intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-02-04 20:00:11 +08:00

Author	SHA1	Message	Date
Hans Wennborg	c40f0fe434	Revert "Reland "[clang] Lower modf builtin using `llvm.modf` intrinsic" (#129885 )" This broke modff calls on 32-bit x86 Windows. See comment on the PR. > This updates the existing modf[f\|l] builtin to be lowered via the > llvm.modf.* intrinsic (rather than directly to a library call). > > The legalization issues exposed by the original PR (#126750) should have > been fixed in #128055 and #129264. This reverts commit `cd1d9a8fab`.	2025-03-10 16:35:03 +01:00
Matt Arsenault	2bada417c1	StructurizeCFG: Use poison instead of undef (#130459 ) There are a surprising number of codegen changes from this.	2025-03-10 22:29:15 +07:00
Matt Arsenault	81ca350b29	AMDGPU: Rename variable from undef to poison (#130460 )	2025-03-10 22:18:36 +07:00
Chris B	39cf545756	[HLSL][Driver] Use temporary files correctly (#130436 ) This updates the DXV and Metal Converter actions to properly use temporary files created by the driver. I've abstracted away a check to determine if an action is the last in the sequence because we may have between 1 and 3 actions depending on the arguments and environment.	2025-03-10 10:13:33 -05:00
Benson Chu	3b3356043c	Revert "[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute" This reverts commit `1f05703176`.	2025-03-10 10:11:23 -05:00
Benson Chu	1f05703176	[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been saved. - GPRCS1 - GPRCS2 - FPStatusRegs (new) - DPRCS - GPRCS3 - DPRCS2 FPSCR is present on all targets with a VFP, but the FPEXC register is not present on Cortex-M devices, so different amounts of bytes are being pushed onto the stack depending on our target, which would affect alignment for subsequent saves. DPRCS1 will sum up all previous bytes that were saved, and will emit extra instructions to ensure that its alignment is correct. My assumption is that if DPRCS1 is able to correct its alignment to be correct, then all subsequent saves will also have correct alignment. Avoid annotating the saving of FPSCR and FPEXC for functions marked with the interrupt_save_fp attribute, even though this is done as part of frame setup. Since these are status registers, there really is no viable way of annotating this. Since these aren't GPRs or DPRs, they can't be used with .save or .vsave directives. Instead, just record that the intermediate registers r4 and r5 are saved to the stack again. Co-authored-by: Jake Vossen <jake@vossen.dev> Co-authored-by: Alan Phipps <a-phipps@ti.com>	2025-03-10 10:05:15 -05:00
A. Jiang	83936f5437	[libc++][docs] Remove mis-added entry for P2513R4 (#130581 ) P2513R4 neither touched library wording nor required library implementation to change. So it was probably a mistake to list it in libc++'s implementation status table.	2025-03-10 22:58:56 +08:00
Timm Baeder	cf6a520a7a	[clang][bytecode] Fix builtin_memcmp buffer sizes for pointers (#130570 ) Don't use the pointer size, but the number of elements multiplied by the element size.	2025-03-10 15:51:31 +01:00
erichkeane	8a8f1359ee	[OpenACC] Implement 'bind' ast/sema for 'routine' directive The 'bind' clause allows the renaming of a function during code generation. There are a few rules about when this can/cannot happen, and it takes either a string or identifier (previously mis-implemetned as ID-expression) argument. Note there are additional rules to this in the implicit-function routine case, but that isn't implemented in this patch, as implicit-function routine is not yet implemented either.	2025-03-10 07:49:13 -07:00
Thomas Preud'homme	967ab7e08e	[mlir][TOSA] Fix linalg lowering of depthwise conv2d (#130293 ) Current lowering for tosa.depthwise_conv2d assumes if both zero points are zero then it's a floating-point operation by hardcoding the use of a arith.addf in the lowered code. Fix code to check for the element type to decide what add operation to use.	2025-03-10 14:49:05 +00:00
Bart Chrzaszcz	8885b5c062	[mlir] Fix bazel build after `f3dcc0fe22`	2025-03-10 14:36:40 +00:00
A2uria	aaa1adc398	[LLD][COFF] Add /noexp for link.exe compatibility (#128814 ) See #107346	2025-03-10 16:26:30 +02:00
Omair Javaid	3a41c7b483	[OpenMP] Mark Failing OpenMP Tests as XFAIL on Windows (#129040 ) This patch marks specific OpenMP runtime tests as XFAIL on Windows due to failures reported in #129023	2025-03-10 19:23:10 +05:00
Nick Sarnie	919d293176	[clang][SPIR-V] Use the SPIR-V backend by default (#129545 ) The SPIR-V backend is now a supported backend, and we believe it is ready to be used by default in Clang over the SPIR-V translator. Some IR generated by Clang today, such as those requiring SPIR-V target address spaces, cannot be compiled by the translator for reasons in this [RFC](https://discourse.llvm.org/t/rfc-the-spir-v-backend-should-change-its-address-space-mappings/82640), so we expect even more programs to work as well. Enable it by default, but keep some of the code as it is still called by the HIP toolchain directly. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-03-10 14:21:18 +00:00
Simon Pilgrim	a619f3111e	[X86] combineConcatVectorOps - add missing VT/Subtarget checks for MOV*DUP concatenation folds.	2025-03-10 14:18:41 +00:00
Austin	c35092c7f1	[ARM] Fix HW thread pointer functionality (#130027 ) - Separate check for hardware support of TLS register from check for support of Thumb2 encoding - Base decision to auto enable TLS on both hardware support and Thumb2 encoding support - Fix HW support check to correctly exclude M-Profile and include ARMV6K variants reference:https://reviews.llvm.org/D114116	2025-03-10 14:18:11 +00:00
Kadir Cetinkaya	f059e58702	Revert "[libc++] Don't try to wait on a thread that hasn't started in std::async (#125433 )" This reverts commit `11766a4097`.	2025-03-10 15:13:41 +01:00
Alexey Bataev	9d37e61fc7	[SLP]Reduce number of alternate instruction, where possible Previous version was reviewed here https://github.com/llvm/llvm-project/pull/123360 It is mostly the same, adjusted after graph-to-tree transformation Patch tries to remove wide alternate operations. Currently SLP vectorizer emits something like this: ``` %0 = add i32 %1 = sub i32 %2 = add i32 %3 = sub i32 %4 = add i32 %5 = sub i32 %6 = add i32 %7 = sub i32 transformes to %v1 = add <8 x i32> %v2 = sub <8 x i32> %res = shuffle %v1, %v2, <0, 9, 2, 11, 4, 13, 6, 15> ``` i.e. half of the results are just unused. This leads to increased register pressure and potentially doubles number of operations. Patch introduces SplitVectorize mode, where it splits the operations by opcodes and produces instead something like this: ``` %v1 = add <4 x i32> %v2 = sub <4 x i32> %res = shuffle %v1, %v2, <0, 4, 1, 5, 2, 6, 3, 7> ``` It allows to improve the performance by reducing number of ops. Also, it turns on some other improvements, like improved graph reordering. -O3+LTO, AVX512 Metric: size..text Program size..text results results0 diff test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test 2788.00 2820.00 1.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 278168.00 280904.00 1.0% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 82682.00 83258.00 0.7% test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 139344.00 139712.00 0.3% test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test 27149.00 27197.00 0.2% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1008188.00 1009948.00 0.2% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 39226.00 39290.00 0.2% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 39229.00 39293.00 0.2% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 2074533.00 2076549.00 0.1% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 2074533.00 2076549.00 0.1% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 798440.00 798952.00 0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 44123.00 44139.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 318942.00 319038.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1159880.00 1160152.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test 73595.00 73611.00 0.0% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1146124.00 1146348.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test 203831.00 203847.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 207662.00 207678.00 0.0% test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test 589851.00 589883.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1398543.00 1398559.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1398543.00 1398559.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2050990.00 2051006.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12559687.00 12559591.00 -0.0% test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 3074157.00 3074125.00 -0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1092252.00 1092188.00 -0.0% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 779763.00 779715.00 -0.0% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 253517.00 253485.00 -0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 848259.00 848035.00 -0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 93064.00 93016.00 -0.1% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 383747.00 383475.00 -0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 673051.00 662907.00 -1.5% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 673051.00 662907.00 -1.5% Olden/tsp - small variations Prolangs-C/TimberWolfMC - small variations, some code not inlined FreeBench/pifft - extra store <8 x double> vectorized, some other extra vectorizations CFP2006/433.milc - better vector code FreeBench/fourinarow - better vector code Benchmarks/tramp3d-v4 - extra vector code, small variations mediabench/gsm/toast - small variations MiBench/telecomm-gsm - small variations CINT2017rate/500.perlbench_r CINT2017speed/600.perlbench_s - better vector code, small variations CINT2006/464.h264ref - some smaller code + changes similar to x264 DOE-ProxyApps-C/miniGMG - small variations Benchmarks/Bullet - small variations CFP2017rate/511.povray_r - small variations DOE-ProxyApps-C/miniAMR - small variations CFP2006/453.povray - small variations DOE-ProxyApps-C++/CLAMR - small variations MiBench/consumer-lame - small variations CFP2006/447.dealII - small variations CFP2017rate/538.imagick_r CFP2017speed/638.imagick_s - small variations CFP2017rate/510.parest_r - better vector code, small variations CFP2017rate/526.blender_r - small variations CINT2006/403.gcc - small variations CINT2006/400.perlbench - small variations CFP2017rate/508.namd_r - small variations ASCI_Purple/SMG2000 - small variations JM/lencod - extra store <16 x i32>, small variations DOE-ProxyApps-C++/miniFE - small variations JM/ldecod - extra vector code, small variations, less shuffles CINT2017speed/625.x264_s CINT2017rate/525.x264_r - the number of instructions increased, but looks like they are more performant. E.g., for function x264_pixel_satd_8x8, llvm-mca reports better throughput - 84 for the current version and 59 for the new version. -O3+LTO, mcpu=sifive-p470 Metric: size..text results results0 diff test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 580768.00 581118.00 0.1% test-suite :: MultiSource/Applications/d/make_dparser.test 78854.00 78894.00 0.1% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 633448.00 633750.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 277002.00 277080.00 0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 931938.00 931960.00 0.0% test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 2512806.00 2512822.00 0.0% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7659880.00 7659876.00 -0.0% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7659880.00 7659876.00 -0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 1602448.00 1602434.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9496664.00 9496542.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 147424.00 147422.00 -0.0% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 1764608.00 1764578.00 -0.0% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 1764608.00 1764578.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 841656.00 841632.00 -0.0% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 949026.00 948962.00 -0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 946348.00 946284.00 -0.0% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 279794.00 279764.00 -0.0% test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 4776.00 4772.00 -0.1% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 25074.00 25028.00 -0.2% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 25074.00 25028.00 -0.2% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 29336.00 29184.00 -0.5% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 535390.00 510124.00 -4.7% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 535390.00 510124.00 -4.7% test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/ieee/GCC-C-execute-ieee-pr50310.test 886.00 608.00 -31.4% CINT2006/464.h264ref - extra v16i32 reduction d/make_dparser - better vector code JM/lencod - extra v16i32 reduction Benchmarks/Bullet - smaller vector code CINT2006/400.perlbench - better vector code CINT2006/403.gcc - small variations CINT2017speed/602.gcc_s CINT2017rate/502.gcc_r - small variations CFP2017rate/510.parest_r - small variations CFP2017rate/526.blender_r - small variations MiBench/consumer-lame - small variations CINT2017speed/600.perlbench_s CINT2017rate/500.perlbench_r - small variations Benchmarks/7zip - small variations CFP2017rate/511.povray_r - small variations JM/ldecod - extra vector code mediabench/g721/g721encode - extra vector code mediabench/gsm - extra vector code MiBench/telecomm-gsm - extra vector code DOE-ProxyApps-C/miniGMG - extra vector code CINT2017rate/525.x264_r CINT2017speed/625.x264_s - reduced number of wide operations and shuffles, saving the registers, similar to X86, extra code in pixel_hadamard_ac vectorized ieee/GCC-C-execute-ieee-pr50310 - extra code vectorized CINT2006/464.h264ref - extra vector code in find_sad_16x16 JM/lencod - extra vector code in find_sad_16x16 d/make_dparser - smaller vector code Benchmarks/Bullet - small variations CINT2006/400.perlbench - smaller vector code CFP2017rate/526.blender_r - small variations, extra store <8 x float> in the loop, extra store <8 x i8> in loop CINT2017rate/500.perlbench_r CINT2017speed/600.perlbench_s - small variations MiBench/consumer-lame - small variations JM/ldecod - extra vector code mediabench/g721/g721encode - small variations Reviewers: hiraditya Reviewed By: hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/128907	2025-03-10 10:06:39 -04:00
David Spickett	cac6d431d0	[clang][test] Don't require specific alignment in test case (#130589 ) https://github.com/llvm/llvm-project/pull/129952 / `42d49a7724` added this test which is failing on 32-bit ARM because the alignment chosen is 4 not 8. Which would make sense if this is a 32/64 bit difference https://lab.llvm.org/buildbot/#/builders/154/builds/13059 ``` <stdin>:34:30: note: scanning from here define dso_local void @_Z1fv(ptr dead_on_unwind noalias writable sret(%struct.B) align 4 %agg.result) #0 { ^ <stdin>:38:2: note: possible intended match here %0 = load ptr, ptr @x, align 4 ^ ``` The other test does not check alignment, so I'm assuming that it is not important here.	2025-03-10 13:45:44 +00:00
Simon Pilgrim	6b3bb44227	[X86] combineConcatVectorOps - convert PSHUFB/PSADBW/VPMADDUBSW/VPMADDUBSW concatenation to use combineConcatVectorOps recursion (#130592 ) Only concatenate nodes if at least one operand is beneficial to concatenate	2025-03-10 13:36:58 +00:00
LLVM GN Syncbot	754eeeaef1	[gn build] Port `0d2c55cb96`	2025-03-10 13:23:12 +00:00
Alex Bradbury	dffbc030e7	[MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses (#129889 ) This change removes 189 static instances of no-op reg-reg moves (i.e. where src == dest) across llvm-test-suite when compiled for RISC-V rv64gc and with SPEC included.	2025-03-10 13:22:59 +00:00
Krzysztof Parzyszek	4e453d5292	[flang][OpenMP] Accept old FLUSH syntax in METADIRECTIVE (#130122 ) Accommodate it in OmpDirectiveSpecification, which may become the primary component of the actual FLUSH construct in the future.	2025-03-10 08:12:46 -05:00
Krzysztof Parzyszek	d67947162f	[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568 ) The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically, `has_device_addr(x)` means that (in C/C++ terms) `&x` is a device address. When entering a target region, `x` does not need to be allocated on the device, or have its contents copied over (in the absence of additional mapping clauses). Passing its address verbatim to the region for use is sufficient, and is the intended goal of the clause. Some Fortran objects use descriptors in their in-memory representation. If `x` had a descriptor, both the descriptor and the contents of `x` would be located in the device memory. However, the descriptors are managed by the compiler, and can be regenerated at various points as needed. The address of the effective descriptor may change, hence it's not safe to pass the address of the descriptor to the target region. Instead, the descriptor itself is always copied, but for objects like `x`, no further mapping takes place (as this keeps the storage pointer in the descriptor unchanged). --------- Co-authored-by: Sergio Afonso <safonsof@amd.com>	2025-03-10 08:11:01 -05:00
Yutong Zhu	773e88f9d6	[Clang] Force expressions with UO_Not to not be non-negative (#126846 ) This PR addresses the bug of not throwing warnings for the following code: ```c++ int test13(unsigned a, int b) { return a > ~(95 != b); // expected-warning {{comparison of integers of different signs}} } ``` However, in the original issue, a comment mentioned that negation, pre-increment, and pre-decrement operators are also incorrect in this case. Fixes #18878	2025-03-10 14:08:50 +01:00
Matheus Izvekov	a1a6a83976	[clang] fix matching of nested template template parameters (#130447 ) When checking the template template parameters of template template parameters, the PartialOrdering context was not correctly propagated. This also has a few drive-by fixes, such as checking the template parameter lists of template template parameters, which was previously missing and would have been it's own bug, but we need to fix it in order to prevent crashes in error recovery in a simple way. Fixes #130362	2025-03-10 10:08:43 -03:00
Phoebe Wang	507e0c3b67	[X86][APX] Try to replace non-NF with NF instructions when optimizeCompareInstr (#130488 ) https://godbolt.org/z/rWYdqnjjx	2025-03-10 21:08:01 +08:00
Simon Pilgrim	5d921710c0	[X86] combineConcatVectorOps - convert X86ISD::HADD/SUB concatenation to use combineConcatVectorOps recursion (#130579 ) Only concatenate X86ISD::HADD/SUB nodes if at least one operand is beneficial to concatenate	2025-03-10 13:03:27 +00:00
Hans Wennborg	28fa1fcf55	Revert "[clang] Fix missing diagnostic of declaration use when accessing TypeDecls through typename access (#129681 )" This caused incorrect -Wunguarded-availability warnings. See comment on the pull request. > We were missing a call to DiagnoseUseOfDecl when performing typename > access. > > This refactors the code so that TypeDecl lookups funnel through a helper > which performs all the necessary checks, removing some related > duplication on the way. > > Fixes #58547 > > Differential Revision: https://reviews.llvm.org/D136533 This reverts commit `4c4fd6b031`.	2025-03-10 14:02:04 +01:00
sommersun	848ba3854c	[DAG] fold AVGFLOORS to AVGFLOORU for non-negative operand (#84746 ) (#129678 ) Fold ISD::AVGFLOORS to ISD::AVGFLOORU for non-negative operand. Cover test is modified for uhadd with zero extension. Fixes #84746	2025-03-10 13:01:08 +00:00
SivanShani-Arm	a5c33e634b	[AArch64][ELF Parser] Fix out-of-scope variable usage (#130594 ) Return a reference to a persistent variable instead of a temporary copy.	2025-03-10 12:56:46 +00:00
Alex Bradbury	7fb71d1511	[RISCV][test] Add test case showing case where machine copy propagation leaves behind a no-op reg move Pre-commit for #129889.	2025-03-10 12:53:55 +00:00
Matt Arsenault	0d2c55cb96	AMDGPU: Move enqueued block handling into clang (#128519 ) The previous implementation wasn't maintaining a faithful IR representation of how this really works. The value returned by createEnqueuedBlockKernel wasn't actually used as a function, and hacked up later to be a pointer to the runtime handle global variable. In reality, the enqueued block is a struct where the first field is a pointer to the kernel descriptor, not the kernel itself. We were also relying on passing around a reference to a global using a string attribute containing its name. It's better to base this on a proper IR symbol reference during final emission. This now avoids using a function attribute on kernels and avoids using the additional "runtime-handle" attribute to populate the final metadata. Instead, associate the runtime handle reference to the kernel with the !associated global metadata. We can then get a final, correctly mangled name at the end. I couldn't figure out how to get rename-with-external-symbol behavior using a combination of comdats and aliases, so leaves an IR pass to externalize the runtime handles for codegen. If anything breaks, it's most likely this, so leave avoiding this for a later step. Use a special section name to enable this behavior. This also means it's possible to declare enqueuable kernels in source without going through the dedicated block syntax or other dedicated compiler support. We could move towards initializing the runtime handle in the compiler/linker. I have a working patch where the linker sets up the first field of the handle, avoiding the need to export the block kernel symbol for the runtime. We would need new relocations to get the private and group sizes, but that would avoid the runtime's special case handling that requires the device_enqueue_symbol metadata field. https://reviews.llvm.org/D141700	2025-03-10 19:54:04 +07:00
Matheus Izvekov	dbd82f33b5	[clang] NNS: don't print trailing scope resolution operator in diagnostics (#130529 ) This clears up the printing of a NestedNameSpecifier so a trailing '::' is not printed, unless it refers into the global scope. This fixes a bunch of diagnostics where the trailing :: was awkward. This also prints the NNS quoted consistenty. There is a drive-by improvement to error recovery, where now we print the actual type instead of `<dependent type>`. This will clear up further uses of NNS printing in further patches.	2025-03-10 09:37:38 -03:00
Paul Walker	4fee3981a8	[LLVM][SVE] Add isel for scalable vector bfloat copysign operations. (#130098 )	2025-03-10 12:34:42 +00:00
SivanShani-Arm	4508d6aa72	[AArch64][ELF Parser] Fix out-of-scope variable usage (#130576 ) Return a reference to a persistent variable instead of a temporary copy.	2025-03-10 12:25:40 +00:00
Simon Pilgrim	88f010a2eb	[X86] Improve test coverage for concat(pmaddubsw(),pmaddubsw()) -> pmaddubsw(concat(),concat()) Ensure we have tests for both beneficial/non-beneficial concatenation cases	2025-03-10 12:22:15 +00:00
Simon Pilgrim	94a62b3085	[X86] Add test case showing its not always beneficial to fold concat(pshufb(),pshufb()) -> pshufb(concat(),concat())	2025-03-10 12:09:32 +00:00
Akshat Oke	52225d2702	[AMDGPU][NewPM] Port AMDGPUReserveWWMRegs to NPM (#123722 )	2025-03-10 17:36:35 +05:30
Robert Dazi	fdc8e5ab62	[Clang][AArch64] Fix typo with colon-separated syntax for system registers (#105608 ) The range for Op0 was set to 1 instead of 3. The description of `e493f177ee` visually explains the encoding of implementation-defined system registers. `796787d07c/llvm/lib/Target/AArch64/AArch64SystemOperands.td (L658-L674)` Gobolt: https://godbolt.org/z/WK9PqPvGE Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>	2025-03-10 12:05:41 +00:00
John Harrison	502385c241	[lldb] Remove an extraneous `printf` statement. (#130453 ) This was missed in review but is showing up in lldb-dap output.	2025-03-10 11:38:03 +00:00
Michael Buch	6a9df5b4dd	Revert "[lldb][asan] Add temporary logging to ReportRetriever" This reverts commit `39a4da20d8`. We skipped the failing tests in `6cc8b0bef07f4270303bec0fc203f251a2fde262`.	2025-03-10 11:37:13 +00:00
DianQK	dd21aacd76	[TailDuplicator] Do not restrict the computed gotos (#114990 ) Fixes #106846. This is what I learned from GCC. I found that GCC does not duplicate the BB that has indirect jumps with the jump table. I believe GCC has provided a clear explanation here: > Duplicate the blocks containing computed gotos. This basically unfactors computed gotos that were factored early on in the compilation process to speed up edge based data flow. We used to not unfactor them again, which can seriously pessimize code with many computed jumps in the source code, such as interpreters.	2025-03-10 19:34:07 +08:00
Kajetan Puchalski	0c7e895de3	[flang] Move parser invocations into ParserActions (#130309 ) FrontendActions.cpp is currently one of the biggest compilation units in all of flang. Measuring its compilation gives the following metrics: User time (seconds): 139.21 System time (seconds): 4.65 Maximum resident set size (kbytes): 5891440 (5.61 GB) This commit separates out explicit invocations of the parser into a separate compilation unit - ParserActions.cpp - through helper functions in order to decrease the maximum compilation time and memory usage of a single unit. After the split, the measurements of FrontendActions.cpp are as follows: User time (seconds): 70.08 System time (seconds): 3.16 Maximum resident set size (kbytes): 3961492 (3.7 GB) While the ones for the newly created ParserActions.cpp as follows: User time (seconds): 104.33 System time (seconds): 3.37 Maximum resident set size (kbytes): 4185600 (3.99 GB) --------- Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>	2025-03-10 11:33:47 +00:00
Simon Pilgrim	99fdb5dfd8	[X86] combineConcatVectorOps - convert X86ISD::PALIGNR concatenation to use combineConcatVectorOps recursion (#130572 ) Only concatenate X86ISD::PALIGNR nodes if at least one operand is beneficial to concatenate	2025-03-10 11:32:15 +00:00
gbMattN	0ede277b62	Spelling in lit.cfg.py	2025-03-10 11:27:23 +00:00
Florian Hahn	dd86ece554	[MergeFunc] Check full IR and comdat keys in comdat.ll.	2025-03-10 11:11:03 +00:00
Marc Auberer	8d38906d08	[IR] Fix assertion error in User new/delete edge case (#129914 ) Fixes #129900 If `operator delete` was called after an unsuccessful constructor call after `operator new`, we ran into undefined behaviour. This was discovered by our malfunction tests while preparing an upgrade to LLVM 20, that explicitly check for such kind of bugs.	2025-03-10 11:53:45 +01:00
Artemiy Bulavin	f3dcc0fe22	[mlir] Refactor ConvertVectorToLLVMPass options (#128219 ) The `VectorTransformsOptions` on the `ConvertVectorToLLVMPass` is currently represented as a struct, which makes it not serialisable. This means a pass pipeline that contains this pass cannot be represented as textual form, which breaks reproducer generation and options such as `--dump-pass-pipeline`. This PR expands the `VectorTransformsOptions` struct into the two options that are actually used by the Pass' patterns: `vector-contract-lowering` and `vector-transpose-lowering` . The other options present in VectorTransformOptions are not used by any patterns in this pass. Additionally, I have changed some interfaces to only take these specific options over the full options struct as, again, the vector contract and transpose lowering patterns only need one of their respective options. Finally, I have added a simple lit test that just prints the pass pipeline using `--dump-pass-pipeline` to ensure the options on this pass remain serialisable. Fixes #129046	2025-03-10 10:32:03 +00:00
Simon Pilgrim	8c8eff2b35	[X86] Add test case showing its not always beneficial to fold concat(pack(),pack()) -> pack(concat(),concat())	2025-03-10 10:23:30 +00:00

1 2 3 4 5 ...

530055 Commits