Commit Graph

530055 Commits

Author SHA1 Message Date
Hans Wennborg
c40f0fe434 Revert "Reland "[clang] Lower modf builtin using llvm.modf intrinsic" (#129885)"
This broke modff calls on 32-bit x86 Windows. See comment on the PR.

> This updates the existing modf[f|l] builtin to be lowered via the
> llvm.modf.* intrinsic (rather than directly to a library call).
>
> The legalization issues exposed by the original PR (#126750) should have
> been fixed in #128055 and #129264.

This reverts commit cd1d9a8fab.
2025-03-10 16:35:03 +01:00
Matt Arsenault
2bada417c1 StructurizeCFG: Use poison instead of undef (#130459)
There are a surprising number of codegen changes from this.
2025-03-10 22:29:15 +07:00
Matt Arsenault
81ca350b29 AMDGPU: Rename variable from undef to poison (#130460) 2025-03-10 22:18:36 +07:00
Chris B
39cf545756 [HLSL][Driver] Use temporary files correctly (#130436)
This updates the DXV and Metal Converter actions to properly use
temporary files created by the driver. I've abstracted away a check to
determine if an action is the last in the sequence because we may have
between 1 and 3 actions depending on the arguments and environment.
2025-03-10 10:13:33 -05:00
Benson Chu
3b3356043c Revert "[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute"
This reverts commit 1f05703176.
2025-03-10 10:11:23 -05:00
Benson Chu
1f05703176 [ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute
FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been
saved.

- GPRCS1
- GPRCS2
- FPStatusRegs (new)
- DPRCS
- GPRCS3
- DPRCS2

FPSCR is present on all targets with a VFP, but the FPEXC register is
not present on Cortex-M devices, so different amounts of bytes are
being pushed onto the stack depending on our target, which would
affect alignment for subsequent saves.

DPRCS1 will sum up all previous bytes that were saved, and will emit
extra instructions to ensure that its alignment is correct. My
assumption is that if DPRCS1 is able to correct its alignment to be
correct, then all subsequent saves will also have correct alignment.

Avoid annotating the saving of FPSCR and FPEXC for functions marked
with the interrupt_save_fp attribute, even though this is done as part
of frame setup.  Since these are status registers, there really is no
viable way of annotating this. Since these aren't GPRs or DPRs, they
can't be used with .save or .vsave directives. Instead, just record
that the intermediate registers r4 and r5 are saved to the stack
again.

Co-authored-by: Jake Vossen <jake@vossen.dev>
Co-authored-by: Alan Phipps <a-phipps@ti.com>
2025-03-10 10:05:15 -05:00
A. Jiang
83936f5437 [libc++][docs] Remove mis-added entry for P2513R4 (#130581)
P2513R4 neither touched library wording nor required library
implementation to change. So it was probably a mistake to list it in
libc++'s implementation status table.
2025-03-10 22:58:56 +08:00
Timm Baeder
cf6a520a7a [clang][bytecode] Fix builtin_memcmp buffer sizes for pointers (#130570)
Don't use the pointer size, but the number of elements multiplied by the
element size.
2025-03-10 15:51:31 +01:00
erichkeane
8a8f1359ee [OpenACC] Implement 'bind' ast/sema for 'routine' directive
The 'bind' clause allows the renaming of a function during code
generation.  There are a few rules about when this can/cannot happen,
and it takes either a string or identifier (previously mis-implemetned
as ID-expression) argument.

Note there are additional rules to this in the implicit-function routine
case, but that isn't implemented in this patch, as implicit-function
routine is not yet implemented either.
2025-03-10 07:49:13 -07:00
Thomas Preud'homme
967ab7e08e [mlir][TOSA] Fix linalg lowering of depthwise conv2d (#130293)
Current lowering for tosa.depthwise_conv2d assumes if both zero points
are zero then it's a floating-point operation by hardcoding the use of a
arith.addf in the lowered code. Fix code to check for the element type
to decide what add operation to use.
2025-03-10 14:49:05 +00:00
Bart Chrzaszcz
8885b5c062 [mlir] Fix bazel build after f3dcc0fe22 2025-03-10 14:36:40 +00:00
A2uria
aaa1adc398 [LLD][COFF] Add /noexp for link.exe compatibility (#128814)
See #107346
2025-03-10 16:26:30 +02:00
Omair Javaid
3a41c7b483 [OpenMP] Mark Failing OpenMP Tests as XFAIL on Windows (#129040)
This patch marks specific OpenMP runtime tests as XFAIL on Windows due
to failures reported in #129023
2025-03-10 19:23:10 +05:00
Nick Sarnie
919d293176 [clang][SPIR-V] Use the SPIR-V backend by default (#129545)
The SPIR-V backend is now a supported backend, and we believe it is
ready to be used by default in Clang over the SPIR-V translator.

Some IR generated by Clang today, such as those requiring SPIR-V target
address spaces, cannot be compiled by the translator for reasons in this
[RFC](https://discourse.llvm.org/t/rfc-the-spir-v-backend-should-change-its-address-space-mappings/82640),
so we expect even more programs to work as well.

Enable it by default, but keep some of the code as it is still called by
the HIP toolchain directly.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-03-10 14:21:18 +00:00
Simon Pilgrim
a619f3111e [X86] combineConcatVectorOps - add missing VT/Subtarget checks for MOV*DUP concatenation folds. 2025-03-10 14:18:41 +00:00
Austin
c35092c7f1 [ARM] Fix HW thread pointer functionality (#130027)
- Separate check for hardware support of TLS register from check for support of Thumb2 encoding
- Base decision to auto enable TLS on both hardware support and Thumb2 encoding support
- Fix HW support check to correctly exclude M-Profile and include ARMV6K variants

reference:https://reviews.llvm.org/D114116
2025-03-10 14:18:11 +00:00
Kadir Cetinkaya
f059e58702 Revert "[libc++] Don't try to wait on a thread that hasn't started in std::async (#125433)"
This reverts commit 11766a4097.
2025-03-10 15:13:41 +01:00
Alexey Bataev
9d37e61fc7 [SLP]Reduce number of alternate instruction, where possible
Previous version was reviewed here https://github.com/llvm/llvm-project/pull/123360
It is mostly the same, adjusted after graph-to-tree transformation

Patch tries to remove wide alternate operations.
Currently SLP vectorizer emits something like this:
```
%0 = add i32
%1 = sub i32
%2 = add i32
%3 = sub i32
%4 = add i32
%5 = sub i32
%6 = add i32
%7 = sub i32

transformes to

%v1 = add <8 x i32>
%v2 = sub <8 x i32>
%res = shuffle %v1, %v2, <0, 9, 2, 11, 4, 13, 6, 15>
```
i.e. half of the results are just unused. This leads to increased
register pressure and potentially doubles number of operations.

Patch introduces SplitVectorize mode, where it splits the operations by
opcodes and produces instead something like this:
```
%v1 = add <4 x i32>
%v2 = sub <4 x i32>
%res = shuffle %v1, %v2, <0, 4, 1, 5, 2, 6, 3, 7>
```
It allows to improve the performance by reducing number of ops. Also, it
turns on some other improvements, like improved graph reordering.

-O3+LTO, AVX512
Metric: size..text
Program                                                                         size..text
                                                                                results     results0    diff
                       test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test     2788.00     2820.00  1.1%
test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   278168.00   280904.00  1.0%
               test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test    82682.00    83258.00  0.7%
                    test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test   139344.00   139712.00  0.3%
     test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test    27149.00    27197.00  0.2%
               test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1008188.00  1009948.00  0.2%
          test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test    39226.00    39290.00  0.2%
   test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test    39229.00    39293.00  0.2%
 test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test  2074533.00  2076549.00  0.1%
test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test  2074533.00  2076549.00  0.1%
             test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   798440.00   798952.00  0.1%
     test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test    44123.00    44139.00  0.0%
                       test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   318942.00   319038.00  0.0%
        test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1159880.00  1160152.00  0.0%
     test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test    73595.00    73611.00  0.0%
                test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  1146124.00  1146348.00  0.0%
       test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test   203831.00   203847.00  0.0%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   207662.00   207678.00  0.0%
                test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test   589851.00   589883.00  0.0%
      test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1398543.00  1398559.00  0.0%
     test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1398543.00  1398559.00  0.0%
        test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2050990.00  2051006.00  0.0%

      test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12559687.00 12559591.00 -0.0%
                     test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test  3074157.00  3074125.00 -0.0%
         test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1092252.00  1092188.00 -0.0%
            test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   779763.00   779715.00 -0.0%
         test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test   253517.00   253485.00 -0.0%
                  test-suite :: MultiSource/Applications/JM/lencod/lencod.test   848259.00   848035.00 -0.0%
     test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test    93064.00    93016.00 -0.1%
                  test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   383747.00   383475.00 -0.1%
          test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   673051.00   662907.00 -1.5%
           test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   673051.00   662907.00 -1.5%

Olden/tsp - small variations
Prolangs-C/TimberWolfMC - small variations, some code not inlined
FreeBench/pifft - extra store <8 x double> vectorized, some other extra
vectorizations
CFP2006/433.milc - better vector code
FreeBench/fourinarow - better vector code
Benchmarks/tramp3d-v4 - extra vector code, small variations
mediabench/gsm/toast - small variations
MiBench/telecomm-gsm - small variations
CINT2017rate/500.perlbench_r
CINT2017speed/600.perlbench_s - better vector code, small variations
CINT2006/464.h264ref - some smaller code + changes similar to x264
DOE-ProxyApps-C/miniGMG - small variations
Benchmarks/Bullet - small variations
CFP2017rate/511.povray_r - small variations
DOE-ProxyApps-C/miniAMR - small variations
CFP2006/453.povray - small variations
DOE-ProxyApps-C++/CLAMR - small variations
MiBench/consumer-lame - small variations
CFP2006/447.dealII - small variations
CFP2017rate/538.imagick_r
CFP2017speed/638.imagick_s - small variations
CFP2017rate/510.parest_r - better vector code, small variations
CFP2017rate/526.blender_r - small variations
CINT2006/403.gcc - small variations
CINT2006/400.perlbench - small variations
CFP2017rate/508.namd_r - small variations
ASCI_Purple/SMG2000 - small variations
JM/lencod - extra store <16 x i32>, small variations
DOE-ProxyApps-C++/miniFE - small variations
JM/ldecod - extra vector code, small variations, less shuffles
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - the number of instructions increased, but
looks like they are more performant. E.g., for function
x264_pixel_satd_8x8, llvm-mca reports better throughput - 84 for the
current version and 59 for the new version.

-O3+LTO, mcpu=sifive-p470

Metric: size..text

                                                                               results    results0   diff
                                 test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  580768.00  581118.00   0.1%
                                        test-suite :: MultiSource/Applications/d/make_dparser.test   78854.00   78894.00   0.1%
                                      test-suite :: MultiSource/Applications/JM/lencod/lencod.test  633448.00  633750.00   0.0%
                                           test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  277002.00  277080.00   0.0%
                             test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  931938.00  931960.00   0.0%
                                         test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 2512806.00 2512822.00   0.0%
                                test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7659880.00 7659876.00  -0.0%
                                 test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7659880.00 7659876.00  -0.0%
                            test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 1602448.00 1602434.00  -0.0%
                          test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9496664.00 9496542.00  -0.0%
                     test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test  147424.00  147422.00  -0.0%
                    test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 1764608.00 1764578.00  -0.0%
                     test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 1764608.00 1764578.00  -0.0%
                                     test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  841656.00  841632.00  -0.0%
                                    test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  949026.00  948962.00  -0.0%
                            test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  946348.00  946284.00  -0.0%
                                      test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  279794.00  279764.00  -0.0%
                       test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    4776.00    4772.00  -0.1%
                              test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test   25074.00   25028.00  -0.2%
                       test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test   25074.00   25028.00  -0.2%
                         test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   29336.00   29184.00  -0.5%
                               test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  535390.00  510124.00  -4.7%
                              test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  535390.00  510124.00  -4.7%
test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/ieee/GCC-C-execute-ieee-pr50310.test     886.00     608.00 -31.4%

CINT2006/464.h264ref - extra v16i32 reduction
d/make_dparser - better vector code
JM/lencod - extra v16i32 reduction
Benchmarks/Bullet - smaller vector code
CINT2006/400.perlbench - better vector code
CINT2006/403.gcc - small variations
CINT2017speed/602.gcc_s
CINT2017rate/502.gcc_r - small variations
CFP2017rate/510.parest_r - small variations
CFP2017rate/526.blender_r - small variations
MiBench/consumer-lame - small variations
CINT2017speed/600.perlbench_s
CINT2017rate/500.perlbench_r - small variations
Benchmarks/7zip - small variations
CFP2017rate/511.povray_r - small variations
JM/ldecod - extra vector code
mediabench/g721/g721encode - extra vector code
mediabench/gsm - extra vector code
MiBench/telecomm-gsm - extra vector code
DOE-ProxyApps-C/miniGMG - extra vector code
CINT2017rate/525.x264_r
CINT2017speed/625.x264_s - reduced number of wide operations and
shuffles, saving the registers, similar to X86, extra code in
pixel_hadamard_ac vectorized
ieee/GCC-C-execute-ieee-pr50310 - extra code vectorized

CINT2006/464.h264ref - extra vector code in find_sad_16x16
JM/lencod - extra vector code in find_sad_16x16
d/make_dparser - smaller vector code
Benchmarks/Bullet - small variations
CINT2006/400.perlbench - smaller vector code
CFP2017rate/526.blender_r - small variations, extra store <8 x float> in
the loop, extra store <8 x i8> in loop
CINT2017rate/500.perlbench_r
CINT2017speed/600.perlbench_s - small variations
MiBench/consumer-lame - small variations
JM/ldecod - extra vector code
mediabench/g721/g721encode - small variations

Reviewers: hiraditya

Reviewed By: hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/128907
2025-03-10 10:06:39 -04:00
David Spickett
cac6d431d0 [clang][test] Don't require specific alignment in test case (#130589)
https://github.com/llvm/llvm-project/pull/129952 /
42d49a7724 added this test which is
failing on 32-bit ARM because the alignment chosen is 4 not 8. Which
would make sense if this is a 32/64 bit difference

https://lab.llvm.org/buildbot/#/builders/154/builds/13059
```
<stdin>:34:30: note: scanning from here
define dso_local void @_Z1fv(ptr dead_on_unwind noalias writable sret(%struct.B) align 4 %agg.result) #0 {
                             ^
<stdin>:38:2: note: possible intended match here
 %0 = load ptr, ptr @x, align 4
 ^
```
The other test does not check alignment, so I'm assuming that it is not
important here.
2025-03-10 13:45:44 +00:00
Simon Pilgrim
6b3bb44227 [X86] combineConcatVectorOps - convert PSHUFB/PSADBW/VPMADDUBSW/VPMADDUBSW concatenation to use combineConcatVectorOps recursion (#130592)
Only concatenate nodes if at least one operand is beneficial to concatenate
2025-03-10 13:36:58 +00:00
LLVM GN Syncbot
754eeeaef1 [gn build] Port 0d2c55cb96 2025-03-10 13:23:12 +00:00
Alex Bradbury
dffbc030e7 [MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses (#129889)
This change removes 189 static instances of no-op reg-reg moves (i.e.
where src == dest) across llvm-test-suite when compiled for RISC-V
rv64gc and with SPEC included.
2025-03-10 13:22:59 +00:00
Krzysztof Parzyszek
4e453d5292 [flang][OpenMP] Accept old FLUSH syntax in METADIRECTIVE (#130122)
Accommodate it in OmpDirectiveSpecification, which may become the
primary component of the actual FLUSH construct in the future.
2025-03-10 08:12:46 -05:00
Krzysztof Parzyszek
d67947162f [flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568)
The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an
address that is a valid device address. Specifically,
`has_device_addr(x)` means that (in C/C++ terms) `&x` is a device
address.

When entering a target region, `x` does not need to be allocated on the
device, or have its contents copied over (in the absence of additional
mapping clauses). Passing its address verbatim to the region for use is
sufficient, and is the intended goal of the clause.

Some Fortran objects use descriptors in their in-memory representation.
If `x` had a descriptor, both the descriptor and the contents of `x`
would be located in the device memory. However, the descriptors are
managed by the compiler, and can be regenerated at various points as
needed. The address of the effective descriptor may change, hence it's
not safe to pass the address of the descriptor to the target region.
Instead, the descriptor itself is always copied, but for objects like
`x`, no further mapping takes place (as this keeps the storage pointer
in the descriptor unchanged).

---------

Co-authored-by: Sergio Afonso <safonsof@amd.com>
2025-03-10 08:11:01 -05:00
Yutong Zhu
773e88f9d6 [Clang] Force expressions with UO_Not to not be non-negative (#126846)
This PR addresses the bug of not throwing warnings for the following
code:
```c++
int test13(unsigned a, int *b) {
        return a > ~(95 != *b); // expected-warning {{comparison of integers of different signs}}
}
```

However, in the original issue, a comment mentioned that negation,
pre-increment, and pre-decrement operators are also incorrect in this
case.

Fixes #18878
2025-03-10 14:08:50 +01:00
Matheus Izvekov
a1a6a83976 [clang] fix matching of nested template template parameters (#130447)
When checking the template template parameters of template template
parameters, the PartialOrdering context was not correctly propagated.

This also has a few drive-by fixes, such as checking the template
parameter lists of template template parameters, which was previously
missing and would have been it's own bug, but we need to fix it in order
to prevent crashes in error recovery in a simple way.

Fixes #130362
2025-03-10 10:08:43 -03:00
Phoebe Wang
507e0c3b67 [X86][APX] Try to replace non-NF with NF instructions when optimizeCompareInstr (#130488)
https://godbolt.org/z/rWYdqnjjx
2025-03-10 21:08:01 +08:00
Simon Pilgrim
5d921710c0 [X86] combineConcatVectorOps - convert X86ISD::HADD/SUB concatenation to use combineConcatVectorOps recursion (#130579)
Only concatenate X86ISD::HADD/SUB nodes if at least one operand is beneficial to concatenate
2025-03-10 13:03:27 +00:00
Hans Wennborg
28fa1fcf55 Revert "[clang] Fix missing diagnostic of declaration use when accessing TypeDecls through typename access (#129681)"
This caused incorrect -Wunguarded-availability warnings. See comment on
the pull request.

> We were missing a call to DiagnoseUseOfDecl when performing typename
> access.
>
> This refactors the code so that TypeDecl lookups funnel through a helper
> which performs all the necessary checks, removing some related
> duplication on the way.
>
> Fixes #58547
>
> Differential Revision: https://reviews.llvm.org/D136533

This reverts commit 4c4fd6b031.
2025-03-10 14:02:04 +01:00
sommersun
848ba3854c [DAG] fold AVGFLOORS to AVGFLOORU for non-negative operand (#84746) (#129678)
Fold ISD::AVGFLOORS to ISD::AVGFLOORU for non-negative operand. Cover test is modified for uhadd with zero extension.

Fixes #84746
2025-03-10 13:01:08 +00:00
SivanShani-Arm
a5c33e634b [AArch64][ELF Parser] Fix out-of-scope variable usage (#130594)
Return a reference to a persistent variable instead of a temporary copy.
2025-03-10 12:56:46 +00:00
Alex Bradbury
7fb71d1511 [RISCV][test] Add test case showing case where machine copy propagation leaves behind a no-op reg move
Pre-commit for #129889.
2025-03-10 12:53:55 +00:00
Matt Arsenault
0d2c55cb96 AMDGPU: Move enqueued block handling into clang (#128519)
The previous implementation wasn't maintaining a faithful IR
representation of how this really works. The value returned by
createEnqueuedBlockKernel wasn't actually used as a function, and
hacked up later to be a pointer to the runtime handle global
variable. In reality, the enqueued block is a struct where the first
field is a pointer to the kernel descriptor, not the kernel itself. We
were also relying on passing around a reference to a global using a
string attribute containing its name. It's better to base this on a
proper IR symbol reference during final emission.

This now avoids using a function attribute on kernels and avoids using
the additional "runtime-handle" attribute to populate the final
metadata. Instead, associate the runtime handle reference to the
kernel with the !associated global metadata. We can then get a final,
correctly mangled name at the end.

I couldn't figure out how to get rename-with-external-symbol behavior
using a combination of comdats and aliases, so leaves an IR pass to
externalize the runtime handles for codegen. If anything breaks, it's
most likely this, so leave avoiding this for a later step. Use a
special section name to enable this behavior. This also means it's
possible to declare enqueuable kernels in source without going through
the dedicated block syntax or other dedicated compiler support.

We could move towards initializing the runtime handle in the
compiler/linker. I have a working patch where the linker sets up the
first field of the handle, avoiding the need to export the block
kernel symbol for the runtime. We would need new relocations to get
the private and group sizes, but that would avoid the runtime's
special case handling that requires the device_enqueue_symbol metadata
field.

https://reviews.llvm.org/D141700
2025-03-10 19:54:04 +07:00
Matheus Izvekov
dbd82f33b5 [clang] NNS: don't print trailing scope resolution operator in diagnostics (#130529)
This clears up the printing of a NestedNameSpecifier so a trailing '::'
is not printed, unless it refers into the global scope.

This fixes a bunch of diagnostics where the trailing :: was awkward.
This also prints the NNS quoted consistenty.

There is a drive-by improvement to error recovery, where now we print
the actual type instead of `<dependent type>`.

This will clear up further uses of NNS printing in further patches.
2025-03-10 09:37:38 -03:00
Paul Walker
4fee3981a8 [LLVM][SVE] Add isel for scalable vector bfloat copysign operations. (#130098) 2025-03-10 12:34:42 +00:00
SivanShani-Arm
4508d6aa72 [AArch64][ELF Parser] Fix out-of-scope variable usage (#130576)
Return a reference to a persistent variable instead of a temporary copy.
2025-03-10 12:25:40 +00:00
Simon Pilgrim
88f010a2eb [X86] Improve test coverage for concat(pmaddubsw(),pmaddubsw()) -> pmaddubsw(concat(),concat())
Ensure we have tests for both beneficial/non-beneficial concatenation cases
2025-03-10 12:22:15 +00:00
Simon Pilgrim
94a62b3085 [X86] Add test case showing its not always beneficial to fold concat(pshufb(),pshufb()) -> pshufb(concat(),concat()) 2025-03-10 12:09:32 +00:00
Akshat Oke
52225d2702 [AMDGPU][NewPM] Port AMDGPUReserveWWMRegs to NPM (#123722) 2025-03-10 17:36:35 +05:30
Robert Dazi
fdc8e5ab62 [Clang][AArch64] Fix typo with colon-separated syntax for system registers (#105608)
The range for Op0 was set to 1 instead of 3.

The description of e493f177ee visually
explains the encoding of implementation-defined system registers.


796787d07c/llvm/lib/Target/AArch64/AArch64SystemOperands.td (L658-L674)

Gobolt: https://godbolt.org/z/WK9PqPvGE

Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>
2025-03-10 12:05:41 +00:00
John Harrison
502385c241 [lldb] Remove an extraneous printf statement. (#130453)
This was missed in review but is showing up in lldb-dap output.
2025-03-10 11:38:03 +00:00
Michael Buch
6a9df5b4dd Revert "[lldb][asan] Add temporary logging to ReportRetriever"
This reverts commit 39a4da20d8.

We skipped the failing tests in `6cc8b0bef07f4270303bec0fc203f251a2fde262`.
2025-03-10 11:37:13 +00:00
DianQK
dd21aacd76 [TailDuplicator] Do not restrict the computed gotos (#114990)
Fixes #106846.

This is what I learned from GCC. I found that GCC does not duplicate the
BB that has indirect jumps with the jump table. I believe GCC has
provided a clear explanation here:

> Duplicate the blocks containing computed gotos. This basically
unfactors computed gotos that were factored early on in the compilation
process to speed up edge based data flow. We used to not unfactor them
again, which can seriously pessimize code with many computed jumps in
the source code, such as interpreters.
2025-03-10 19:34:07 +08:00
Kajetan Puchalski
0c7e895de3 [flang] Move parser invocations into ParserActions (#130309)
FrontendActions.cpp is currently one of the biggest compilation units in
all of flang. Measuring its compilation gives the following metrics:

User time (seconds): 139.21
System time (seconds): 4.65
Maximum resident set size (kbytes): 5891440 (5.61 GB)

This commit separates out explicit invocations of the parser into a
separate compilation unit - ParserActions.cpp - through helper functions
in order to decrease the maximum compilation time and memory usage of a
single unit.
After the split, the measurements of FrontendActions.cpp are as follows:

User time (seconds): 70.08
System time (seconds): 3.16
Maximum resident set size (kbytes): 3961492 (3.7 GB)

While the ones for the newly created ParserActions.cpp as follows:

User time (seconds): 104.33
System time (seconds): 3.37
Maximum resident set size (kbytes): 4185600 (3.99 GB)

---------

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-03-10 11:33:47 +00:00
Simon Pilgrim
99fdb5dfd8 [X86] combineConcatVectorOps - convert X86ISD::PALIGNR concatenation to use combineConcatVectorOps recursion (#130572)
Only concatenate X86ISD::PALIGNR nodes if at least one operand is beneficial to concatenate
2025-03-10 11:32:15 +00:00
gbMattN
0ede277b62 Spelling in lit.cfg.py 2025-03-10 11:27:23 +00:00
Florian Hahn
dd86ece554 [MergeFunc] Check full IR and comdat keys in comdat.ll. 2025-03-10 11:11:03 +00:00
Marc Auberer
8d38906d08 [IR] Fix assertion error in User new/delete edge case (#129914)
Fixes #129900

If `operator delete` was called after an unsuccessful constructor call
after `operator new`, we ran into undefined behaviour.
This was discovered by our malfunction tests while preparing an upgrade
to LLVM 20, that explicitly check for such kind of bugs.
2025-03-10 11:53:45 +01:00
Artemiy Bulavin
f3dcc0fe22 [mlir] Refactor ConvertVectorToLLVMPass options (#128219)
The `VectorTransformsOptions` on the `ConvertVectorToLLVMPass` is
currently represented as a struct, which makes it not serialisable. This
means a pass pipeline that contains this pass cannot be represented as
textual form, which breaks reproducer generation and options such as
`--dump-pass-pipeline`.

This PR expands the `VectorTransformsOptions` struct into the two
options that are actually used by the Pass' patterns:
`vector-contract-lowering` and `vector-transpose-lowering` . The other
options present in VectorTransformOptions are not used by any patterns
in this pass.

Additionally, I have changed some interfaces to only take these specific
options over the full options struct as, again, the vector contract and
transpose lowering patterns only need one of their respective options.

Finally, I have added a simple lit test that just prints the pass
pipeline using `--dump-pass-pipeline` to ensure the options on this pass
remain serialisable.

Fixes #129046
2025-03-10 10:32:03 +00:00
Simon Pilgrim
8c8eff2b35 [X86] Add test case showing its not always beneficial to fold concat(pack(),pack()) -> pack(concat(),concat()) 2025-03-10 10:23:30 +00:00