Commit Graph

561300 Commits

Author SHA1 Message Date
Nico Weber
e05fffbbc5 Revert "[Clang] Add __builtin_common_reference (#121199)"
This reverts commit 3b9e203364.
Causes not-yet-understood semantic differences, see commits
on #121199.
2025-12-02 19:37:16 -05:00
Aiden Grossman
c5e9289ba5 [llvm-exegesis] Make rvv/filter.test deterministic
This should prevent the flaky failures that have been plaguing the
buildbots since the test was introduced and allow for offline
investigation without disrupting CI.

Reviewers: topperc, mshockwave

Reviewed By: mshockwave

Pull Request: https://github.com/llvm/llvm-project/pull/170014
2025-12-02 16:36:29 -08:00
Zachary Fogg
325a08267d [lldb] Fix Doxygen warning in SBTrace.h (#170394)
Remove errant `\a` command before `<directory>` in `SaveToDisk`
documentation. The `\a` Doxygen command expects a word argument, but
`<directory>` starts with `<` which Doxygen interprets as HTML. This
fixes:

```
llvm-project/lldb/include/lldb/API/SBTrace.h:60:
Warning 564: Error parsing Doxygen command a: No word followed the command. Command ignored.
```
2025-12-02 16:36:01 -08:00
Max Desiatov
94c8940f44 lldbgdbremote.md: Update qWasmLocal result description (#170393)
The current description mistakenly specified that an address of a local
value in some address space is returned. When testing this with Wasm
runtimes that already implement this command, it can be observed that
the value itself is returned. The value itself may be an address for
languages that use shadow stack in Wasm linear memory, but the value of
an arbitrary local does not always contain that address.
2025-12-02 16:27:14 -08:00
Matt Arsenault
9fd288e886 clang/AMDGPU: Enable opencl 2.0 features for unknown target (#170308)
Assume amdhsa triples support flat addressing, which matches
the backend logic for the default target. This fixes the
rocm device-libs build.
2025-12-02 19:11:30 -05:00
Stanislav Mekhanoshin
9dd3346589 [AMDGPU] Prevent folding of flat_scr_base_hi into a 64-bit SALU (#170373)
Fixes: SWDEV-563886
2025-12-02 16:08:00 -08:00
Farzon Lotfi
dd1b4abfb7 [HLSL][Matrix] Add support for Matrix element and trunc Casts (#168915)
fixes #168737
fixes #168755

This change fixes adds support for Matrix truncations via the
ICK_HLSL_Matrix_Truncation enum. That ends up being most of the files
changed.

It also allows Matrix as an HLSL Elementwise cast as long as the cast
does not perform a shape transformation ie 3x2 to 2x3.

Tests for the new elementwise and truncation behavior were added. As
well as sema tests to make sure we error n the shape transformation
cast.

I am punting right now on the ConstExpr Matrix support. That will need
to be addressed later. Will file a seperate issue for that if reviewers
agree it can wait.
2025-12-02 19:02:25 -05:00
David Stone
45918f50aa [llvm][NFC] In SetVector, contains and count now automatically accept const T * arguments when the key is T * (#170377)
Also use `is_contained` to implement `contains`, since this tries the
`contains` member function of the set type first.
2025-12-02 17:02:14 -07:00
David Stone
6c32535b20 [clang][NFC] Remove unused CFGStmtMap.h includes (#170383) 2025-12-02 17:02:00 -07:00
Mircea Trofin
e9c127428c [LTT] mark the CFI jumptable naked on Windows (#170371)
We were not marking the `.cfi.jumptable`​ functions as `naked`​ on windows. The referenced bug (https://llvm.org/bugs/show_bug.cgi?id=28641#c3) appears to be fixed:

```bash
build/bin/opt -S -passes=lowertypetests -mtriple=i686-pc-win32 llvm/test/Transforms/LowerTypeTests/function.ll | build/bin/llc -O0
```

```
L_.cfi.jumptable:                       # @.cfi.jumptable
# %bb.0:                                # %entry
        #APP
        jmp     _f.cfi@PLT
        int3
        int3
        int3

        #NO_APP
        #APP
        jmp     _g.cfi@PLT
        int3
        int3
        int3

        #NO_APP
                                        # -- End function
        .section        .rdata,"dr"
        .p2align        4, 0x0                          # @0

```

Not seeing the spilled registers described in the bug anymore.
2025-12-02 15:47:35 -08:00
Thibault Monnier
6bdb838a05 [CIR] Upstream vec shuffle builtins in CIR codegen (#169178)
This PR is part of #167752. It upstreams the codegen and tests for the
shuffle builtins implemented in the incubator, including:
- `vinsert` + `insert`
- `pblend` + `blend`
- `vpermilp`
- `pshuf` + `shufp`
- `palignr`

It does NOT upstream the `perm`, `vperm2`, `vpshuf`, `shuf_i` / `shuf_f`
and `align` builtins, which are not yet implemented in the incubator.

This _is_ a large commit, but most of it is tests.

The `pshufd` / `vpermilp` builtins seem to have no test coverage in the
incubator, what should I do?
2025-12-02 15:29:12 -08:00
Drew Kersnar
9c78bc5de4 Revert "[LSV] Merge contiguous chains across scalar types" (#170381)
Reverts llvm/llvm-project#154069. I pointed out a number of issues
post-merge, most importantly examples of miscompiles:
https://github.com/llvm/llvm-project/pull/154069#issuecomment-3603854626.

While the motivation of the change is clear, I think the implementation
approach is flawed. It seems like the goal is to allow elements like
`load <2xi16>` and `load i32` to be vectorized together despite the
current algorithm not grouping them into the same equivalence classes. I
personally think that if we want to attempt this it should be a more
wholistic approach, maybe even redefining the concept of an equivalence
class. This current solution seems like it would be really hard to do
bug-free, and even if the bugs were not present, it is only able to
merge chains that happen to be adjacent to each other after
`splitChainByContiguity`, which seems like it is leaving things up to
chance whether this optimization kicks in. But we can discuss more in
the re-land. Maybe the broader approach I'm proposing is too difficult,
and a narrow optimization is worthwhile. Regardless, this should be
reverted, it needs more iteration before it is correct.
2025-12-02 18:27:58 -05:00
Hendrik Hübner
e5f1d025aa [CIR] Lower calls to trivial copy constructor to cir::CopyOp (#168281)
This PR is a follow up to #167975 and replaces calls to trivial copy
constructors with `cir::CopyOp`.

---------

Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
Co-authored-by: Henrich Lauko <henrich.lau@gmail.com>
2025-12-02 15:22:46 -08:00
Shilei Tian
dbb702fbcb [NFC][AMDGPU] Remove trailing white spaces in AMDGPU.td 2025-12-02 18:17:09 -05:00
Björn Pettersson
0f235c346c [LowerConstantIntrinsics] Improve tests related to llvm.objectsize. NFC (#132364)
Adding some new test cases (including FIXME:s) to highlight some bugs
related to lowering of llvm.objectsize.

One special case is when there are getelementptr instruction with index
types that are larger than the index type size for the pointer being
analysed. This will add a couple of tests to show what happens both when
using a smaller and larger index type, and when having out-of-bounds
indices (both too large and negative).
2025-12-02 23:12:42 +00:00
Petar Avramovic
aeea056f60 AMDGPU/GlobalISel: Report RegBankLegalize errors using reportGISelFailure (#169918)
Use standard GlobalISel error reporting with reportGISelFailure
and pass returning false instead of llvm_unreachable.
Also enables -global-isel-abort=0 or 2 for -global-isel -new-reg-bank-select.
Note: new-reg-bank-select with abort 0 or 2 runs LCSSA,
while "intended use" without abort or with abort 1 does not run LCSSA.
2025-12-02 23:49:21 +01:00
Alex Duran
ec6091f4de [OFFLOAD][LIBOMPTARGET] Start to update debug messages in libomptarget (#170265)
* Add compatibility support for DP and REPORT macros 
* Define a set of predefined Debug Type for libomptarget
* Start to update libomptarget files (OffloadRTL.cpp, device.cpp)
2025-12-02 23:45:23 +01:00
Valentin Clement (バレンタイン クレメン)
9885aed474 [flang][cuda] Add address cast for src and dst in TMA operations (#170375)
src and dst pointer needs to have an address cast
2025-12-02 22:31:55 +00:00
Helena Kotas
434127b0c1 [HLSL] Static resources (#166880)
This change fixes couple of issues with static resources:
- Enables assignment to static resource or resource array variables (fixes #166458)
- Initializes static resources and resource arrays with default constructor that sets the handle to poison
2025-12-02 22:25:17 +00:00
John Harrison
fff45ddcc0 [lldb-dap] Follow the spec more closely on 'initialize' arguments. (#170350)
Updates `InitializeRequestArguments` to correctly follow the spec, see
https://microsoft.github.io/debug-adapter-protocol/specification#Requests_Initialize.

This should correct which fields are tracked as optional and simplifies
some of the types to make sure they're meaningful (e.g. an
`optional<bool>` isn't anymore helpful than a `bool` since undefined and
false are basically equivalent and it requires us to handle interpreting undefined as the default value in all the places we use the `optional<bool>`).
2025-12-02 14:19:05 -08:00
Florian Hahn
41519b390f [SCEV] Add UDiv canonicalization tests with nested AddRecs.
Add more tests for follow-up to
https://github.com/llvm/llvm-project/pull/169576.
2025-12-02 22:18:16 +00:00
Valentin Clement (バレンタイン クレメン)
d3256d935d [flang][cuda] Add alignment to shared memory operation (#170372)
Shared memory for TMA operation needs to be align to 16. Add ability to
set an alignment on the cuf.shared_memory operation.
2025-12-02 22:13:19 +00:00
Florian Hahn
bd5fa63335 [VPlan] Remove duplicated computeCost call (NFC).
Remove a redundant duplicated computeCost call. NFC, just skipping an
unneeded call.
2025-12-02 21:59:40 +00:00
Erich Keane
4006df9b32 [OpenACC][CIR] Implement 'nohost' lowering. (#170369)
This clause is pretty small/trivial and is a simple 'set a bool' value
on the IR node, so its implementation is quite simple. We create the
Operation with this as 'false', so the 'nohost' marks it as true always.
2025-12-02 21:56:42 +00:00
Florian Hahn
f0e1254bce [LV] Use forced cost once for whole interleave group in legacy costmodel (#168270)
The VPlan-based cost model assigns the forced cost once for a whole
VPInterleaveRecipe. Update the legacy cost model to match this behavior.
This fixes a cost-model divergence, and assigns the cost in a way that
matches the generated code more accurately.

PR: https://github.com/llvm/llvm-project/pull/168270
2025-12-02 21:39:54 +00:00
Jason Macnak
139ebfa63d [Bazel] Fix --warn-backrefs errors in Analysis target (#170357)
Commit b262785 introduced a separate `AnalysisFpExc` target to try to
workaround the lack of a bazel equivalent of single source file
properties. However, this introduces backref errors when
`--warn-backrefs` is enabled.

This change alternatively just adds the `-ftrapping-math` copt to the
entire `Analysis` target.

Fix suggested by @rocallahan.
2025-12-02 15:32:52 -06:00
asmok-g
d97746c56b [libc++] Fix the rest of __gnu_cxx::hash_XXX copy construction (#160525)
Co-authored-by: Alexander Kornienko <alexfh@google.com>
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
2025-12-02 22:18:50 +01:00
Andy Kaylor
12ae72744c [CIR] Upstream support for builtin_constant_p (#170354)
This upstreams the handler for the BI__builtin_constant_p function.
2025-12-02 21:15:17 +00:00
Kyungtak Woo
c77fe5845e [bazel] update bazel build for PluginScriptedProcess (#170364)
Adding the following dependencies to PluginScriptedProcess:
-         "//lldb:CoreHeaders",
-         "//lldb:SymbolHeaders",
-         "//llvm:Support",

For c50802cbee
2025-12-02 15:04:07 -06:00
Erich Keane
c910d821dc [OpenACC][CIR] Add worker/vector clause lowering for Routine (#170358)
These two are both incredibly similar and simple, basically identical to
'seq'. This patch adds them both together.
2025-12-02 12:58:11 -08:00
Yaxun (Sam) Liu
0bb987f409 Revert "[CUDA][HIP] Fix CTAD for host/device constructors (#168711)"
This reverts commit e719e93d41.

revert this since it caused regression in our internal CI.

Deduction guide with host/device attrs have already been
used in

https://github.com/ROCm/rocm-libraries/blob/develop/projects/rocrand/library/src/rng/utils/cpp_utils.hpp#L249

```
template<class V>
__host__ __device__ vec_wrapper(V) -> vec_wrapper<V>;
```
2025-12-02 15:42:22 -05:00
Andy Kaylor
ca3de05eca [CIR][NFC] Fix a release build warning (#170359)
This moves a call inside an assert to avoid a warning about the result
variable being unused in release builds.
2025-12-02 20:29:04 +00:00
Philip Reames
49a9787128 [SCEV] Regenerate a subset of auto updated tests
Reducing spurious diff in an upcoming change.
2025-12-02 12:16:53 -08:00
Razvan Lupusoru
b50a590984 [acc][flang] Add genLoad and genStore to PointerLikeType (#170348)
This patch extends the OpenACC PointerLikeType interface with two new
methods for generating load and store operations, enabling
dialect-agnostic memory access patterns.

New Interface Methods:
- genLoad(builder, loc, srcPtr, valueType): Generates a load operation
from a pointer-like value. Returns the loaded value.

- genStore(builder, loc, valueToStore, destPtr): Generates a store
operation to a pointer-like value.

Implementations provided for FIR pointer-like types, memref type (rank-0
only), and LLVM pointer types.

Extended TestPointerLikeTypeInterface.cpp with 'load' and 'store' test
modes.
2025-12-02 12:09:32 -08:00
Erich Keane
6dd639ec9e [CIR][OpenACC] Implement 'routine' lowering + seq clause (#170207)
The 'routine' construct just adds a acc.routine element to the global
module, which contains all of the information about the directive. it
contains a reference to the function, which also contains a reference to
the acc.routine, which this generates.

This handles both the implicit-func version (where the routine is
    spelled without parens, and just applies to the next function) and
the explicit-func version (where the routine is spelled with the func
    name in parens).

The AST stores the directive in an OpenACCRoutineDeclAttr in the
implicit case, so we can emit that when we hit the function declaration.
The explicit case is held in an OpenACCRoutineAnnotAttr on the function,
however, when we emit the function we haven't necessarily seen the
construct yet, so we can't depend on that attribute. Instead, we save up
the list in Sema so that we can emit them all at the end.

This results in the tests getting really hard to read (because ordering
is a little awkward based on spelling, with no way to fix it), so we
instead split the tests up based on topic.

One last thing: Flang spends some time determining if the clause lists
of two routines on the same function are identical, and omits the
duplicates. However, it seems to do a poor job on this when the ordering
isn't the same, or references are slightly different. This patch doesn't
bother trying that, and instead emits all, trusting the ACC dialect to
remove duplicates/handle duplicates gracefully.

Note; This doesn't cause emission of functions that would otherwise not
be emitted, but DOES emit routine references based on which function
they are attached to.
2025-12-02 11:55:14 -08:00
David Peixotto
fae64adaa6 [lldb] Handle deref of register and implicit locations (#169419)
This commit modifies the dwarf expression evaluator in how we handle the
deref operation for register and implicit locations on the stack. For a
typical memory location a deref operation will read the value from
memory. For register and implicit locations the deref operation will
read the value from the register or its implicit location. In lldb we
eagerly read register and implicit values and push them on the stack so
the deref operation for these becomes a "no-op" that leaves the value on
the stack and updates the tracked location kind.

The motivation for this change is to handle `DW_OP_deref*` operations on
location descriptions as described by the heterogenious debugging
[extensions](https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/llvm/html/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html#a-2-5-4-4-4-register-location-description-operations).

Specifically, for register locations it states

> These operations obtain a register location. To fetch the contents of
> a register, it is necessary to use DW_OP_regval_type, use one of the
> DW_OP_breg* register-based addressing operations, or use DW_OP_deref*
on
> a register location description.

My understanding is that this is the intended behavior from dwarf5 as
well and is not a change in behavior.
2025-12-02 11:13:48 -08:00
Krzysztof Drewniak
3f2e3e67c1 [mlir][AMDGPU][NFC] Fix overlapping masked load refinements (#159805)
The two paterns for handlig vector.maskedload on AMD GPUs had an overlap
- both the "scalar mask becomes an if statement" pattern and the "masked
loads become a normal load + a select on buffers" patterns could handle
a load with a broadcast mask on a fat buffer resource.

This commet add checks to resolve the overlap.
2025-12-02 11:02:45 -08:00
Med Ismail Bennani
c50802cbee Reland "[lldb] Introduce ScriptedFrameProvider for real threads (#161870)" (#170236)
This patch re-lands #161870 with fixes to the previous test failures.

rdar://161834688

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
2025-12-02 18:59:40 +00:00
David Green
879dddf2b4 [AArch64] Add tests for umulh. NFC 2025-12-02 18:58:32 +00:00
LLVM GN Syncbot
6e262aa8ba [gn build] Port 41a53c0a23 2025-12-02 18:51:35 +00:00
Erick Ochoa Lopez
73979c1df9 [mlir][amdgpu] Lower amdgpu.make_dma_base (#169817)
* Adds lowering for `amdgpu.make_dma_base`
2025-12-02 13:48:31 -05:00
Changpeng Fang
697b1be09c [AMDGPU][NFC] Put gfx125x common features into 12_50_Common (#170338) 2025-12-02 10:47:00 -08:00
Robert Imschweiler
5c3c0020af [NFC] Refactor TargetLowering::getTgtMemIntrinsic to take CallBase parameter (#170334)
cf.
https://github.com/llvm/llvm-project/pull/133907#discussion_r2578576548
2025-12-02 19:42:31 +01:00
hjagasiaAMD
2183846a15 [AMDGPU] Fix AGPR_32 reg assign for mfma scale ops (#168964)
In MFMA rewrite pass, prevent AGPR_32 reg class assignment for scale
operands, not permitted by instruction format.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-12-02 13:41:16 -05:00
Med Ismail Bennani
41a53c0a23 [lldb/Target] Add BorrowedStackFrame and make StackFrame methods virtual (#170191)
This change makes StackFrame methods virtual to enable subclass
overrides and introduces BorrowedStackFrame, a wrapper that presents an
existing StackFrame with a different frame index.

This enables creating synthetic frame views or renumbering frames
without copying the underlying frame data, which is useful for frame
manipulation scenarios.

This also adds a new borrowed-info format entity to show what was the
original frame index of the borrowed frame.

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
2025-12-02 10:41:03 -08:00
Nick Sarnie
1a3709cc7e [SPIRV] Error for zero-length arrays if not a shader (#169732)
I had a case where the frontend was generating a zero elem array in
non-shader code so it was just crashing in a release build.
Add a real error and make it not crash.

---------

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2025-12-02 18:24:59 +00:00
Jasmine Tang
e0db7f347c [WebAssembly] Optimize away mask of 63 for sra and srl( zext (and i32 63))) (#170128)
Follow up to #71844 after shl implementation
2025-12-02 18:23:17 +00:00
Shubham Sandeep Rastogi
23a22d0497 [SROA] Unify the names of new instructions created in SROA. (#167917)
In Debug builds, the names of adjusted pointers have a pointer-specific
name prefix which doesn't exist in non-debug builds.

This causes differences in output when looking at the output of SROA
with a Debug or Release compiler.

For most of our ongoing testing, we use essentially Release+Asserts
build (basically release but without NDEBUG defined), however we ship a
Release compiler. Therefore we want to say with reasonable confidence
that building a large project with Release vs a Release+Asserts build
gives us the same output when the same compiler version is used.

This difference however, makes it difficult to prove that the output is
the same if the only difference is the name when using LTO builds and
looking at bitcode.

Hence this change is being proposed.
2025-12-02 10:12:20 -08:00
serge-sans-paille
4587fe6be8 [lld] Fix typo in lld manpage, nfc (#170299) 2025-12-02 18:11:27 +00:00
Matt Arsenault
2c38632639 LTO: Remove unused TargetLibraryInfo include (#170340) 2025-12-02 18:10:48 +00:00