Commit Graph

561044 Commits

Author SHA1 Message Date
Mehdi Amini
b4c30b0e1e Fix LLVM test to use %python instead of python
This uses lit substitution, which fixes running this test on
some environment where 'python' isn't in the path.
2025-12-01 05:53:02 -08:00
Ryan Holt
b27301ff5d [mlir][linalg] Re-enable linalg runtime verification test (#170129)
Test seems to pass after re-enabling without any additional changes.
2025-12-01 08:52:20 -05:00
David Green
c25ad27174 [AArch64] Remove unused references to MVT::f80. (#169545)
These f80 fp types are only supported on X86 and can be removed from
AArch64. It looks like they were copied from another backend by mistake.
2025-12-01 13:43:16 +00:00
Ryotaro Kasuga
d431f38860 [DA] Add tests for GCD MIV misses dependency due to overflow (NFC) (#169926)
Add two test cases where dependencies are missed due to overflows. These
will be fixed by #169927 and #169928, respectively.
2025-12-01 22:36:01 +09:00
Robert Imschweiler
8808beeb1a Reland: [OpenMP] Implement omp_get_uid_from_device() / omp_get_device_from_uid() (#168554)
Reland https://github.com/llvm/llvm-project/pull/164392 with Fortran support moved to follow-up PR
2025-12-01 14:18:31 +01:00
Jasmine Tang
4a6451af7b Fix typo in attr.td: Avaiable -> Available (#170116)
Follow up to #163618
2025-12-01 12:53:47 +00:00
Simon Pilgrim
05ad84095a [X86] combineConcatVectorOps - add handling to concat sqrt intrinsics together (#170113)
Similar to fdiv, we should be trying to concat these high latency instructions together
2025-12-01 12:45:45 +00:00
Giacomo Castiglioni
d3edc94d11 [MLIR][GPU] subgroup_mma fp64 extension - take 2 (#169061)
This PR re-lands #165873.

This PR extends the gpu.subgroup_mma_* ops to support fp64 type.
The extension requires special handling during the lowering to nvvm due
to the return type for load ops for fragment a and b (they return a
scalar instead of a struct).

The original PR did not guard the new test based on the required
architecture (sm80) which lead to a failure on the cuda runners with T4
GPUs.
2025-12-01 07:39:59 -05:00
Paul Walker
8478de3d00 [LLVM][CodeGen] Remove failure cases when widening EXTRACT/INSERT_SUBVECTOR. (#162308)
This PR implements catch all handling for widening the scalable
subvector operand (INSERT_SUBVECTOR) or result (EXTRACT_SUBVECTOR). It
does this via the stack using masked memory operations. With general
handling available we can add optimiations for specific cases.
2025-12-01 12:32:58 +00:00
Simon Pilgrim
989ac4c9db [X86] Add tests showing failure to concat fp rounding intrinsics together. (#170108) 2025-12-01 12:07:01 +00:00
Sohaib Iftikhar
6157d46259 [MLIR|BUILD]: Fix for 8ceeba838 (#170110) 2025-12-01 12:00:58 +00:00
Ryotaro Kasuga
58770200a7 [DA] Clean up unnecessary member function declarations (#170106)
Follow-up for #169047. The previous PR moved some functions from DA to
Delinearization, but the member function declarations were not updated
accordingly. This patch removes them.
2025-12-01 11:57:09 +00:00
Luke Lau
d0df51bc93 [ConstantRange] Allow casting to the same bitwidth. NFC (#170102)
From the review in
https://github.com/llvm/llvm-project/pull/169527#discussion_r2567122387,
there are some users where we want to extend or truncate a ConstantRange
only if it's not already the destination bitwidth. Previously this
asserted, so this PR relaxes it to just be a no-op, similar to
IRBuilder::createZExt and friends.
2025-12-01 19:51:56 +08:00
Timm Baeder
48931e5e59 [clang][bytecode] Check memcmp builtin for one-past-the-end pointers (#170097)
We can't read from those and will run into an assertion sooner or later.

Fixes https://github.com/llvm/llvm-project/issues/170031
2025-12-01 12:43:35 +01:00
Mehdi Amini
577cd6fb02 [LIT] Workaround the 60 processed limit on Windows (#157759)
Python multiprocessing is limited to 60 workers at most:

6bc65c30ff/Lib/concurrent/futures/process.py (L669-L672)

The limit being per thread pool, we can work around it by using multiple
pools on windows when we want to actually use more workers.
2025-12-01 11:39:25 +00:00
Jan Patrick Lehr
130746addf [MLIR] Fix build after #169982 (#170107) 2025-12-01 12:37:09 +01:00
Jasmine Tang
edd1856686 [WebAssembly] Optimize away mask of 63 for shl ( zext (and i32 63))) (#152397)
Fixes https://github.com/llvm/llvm-project/issues/71844
2025-12-01 11:32:46 +00:00
Simon Pilgrim
0e721b75aa [X86] Add tests showing failure to concat RCPPS + RSQRTPS intrinsics together. (#170098)
Can only do this for 128->256 cases as we can't safely convert to the RCP14/RSQRT14 variants
2025-12-01 11:28:34 +00:00
Simon Pilgrim
6c0a02f2ad [X86] Add tests showing failure to concat sqrt intrinsics together. (#170096)
Similar to fdiv, we should be trying to concat these high latency instructions together
2025-12-01 11:23:43 +00:00
Tom Eccles
bf22687c48 [OMPIRBuilder] CANCEL IF(FALSE) is still a cancellation point (#170095)
From OpenMP 4.0:

> When an if clause is present on a cancel construct and the if
expression
> evaluates to false, the cancel construct does not activate
cancellation.
> The cancellation point associated with the cancel construct is always
> encountered regardless of the value of the if expression.

This wording is retained unmodified in OpenMP 6.0.

This re-opens the already approved PR #164587, which was closed by
accident. The only changes are a rebase.
2025-12-01 11:23:14 +00:00
Tom Eccles
b60a84a46f Revert "[flang][TBAA] refine TARGET/POINTER encoding" (#170105)
Reverts llvm/llvm-project#169544

[Regressed](https://lab.llvm.org/buildbot/#/builders/143/builds/12956)
gfortran test suite
2025-12-01 11:19:12 +00:00
Ming Yan
2c21790983 Revert "[MLIR][SCF] Sink scf.if from scf.while before region into after region in scf-uplift-while-to-for" (#169888)
Reverts llvm/llvm-project#165216
It is implemented in #169892 .
2025-12-01 19:02:02 +08:00
Gergely Bálint
29fef3a51e [BOLT] Improve DWARF CFI generation for pac-ret binaries (#163381)
During InsertNegateRAState pass we check the annotations on
instructions,
to decide where to generate the OpNegateRAState CFIs in the output
binary.

As only instructions in the input binary were annotated, we have to make
a judgement on instructions generated by other BOLT passes.
Incorrect placement may cause issues when an (async) unwind request
is received during the new "unknown" instructions.

This patch adds more logic to make a more informed decision on by taking
into account:
- unknown instructions in a BasicBlock with other instruction have the
same RAState. Previously, if the BasicBlock started with an unknown
instruction,
the RAState was copied from the preceding block. Now, the RAState is
copied from
  the succeeding instructions in the same block.
- Some BasicBlocks may only contain instructions with unknown RAState,
As explained in issue #160989, these blocks already have incorrect
unwind info. Because of this, the last known RAState based on the layout order
is copied.

Updated bolt/docs/PacRetDesign.md to reflect changes.
2025-12-01 12:00:31 +01:00
Ming Yan
8ceeba8381 [MLIR][SCF] Canonicalize redundant scf.if from scf.while before region into after region (#169892)
When a `scf.if` directly precedes a `scf.condition` in the before region
of a `scf.while` and both share the same condition, move the if into the
after region of the loop. This helps simplify the control flow to enable
uplifting `scf.while` to `scf.for`.
2025-12-01 18:54:21 +08:00
Jim Lin
b7721c55fc [RISCV] Remove the duplicate for RV32/RV64 in zicond-fp-select-zfinx.ll. NFC. 2025-12-01 18:36:07 +08:00
Luke Lau
d1500d12be [SelectionDAG] Add SelectionDAG::getTypeSize. NFC (#169764)
Similar to how getElementCount avoids the need to reason about fixed and
scalable ElementCounts separately, this patch adds getTypeSize to do the
same for TypeSize.

It also goes through and replaces some of the manual uses of getVScale
with getTypeSize/getElementCount where possible.
2025-12-01 10:33:50 +00:00
Timm Baeder
b1620996f4 [clang][bytecode] Fix discarding ImplitiValueInitExprs (#170089)
They don't have side-effects, so this should be fine.

Fixes https://github.com/llvm/llvm-project/issues/170064
2025-12-01 11:33:33 +01:00
Luke Lau
2c9e9ffa77 [SCCP] Handle llvm.experimental.get.vector.length calls (#169527)
As noted in the reproducer provided in
https://github.com/llvm/llvm-project/issues/164762#issuecomment-3554719231,
on RISC-V after LTO we sometimes have trip counts exposed to vectorized
loops. The loop vectorizer will have generated calls to
@llvm.experimental.get.vector.length, but there are [some
properties](https://llvm.org/docs/LangRef.html#id2399) about the
intrinsic we can use to simplify it:

- The result is always less than both Count and MaxLanes
- If Count <= MaxLanes, then the result is Count

This teaches SCCP to handle these cases with the intrinsic, which allows
some single-iteration-after-LTO loops to be unfolded.

#169293 is related and also simplifies the intrinsic in InstCombine via
computeKnownBits, but it can't fully remove the loop since
computeKnownBits only does limited reasoning on recurrences.
2025-12-01 10:29:21 +00:00
Tom Eccles
8ec2112ec8 [OMPIRBuilder] re-land cancel barriers patch #164586 (#169931)
A barrier will pause execution until all threads reach it. If some go to
a different barrier then we deadlock. This manifests in that the
finalization callback must only be run once. Fix by ensuring we always
go through the same finalization block whether the thread in cancelled
or not and no matter which cancellation point causes the cancellation.

The old callback only affected PARALLEL, so it has been moved into the
code generating PARALLEL. For this reason, we don't need similar changes
for other cancellable constructs. We need to create the barrier on the
shared exit from the outlined function instead of only on the cancelled
branch to make sure that threads exiting normally (without cancellation)
meet the same barriers as those which were cancelled. For example,
previously we might have generated code like

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  // Populated by the callback:
  // unsafe: if any thread makes it to the end without being cancelled
  // it won't reach this barrier and then the program will deadlock
  %unused = call i32 @__kmpc_cancel_barrier(...)
  br label %fini

fini:
  // run destructors etc
  ret
```

In the new version the barrier is moved into fini. I generate it *after*
the destructors because the standard describes the barrier as occurring
after the end of the parallel region.

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  br label %fini

fini:
  // run destructors etc
  // safe so long as every exit from the function happens via this block:
  %unused = call i32 @__kmpc_cancel_barrier(...)
  ret
```

To achieve this, the barrier is now generated alongside the finalization
code instead of in the callback. This is the reason for the changes to
the unit test.

I'm unsure if I should keep the incorrect barrier generation callback
only on the cancellation branch in clang with the OMPIRBuilder backend
because that would match clang's ordinary codegen. Right now I have
opted to remove it entirely because it is a deadlock waiting to happen.

---

This re-lands #164586 with a small fix for a failing buildbot running
address sanitizer on clang lit tests.

In the previous version of the patch I added an insertion point guard
"just to be safe" and never removed it. There isn't insertion point
guarding on the other route out of this function and we do not
preserve the insertion point around getFiniBB either so it is not
needed here.

The problem flagged by the sanitizers was because the saved insertion
point pointed to an instruction which was then removed inside the FiniCB
for some clang codegen functions. The instruction was freed when it was
removed. Then accessing it to restore the insertion point was a use
after free bug.
2025-12-01 10:07:19 +00:00
Tom Eccles
34c44f21ae [flang][TBAA] refine TARGET/POINTER encoding (#169544)
Previously we were less specific for POINTER/TARGET: encoding that they
could alias with (almost) anything.

In the new system, the "target data" tree is now a sibling of the other
trees (e.g. "global data"). POITNTER variables go at the root of the
"target data" tree, whereas TARGET variables get their own nodes under
that tree. For example,

```
integer, pointer :: ip
real, pointer :: rp
integer, target :: it
integer, target :: it2(:)
real, target :: rt
integer :: i
real :: r
```
- `ip` and `rp` may alias with any variable except `i` and `r`.
- `it`, `it2`, and `rt` may alias only with `ip` or `rp`.
- `i` and `r` cannot alias with any other variable.

Fortran 2023 15.5.2.14 gives restrictions on entities associated with
dummy arguments. These do not allow non-target globals to be modified
through dummy arguments and therefore I don't think we need to make all
globals alias with dummy arguments.

I haven't implemented it in this patch, but I wonder whether it is ever
possible for `ip` to alias with `rt` or even `it2`.

While I was updating the tests I fixed up some tests that still assumed
that local alloc tbaa wasn't the default.

I found no functional regressions in the gfortran test suite, fujitsu
test suite, spec2017, or a selection of HPC apps we test internally.
2025-12-01 10:05:56 +00:00
Benjamin Maxwell
1317083530 [AArch64][SME] Support saving/restoring ZT0 in the MachineSMEABIPass (#166362)
This patch extends the MachineSMEABIPass to support ZT0. This is done
with the addition of two new states:

- `ACTIVE_ZT0_SAVED`
  * This is used when calling a function that shares ZA, but does not 
    share ZT0 (i.e., no ZT0 attributes)
  * This state indicates ZT0 must be saved to the save slot, but ZA must 
    remain on, with no lazy save setup
- `LOCAL_COMMITTED`
  * This is used for saving ZT0 in functions without ZA state
  * This state indicates ZA is off and ZT0 has been saved
  * This state is general enough to support ZA, but the required 
    transitions have not been implemented†

To aid with readability, the state transitions have been reworked to a
switch of `transitionFrom(<FromState>).to(<ToState>)`, rather than 
nested ifs, which helps manage more transitions.

† This could be implemented to handle some cases of undefined behavior
better.
2025-12-01 09:55:49 +00:00
Igor Wodiany
dda15ad0aa [mlir][spirv] Use MapVector for BlockMergeInfoMap (#169636)
This should ensure that the structurizer while loop is deterministic
across runs. Use of `MapVector` addresses the source of the
nondeterminism which is use of a `Block*` as a map key.

fixes #128547
2025-12-01 09:43:25 +00:00
Gergely Bálint
8e6fb0ee84 Reapply "[BOLT][BTI] Skip inlining BasicBlocks containing indirect tailcalls" (#169881) (#169929)
This reapplies commit 5d6d74359d.

Fix: added assertions to the requirements of the test

--------

Original commit message:

In the Inliner pass, tailcalls are converted to calls in the inlined
BasicBlock. If the tailcall is indirect, the `BR` is converted to `BLR`.

These instructions require different BTI landing pads at their targets.

As the targets of indirect tailcalls are unknown, inlining such blocks
is unsound for BTI: they should be skipped instead.
2025-12-01 10:20:23 +01:00
Steven Wu
8079d033c9 [CAS] Temporarily skip tests on old windows version (#170063) 2025-12-01 17:10:39 +08:00
Carlos Galvez
eb711d8e14 [clang-tidy][doc] Fix incorrect link syntax in cppcoreguidelines-pro-… (#170088)
…bounds-avoid-unchecked-container-access

Missing a trailing underscore to render it as a link.

Co-authored-by: Carlos Gálvez <carlos.galvez@zenseact.com>
2025-12-01 09:50:19 +01:00
Matthias Springer
147c466bcd [mlir][arith] Add support for min/max to ArithToAPFloat (#169760)
Add support for `arith.minnumf`, `arith.maxnumf`, `arith.minimumf`,
`arith.maximumf`.
2025-12-01 08:50:02 +00:00
ShashwathiNavada
9afb651613 Adding support for iterator in motion clauses. (#159112)
As described in section 2.14.6 of openmp spec, the patch implements
support for iterator in motion clauses.

---------

Co-authored-by: Shashwathi N <nshashwa@pe31.hpc.amslabs.hpecorp.net>
2025-12-01 14:03:32 +05:30
Matthias Springer
05b1989551 [mlir][arith] Add support for negf to ArithToAPFloat (#169759)
Add support for `arith.negf`.
2025-12-01 08:28:23 +00:00
Matthias Springer
f67b018470 [mlir][SPIRV] Improve ub.unreachable lowering test case (#170083)
Addresses a comment on the PR that introduces the ub.reachable ->
spriv.Unreachable lowering
(https://github.com/llvm/llvm-project/pull/169872#discussion_r2573670611).
2025-12-01 08:15:15 +00:00
Abhishek Varma
7ce71414ec [NFC][Linalg] Follow-up on ConvMatchBuilder (#170080)
-- This commit addresses [follow-up review comments on
169704](https://github.com/llvm/llvm-project/pull/169704#pullrequestreview-3521785548).
-- Contains NFC nit/minor changes.

Signed-off-by: Abhishek Varma <abhvarma@amd.com>
2025-12-01 13:44:15 +05:30
David Sherwood
17677ad7eb [LV] Don't create WidePtrAdd recipes for scalar VFs (#169344)
While attempting to remove the use of undef from more loop vectoriser
tests I discovered a bug where this assert was firing:

```
llvm::Constant* llvm::Constant::getSplatValue(bool) const: Assertion `this->getType()->isVectorTy() && "Only valid for vectors!"' failed.
...
 #8 0x0000aaaab9e2fba4 llvm::Constant::getSplatValue
 #9 0x0000aaaab9dfb844 llvm::ConstantFoldBinaryInstruction
```

This seems to be happening because we are incorrectly generating
WidePtrAdd recipes for scalar VFs. The PR fixes this by checking whether
a plan has a scalar VF only in legalizeAndOptimizeInductions.

This PR also removes the use of undef from the test `both` in
Transforms/LoopVectorize/iv_outside_user.ll, which is what started
triggering the assert.

Fixes #169334
2025-12-01 08:12:41 +00:00
Matthias Springer
4d7abe5355 [mlir][arith] Add support for cmpf to ArithToAPFloat (#169753)
Add support for `arith.cmpf`.
2025-12-01 09:12:11 +01:00
Vasily Leonenko
a751ed97ac [BOLT] Support runtime library hook via DT_INIT_ARRAY (#167467)
Major part of this PR is commit implementing support for DT_INIT_ARRAY
for BOLT runtime libraries initialization. Also, it adds related
hook-init test & fixes couple of X86 instrumentation tests.

This commit follows implementation of instrumentation hook via
DT_FINI_ARRAY (https://github.com/llvm/llvm-project/pull/67348) and
extends it for BOLT runtime libraries (including instrumentation
library) initialization hooking.

Initialization has has differences compared to finalization:
- Executables always use ELF entry point address. Update code checks it
and updates init_array entry if ELF is shared library (have no interp
entry) and have no DT_INIT entry. Also this commit introduces
"runtime-lib-init-hook" option to select primary initialization hook
(entry_point, init, init_array) with fall back to next available hook in
input binary. e.g. in case of libc we can explicitly set it to
init_array.
- Shared library init_array entries relocations usually has
R_AARCH64_ABS64 type on AArch64 binaries. We check relocation type and
adjust methods for reading init_array relocations in discovery and
update methods.

---------

Co-authored-by: Vasily Leonenko <vasily.leonenko@huawei.com>
2025-12-01 10:55:00 +03:00
Timm Baeder
bbb0dbadfa [clang][AST] Add RecordDecl::getNumFields() (#170022)
Not sure why that didn't exist yet, but we have quite a few places using
the same `std::distance` pattern.
2025-12-01 08:33:54 +01:00
Luke Lau
dc5ce79cc1 [LV] Regenerate some check lines. NFC
The scalar loop doesn't exist anymore after 8907b6d393
2025-12-01 15:25:08 +08:00
Yingwei Zheng
9416b19e4f [InstCombine] Add missing constant check (#170068)
`cast<Constant>` is not guarded by a type check during canonicalization
of predicates. This patch adds a type check in the outer if to avoid the
crash. `dyn_cast` may introduce another nested if, so I just use
`isa<Constant>` instead.

Address the crash reported in
https://github.com/llvm/llvm-project/pull/153053#issuecomment-3593914124.
2025-12-01 15:20:45 +08:00
Jason Molenda
036279addf [lldb][debugserver] Return shared cache filepath in jGetSharedCacheInfo (#168474)
Add a "shared_cache_path" key-value to the jGetSharedCacheInfo response,
if we can fetch the shared cache path.

If debugserver and the inferior process are running with the same shared
cache UUID, there is a simple SPI to get debugserver's own shared cache
filepath and we will return that.

On newer OSes, there are SPI we can use to get the inferior process'
shared cache filepath, use that if necessary and the SPI are available.

The response for the jGetSharedCacheInfo packet will now look like


{"shared_cache_base_address":6609256448,"shared_cache_uuid":"B69FF43C-DBFD-3FB1-B4FE-A8FE32EA1062","no_shared_cache":false,"shared_cache_private_cache":false,"shared_cache_path":"/System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_arm64e"}

when we have the full information about the shared cache in the
inferior. There are three possible types of responses:

1. inferior has not yet mapped in a shared cache (read: when stopped at
dyld_start and dyld hasn't started executing yet). In this case, no
"shared_cache_path" is listed. ("shared_cache_base_address" will be 0,
"shared_cache_uuid" will be all-zeroes uuid)

2. inferior has a shared cache, but it is different than debugserver's
and we do not have the new SPI to query the shared cache filepath. No
"shared_cache_path" is listed.

3. We were able to find the shared cache filepath, and it is included in
the response, as above.

I'm not using this information in lldb yet, but changes that build on
this will be forthcoming.

rdar://148939795
2025-11-30 21:40:13 -08:00
Men-cotton
81c5d468cf [MLIR][NVVM] Propagate verification failure for unsupported SM targets (#170001)
Fixes: https://github.com/llvm/llvm-project/issues/169113

Correctly propagate verification failure when
`NVVM::RequiresSMInterface` check fails during `gpu.module`
verification.
Previously, the walk was interrupted but the function returned
`success()`, causing a mismatch between the emitted diagnostic and the
return status. This led to assertion failures in Python bindings which
expect `failure()` when diagnostics are emitted.

CC: @grypp
2025-12-01 09:50:13 +05:30
Brandon Wu
e2181400d7 [RISCV][llvm] Correct shamt in P extension EXTRACT_VECTOR_ELT lowering (#169823)
During operation legalization, element type should have been turn into
XLenVT which makes the SHL a no-op. We need to use exact vector element
type instead.
2025-12-01 11:03:50 +08:00
Matt Arsenault
6369279a0c Revert "Revert "LangRef: Clarify llvm.minnum and llvm.maxnum about sNaN and signed zero (#112852)"" (#170067)
Reverts llvm/llvm-project#168838

Justification is confused and this did not receive adequate discussion,
particularly during a holiday week
2025-12-01 02:56:47 +00:00