Commit Graph

561896 Commits

Author SHA1 Message Date
Shreeyash Pandey
2a5420ea51 [libc] move abs_timesout and monotonicity out of linux dir (#167719)
This patch moves abs_timeout and monotonicity out of the linux dir into
common. Both of these functions depend on clock_gettime which is the
actual os-dependent component. As other features in `__support/threads`
may want to use these, it's better to share it in common.
2025-12-08 22:14:12 +05:30
Men-cotton
614fe6da14 [mlir][OpenMP] Fix crash in MapInfoOp conversion when type conversion fails (#171045)
Check the result of `convertType` before calling `TypeAttr::get`. This
prevents a crash on unsupported types (e.g. `tensor`) by ensuring the
pattern fails gracefully.

Added regression test: map-info-type-conversion-fail.mlir

Fixes: #108159
2025-12-08 17:30:22 +01:00
Rana Pratap Reddy
b32a2f418a [Clang][OpenCL][AMDGPU] Allow _Float16 and half vector type compatibility (#170605)
## Summary
Allowing implicit compatibility between `_Float16` vector types and
`half` vector types in OpenCL mode. This enables AMDGPU builtins to work
correctly across OpenCL, HIP, and C++ without requiring separate builtin
definitions.
## Problem Statement
When using AMDGPU image builtins that return half-precision vectors in
OpenCL, users encounter type incompatibility errors:
**Builtin Definition:**
`TARGET_BUILTIN(__builtin_amdgcn_image_load_1d_v4f16_i32, "V4xiiQtii",
"nc", "image-insts")`

**Test Case:**
```
typedef half half4 __attribute__((ext_vector_type(4)));
half4 test_builtin_image_load_1d_2(half4 v4f16, int i32, __amdgpu_texture_t tex) {
  return __builtin_amdgcn_image_load_1d_v4f16_i32(100, i32, tex, 120, i32);
}
```
**Error:**
```
error: returning '__attribute__((__vector_size__(4 * sizeof(_Float16)))) _Float16' 
(vector of 4 '_Float16' values) from a function with incompatible result type 
'half4' (vector of 4 'half' values)
```
## Solution
In OpenCL, allow implicit compatibility between `_Float16` vector types
and `half` vector types. This is needed for AMDGPU builtins that may
return _Float16 vectors to work correctly with OpenCL half vector types.
2025-12-08 21:56:35 +05:30
Michael Liao
fccb65ef8f [mlir] Fix '-Wtemplate-id-cdtor'. NFC 2025-12-08 11:22:48 -05:00
Simon Pilgrim
a05fc9edb9 HexagonGenWideningVecInstr.cpp - fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC. (#171095) 2025-12-08 16:17:25 +00:00
zhijian lin
d1ad0856f8 Fix [PowerPC] llc crashed at -O1/O2/O3: Assertion `isImm() && "Wrong MachineOperand mutator"' failed. (#170548)
Fixed issue 
[[PowerPC] llc crashed at -O1/O2/O3: Assertion `isImm() && "Wrong
MachineOperand mutator"'
failed.](https://github.com/llvm/llvm-project/issues/167672)

the root cause of the crash, the IMM operand is in different operand num
of the instruction PPC::XXSPLTW and PPC::XXSPLTB/PPC::XXSPLTH.

and the patch also fix a potential bug that the new element index of
PPC::XXSPLTB/PPC::XXSPLTH/XXSPLTW use the same logic. It should be
different .We need to convert the element index into the proper unit
(byte for VSPLTB, halfword for VSPLTH, word for VSPLTW) because
PPC::XXSLDWI interprets its ShiftImm in 32-bit word units.
2025-12-08 11:16:55 -05:00
Sang Ik Lee
447af32fbb [MLIR][XeGPU][XeVM] create_nd_tdesc: use correct pitch from strides. (#170384)
Base memory pitch should be derived from base stride, not base width.
Remove offset fields from tensor descriptor payload and add pitch field.
2025-12-08 08:15:44 -08:00
Sang Ik Lee
b8ddbc4f03 [MLIR][XeVM] gpu.printf test: use correct runtime. (#170754)
gpu printf test was not using the runtime required by lit.local.cfg
All other tests in the directory are correctly using level zero runtime.
But gpu printf test is using sycl runtime.
2025-12-08 08:14:56 -08:00
Ivan Butygin
ca8419d6cc [mlir][amdgpu] Fuse adjacent MemoryCounterWaitOp (#171148)
Taking the minimum value.
2025-12-08 18:52:26 +03:00
Simon Pilgrim
ebdb903c10 [X86] Handle X86ISD::EXPAND/COMPRESS nodes as target shuffles (#171119)
Allows for shuffle simplification

Required a minor fix to the overly reduced compress-undef-float-passthrough.ll regression test
2025-12-08 15:48:43 +00:00
Anchu Rajendran S
b08c72b26c [Flang][OpenMP] Enables parsing of threadset clause (#169856) 2025-12-08 07:47:05 -08:00
Ramkumar Ramachandra
c5b90103da [VPlan] Use nuw when computing {VF,VScale}xUF (#170710)
These quantities should never unsigned-wrap. This matches the behavior
if only VFxUF is used (and not VF): when computing both VF and VFxUF,
nuw should hold for each step separately.
2025-12-08 15:46:02 +00:00
Benjamin Maxwell
9a5fa3075a [ADT] Add llvm::reverse_conditionally() iterator (#171040)
This patch adds a simple iterator range that allows conditionally
iterating a collection in reverse. It works with any collection
supported by `llvm::reverse(Collection)`.

```
void foo(bool Reverse, std::vector<int>& C) {
  for (int I : reverse_conditionally(C, Reverse)) {
    // ...
  }
}
```
2025-12-08 15:28:09 +00:00
Matt Arsenault
886f54a04c DAG: Set MachinePointerInfo for stack when expanding divrem libcall (#170537) 2025-12-08 16:25:19 +01:00
Sameer Sahasrabuddhe
1ae957515c [AMDGPU][NFC] Update a comment about FLAT v/s LDSDMA
The change in #170263 does not do justice to common knowledge in the backend.
Fix the comment to reflect the relation between FLAT encoding, flat pointer
access, and LDSDMA operations.
2025-12-08 20:49:19 +05:30
Victor Chernyakin
a6fc5a1d77 [clang-tidy][NFC] Refactor fuchsia-multiple-inheritance (#171059) 2025-12-08 07:19:04 -08:00
Matt Arsenault
ce73cbb6ab clang: Use generic builtins in cuda complex builtins header (#171106)
There's no reason to use the ocml or nv prefixed functions and
maintain this list of alias macros. I left these macros in for
NVPTX in the scalbn and logb case, since those have a special
case hack in the AMDGPU codegen and probably do not work on ptx.
2025-12-08 16:16:24 +01:00
Dark Steve
cc19f420b9 [AMDGPU][NPM] Port AMDGPUArgumentUsageInfo to NPM (#170886)
Port AMDGPUArgumentUsageInfo analysis to the NPM to fix suboptimal code
generation when NPM is enabled by default.

Previously, DAG.getPass() returns nullptr when using NPM, causing the
argument usage info to be unavailable during ISel. This resulted in
fallback to FixedABIFunctionInfo which assumes all implicit arguments
are needed, generating unnecessary register setup code for entry
functions.

Fixes LLVM::CodeGen/AMDGPU/cc-entry.ll

Changes:
- Split AMDGPUArgumentUsageInfo into a data class and NPM analysis
wrapper
- Update SIISelLowering to use DAG.getMFAM() for NPM path
- Add RequireAnalysisPass in addPreISel() to ensure analysis
availability

This follows the same pattern used for PhysicalRegisterUsageInfo.
2025-12-08 20:38:00 +05:30
Tim Gymnich
0487154588 [mlir][amdgpu] Add workgroup_mask to MakeDmaDescriptorOp (#171103)
- add `workgroup_mask` and `early_timeout`
2025-12-08 16:02:18 +01:00
Luke Lau
e8219e5ce8 [VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690)
In 531.deepsjeng_r from SPEC CPU 2017 there's a loop that we
unprofitably loop vectorize on RISC-V.

The loop looks something like:

```c
  for (int i = 0; i < n; i++) {
    if (x0[i] == a)
      if (x1[i] == b)
        if (x2[i] == c)
          // do stuff...
  }
```

Because it's so deeply nested the actual inner level of the loop rarely
gets executed. However we still deem it profitable to vectorize, which
due to the if-conversion means we now always execute the body.

This stems from the fact that `getPredBlockCostDivisor` currently
assumes that blocks have 50% chance of being executed as a heuristic.

We can fix this by using BlockFrequencyInfo, which gives a more accurate
estimate of the innermost block being executed 12.5% of the time. We can
then calculate the probability as `HeaderFrequency / BlockFrequency`.

Fixing the cost here gives a 7% speedup for 531.deepsjeng_r on RISC-V.

Whilst there's a lot of changes in the in-tree tests, this doesn't
affect llvm-test-suite or SPEC CPU 2017 that much:

- On armv9-a -flto -O3 there's 0.0%/0.2% more geomean loops vectorized
on llvm-test-suite/SPEC CPU 2017.
- On x86-64 -flto -O3 **with PGO** there's 0.9%/0% less geomean loops
vectorized on llvm-test-suite/SPEC CPU 2017.

Overall geomean compile time impact is 0.03% on stage1-ReleaseLTO:
https://llvm-compile-time-tracker.com/compare.php?from=9eee396c58d2e24beb93c460141170def328776d&to=32fbff48f965d03b51549fdf9bbc4ca06473b623&stat=instructions%3Au
2025-12-08 14:28:26 +00:00
Erich Keane
dd06214394 [OpenACC][CIR] Implement routine 'bind'-with-a-string lowering (#170916)
The 'bind' clause emits an attribute on the RoutineOp that states which
function it should call on the device side. When provided in
double-quotes, the function on the device side should be the exact name
given. This patch emits the IR to do that.

As a part of that, we add a helper function to the OpenACC dialect to do
so, as well as a version that adds the ID version (though we don't
    exercise th at yet).

The 'bind' with an ID should do the MANGLED name, but it isn't quite
clear what that name SHOULD be yet. Since the signature of a function is
included in its mangling, and we're not providing said signature, we
have to come up with something. This is left as an exercise for a future
patch.
2025-12-08 06:23:13 -08:00
Simon Pilgrim
bab4d1e8b2 [X86] shift-i512.ll - extend test coverage (#171125)
Remove v8i64 dependency from original shift-by-1 tests - this was added for #132601 but is unlikely to be necessary

Add tests for general shifts as well as shift-by-constant and shift-of-constant examples
2025-12-08 14:17:00 +00:00
Hongyu Chen
11866c499b [DAGCombiner] Don't peek through bitcast when checking isMulAddWithConstProfitable (#171056)
Fixes https://github.com/llvm/llvm-project/issues/171035
Peeking through bitcast may cause type mismatch between `AddNode` and
`ConstNode` in `isMulAddWithConstProfitable`.
2025-12-08 22:09:12 +08:00
Mend Renovate
f1af9b027e Update [Github] Update GHA Dependencies (#171064)
This PR contains the following updates:

| Package | Type | Update | Change | Pending |
|---|---|---|---|---|
| [actions/checkout](https://redirect.github.com/actions/checkout) |
action | patch | `v6.0.0` -> `v6.0.1` | |
| [actions/setup-node](https://redirect.github.com/actions/setup-node) |
action | minor | `v6.0.0` -> `v6.1.0` | |
|
[github/codeql-action](https://redirect.github.com/github/codeql-action)
| action | patch | `v4.31.5` -> `v4.31.6` | `v4.31.7` |
2025-12-08 06:06:43 -08:00
Aiden Grossman
f29f01db8f [Sanitizer] Bump soft_rss_limit_mb in test (#170911)
This test is failing on some buildbots now that the internal shell has
been turned on and was failing previously on some ppc bots when turning
it on a while back (before it got reverted).

At least one X86 bot is barely hitting the limit
(https://lab.llvm.org/buildbot/#/builders/174/builds/28487 224MB-235MB).

This likely needs to be bumped due to changes in the process tree (now
that we invoke things through python rather than a bash shell) with the
enablement of the internal shell.
2025-12-08 06:04:41 -08:00
David Spickett
7fbd443491 [lldb] Remove printf in breakpoint add command
Added in 2110db0f49 / #156067.
2025-12-08 13:53:55 +00:00
Mehdi Amini
c1d030e9a4 [MLIR][ExecutionEngine] Don't create a _mlir_ wrapper function for internal linkage (#171115)
This is somehow NFC, we were creating wrapper for interal functions,
which are de-facto not callable.
2025-12-08 14:42:00 +01:00
Jay Foad
07bafab83d [AMDGPU] Do not generate V_FMAC_DX9_ZERO_F32 on GFX12 (#171116)
GFX12 does not have the FMAC form of this instruction, only the FMA
form.

Fixes: #170437
2025-12-08 13:20:02 +00:00
Robert Imschweiler
33d779dfbf [OpenMP] Fix undefined symbol for Darwin builds (#170999)
cf.
https://github.com/llvm/llvm-project/pull/168554#issuecomment-3617253169
2025-12-08 14:15:39 +01:00
Adrian Vogelsgesang
7c832fca53 [lldb] Fix command line of target frame-provider register (#167803)
So far, the syntax was `target frame-provider register <cmd-options>
[<run-args>]`. Note the optional `run-args` at the end. They are
completely ignored by the actual command, but the command line parser
still accepts them.

This commit removes them.

This was probably a copy-paste error from `CommandObjectProcessLaunch`
which was probably used as a blue-print for `target frame-provider
register`.
2025-12-08 13:14:41 +00:00
Gergely Bálint
a5e8e77f7c [BOLT][PAC] Warn about synchronous unwind tables (#165227)
BOLT currently ignores functions with synchronous PAuth DWARF info.
If more than 10% of functions get ignored for inconsistencies, we
should emit a warning to only use asynchronous unwind tables.

See related issue: #165215
2025-12-08 13:34:48 +01:00
Mehdi Amini
60492898f8 [MLIR] Apply clang-tidy fixes for readability-identifier-naming in ShardOps.cpp (NFC) 2025-12-08 04:12:47 -08:00
Mehdi Amini
1bbff7290f [MLIR] Apply clang-tidy fixes for llvm-qualified-auto in VulkanRuntimeWrappers.cpp (NFC) 2025-12-08 04:12:47 -08:00
Tirthankar Mazumder
d94958b2f2 [InstCombine] Fold icmp samesign u{gt/lt} (X +nsw C2), C -> icmp s{gt/lt} X, (C - C2) (#169960)
Fixes #166973

Partially addresses #134028

Alive2 proof: https://alive2.llvm.org/ce/z/BqHQNN
2025-12-08 13:05:37 +01:00
Simon Pilgrim
3a6781ea4d [X86] vector-shuffle-combining-avx512f.ll - add tests showing failure to simplify expand/compress nodes (#171113) 2025-12-08 12:02:43 +00:00
Benjamin Maxwell
32ff7100d7 [AArch64] Lower v8bf16 FMUL to BFMLAL top/bottom with +sve (#169655)
Assuming the predicate is hoisted, this should have a slightly better
throughput: https://godbolt.org/z/jb7aP7Efc

Note: SVE must be used to convert back to bf16 as the bfmlalb/t
instructions operate on even/odd lanes, but the neon bfcvtn/2 process
the top/bottom halves of vectors.
2025-12-08 11:56:18 +00:00
Mehdi Amini
5e3ffd66e7 [MLIR] Apply clang-tidy fixes for readability-identifier-naming in ArmRunnerUtils.cpp (NFC) 2025-12-08 03:48:14 -08:00
Jay Foad
f41edb3fb9 [AMDGPU] Add test cases for v_fmac_dx9_zero_f32 aka v_fmac_legacy_f32 (#171108) 2025-12-08 11:42:10 +00:00
Manuel Carrasco
56beac9f0c [SPIRV] Fix assertion violation caused by unexpected ConstantExpr. (#170524)
`SPIRVEmitIntrinsics::simplifyZeroLengthArrayGepInst` asserted that it
always expected a `GetElementPtrInst` from `IRBuilder::CreateGEP` (which
returns a `Value`). `IRBuilder` can fold and return a `ConstantExpr`
instead, thus violating the assertion. The patch fixes this by using
`GetElementPtrInst::Create` to always return a `GetElementPtrInst`.

This LLVM defect was identified via the AMD Fuzzing project.
2025-12-08 11:37:16 +00:00
Tom Stellard
e52cddc432 workflows/release-binaries: Use upload-release-artifact action for uploading (#170528) 2025-12-08 03:35:43 -08:00
David Spickett
405403c8ed [mlir] Fix GCC compilation warning in TuneExtensionOps.cpp (#168850)
Building with GCC produces:
```
<...>/TuneExtensionOps.cpp:180:26: warning: comparison of unsigned expression in ‘< 0’ is always false [-Wtype-limits]
  180 |   if (*selectedRegionIdx < 0 || *selectedRegionIdx >= getNumRegions())
      |       ~~~~~~~~~~~~~~~~~~~^~~
<...>/TuneExtensionOps.cpp: In member function ‘llvm::LogicalResult mlir::transform::tune::AlternativesOp::verify()’:
/home/david.spickett/llvm-project/mlir/lib/Dialect/Transform/TuneExtension/TuneExtensionOps.cpp:236:19: warning: comparison of unsigned expression in ‘< 0’ is always false [-Wtype-limits]
  236 |     if (regionIdx < 0 || regionIdx >= getNumRegions())
      |         ~~~~~~~~~~^~~
```

As we are sign extending these variables, use int64_t instead of size_t
for their type.
2025-12-08 11:06:42 +00:00
guillem-bartrina-sonarsource
f9e0fa8ba4 [analyzer] MoveChecker: correct invalidation of this-regions (#169626)
By completely omitting invalidation in the case of InstanceCall, we do
not clear the moved state of the fields of the this object after an
opaque call to a member function of the object itself.
2025-12-08 11:00:54 +00:00
Mehdi Amini
49496c531d [MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in LLVMIRIntrinsicGen.cpp (NFC) 2025-12-08 02:48:34 -08:00
Simon Pilgrim
bb926c157f [X86] bitcnt-big-integer.ll - add test coverage for AVX512 targets with no VLX support (#171104) 2025-12-08 10:34:11 +00:00
Hans Wennborg
2e238bfa36 Build win release packages with LLDB_ENABLE_LIBXML2 (#170513)
Fixes #170461
2025-12-08 11:20:52 +01:00
Pierre van Houtryve
8aa82eff56 [AMDGPU][SIInsertWaitcnts] Wait on all LDS DMA operations when no aliasing store is found (#170660)
Previously, we would miss inserting a wait if the ds_read had AA info,
but it didn't match
any LDS DMA op, for example if we didn't track the LDS DMA op it aliases
with because it exceeded the tracking limit.
2025-12-08 11:02:24 +01:00
Jay Foad
7a59ab0e1a [AMDGPU] Common up some unsafe fexp lowering. NFC. (#170841) 2025-12-08 09:50:45 +00:00
Petar Avramovic
448ac1fb00 AMDGPU/GlobalISel: Fix broken exp10 lowering for f16 (#170708) 2025-12-08 10:35:40 +01:00
Stefan Gränitz
c347b2669b Remove LLVM_ABI from members of RuntimeLibraryAnalysis (NFC) (#170850)
Fix Windows build error: attribute 'dllexport' cannot be applied to member of 'dllexport' class
2025-12-08 10:27:17 +01:00
Dan Blackwell
bd1bd178f8 [fuzzer][test-only] Bump runs for reduce_inputs.test unseeded run (#169641)
I have seen a failure whereby the fuzzer failed to reach the expected
input and thus failed the test.

This patch bumps the max executions to 10,000,000 in order to give the
fuzzer a better chance of reaching the expected input. Most runs
complete successfully, so I do not see this adding test time in the
general case; I believe it's a fair tradeoff for the unlucky seed to run
for longer if it reduces the noise from false positives. Note, this
updates a different `RUN:` to
https://github.com/llvm/llvm-project/pull/165402.

rdar://162122184
2025-12-08 09:05:49 +00:00