Commit Graph

545382 Commits

Author SHA1 Message Date
Kelvin Li
2e67dcfdcd [flang] update ppc lit tests after using vector.insert and vector.extract (NFC) (#148775)
See https://github.com/llvm/llvm-project/pull/143272
2025-07-18 14:43:15 -07:00
Diego Caballero
c99c213e72 [mlir][Flang][NFC] Replace use of vector.insertelement/extractelement (#143272)
This PR is part of the last step to remove `vector.extractelement` and
`vector.insertelement` ops (RFC:
https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops).
It replaces `vector.insertelement` and `vector.extractelement` with
`vector.insert` and `vector.extract` in Flang. It looks like no lit
tests are impacted?
2025-07-18 14:43:03 -07:00
Prabhu Rajasekaran
921c6dbeca [llvm] Introduce callee_type metadata
Introduce `callee_type` metadata which will be attached to the indirect
call instructions.

The `callee_type` metadata will be used to generate `.callgraph` section
described in this RFC:
https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html

Reviewers: morehouse, petrhosek, nikic, ilovepi

Reviewed By: nikic, ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87573
2025-07-18 14:40:54 -07:00
Stanislav Mekhanoshin
6d8e53d4af [AMDGPU] Support nv memory instructions modifier on gfx1250 (#149582) 2025-07-18 14:38:46 -07:00
Florian Mayer
1b8a136a09 [Sanitizer] remove array-bounds-pseudofn (#149430)
This has been replaced by -fsanitize-annotate-debug-info
2025-07-18 14:31:21 -07:00
Jonas Devlieghere
3641448e08 [lldb] Use StopInfoSP instead of StopInfo* (NFC)
Don't make assumptions about the lifetime of the underlying object and
use the shared_ptr to participate in reference counting and extend the
lifetime of the object to the end of the lexical scope.
2025-07-18 14:29:20 -07:00
Alex MacLean
965b68e8f2 [NVPTX] Prevent fptrunc of v2f32 from being folded into store (#149571) 2025-07-18 14:20:13 -07:00
Andres-Salamanca
b02787d33f [CIR] Fix alignment when lowering set/get bitfield operations (#148999)
This PR fixes incorrect alignment when lowering `set` and `getBitField`
operations to LLVM IR. The issue occurred because during lowering, the
function was being called with an alignment of 0, which caused it to
default to the alignment of the packed member. For example, if the
bitfield was packed inside a `u64i`, it would use an alignment of 8.
With this change, the generated code now matches what the classic
codegen produces.
In the assembly format, I changed to be similar to how it's done in
loadOp. If there's a better approach, please feel free to point it out.
2025-07-18 16:13:34 -05:00
Princeton Ferro
d63ab5467d [NVPTX] don't erase CopyToRegs when folding movs into loads (#149393)
We may still need to keep CopyToReg even after folding uses into vector
loads, since the original register may be used in other blocks.

Partially reverts 1fdbe69849
2025-07-18 14:11:31 -07:00
Jay Foad
3be44e2580 [TableGen] Add some -time-phases support in CodeGenRegisters (#149309) 2025-07-18 22:05:54 +01:00
Shilei Tian
d46de86ca4 [NFC][AMDGPU] Re-enable two tests previously disabled due to missing upstream features (#149568)
This PR re-enables two tests that were previously disabled because they
depended on features not yet upstreamed.
2025-07-18 17:04:34 -04:00
Shilei Tian
ffb453989b [NFC][AMDGPU] Align all gfx1250 VOP1 MC tests with downstream (#149567)
This PR adds all VOP1 tests that haven't yet been upstreamed by copying
the relevant test files directly from downstream. Afterward, the
auto-generation script is run with the `--unique` option to deduplicate
any redundant tests that may have been introduced during the downstream
merge.

Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>

Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 17:03:26 -04:00
Ellis Hoag
fb5c94e712 [profdata] Use --hot-func-list to show all hot functions (#149428)
The `--hot-func-list` flag is used for sample profiles to dump the list
of hot functions. Add support to dump hot functions for IRPGO profiles
as well.

This also removes a `priority_queue` used for `--topn`. We can instead
store all functions and sort at the end before dumping. Since we are
storing `StringRef`s, I believe this won't consume too much memory.
2025-07-18 14:00:32 -07:00
Florian Hahn
004c67ea25 [LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239)
Update LV to vectorize maxnum/minnum reductions without fast-math flags,
by adding an extra check in the loop if any inputs to maxnum/minnum are
NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros 
are already handled consistently by maxnum/minnum.

If any input is NaN,
 *exit the vector loop,
 *compute the reduction result up to the vector iteration that contained
   NaN inputs and
 * resume in the scalar loop


New recurrence kinds are added for reductions using maxnum/minnum
without fast-math flags.

PR: https://github.com/llvm/llvm-project/pull/148239
2025-07-18 21:58:19 +01:00
Jeffrey Byrnes
695660cdfd [AMDGPU] Provide control to force VGPR MFMA form (#148079)
This gives an override to the user to force select VGPR form of MFMA.
Eventually we will drop this in favor of compiler making better
decisions, but this provides a mechanism for users to address the cases
where MayNeedAGPRs favors the AGPR form and performance is degraded due
to poor RA.
2025-07-18 13:53:17 -07:00
Andre Kuhlenschmidt
abdd4536ce [flang][openacc] fix bugs with default(none) checking (#149220)
A report of the following code not generating an error led to fixing two bugs in directive checking.

- We should treat CombinedConstructs as OpenACC Constructs
- We should treat DoConstruct index variables as private. 

```fortran
subroutine sub(nn)
  integer :: nn, ii
  !$acc serial loop default(none)
  do ii = 1, nn
  end do
  !$acc end serial loop
end subroutine
```
Here `nn` should be flagged as needing a data clause while `ii` should
still get one implicitly.
2025-07-18 13:50:09 -07:00
Peter Klausler
b6ea04a37b [flang][NFC] Fix build-time warning (#149549)
Don't increment the LHS variable of an assignment that also uses that
variable on the RHS.
2025-07-18 13:45:25 -07:00
Peter Klausler
9e5b2fbe86 [flang][runtime] Preserve type when remapping monomorphic pointers (#149427)
Pointer remappings unconditionally update the element byte size and
derived type of the pointer's descriptor. This is okay when the pointer
is polymorphic, but not when a pointer is associated with an extended
type.

To communicate this monomorphic case to the runtime, add a new entry
point so as to not break forward binary compatibility.
2025-07-18 13:45:05 -07:00
Peter Klausler
680b8dd707 [flang][runtime] Handle spaces before ')' in alternative list-directe… (#149384)
…d complex input

List-directed reads of complex values that can't go through the usual
fast path (as in this bug's test case, which uses DECIMAL='COMMA')
didn't skip spaces before the closing right parenthesis correctly.

Fixes https://github.com/llvm/llvm-project/issues/149164.
2025-07-18 13:44:44 -07:00
Peter Klausler
97a8476068 [flang][runtime] Further work on speeding up work queue operations (#149189)
This patch avoids a trip through the work queue engine for cases on a
CPU where finalization and destruction actions during assignment were
handled without enqueueing another task.
2025-07-18 13:44:25 -07:00
Peter Collingbourne
9878ef3abd CodeGen: Respect function align attribute if less than preferred alignment.
Reviewers: arsenm, efriedma-quic

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/149444
2025-07-18 13:33:46 -07:00
Philip Reames
c5f0c4ad37 [RISCV][IA] Add test coverage for vp.store of interleaveN with one active 2025-07-18 13:33:23 -07:00
Kazu Hirata
cb6370167f [mlir] Deprecate OpPrintingFlags(std::nullopt_t) (NFC) (#149546)
This patch deprecates OpPrintingFlags(std::nullopt_t) to avoid use of
std::nullopt outside the context of std::optional.
2025-07-18 13:33:05 -07:00
Kazu Hirata
c98b05bd56 [mlir] Deprecate NamedAttrList(std::nullopt_t) (NFC) (#149544)
This patch deprecates NamedAttrList(std::nullopt_t) to avoid use of
std::nullopt outside the context of std::optional.
2025-07-18 13:32:56 -07:00
Kazu Hirata
36c78ec3c8 [DebugInfo] Use llvm::remove_if (NFC) (#149543)
We can pass a range to llvm::remove_if.
2025-07-18 13:32:49 -07:00
Kazu Hirata
3c1a09d939 [lldb] Use a range-based for loop instead of llvm::for_each (NFC) (#149541)
LLVM Coding Standards discourages llvm::for_each unless we already
have a callable.
2025-07-18 13:32:42 -07:00
Alex MacLean
9d9662e4bd [NVPTX][test] fixup version for ptxas on trunc-tofp.ll (#149558) 2025-07-18 13:27:31 -07:00
Chelsea Cassanova
d64802d6d9 [lldb][framework] Glob headers from source for framework (#148736)
When gathering the headers to fix up and place in LLDB.framework, we
were previously globbing the header files from a location in the build
directory. This commit changes this to glob from the source directory
instead, as we were globbing from the build directory without ensuring
that the necessary files were actually in that location before globbing.
2025-07-18 15:26:09 -05:00
Hanumanth
b846d8c3e2 [mlir][tosa] Fix tosa-reduce-transposes to handle large constants better (#148755)
This change addresses the performance issue in the **--tosa-reduce-transposes** implementation by working directly with the
raw tensor data, eliminating the need for creating the costly intermediate attributes that leads to bottleneck.
2025-07-18 16:12:57 -04:00
Ellis Hoag
4dc6dfd653 [NFC][profdata] Apply lints and other format fixes (#149433)
Apply lints and other format fixes to
`llvm/tools/llvm-profdata/llvm-profdata.cpp`. This is intended to have
no functional change.
2025-07-18 13:08:29 -07:00
Jacob Lalonde
6a7f572ef9 [LLDB] Fix Memory64 BaseRVA, move all non-stack memory to Mem64. (#146777)
### Context

Over a year ago, I landed support for 64b Memory ranges in Minidump
(#95312). In this patch we added the Memory64 list stream, which is
effectively a Linked List on disk. The layout is a sixteen byte header
and then however many Memory descriptors.

### The Bug
This is a classic off-by one error, where I added 8 bytes instead of 16
for the header. This caused the first region to start 8 bytes before the
correct RVA, thus shifting all memory reads by 8 bytes. We are correctly
writing all the regions to disk correctly, with no physical corruption
but the RVA is defined wrong, meaning we were incorrectly reading memory


![image](https://github.com/user-attachments/assets/049ef55d-856c-4f3c-9376-aeaa3fe8c0e1)


### Why wasn't this caught?

One problem we've had is forcing Minidump to actually use the 64b mode,
it would be a massive waste of resources to have a test that actually
wrote >4.2gb of IO to validate the 64b regions, and so almost all
validation has been manual. As a weakness of manual testing, this issue
is psuedo non-deterministic, as what regions end up in 64b or 32b is
handled greedily and iterated in the order it's laid out in
/proc/pid/maps. We often validated 64b was written correctly by
hexdumping the Minidump itself, which was not corrupted (other than the
BaseRVA)


![image](https://github.com/user-attachments/assets/b599e3be-2d59-47e2-8a2d-75f182bb0b1d)

### Why is this showing up now?

During internal usage, we had a bug report that the Minidump wasn't
displaying values. I was unable to repro the issue, but during my
investigation I saw the variables were in the 64b regions which resulted
in me identifying the bug.

### How do we prevent future regressions?

To prevent regressions, and honestly to save my sanity for figuring out
where 8 bytes magically came from, I've added a new API to
SBSaveCoreOptions.

```SBSaveCoreOptions::GetMemoryRegionsToSave()```
The ability to get the memory regions that we intend to include in the Coredump. I added this so we can compare what we intended to include versus what was actually included. Traditionally we've always had issues comparing regions because Minidump includes `/proc/pid/maps` and it can be difficult to know what memoryregion read failure was a genuine error or just a page that wasn't meant to be included. 

We are also leveraging this API to choose the memory regions to be generated, as well as for testing what regions should be bytewise 1:1.

After much debate with @clayborg, I've moved all non-stack memory to the Memory64 List. This list doesn't incur us any meaningful overhead and Greg originally suggested doing this in the original 64b PR. This also means we're exercising the 64b path every single time we save a Minidump, preventing regressions on this feature from slipping through testing in the future.

Snippet produced by [minidump.py](https://github.com/clayborg/scripts) 
```
MINIDUMP_MEMORY_LIST:
NumberOfMemoryRanges = 0x00000002
MemoryRanges[0] = [0x00007f61085ff9f0 - 0x00007f6108601000) @ 0x0003f655
MemoryRanges[1] = [0x00007ffe47e50910 - 0x00007ffe47e52000) @ 0x00040c65

MINIDUMP_MEMORY64_LIST:
NumberOfMemoryRanges = 0x000000000000002e
BaseRva              = 0x0000000000042669
MemoryRanges[0]      = [0x00005584162d8000 - 0x00005584162d9000)
MemoryRanges[1]      = [0x00005584162d9000 - 0x00005584162db000)
MemoryRanges[2]      = [0x00005584162db000 - 0x00005584162dd000)
MemoryRanges[3]      = [0x00005584162dd000 - 0x00005584162ff000)
MemoryRanges[4]      = [0x00007f6100000000 - 0x00007f6100021000)
MemoryRanges[5]      = [0x00007f6108800000 - 0x00007f6108828000)
MemoryRanges[6]      = [0x00007f6108828000 - 0x00007f610899d000)
MemoryRanges[7]      = [0x00007f610899d000 - 0x00007f61089f9000)
MemoryRanges[8]      = [0x00007f61089f9000 - 0x00007f6108a08000)
MemoryRanges[9]      = [0x00007f6108bf5000 - 0x00007f6108bf7000)
```

### Misc
As a part of this fix I had to look at LLDB logs a lot, you'll notice I added `0x` to many of the PRIx64 `LLDB_LOGF`. This is so the user (or I) can directly copy paste the address in the logs instead of adding the hex prefix themselves.

Added some SBSaveCore tests for the new GetMemoryAPI, and Docstrings.

CC: @DavidSpickett, @da-viper @labath because we've been working together on save-core plugins, review it optional and I didn't tag you but figured you'd want to know
2025-07-18 13:05:15 -07:00
Joseph Huber
de59e7b86c [libc] Fix GPU benchmarking 2025-07-18 14:36:23 -05:00
Stanislav Mekhanoshin
cfa918bec1 [AMDGPU] Select flat GVS atomics on gfx1250 (#149554) 2025-07-18 12:31:29 -07:00
Roland McGrath
13f7786f72 [libc] Remove trivial .h.def files (#149466)
Remove all the .h.def files that already express nothing
whatsoever not already expressed in YAML.  Clean up a few YAML
files without materially changing any generated header output.

Many more .h.def files remain that need a bit of conversion in
YAML to express macro requirements and such.
2025-07-18 11:35:09 -07:00
Krzysztof Parzyszek
6acc6991f8 [STLForwardCompat] Improve category handling in transformOptional (#149539)
The old version would prefer the "const &" overload over the "&&" one
unless the former was not allowed in the given situation. In particular,
if the function passed was "[](auto &&)" the argument would be "const &"
even if the value passed to transformOptional was an rvalue reference.

This version improves the handling of expression categories, and the
lambda argument category will reflect the argument category in the above
scenario.
2025-07-18 13:34:15 -05:00
Tobias Decking
10b0dee97d [X86] Ensure that bit reversals of byte vectors are properly lowered on pure GFNI targets (#148304)
Fixes #148238.

When GFNI is present, custom bit reversal lowerings for scalar integers
become active. They work by swapping the bytes in the scalar value and
then reversing bits in a vector of bytes. However, the custom bit
reversal lowering for a vector of bytes is disabled if GFNI is present
in isolation, resulting messed up code.

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-07-18 19:14:34 +01:00
Daniel Chen
4bf4e87576 Static_cast std::size_t to build flang_rt in 32-bit. (#149529) 2025-07-18 14:14:27 -04:00
Philip Reames
f6641e2f23 [RISCV][IA] Factor out code for extracting operands from mem insts [nfc] (#149344)
We're going to end up repeating the operand extraction four times once
all of the routines have been updated to support both plain load/store
and vp.load/vp.store. I plan to add masked.load/masked.store in the near
future, and we'd need to add that to each of the four cases. Instead,
factor out a single copy of the operand normalization.
2025-07-18 11:04:18 -07:00
Peter Collingbourne
b5e71d727b Add section type to support CFI jump table relaxation.
For context see main pull request: #147424.

Reviewers: MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/149259
2025-07-18 10:48:42 -07:00
Kazu Hirata
796d5a89a1 [ADT] Use a range-based for loop instead of llvm::for_each (NFC) (#149542)
LLVM Coding Standards discourages llvm::for_each unless we already
have a callable.
2025-07-18 10:43:51 -07:00
Han-Chung Wang
3ea6da59ec [mlir][linalg] Allow pack consumer fusion if the tile size is greater than dimension size. (#149438)
This happens only when you use larger tile size, which is greater than
or equal to the dimension size. In this case, it is a full slice, so it
is fusible.

The IR can be generated during the TileAndFuse process. It is hard to
fix in such driver, so we enable the naive fusion for the case.

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-07-18 10:42:42 -07:00
Philip Reames
87c2adbb58 [RISCV][IA] Precommit tests for deinterleaveN of masked.load 2025-07-18 10:39:11 -07:00
Jaden Angella
7fd91bb6e8 [mlir][EmitC]Expand the MemRefToEmitC pass - Adding scalars (#148055)
This aims to expand the the MemRefToEmitC pass so that it can accept
global scalars.
From:
```
memref.global "private" constant @__constant_xi32 : memref<i32> = dense<-1>
func.func @globals() {
    memref.get_global @__constant_xi32 : memref<i32>
}
```
To:
```
emitc.global static const @__constant_xi32 : i32 = -1
    emitc.func @globals() {
      %0 = get_global @__constant_xi32 : !emitc.lvalue<i32>
      %1 = apply "&"(%0) : (!emitc.lvalue<i32>) -> !emitc.ptr<i32>
      return
    }
```
2025-07-18 10:15:05 -07:00
Alexey Bataev
ff225b5d88 [SLP][NFC]Add a run line for the test, NFC 2025-07-18 10:14:18 -07:00
Shilei Tian
2c50e4cac2 [AMDGPU] Add support for v_sat_pk4_i4_[i8,u8] on gfx1250 (#149528)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Foad, Jay <Jay.Foad@amd.com>
2025-07-18 13:08:50 -04:00
Shilei Tian
e11d28faee [AMDGPU] Add support for v_permlane16_swap_b32 on gfx1250 (#149518)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 13:05:08 -04:00
Muhammad Bassiouni
7e0ae019f8 [libc][math] Refactor exp10f16 implementation to header-only in src/__support/math folder. (#148408)
Part of #147386

in preparation for:
https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450
2025-07-18 20:00:04 +03:00
Mohammadreza Ameri Mahabadian
10518c76de [mlir][spirv] Add conversion pass to rewrite splat constant composite… (#148910)
…s to replicated form

This adds a new SPIR-V dialect-level conversion pass
`ConversionToReplicatedConstantCompositePass`. This pass looks for splat
composite `spirv.Constant` or `spirv.SpecConstantComposite` and rewrites
them into `spirv.EXT.ConstantCompositeReplicate` or
`spirv.EXT.SpecConstantCompositeReplicate`, respectively.

---------

Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>
2025-07-18 12:59:39 -04:00
Eugene Epshteyn
2c2567da95 [flang] Fixed a crash with undeclared variable in implicit-do loop (#149513)
Fixed a crash in the following example:
```
subroutine sub()
  implicit none
  print *, (i, i = 1, 2)  ! Problem: using undefined var in implied-do loop
end subroutine sub
```
The error message was already generated, but the compiler crashed before
it could display it.
2025-07-18 12:58:09 -04:00
Brox Chen
5138b61a25 [AMDGPU][True16][Codegen] remove packed build_vector pattern from true16 (#148715)
Some of the packed build_vector use vgpr_32 for i16/f16/bf16. 

In gfx11, bf16 arithmetic get promoted to f32 and this is done via v2i16
pack. In true16 mode this v2i16 pack is selected to a
build_vector/v_lshlrev pattern which only accepts VGPR32. This causes
isel to insert an illegal copy "vgpr32 = copy vgpr16" between def and
use. In the end this illegal copy confuses cse pass and trigger wrong
code elimination.

Remove the packed build_vector pattern from true16. After removal, ISel
will use vgpr16 build_vector patterns instead.
2025-07-18 12:55:11 -04:00