Commit Graph

518420 Commits

Author SHA1 Message Date
agozillon
3723449955 [OpenMP] Allocatable explicit member mapping fortran offloading tests (#113555)
This PR is one in a series of 3 that aim to add support for explicit
member mapping of allocatable components in derived types within
OpenMP+Fortran for Flang.

This PR provides all of the runtime tests that are currently
upstreamable, unfortunately some of the other tests would require
linking of the fortran runtime for offload which we currently do not do.
But regardless, this is plenty to ensure that the mapping is working in
most cases.
2024-11-16 12:22:33 +01:00
Louis Dionne
0fd6f684b9 [libc++] Adjust workflow file for building the libc++ docker image (#116366) 2024-11-16 12:05:12 +01:00
David Green
100376a2fa [AArch64] Add a test for phis of different types. NFC 2024-11-16 10:40:06 +00:00
Serge Pavlov
f97f96492d [GlobalISel][ARM] Legalize reset_fpmode (#115859)
Implement lowering intrinsic `reset_fpmode` in Global Selector for ARM
target.
2024-11-16 17:21:33 +07:00
Sergei Barannikov
b69f646c46 [AArch64] Remove unused SDNodes (NFC) (#116236)
The corresponding enum members were only used by `EmitMOPS`, which
immediately translated them to machine opcodes. Just pass the machine
opcodes instead.
2024-11-16 13:14:42 +03:00
Jay Foad
89cb0eefcb [AMDGPU] Move GCNPreRAOptimizations after MachineScheduler (#116211)
This is in preparation for adding a new optimization to the pass that
cares about the order of instructions. The existing optimization does
not care, so this just causes minor codegen differences.
2024-11-16 09:40:46 +00:00
Martin Storsjö
dc3156d8e6 [OpenMP] Don't hardcode _WIN32_WINNT for MinGW targets (#115708)
Instead respect what the toolchain default is (or what the user sets via
CMAKE_CXX_FLAGS).

This fixes builds with libcxx, with mingw toolchains targeting
msvcrt.dll, after 5d8be4c036aa5ce4a94f1f37a9155d5c877e23db; after that
commit, the libcxx public headers reference symbols such as iswspace_l,
which are unavailable when targeting msvcrt.dll on older versions of
Windows (it's only available in msvcrt.dll since Windows Vista).
2024-11-16 11:23:15 +02:00
Kunwar Grover
db115ba3ef [mlir][Linalg] Fix non-matmul linalg structured ops (#116412)
3ad0148020
broke linalg structured ops other than MatmulOp.

The patch:

- Changes the printer to hide additional attributes, which weren't
hidden before: "indexing_maps".
- Changes the build of every linalg structured op to have an indexing
map for matmul.

These changes combined, hide the problem until you print the operation
in it's generic form.

Reproducer:

```mlir
func.func public @bug(%arg0 : tensor<5x10x20xf32>, %arg1 : tensor<5x20x40xf32>, %arg3 : tensor<5x10x40xf32>) -> tensor<5x10x40xf32> {
  %out = linalg.batch_matmul ins(%arg0, %arg1 : tensor<5x10x20xf32>, tensor<5x20x40xf32>)
      outs(%arg3 : tensor<5x10x40xf32>) -> tensor<5x10x40xf32>
  func.return %out : tensor<5x10x40xf32>
}
```

Prints fine, with `mlir-opt <file>`, but if you do `mlir-opt
--mlir-print-op-generic <file>`:

```
#map = affine_map<(d0, d1, d2) -> (d0, d2)>
#map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
#map2 = affine_map<(d0, d1, d2) -> (d0, d1)>
#map3 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
#map4 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
#map5 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>
"builtin.module"() ({
  "func.func"() <{function_type = (tensor<5x10x20xf32>, tensor<5x20x40xf32>, tensor<5x10x40xf32>) -> tensor<5x10x40xf32>, sym_name = "bug", sym_visibility = "public"}> ({
  ^bb0(%arg0: tensor<5x10x20xf32>, %arg1: tensor<5x20x40xf32>, %arg2: tensor<5x10x40xf32>):
    %0 = "linalg.batch_matmul"(%arg0, %arg1, %arg2) <{operandSegmentSizes = array<i32: 2, 1>}> ({
    ^bb0(%arg3: f32, %arg4: f32, %arg5: f32):
      %1 = "arith.mulf"(%arg3, %arg4) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
      %2 = "arith.addf"(%arg5, %1) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
      "linalg.yield"(%2) : (f32) -> ()
    }) {indexing_maps = [#map, #map1, #map2], linalg.memoized_indexing_maps = [#map3, #map4, #map5]} : (tensor<5x10x20xf32>, tensor<5x20x40xf32>, tensor<5x10x40xf32>) -> tensor<5x10x40xf32>
    "func.return"(%0) : (tensor<5x10x40xf32>) -> ()
  }) : () -> ()
}) : () -> ()
```

The batch_matmul operation's builder now always inserts a indexing_map
which is unrelated to the operation itself. This was caught when a
transformation from one LinalgStructuredOp to another, tried to pass
it's attributes to the other ops builder and there were multiple
indexing_map attributes in the result.

This patch fixes this by specializing the builders for MatmulOp with
indexing map information.
2024-11-16 08:13:10 +00:00
Thorsten Schütt
2906fcadb8 [GlobalISel] Combine G_MERGE_VALUES of x and zero (#116283)
into zext x

LegalizerHelper has two padding strategies: undef or zero.

see LegalizerHelper:273
see LegalizerHelper:315

This PR is about zero sugar and Coke Zero.

; CHECK-NEXT: [[MV2:%[0-9]+]]:_(s64) = G_MERGE_VALUES %a(s32),
[[C]](s32)

Please continue padding merge values.

// %bits_8_15:(s8) = G_CONSTANT i8 0
// %0:(s16) = G_MERGE_VALUES %bits_0_7:(s8), %bits_8_15:(s8)

%bits_8_15 is defined by zero. For optimization, we pick zext.

// %0:_(s16) = G_ZEXT %bits_0_7:(s8)

The upper bits of %0 are zero and the lower bits come from %bits_0_7.
2024-11-16 08:00:21 +01:00
Julian Schmidt
ec0a27f658 Revert "Reland: [clang][test] add testing for the AST matcher reference" (#116477)
Reverts llvm/llvm-project#112168
2024-11-16 07:34:20 +01:00
Valentin Clement
42be165dde Reland '[flang][cuda] Specialize entry point for scalar to desc data transfer' 2024-11-15 19:13:55 -08:00
Matthias Springer
309c890921 [llvm] APFloat: Add helpers to query NaN/inf semantics (#116315)
`APFloat` changes extracted from #116176 as per reviewer comments.
2024-11-16 11:48:05 +09:00
Valentin Clement (バレンタイン クレメン)
70b9440c88 Revert "[flang][cuda] Specialize entry point for scalar to desc data transfer" (#116458)
Reverts llvm/llvm-project#116457
2024-11-15 17:44:48 -08:00
Valentin Clement (バレンタイン クレメン)
43cb424a54 [flang][cuda] Specialize entry point for scalar to desc data transfer (#116457)
The runtime Assign function is not meant to initialize an array from a
scalar. For that we need to use DoAssignFromSource. Update the data
transfer from scalar to descriptor to use a new entry point that use
this function underneath.
2024-11-15 17:41:23 -08:00
Kyungwoo Lee
ab27253ad3 [CGData][lld-macho] Merge CG Data by LLD (#112674)
LLD now processes raw CG data for stable functions, similar to how it
handles raw CG data for the outliner's hash tree. This data is encoded
in the custom section (`__llvm_merge`) within object files. LLD merges
this information into the indexed CG data file specified by the
`-codegen-data-generate-path={path}` option. For the linker that does
not support this feature, we could use `llvm-cgdata` tool --
https://github.com/llvm/llvm-project/blob/main/llvm/docs/CommandGuide/llvm-cgdata.rst.

Depends on #115750.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-11-15 17:24:35 -08:00
Craig Topper
6a0905d11e [RISCV][GISel] Add isel patterns for i16 load/store (#116293)
In order to support f16 load/store we need to make load/stores with s16
register type legal. If regbank selection doesn't pick the FPR bank,
we'll be left with a GPR load or store which we don't have isel patterns
for from SelectionDAG.

In order to add the patterns we need to make i16 a legal type for the
GPR register class.

Tests are currently disabling the legality check because I haven't
update the legalizer yet.
2024-11-15 17:23:46 -08:00
Craig Topper
131d73ed34 [RegAlloc] Remove redundant prints of LiveInterval weight. (#116451)
LiveInterval::print has included the weight since early 2018. We don't
need to print again after we print the interval.
2024-11-15 16:43:30 -08:00
vporpo
1be9827754 [SandboxVec][BottomUpVec] Implement packing of vectors (#116447)
Up until now we could only support packing of scalar elements. This
patch fixes this by implementing packing of vector elements, by
generating extractelement and insertelement instruction pairs.
2024-11-15 16:12:22 -08:00
Kazu Hirata
0d38f64e7d [memprof] Remove MemProf format Version 0 (#116442)
This patch removes MemProf format Version 0 now that version 2 and 3
seem to be working well.

I'm not touching version 1 for now because some tests still rely on
version 1.

Note that Version 0 is identical to Version 1 except that the MemProf
section of the indexed format has a MemProf version field.
2024-11-15 15:37:00 -08:00
Kazu Hirata
57ed628fb3 [memprof] Speed up caller-callee pair extraction (Part 2) (#116441)
This patch further speeds up the extraction of caller-callee pairs
from the profile.

Recall that we reconstruct a call stack by traversing the radix tree
from one of its leaf nodes toward a root.  The implication is that
when we decode many different call stacks, we end up visiting nodes
near the root(s) repeatedly.  That in turn adds many duplicates to our
data structure:

  DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> Calls;

only to be deduplicated later with sort+unique for each vector.

This patch makes the extraction process more efficient by keeping
track of indices of the radix tree array we've visited so far and
terminating traversal as soon as we encounter an element previously
visited.

Note that even with this improvement, we still add at least one
caller-callee pair to the data structure above for each call stack
because we do need to add a caller-callee pair for the leaf node with
the callee GUID being 0.

Without this patch, it takes 4 seconds to extract caller-callee pairs
from a large MemProf profile.  This patch shortenes that down to
900ms.
2024-11-15 15:33:23 -08:00
Valentin Clement (バレンタイン クレメン)
b1fa9d154b [flang][cuda] Correctly embox logical constant (#116445) 2024-11-15 15:29:41 -08:00
Vitaly Buka
64c455077a [docs][asan][lsan] Drop list of supported architechures (#116302)
Full list is quite long, and quality of implementation can
vary.

Drop the lists to avoid confusion like
https://github.com/rust-lang/rust/pull/123617#issuecomment-2471695102

We don't maintain these for other sanitizers.
2024-11-15 15:15:50 -08:00
Kyungwoo Lee
816c975ea7 Fix crash from [CGData] Global Merge Functions (#112671) (#116241)
Module summary index is optional for this pass, and we shouldn't run it,
but import it as necessary.
2024-11-15 14:57:17 -08:00
vporpo
3be3b33e57 [SandboxVec][BottomUpVec] Implement pack of scalars (#115549)
This patch implements packing of scalar operands when the vectorizer
decides to stop vectorizing. Packing is implemented with a sequence of
InsertElement instructions.

Packing vectors requires different instructions so it's implemented in a
follow-up patch.
2024-11-15 14:45:17 -08:00
Valentin Clement (バレンタイン クレメン)
012fad975e [flang][cuda] Materialize the box in memory when dst is emboxed (#116320)
Similar to #116289 but for the dst.
2024-11-15 14:31:36 -08:00
Valentin Clement (バレンタイン クレメン)
e8469f1577 [flang][cuda] Add support for character type in cuf.alloc and cuf.data_transfer (#116277)
Add support for character type in bytes computation
2024-11-15 14:31:21 -08:00
Shilei Tian
4b50ec43d0 [Clang] Avoid Using byval for ndrange_t when emitting __enqueue_kernel_basic (#116435)
AMDGPU disabled the use of `byval` for struct argument passing in commit
d77c620. However, when emitting `__enqueue_kernel_basic`, Clang still
adds the
`byval` attribute by default. Emitting the `byval` attribute by default
in this
context doesn’t seem like a good idea, as argument-passing conventions
are
highly target-dependent, and assumptions here could lead to issues. This
PR
removes the addition of the `byval` attribute, aligning the behavior
with other
`__enqueue_kernel_*` functions.
2024-11-15 16:54:29 -05:00
Jon Roelofs
34ebfabc34 [llvm][ARM] Restore the default to -mstrict-align on Apple firmwares (#115546)
This is a partial revert of e314622f20

rdar://139237593
2024-11-15 13:54:21 -08:00
Ognyan Mirev
9204eba912 Remove device override for operator new when the C++ standard >= 26 (#114056)
Related to https://github.com/llvm/llvm-project/issues/114048
2024-11-15 13:53:24 -08:00
Kazu Hirata
ec353b7418 [memprof] Use llvm::function_ref instead of std::function (#116306)
We've seen bugs where we lost track of error states stored in the
functor because we passed the functor by value (that is,
std::function) as opposed to reference (llvm::function_ref).

This patch fixes a couple of places we pass functors by value.

While we are at it, this patch adds curly braces around a "for" loop
spanning multiple lines.
2024-11-15 13:03:24 -08:00
Florian Hahn
3734e4c0c4 [MergedLoadStore] Preserve common metadata when sinking stores. (#116382)
When sinking a store, preserve common metadata present on stores on both
sides of the diamond.

PR: https://github.com/llvm/llvm-project/pull/116382
2024-11-15 20:52:02 +00:00
Ramkumar Ramachandra
94eebf721a InstSimplify: support floating-point equivalences (#115152)
Since cd16b07 (IR: introduce CmpInst::isEquivalence), there is now an
isEquivalence routine in CmpInst that we can use to determine
equivalence in simplifySelectWithICmpEq. Implement this, extending the
code from integer-equalities to integer and floating-point equivalences.
2024-11-15 20:06:11 +00:00
Craig Topper
92f3f27106 [RISCV][GISel] Remove -disable-gisel-legality-check from most RVV tests. NFC 2024-11-15 12:04:55 -08:00
Janek van Oirschot
9a5e5e28ec [AMDGPU] Newly added test modified for recent SGPR use change (#116427)
Mistimed rebase for #112251 which added new tests which did not consider
the changes introduced in #112403 yet
2024-11-15 14:51:58 -05:00
Petr Hosek
1e492285f3 [Fuchsia] Include runtimes for armv8.1m.main-none-eabi (#116420)
These are needed by some of our users.
2024-11-15 11:32:15 -08:00
Craig Topper
47a0e24a3b [GISel][RISCV] Add G_SMIN/SMAX/UMIN/UMAX to GISelKnownBits::computeNumSignBits. (#116321) 2024-11-15 11:23:15 -08:00
Janek van Oirschot
bd9145c8c2 Reapply [AMDGPU] Avoid resource propagation for recursion through multiple functions (#112251)
I was wrong last patch. I viewed the `Visited` set purely as a possible
recursion deterrent where functions calling a callee multiple times are
handled elsewhere. This doesn't consider cases where a function is
called multiple times by different callers still part of the same call
graph. New test shows the aforementioned case.

Reapplies #111004, fixes #115562.
2024-11-15 18:40:05 +00:00
Peter Smith
098b0d18ad [LLD][AArch64] Detach Landing Pad creation from Thunk creation (#116402)
Move Landing Pad Creation to a new function that checks each thunk every
pass to see if it needs a landing pad. This permits a thunk to be
created without needing a landing pad, but later needing one due to
drifting out of direct branch range and requiring an indirect branch.

We record all the Thunks created so far in a new vector rather than
trying to iterate over the DenseMap as we need a deterministic order of
adding LandingPadThunks due to the short branch fall through. We cannot
use normalizeExistingThunk() either as that only iterates through live
thunks.

Fixes: https://crbug.com/377438309
Original PR: https://github.com/llvm/llvm-project/pull/108989

Sending without a new test case to fix existing test. A new regression
test will come in a separate PR as coming up with a small enough
reproducer for this case is non-trivial.
2024-11-15 18:18:18 +00:00
lialan
ef92aba52a [MLIR] Fix VectorEmulateNarrowType constant op mask bug (#116064)
This commit adds support for handling mask constants generated by the
`arith.constant` op in the `VectorEmulateNarrowType` pattern.
Previously, this pattern would not match due to the lack of mask
constant handling in `getCompressedMaskOp`.

The changes include:

1. Updating `getCompressedMaskOp` to recognize and handle
`arith.constant` ops as mask value sources.

2. Handling cases where the mask is not aligned with the emulated load
width. The compressed mask is adjusted to account for the offset.

Limitations:
- The arith.constant op can only have 1-dimensional constant values.

Resolves: #115742

Signed-off-by: Alan Li <me@alanli.org>
2024-11-15 10:06:40 -08:00
Krzysztof Parzyszek
0398cb4592 [flang][OpenMP][OpenACC] Use iterator_range in check-directive-struct… (#115872)
…ure, NFC

The OpenMP code is already using iterator_range, lift it to the shared
header file.
2024-11-15 11:54:58 -06:00
Aaron Ballman
3130691a60 [C23] Move WG14 N2754 to the TS 18661 section
This paper is about the quantum exponent of NAN, which only applies if
we support decimal floating-point types from the TS. That is why the
status changed from Unknown to No.
2024-11-15 12:52:18 -05:00
Krzysztof Drewniak
f2e42d9324 [mlir][IntRangeInference] Handle ceildivsi(INT_MIN, x > 1) as expected (#116284)
Fixes #115293

While the definition of ceildivsi is integer division, rounding up, most
implementations will use `-(-a / b)` for dividing `a ceildiv b` with `a`
negative and `b` positive.

Mathematically, and for most integers, these two definitions are
equivalent. However, with `a == INT_MIN`, the initial negation is a
noop, which means that, while divinding and rounding up would give a
negative result, `-((- INT_MIN) / b)` is `-(INT_MIN / b)`, which is
positive.

This commit adds a special case to ceilDivSI inference to handle this
case and bring it in line with the operational instead of the
mathematical semantics of ceiling division.
2024-11-15 11:43:05 -06:00
Fangrui Song
d82422f69c [ELF] Remove errorOrWarn 2024-11-15 09:37:38 -08:00
Sergei Barannikov
032014ef10 [PowerPC] Add SDNPMemOperand to some nodes (#115580)
Nodes created with `getMemIntrinsicNode` have memory operands. In order
for operands to be propagated to machine instructions, the nodes should
have `SDNPMemOperand` property.

Similar to 3c8c385a.
2024-11-15 20:36:56 +03:00
Eric Astor
e9e8f59dd4 [clang] Instantiate attributes on LabelDecls (#115924)
Start propagating attributes on (e.g.) labels inside of templated
functions to their instances.
2024-11-15 12:33:20 -05:00
Cyndy Ishida
2d48489cc3 [Clang][Darwin] Introduce SubFrameworks as a SDK default location (#115048)
* Have clang always append & pass System/Library/SubFrameworks when determining default sdk search paths.
* Teach clang-installapi to traverse there for framework input.
* Teach llvm-readtapi that the library files (TBD or binary) in there should be considered private.

resolves: rdar://137457006
2024-11-15 09:27:08 -08:00
Stephen Tozer
2188a56a75 [DebugInfo][SimplifyCFG] Fully propagate merged invoke DILocations (#114235)
Currently when we merge invokes as part of SimplifyCFG we apply a merge
of the invoke DILocations to the merged invoke. We also insert an
unconditional branch to the merged invoke at the positions previously
occupied by the original invokes; as this branch is part of the
substitution for the invoke it has replaced, we should propagate the
original invoke DebugLoc to it.
2024-11-15 17:20:55 +00:00
Simon Pilgrim
92cc805193 [IR] Add ICmpInst::isCommutative and FCmpInst::isCommutative static wrappers (#116398)
Add static variants that can used with the Predicate enum directly.
2024-11-15 17:13:43 +00:00
Anchu Rajendran S
e67e09a77e [Flang][OpenMP][Sema] Adding parsing and semantic support for scan directive. (#102792) 2024-11-15 09:10:36 -08:00
Joseph Huber
fd5fcfb1e6 [Clang] Add 'gpuintrin.h' to the release notes (#116410) 2024-11-15 11:08:06 -06:00