Commit Graph

19271 Commits

Author SHA1 Message Date
Billy Zhu
1e8dad3bef [MLIR][LLVM] Support Recursive DITypes (#80251)
Following the discussion from [this
thread](https://discourse.llvm.org/t/handling-cyclic-dependencies-in-debug-info/67526/11),
this PR adds support for recursive DITypes.

This PR adds:
1. DIRecursiveTypeAttrInterface: An interface that DITypeAttrs can
implement to indicate that it supports recursion. See full description
in code.
2. Importer & exporter support (The only DITypeAttr that implements the
interface is DICompositeTypeAttr, so the exporter is only implemented
for composites too. There will be two methods that each llvm DI type
that supports mutation needs to implement since there's nothing
general).

---------

Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
2024-03-15 09:58:25 -07:00
Matthias Gehre
01a31cee56 [MLIR] EmitC: Add subscript operator (#84783)
Introduces a SubscriptOp that allows to write IR like
```
func.func @load_store(%arg0: !emitc.array<4x8xf32>, %arg1: !emitc.array<3x5xf32>, %arg2: index, %arg3: index) {
  %0 = emitc.subscript %arg0[%arg2, %arg3] : <4x8xf32>, index, index
  %1 = emitc.subscript %arg1[%arg2, %arg3] : <3x5xf32>, index, index
  emitc.assign %0 : f32 to %1 : f32
  return
}
```
which gets translated into the C++ code
```
v1[v2][v3] = v0[v1][v2];
```

To make this happen, this
- adds the SubscriptOp
- allows the subscript op as rhs of emitc.assign
- updates the emitter to print SubscriptOps

The emitter prints emitc.subscript in a delayed fashing to allow it
being used as lvalue.
I.e. while processing
```
%0 = emitc.subscript %arg0[%arg2, %arg3] : <4x8xf32>, index, index
```
it will not emit any text, but record in the `valueMapper` that the name
for `%0` is `v0[v1][v2]`, see `CppEmitter::getSubscriptName`. Only when
that result is then used (here in `emitc.assign`), that name is inserted
into the text.
2024-03-15 11:08:34 +01:00
Mehdi Amini
2a547f0f6c [MLIR] Fix mlir-opt --show-dialects to not require any input (as documented) 2024-03-14 22:55:25 -07:00
Matthias Springer
e8e8df4c1b [mlir][sparse] Add has_runtime_library test op (#85355)
This commit adds a new test-only op:
`sparse_tensor.has_runtime_library`. The op returns "1" if the sparse
compiler runs in runtime library mode.

This op is useful for writing test cases that require different IR
depending on whether the sparse compiler runs in runtime library or
codegen mode.

This commit fixes a memory leak in `sparse_pack_d.mlir`. This test case
uses `sparse_tensor.assemble` to create a sparse tensor SSA value from
existing buffers. This runtime library reallocates+copies the existing
buffers; the codegen path does not. Therefore, the test requires
additional deallocations when running in runtime library mode.

Alternatives considered:
- Make the codegen path allocate. "Codegen" is the "default" compilation
mode and it is handling `sparse_tensor.assemble` correctly. The issue is
with the runtime library path, which should not allocate. Therefore, it
is better to put a workaround in the runtime library path than to work
around the issue with a new flag in the codegen path.
- Add a `sparse_tensor.runtime_only` attribute to
`bufferization.dealloc_tensor`. Verifying that the attribute can only be
attached to `bufferization.dealloc_tensor` may introduce an unwanted
dependency of `MLIRSparseTensorDialect` on `MLIRBufferizationDialect`.
2024-03-15 13:35:48 +09:00
Matthias Springer
5124eedd35 [mlir][sparse] Fix memory leaks (part 3) (#85184)
This commit fixes memory leaks in sparse tensor integration tests by
adding `bufferization.dealloc_tensor` ops.

Note: Buffer deallocation will be automated in the future with the
ownership-based buffer deallocation pass, making `dealloc_tensor`
obsolete (only codegen path, not when using the runtime library).
2024-03-15 13:31:47 +09:00
Matthias Springer
6ed4d15cf4 [mlir][sparse_tensor] Implement bufferization interface for foreach (#85183)
This commit fixes a memory leak in `sparse_codegen_foreach.mlir`. The
bufferization inserted a copy for the operand of `sparse_tensor.foreach`
because it conservatively assumed that the op writes to the operand.
2024-03-15 13:28:09 +09:00
Matthias Springer
102273a9b4 [mlir][Transform] Remove notifyOperationErased workaround (#84134)
D144193 (#66771) has been merged.
2024-03-15 10:29:36 +09:00
srcarroll
58ef9bec07 [mlir][math] Implement alternative decomposition for tanh (#85025)
The previous implementation decomposes `tanh(x)` into
`(exp(2x) - 1)/(exp(2x)+1), x < 0`
`(1 - exp(-2x))/(1 + exp(-2x)), x >= 0`
This is fine as it avoids overflow with the exponential, but the whole
decomposition is computed for both cases unconditionally, then the
result is chosen based off the sign of the input. This results in doing
two expensive `exp` computations.

The proposed change avoids doing the whole computation twice by
exploiting the reflection symmetry `tanh(-x) = -tanh(x)`. We can
"normalize" the input to be positive by setting `y = sign(x) * x`, where
the sign of `x` is computed as `sign(x) = (float)(x > 0) * (-2) + 1`.
Then compute `z = tanh(y)` with the decomposition above for `x >=0` and
"denormalize" the result `z * sign(x)` to retain the sign. The reason it
is done this way is that it is very amenable to vectorization.

This method trades the duplicate decomposition computations (which takes
5 instructions including an extra expensive `exp` and `div`) for 4 cheap
instructions to compute the signs value
1. `arith.cmpf` (which is a pre-existing instruction in the previous
impl)
2. `arith.sitofp`
3. `arith.mulf`
4. `arith.addf`

and 1 more instruction to get the right sign in the result
5. `arith.mulf`. Moreover, numerically, this implementation will yield
the exact same results as the previous implementation.
2024-03-14 19:18:56 -05:00
Aart Bik
4daf86ef3f [mlir][sparse] refactoring sparse runtime lib into less paths (#85332)
Two constructors could be easily refactored into one after a lot of
previous deprecated code has been removed.
2024-03-14 17:06:39 -07:00
Aart Bik
9d994e900f [mlir][sparse] remove deprecated toCOO from sparse runtime support lib (#85319) 2024-03-14 16:00:33 -07:00
Andrzej Warzyński
c56bd7ab79 [mlir][linalg] Enable masked vectorisation for depthwise convolutions (#81625)
This patch adds support for masked vectorisation of depthwise 1D WC
convolutions,`linalg.depthwise_conv_1d_nwc_wc`. This is implemented by
adding support for masking.

Two major assumptions are made:
  * only the channel dimension can be dynamic/scalable (i.e. the
    trailing dim),
  * when specifying vector sizes to use in the vectoriser, only the size
    corresponding to the channel dim is effectively used (other dims are
    inferred from the context).

In terms of scalable vectorisation, this should be sufficient to cover
all practical cases (i.e. making arbitrary dim scalable wouldn't make
much sense). As for more generic cases with dynamic shapes (e.g. W or N
dims being dynamic), more work would be needed. In particular, one would
have to consider the filter and input/output tensors separately.
2024-03-14 20:19:46 +00:00
Ingo Müller
d7f71a330d [mlir] Fix RUN command introduced in 516ccce7fa (#84765) (NFC)
There were two problems:
* The `%s` argument to `FileCheck` was repeated.
* A single dash for `-check-prefix` was used but we need two dashes.
2024-03-14 16:34:09 +00:00
Zahi Moudallal
8481fb1698 [MLIR][ROCDL] Fix BallotOp LLVM translation and add doc (#85116)
This modifies the return type of the intrinsic call to handle 32 and 64
bits
properly and document the MLIR operation.
2024-03-14 08:43:48 -07:00
Ingo Müller
516ccce7fa [mlir] Make the split markers of splitAndProcessBuffer configurable. (#84765)
This allows to define custom splitters, which is interesting for
non-MLIR inputs and outputs to `mlir-translate`. For example, one may
use `; -----` as a splitter of `.ll` files. The splitters are now passed
as arguments into `splitAndProcessBuffer`, the input splitter defaulting
to the previous default (`// -----`) and the output splitter defaulting
to the empty string, which also corresponds to the previous default. The
behavior of the input split marker should not change at all; however,
outputs now have one new line *more* than before if there is no splitter
(old: `insertMarkerInOutput = false`, new: `outputSplitMarker = ""`) and
one new line *less* if there is one. The value of the input splitter is
exposed as a command line options of `mlir-translate` and other tools as
an optional value to the previously existing flag `-split-input-file`,
which defaults to the default splitter if not specified; the value of
the output splitter is exposed with the new `-output-split-marker`,
which default to the empty string in `mlir-translate` and the default
splitter in the other tools. In short, the previous usage or omission of
the flags should result in previous behavior (modulo the new lines
mentioned before).
2024-03-14 13:55:50 +01:00
Sergio Afonso
7252d22803 [OpenMP][MLIR] NFC: Remove trailing whitespace (#85213) 2024-03-14 12:51:43 +00:00
Oleg Shyshkov
ef8062e35b [mlir][Target][Cpp] Fix include.
mlir/include/mlir/Target/Cpp/CppEmitter.h:27:45: error: unknown type name 'raw_ostream'; did you mean 'llvm::raw_ostream'?
   27 | LogicalResult translateToCpp(Operation *op, raw_ostream &os,
      |                                             ^~~~~~~~~~~
      |                                             llvm::raw_ostream
2024-03-14 13:23:30 +01:00
Marius Brehler
a82ca398ce [mlir][EmitC] Fix type in example (#85205) 2024-03-14 12:12:03 +01:00
Marius Brehler
2cf2ca3702 [mlir][Target][Cpp] Cleanup includes (#85105) 2024-03-14 10:39:16 +01:00
Kai Sasaki
34ba90745f [mlir][complex] Support Fastmath flag in conversion of complex.sqrt to standard (#85019)
When converting complex.sqrt op to standard, we need to keep the fast
math flag given to the op.

See:
https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981
2024-03-14 15:53:28 +09:00
Matthias Gehre
e6048b728d [MLIR][Bufferization] BufferResultsToOutParams: Add option to add attribute to output arguments (#84320)
Adds a new pass option `add-result-attr` that will make the pass add the
attribute `{bufferize.result}` to each argument that was converted from
a result.
This is important e.g. when later using the python bindings / execution
engine to understand which arguments are actually results.

To be able to test this, the pass option was added to the tablegen. To
avoid collisions with the existing, manually defined option struct
`BufferResultsToOutParamsOptions`, that one was renamed to
`BufferResultsToOutParamsOpts`.
2024-03-14 07:50:16 +01:00
Marius Brehler
071f72a8ec [mlir][Target][Cpp] Remove unused dialects (#85102)
Removes linking and registering dialects that are not support any more.
2024-03-14 07:26:57 +01:00
Matthias Springer
437fcc6eed [mlir][IR] Add additional rewriter constructor (#85044)
For convenience, add an additional constructor to `RewriterBase` and
`IRRewriter` that also sets the insertion point. `OpBuilder` provides a
similar constructor.
2024-03-14 12:45:07 +09:00
Uday Bondhugula
1e9bfcd9a4 [MLIR][Affine] Fix/complete access index invariance, add isInvariantAccess (#84602)
isAccessIndexInvariant had outdated code and didn't handle IR with
multiple
affine.apply ops, which is inconvenient when used as a utility.  This is
addressed by switching to use the proper API on AffineValueMap. Add
mlir::affine::isInvariantAccess exposed for outside use and tested via
the test pass. Add a method on AffineValueMap.  Add test cases to
exercise simplification and composition for invariant access analysis.

A TODO/FIXME has been added but this issue existed before.
2024-03-14 09:14:21 +05:30
Fehr Mathieu
7bdba956ef [mlir][arith] Fix arith.select canonicalization patterns (#84685)
Because `arith.select` does not propagate poison of the second or third
operand depending on the condition, some canonicalization patterns are
currently incorrect. This patch removes these incorrect patterns, and
adds a new pattern to fix the case of `i1` select with constants.

Patterns that are removed:
* select(predA, select(predB, x, y), y) => select(and(predA, predB), x,
y)
* select(predA, select(predB, y, x), y) => select(and(predA,
not(predB)), x, y)
* select(predA, x, select(predB, x, y)) => select(or(predA, predB), x,
y)
* select(predA, x, select(predB, y, x)) => select(or(predA, not(predB)),
x, y)
* arith.select %arg, %x, %y : i1 => and(%arg, %x) or and(!%arg, %y)
  
Pattern that is added:
* select(pred, false, true) => not(pred) for i1

The first two patterns are incorrect when `predB` is poison and `predA`
is false, as a non-poison `y` gets compiled to `poison`. The next two
patterns are incorrect when `predB` is poison and `predA` is true, as a
non-poison `x` gets compiled to `poison`. The last pattern is incorrect
as it propagates poison from all operands afer compilation.
2024-03-13 22:59:34 +01:00
Andrzej Warzyński
3b2694853e [mlir][nfc] Update Linalg matmul -> Vector OP test (#81416)
Updates "transform-op-matmul-to-outerproduct.mlir". Summary:
  * refines TD sequence so that it's easier to reason about the
     compilation pipeline (e.g.
     `transform.structured.vectorize_children_and_apply_patterns`
     is replaced with`transform.structured.vectorize `),
  * new input dims to be able to distinguish parallel from reduction
    dims,
  * updates LIT variable names (makes the output easier to follow),
  * removes "noise" from the expected LIT output (e.g. types).

These Linalg -> Vector tests using Transform Dialect are great reference
points for constructing lowering pipelines. This simplification +
clean-up will hopefully make it easier to follow.
2024-03-13 16:53:26 +00:00
Christian Sigg
bb893fa23f [mlir] Fix inlining-threshold.mlir test for NDEBUG builds. 2024-03-13 17:26:50 +01:00
Han-Chung Wang
bb82092de7 [mlir][tensor] Make getMixedPadImpl return static values when possible. (#85016)
If low and high are constants (i.e., not attributes), users still prefer
attributes. Otherwise, there could be failures in type inference. A
failure is introduced by
60e562d11a,
see the drop_known_unit_constant_low_high test for more details.
2024-03-13 08:52:05 -07:00
Slava Zakharin
732f5368cd [RFC][mlir] Add profitability callback to the Inliner. (#84258)
Discussion at https://discourse.llvm.org/t/inliner-cost-model/2992

This change adds a callback that reports whether inlining
of the particular call site (communicated via ResolvedCall argument)
is profitable or not. The default MLIR inliner pass behavior
is unchanged, i.e. the callback always returns true.
This callback may be used to customize the inliner behavior
based on the target specifics (like target instructions costs),
profitability of the inlining for further optimizations
(e.g. if inlining may enable loop optimizations or scalar optimizations
due to object shape propagation), optimization levels (e.g. -Os inlining
may be quite different from -Ofast inlining), etc.

One of the questions is whether the ResolvedCall entity represents
enough of the context for the custom inlining models to come up with
the profitability decision. I think we can start with this and
extend it as necessary.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-03-13 08:23:10 -07:00
Tom Eccles
f46f5a01f4 [flang][OpenMP][OMPIRBuilder][mlir] Optionally pass reduction vars by ref (#84304)
Previously reduction variables were always passed by value into and out
of the initialization and combiner regions of the OpenMP reduction
declare operation.

This worked well for reductions of primitive types (and might perform
better than passing by reference). But passing by reference will be
useful for array and derived type reductions (e.g. to move allocation
inside of the init region).

Passing reductions by reference requires different LLVM-IR generation
when lowering from MLIR because some of the loads/stores/allocations
will now be moved inside of the init and combiner regions. This
alternate code generation is requested using a new attribute to
omp.wsloop and omp.parallel.

Existing lowerings from mlir are unaffected (these will continue to use
the by-value argument passing.

Flang will continue to pass by-value argument passing for trivial types
unless a (hidden) command line argument is supplied. Non-trivial types
will always use the by-ref lowering.

Array reductions are not ready yet (but are coming very soon). In the
meantime, this is tested by forcing existing reductions to use by-ref.

Commit series for by-ref OpenMP reductions 3/3

---------

Co-authored-by: Mats Petersson <mats.petersson@arm.com>
2024-03-13 14:51:09 +00:00
Jeff Niu
2dbaf26525 [mlir][ods] Fix generation of optional custom parsers (#84821)
We need to generate `.has_value` for `OptionalParseResult`, also ensure
that `auto result` doesn't conflict with `result` which is the variable
name for `OperationState`.
2024-03-13 00:12:37 -04:00
Yinying Li
88986d65e4 [mlir][sparse] Fix sparse_generate test (#85009)
std::uniform_int_distribution may behave differently in different
systems.
2024-03-12 21:39:37 -04:00
Yinying Li
c1ac9a09d0 [mlir][sparse] Finish migrating integration tests to use sparse_tensor.print (#84997) 2024-03-12 20:57:21 -04:00
Zahi Moudallal
accfbf4e49 [MLIR][ROCDL] Add BallotOp and lit test (#84856) 2024-03-12 17:07:16 -07:00
Peiming Liu
94e27c265a [mlir][sparse] reuse tensor.insert operation to insert elements into … (#84987)
…a sparse tensor.
2024-03-12 16:59:17 -07:00
Justin Lebar
6095f8bbc4 Get rid of noisy debug log in verifyOpAndAdjustFlags. (#84677)
This debug log adds noise to a large fraction of *other* debug logs when
you run with -debug, because it prints "Verifying operation: blah blah\n"
whenever those other debug logs dump an op.

You can use -debug-only to get around this, but sometimes -debug really
is what's called for!
2024-03-12 12:52:31 -07:00
Han-Chung Wang
7c83d1bd61 [mlir][vector] Use inferRankReducedResultType for subview type inference. (#84395)
Fixes https://github.com/openxla/iree/issues/16475
2024-03-12 11:46:05 -07:00
Marius Brehler
19266ca389 [mlir][EmitC] Add an emitc.conditional operator (#84883)
This adds an `emitc.conditional` operation for the ternary conditional
operator. Furthermore, this adds a converion from `arith.select` to the
new op.
2024-03-12 11:27:26 +01:00
Walter Erquinigo
e4a546756c [MLIR][LSP][NFC] Fix a header guard (#84862)
This header guard is wrong and conflicts with the one from Transport.h
2024-03-11 23:02:32 -04:00
Sayan Saha
26722f5b61 [MLIR] Fix incorrect memref::DimOp canonicalization, add tensor::DimOp canonicalization (#84225)
The current canonicalization of `memref.dim` operating on the result of
`memref.reshape` into `memref.load` is incorrect as it doesn't check
whether the `index` operand of `memref.dim` dominates the source
`memref.reshape` op. It always introduces `memref.load` right after
`memref.reshape` to ensure the `memref` is not mutated before the
`memref.load` call. As a result, the following error is observed:

```
$> mlir-opt --canonicalize input.mlir

func.func @reshape_dim(%arg0: memref<*xf32>, %arg1: memref<?xindex>, %arg2: index) -> index {
    %c4 = arith.constant 4 : index
    %reshape = memref.reshape %arg0(%arg1) : (memref<*xf32>, memref<?xindex>) -> memref<*xf32>
    %0 = arith.muli %arg2, %c4 : index
    %dim = memref.dim %reshape, %0 : memref<*xf32>
    return %dim : index
  }
```

results in:

```
dominator.mlir:22:12: error: operand #1 does not dominate this use
    %dim = memref.dim %reshape, %0 : memref<*xf32>
           ^
dominator.mlir:22:12: note: see current operation: %1 = "memref.load"(%arg1, %2) <{nontemporal = false}> : (memref<?xindex>, index) -> index
dominator.mlir:21:10: note: operand defined here (op in the same block)
    %0 = arith.muli %arg2, %c4 : index
```

Properly fixing this issue requires a dominator analysis which is
expensive to run within a canonicalization pattern. So, this patch fixes
the canonicalization pattern by being more strict/conservative about the
legality condition in which we perform this canonicalization.
The more general pattern is also added to `tensor.dim`. Since tensors are
immutable we don't need to worry about where to introduce the
`tensor.extract` call after canonicalization.
2024-03-11 19:37:33 -07:00
Matthias Springer
2a30684557 [mlir][Transforms] Use correct listener in dialect conversion (#84861)
There was a typo in the dialect conversion: `RewriterBase::Listener`
should be used instead of `ForwardingListener`.
2024-03-12 10:51:11 +09:00
James Newling
67ef4ae2c3 [MLIR][Tensor,MemRef] Fold expand_shape and collapse_shape if identity (#80658)
Before: op verifiers failed if the input and output ranks were the same
(i.e. no expansion or collapse). This behavior requires users of these
shape ops to verify manually that they are not creating identity
versions of these ops every time they build them -- problematic. This PR
removes this strict verification, and introduces folders for the the
identity cases.

The PR also removes the special case handling of rank-0 tensors for
expand_shape and collapse_shape, there doesn't seem to be any reason to
treat them differently.
2024-03-12 10:11:58 +09:00
Thomas Preud'homme
b2ea04673b [MLIR] Add missing omp_gen dep to MLIROpenMPDialect (#84552)
This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIROpenMPDialect.a only:
```
In file included from mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp:29:
llvm/include/llvm/Frontend/OpenMP/OMPConstants.h:20:10: fatal error: llvm/Frontend/OpenMP/OMP.h.inc: No such file or directory
```
2024-03-11 23:10:26 +00:00
Thomas Preud'homme
36cf982d6c [MLIR] Add missing MLIRFuncDialect dep to MLIRAMDGPUTransforms (#84550)
This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIRAMDGPUTransforms.a only:
```
In file included from mlir/lib/Dialect/AMDGPU/Transforms/OptimizeSharedMemory.cpp:21:
mlir/include/mlir/Dialect/Func/IR/FuncOps.h:29:10: fatal error: mlir/Dialect/Func/IR/FuncOps.h.inc: No such file or directory
```
2024-03-11 23:08:56 +00:00
Thomas Preud'homme
9688a6dae4 [MLIR] Add missing MLIRFuncDialect dep to MLIRNVVMToLLVM (#84548)
This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIRNVVMToLLVM.a only:
```
In file included from mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp:18:
mlir/include/mlir/Dialect/Func/IR/FuncOps.h:29:10: fatal error: mlir/Dialect/Func/IR/FuncOps.h.inc: No such file or directory
```
2024-03-11 23:07:49 +00:00
Congcong Cai
ad23127222 [mlir][inline] avoid inline self-recursive function (#83092) 2024-03-12 06:49:09 +08:00
Yinying Li
83c9244ae4 [mlir][sparse] Migrate more tests to use sparse_tensor.print (#84833)
Continuous efforts following #84249.
2024-03-11 18:44:32 -04:00
Quinn Dawkins
60e562d11a [mlir][linalg] Add unit dim folding pattern for tensor.pad (#84684)
Unit extent dims that are not padded by a tensor.pad can be folded away.
When folding unit extent dims of surrounding linalg ops, this increases
the chance that the iteration space of the linalg op will align with
nearby pad ops, improving fusion opportunities.
2024-03-11 18:24:23 -04:00
Marius Brehler
a924da6d4b [mlir][IR] Add isInteger() (without width) (#84467)
For the singless and signed integers overloads exist, so that the width
does not need to be specified as an argument. This adds the same for
integers without checking for signedness.
2024-03-11 08:47:06 -07:00
Matthias Gehre
818af71b72 [mlir][emitc] Add ArrayType (#83386)
This models a one or multi-dimensional C/C++ array.

The type implements the `ShapedTypeInterface` and prints similar to
memref/tensor:
```
  %arg0: !emitc.array<1xf32>,
  %arg1: !emitc.array<10x20x30xi32>,
  %arg2: !emitc.array<30x!emitc.ptr<i32>>,
  %arg3: !emitc.array<30x!emitc.opaque<"int">>
```

It can be translated to a C array type when used as function parameter
or as `emitc.variable` type.
2024-03-11 16:40:57 +01:00
Krzysztof Drewniak
b05c15259b [mlir][AMDGPU] Improve amdgpu.lds_barrier, add warnings (#77942)
On some architectures (currently gfx90a, gfx94*, and gfx10**), we can
implement an LDS barrier using compiler intrinsics instead of inline
assembly, improving optimization possibilities and decreasing the
fragility of the underlying code.

Other AMDGPU chipsets continue to require inline assembly to implement
this barrier, as, by the default, the LLVM backend will insert waits on
global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure
memory watchpoints set by debuggers work correctly.

Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff
between debugability and performance. The documentation, as well as the
generated inline assembly, have been updated to explicitly call
attention to this fact.

For chipsets that did not require the inline assembly hack, we move to
the s.waitcnt and s.barrier intrinsics, which have been added to the
ROCDL dialect. The magic constants used as an argument to the waitcnt
intrinsic can be derived from
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
2024-03-11 10:06:49 -05:00