intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-23 07:58:23 +08:00

Author	SHA1	Message	Date
Billy Zhu	1e8dad3bef	[MLIR][LLVM] Support Recursive DITypes (#80251 ) Following the discussion from [this thread](https://discourse.llvm.org/t/handling-cyclic-dependencies-in-debug-info/67526/11), this PR adds support for recursive DITypes. This PR adds: 1. DIRecursiveTypeAttrInterface: An interface that DITypeAttrs can implement to indicate that it supports recursion. See full description in code. 2. Importer & exporter support (The only DITypeAttr that implements the interface is DICompositeTypeAttr, so the exporter is only implemented for composites too. There will be two methods that each llvm DI type that supports mutation needs to implement since there's nothing general). --------- Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>	2024-03-15 09:58:25 -07:00
Matthias Gehre	01a31cee56	[MLIR] EmitC: Add subscript operator (#84783 ) Introduces a SubscriptOp that allows to write IR like ``` func.func @load_store(%arg0: !emitc.array<4x8xf32>, %arg1: !emitc.array<3x5xf32>, %arg2: index, %arg3: index) { %0 = emitc.subscript %arg0[%arg2, %arg3] : <4x8xf32>, index, index %1 = emitc.subscript %arg1[%arg2, %arg3] : <3x5xf32>, index, index emitc.assign %0 : f32 to %1 : f32 return } ``` which gets translated into the C++ code ``` v1[v2][v3] = v0[v1][v2]; ``` To make this happen, this - adds the SubscriptOp - allows the subscript op as rhs of emitc.assign - updates the emitter to print SubscriptOps The emitter prints emitc.subscript in a delayed fashing to allow it being used as lvalue. I.e. while processing ``` %0 = emitc.subscript %arg0[%arg2, %arg3] : <4x8xf32>, index, index ``` it will not emit any text, but record in the `valueMapper` that the name for `%0` is `v0[v1][v2]`, see `CppEmitter::getSubscriptName`. Only when that result is then used (here in `emitc.assign`), that name is inserted into the text.	2024-03-15 11:08:34 +01:00
Mehdi Amini	2a547f0f6c	[MLIR] Fix `mlir-opt --show-dialects` to not require any input (as documented)	2024-03-14 22:55:25 -07:00
Matthias Springer	e8e8df4c1b	[mlir][sparse] Add `has_runtime_library` test op (#85355 ) This commit adds a new test-only op: `sparse_tensor.has_runtime_library`. The op returns "1" if the sparse compiler runs in runtime library mode. This op is useful for writing test cases that require different IR depending on whether the sparse compiler runs in runtime library or codegen mode. This commit fixes a memory leak in `sparse_pack_d.mlir`. This test case uses `sparse_tensor.assemble` to create a sparse tensor SSA value from existing buffers. This runtime library reallocates+copies the existing buffers; the codegen path does not. Therefore, the test requires additional deallocations when running in runtime library mode. Alternatives considered: - Make the codegen path allocate. "Codegen" is the "default" compilation mode and it is handling `sparse_tensor.assemble` correctly. The issue is with the runtime library path, which should not allocate. Therefore, it is better to put a workaround in the runtime library path than to work around the issue with a new flag in the codegen path. - Add a `sparse_tensor.runtime_only` attribute to `bufferization.dealloc_tensor`. Verifying that the attribute can only be attached to `bufferization.dealloc_tensor` may introduce an unwanted dependency of `MLIRSparseTensorDialect` on `MLIRBufferizationDialect`.	2024-03-15 13:35:48 +09:00
Matthias Springer	5124eedd35	[mlir][sparse] Fix memory leaks (part 3) (#85184 ) This commit fixes memory leaks in sparse tensor integration tests by adding `bufferization.dealloc_tensor` ops. Note: Buffer deallocation will be automated in the future with the ownership-based buffer deallocation pass, making `dealloc_tensor` obsolete (only codegen path, not when using the runtime library).	2024-03-15 13:31:47 +09:00
Matthias Springer	6ed4d15cf4	[mlir][sparse_tensor] Implement bufferization interface for `foreach` (#85183 ) This commit fixes a memory leak in `sparse_codegen_foreach.mlir`. The bufferization inserted a copy for the operand of `sparse_tensor.foreach` because it conservatively assumed that the op writes to the operand.	2024-03-15 13:28:09 +09:00
Matthias Springer	102273a9b4	[mlir][Transform] Remove `notifyOperationErased` workaround (#84134 ) D144193 (#66771) has been merged.	2024-03-15 10:29:36 +09:00
srcarroll	58ef9bec07	[mlir][math] Implement alternative decomposition for tanh (#85025 ) The previous implementation decomposes `tanh(x)` into `(exp(2x) - 1)/(exp(2x)+1), x < 0` `(1 - exp(-2x))/(1 + exp(-2x)), x >= 0` This is fine as it avoids overflow with the exponential, but the whole decomposition is computed for both cases unconditionally, then the result is chosen based off the sign of the input. This results in doing two expensive `exp` computations. The proposed change avoids doing the whole computation twice by exploiting the reflection symmetry `tanh(-x) = -tanh(x)`. We can "normalize" the input to be positive by setting `y = sign(x) * x`, where the sign of `x` is computed as `sign(x) = (float)(x > 0) * (-2) + 1`. Then compute `z = tanh(y)` with the decomposition above for `x >=0` and "denormalize" the result `z * sign(x)` to retain the sign. The reason it is done this way is that it is very amenable to vectorization. This method trades the duplicate decomposition computations (which takes 5 instructions including an extra expensive `exp` and `div`) for 4 cheap instructions to compute the signs value 1. `arith.cmpf` (which is a pre-existing instruction in the previous impl) 2. `arith.sitofp` 3. `arith.mulf` 4. `arith.addf` and 1 more instruction to get the right sign in the result 5. `arith.mulf`. Moreover, numerically, this implementation will yield the exact same results as the previous implementation.	2024-03-14 19:18:56 -05:00
Aart Bik	4daf86ef3f	[mlir][sparse] refactoring sparse runtime lib into less paths (#85332 ) Two constructors could be easily refactored into one after a lot of previous deprecated code has been removed.	2024-03-14 17:06:39 -07:00
Aart Bik	9d994e900f	[mlir][sparse] remove deprecated toCOO from sparse runtime support lib (#85319 )	2024-03-14 16:00:33 -07:00
Andrzej Warzyński	c56bd7ab79	[mlir][linalg] Enable masked vectorisation for depthwise convolutions (#81625 ) This patch adds support for masked vectorisation of depthwise 1D WC convolutions,`linalg.depthwise_conv_1d_nwc_wc`. This is implemented by adding support for masking. Two major assumptions are made: * only the channel dimension can be dynamic/scalable (i.e. the trailing dim), * when specifying vector sizes to use in the vectoriser, only the size corresponding to the channel dim is effectively used (other dims are inferred from the context). In terms of scalable vectorisation, this should be sufficient to cover all practical cases (i.e. making arbitrary dim scalable wouldn't make much sense). As for more generic cases with dynamic shapes (e.g. W or N dims being dynamic), more work would be needed. In particular, one would have to consider the filter and input/output tensors separately.	2024-03-14 20:19:46 +00:00
Ingo Müller	d7f71a330d	[mlir] Fix RUN command introduced in `516ccce7fa` (#84765 ) (NFC) There were two problems: * The `%s` argument to `FileCheck` was repeated. * A single dash for `-check-prefix` was used but we need two dashes.	2024-03-14 16:34:09 +00:00
Zahi Moudallal	8481fb1698	[MLIR][ROCDL] Fix BallotOp LLVM translation and add doc (#85116 ) This modifies the return type of the intrinsic call to handle 32 and 64 bits properly and document the MLIR operation.	2024-03-14 08:43:48 -07:00
Ingo Müller	516ccce7fa	[mlir] Make the split markers of splitAndProcessBuffer configurable. (#84765 ) This allows to define custom splitters, which is interesting for non-MLIR inputs and outputs to `mlir-translate`. For example, one may use `; -----` as a splitter of `.ll` files. The splitters are now passed as arguments into `splitAndProcessBuffer`, the input splitter defaulting to the previous default (`// -----`) and the output splitter defaulting to the empty string, which also corresponds to the previous default. The behavior of the input split marker should not change at all; however, outputs now have one new line more than before if there is no splitter (old: `insertMarkerInOutput = false`, new: `outputSplitMarker = ""`) and one new line less if there is one. The value of the input splitter is exposed as a command line options of `mlir-translate` and other tools as an optional value to the previously existing flag `-split-input-file`, which defaults to the default splitter if not specified; the value of the output splitter is exposed with the new `-output-split-marker`, which default to the empty string in `mlir-translate` and the default splitter in the other tools. In short, the previous usage or omission of the flags should result in previous behavior (modulo the new lines mentioned before).	2024-03-14 13:55:50 +01:00
Sergio Afonso	7252d22803	[OpenMP][MLIR] NFC: Remove trailing whitespace (#85213 )	2024-03-14 12:51:43 +00:00
Oleg Shyshkov	ef8062e35b	[mlir][Target][Cpp] Fix include. mlir/include/mlir/Target/Cpp/CppEmitter.h:27:45: error: unknown type name 'raw_ostream'; did you mean 'llvm::raw_ostream'? 27 \| LogicalResult translateToCpp(Operation *op, raw_ostream &os, \| ^~~~~~~~~~~ \| llvm::raw_ostream	2024-03-14 13:23:30 +01:00
Marius Brehler	a82ca398ce	[mlir][EmitC] Fix type in example (#85205 )	2024-03-14 12:12:03 +01:00
Marius Brehler	2cf2ca3702	[mlir][Target][Cpp] Cleanup includes (#85105 )	2024-03-14 10:39:16 +01:00
Kai Sasaki	34ba90745f	[mlir][complex] Support Fastmath flag in conversion of complex.sqrt to standard (#85019 ) When converting complex.sqrt op to standard, we need to keep the fast math flag given to the op. See: https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981	2024-03-14 15:53:28 +09:00
Matthias Gehre	e6048b728d	[MLIR][Bufferization] BufferResultsToOutParams: Add option to add attribute to output arguments (#84320 ) Adds a new pass option `add-result-attr` that will make the pass add the attribute `{bufferize.result}` to each argument that was converted from a result. This is important e.g. when later using the python bindings / execution engine to understand which arguments are actually results. To be able to test this, the pass option was added to the tablegen. To avoid collisions with the existing, manually defined option struct `BufferResultsToOutParamsOptions`, that one was renamed to `BufferResultsToOutParamsOpts`.	2024-03-14 07:50:16 +01:00
Marius Brehler	071f72a8ec	[mlir][Target][Cpp] Remove unused dialects (#85102 ) Removes linking and registering dialects that are not support any more.	2024-03-14 07:26:57 +01:00
Matthias Springer	437fcc6eed	[mlir][IR] Add additional rewriter constructor (#85044 ) For convenience, add an additional constructor to `RewriterBase` and `IRRewriter` that also sets the insertion point. `OpBuilder` provides a similar constructor.	2024-03-14 12:45:07 +09:00
Uday Bondhugula	1e9bfcd9a4	[MLIR][Affine] Fix/complete access index invariance, add isInvariantAccess (#84602 ) isAccessIndexInvariant had outdated code and didn't handle IR with multiple affine.apply ops, which is inconvenient when used as a utility. This is addressed by switching to use the proper API on AffineValueMap. Add mlir::affine::isInvariantAccess exposed for outside use and tested via the test pass. Add a method on AffineValueMap. Add test cases to exercise simplification and composition for invariant access analysis. A TODO/FIXME has been added but this issue existed before.	2024-03-14 09:14:21 +05:30
Fehr Mathieu	7bdba956ef	[mlir][arith] Fix `arith.select` canonicalization patterns (#84685 ) Because `arith.select` does not propagate poison of the second or third operand depending on the condition, some canonicalization patterns are currently incorrect. This patch removes these incorrect patterns, and adds a new pattern to fix the case of `i1` select with constants. Patterns that are removed: * select(predA, select(predB, x, y), y) => select(and(predA, predB), x, y) * select(predA, select(predB, y, x), y) => select(and(predA, not(predB)), x, y) * select(predA, x, select(predB, x, y)) => select(or(predA, predB), x, y) * select(predA, x, select(predB, y, x)) => select(or(predA, not(predB)), x, y) * arith.select %arg, %x, %y : i1 => and(%arg, %x) or and(!%arg, %y) Pattern that is added: * select(pred, false, true) => not(pred) for i1 The first two patterns are incorrect when `predB` is poison and `predA` is false, as a non-poison `y` gets compiled to `poison`. The next two patterns are incorrect when `predB` is poison and `predA` is true, as a non-poison `x` gets compiled to `poison`. The last pattern is incorrect as it propagates poison from all operands afer compilation.	2024-03-13 22:59:34 +01:00
Andrzej Warzyński	3b2694853e	[mlir][nfc] Update Linalg matmul -> Vector OP test (#81416 ) Updates "transform-op-matmul-to-outerproduct.mlir". Summary: * refines TD sequence so that it's easier to reason about the compilation pipeline (e.g. `transform.structured.vectorize_children_and_apply_patterns` is replaced with`transform.structured.vectorize `), * new input dims to be able to distinguish parallel from reduction dims, * updates LIT variable names (makes the output easier to follow), * removes "noise" from the expected LIT output (e.g. types). These Linalg -> Vector tests using Transform Dialect are great reference points for constructing lowering pipelines. This simplification + clean-up will hopefully make it easier to follow.	2024-03-13 16:53:26 +00:00
Christian Sigg	bb893fa23f	[mlir] Fix inlining-threshold.mlir test for NDEBUG builds.	2024-03-13 17:26:50 +01:00
Han-Chung Wang	bb82092de7	[mlir][tensor] Make getMixedPadImpl return static values when possible. (#85016 ) If low and high are constants (i.e., not attributes), users still prefer attributes. Otherwise, there could be failures in type inference. A failure is introduced by `60e562d11a`, see the drop_known_unit_constant_low_high test for more details.	2024-03-13 08:52:05 -07:00
Slava Zakharin	732f5368cd	[RFC][mlir] Add profitability callback to the Inliner. (#84258 ) Discussion at https://discourse.llvm.org/t/inliner-cost-model/2992 This change adds a callback that reports whether inlining of the particular call site (communicated via ResolvedCall argument) is profitable or not. The default MLIR inliner pass behavior is unchanged, i.e. the callback always returns true. This callback may be used to customize the inliner behavior based on the target specifics (like target instructions costs), profitability of the inlining for further optimizations (e.g. if inlining may enable loop optimizations or scalar optimizations due to object shape propagation), optimization levels (e.g. -Os inlining may be quite different from -Ofast inlining), etc. One of the questions is whether the ResolvedCall entity represents enough of the context for the custom inlining models to come up with the profitability decision. I think we can start with this and extend it as necessary. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-03-13 08:23:10 -07:00
Tom Eccles	f46f5a01f4	[flang][OpenMP][OMPIRBuilder][mlir] Optionally pass reduction vars by ref (#84304 ) Previously reduction variables were always passed by value into and out of the initialization and combiner regions of the OpenMP reduction declare operation. This worked well for reductions of primitive types (and might perform better than passing by reference). But passing by reference will be useful for array and derived type reductions (e.g. to move allocation inside of the init region). Passing reductions by reference requires different LLVM-IR generation when lowering from MLIR because some of the loads/stores/allocations will now be moved inside of the init and combiner regions. This alternate code generation is requested using a new attribute to omp.wsloop and omp.parallel. Existing lowerings from mlir are unaffected (these will continue to use the by-value argument passing. Flang will continue to pass by-value argument passing for trivial types unless a (hidden) command line argument is supplied. Non-trivial types will always use the by-ref lowering. Array reductions are not ready yet (but are coming very soon). In the meantime, this is tested by forcing existing reductions to use by-ref. Commit series for by-ref OpenMP reductions 3/3 --------- Co-authored-by: Mats Petersson <mats.petersson@arm.com>	2024-03-13 14:51:09 +00:00
Jeff Niu	2dbaf26525	[mlir][ods] Fix generation of optional custom parsers (#84821 ) We need to generate `.has_value` for `OptionalParseResult`, also ensure that `auto result` doesn't conflict with `result` which is the variable name for `OperationState`.	2024-03-13 00:12:37 -04:00
Yinying Li	88986d65e4	[mlir][sparse] Fix sparse_generate test (#85009 ) std::uniform_int_distribution may behave differently in different systems.	2024-03-12 21:39:37 -04:00
Yinying Li	c1ac9a09d0	[mlir][sparse] Finish migrating integration tests to use sparse_tensor.print (#84997 )	2024-03-12 20:57:21 -04:00
Zahi Moudallal	accfbf4e49	[MLIR][ROCDL] Add BallotOp and lit test (#84856 )	2024-03-12 17:07:16 -07:00
Peiming Liu	94e27c265a	[mlir][sparse] reuse tensor.insert operation to insert elements into … (#84987 ) …a sparse tensor.	2024-03-12 16:59:17 -07:00
Justin Lebar	6095f8bbc4	Get rid of noisy debug log in verifyOpAndAdjustFlags. (#84677 ) This debug log adds noise to a large fraction of other debug logs when you run with -debug, because it prints "Verifying operation: blah blah\n" whenever those other debug logs dump an op. You can use -debug-only to get around this, but sometimes -debug really is what's called for!	2024-03-12 12:52:31 -07:00
Han-Chung Wang	7c83d1bd61	[mlir][vector] Use inferRankReducedResultType for subview type inference. (#84395 ) Fixes https://github.com/openxla/iree/issues/16475	2024-03-12 11:46:05 -07:00
Marius Brehler	19266ca389	[mlir][EmitC] Add an `emitc.conditional` operator (#84883 ) This adds an `emitc.conditional` operation for the ternary conditional operator. Furthermore, this adds a converion from `arith.select` to the new op.	2024-03-12 11:27:26 +01:00
Walter Erquinigo	e4a546756c	[MLIR][LSP][NFC] Fix a header guard (#84862 ) This header guard is wrong and conflicts with the one from Transport.h	2024-03-11 23:02:32 -04:00
Sayan Saha	26722f5b61	[MLIR] Fix incorrect memref::DimOp canonicalization, add tensor::DimOp canonicalization (#84225 ) The current canonicalization of `memref.dim` operating on the result of `memref.reshape` into `memref.load` is incorrect as it doesn't check whether the `index` operand of `memref.dim` dominates the source `memref.reshape` op. It always introduces `memref.load` right after `memref.reshape` to ensure the `memref` is not mutated before the `memref.load` call. As a result, the following error is observed: ``` $> mlir-opt --canonicalize input.mlir func.func @reshape_dim(%arg0: memref<xf32>, %arg1: memref<?xindex>, %arg2: index) -> index { %c4 = arith.constant 4 : index %reshape = memref.reshape %arg0(%arg1) : (memref<xf32>, memref<?xindex>) -> memref<xf32> %0 = arith.muli %arg2, %c4 : index %dim = memref.dim %reshape, %0 : memref<xf32> return %dim : index } ``` results in: ``` dominator.mlir:22:12: error: operand #1 does not dominate this use %dim = memref.dim %reshape, %0 : memref<*xf32> ^ dominator.mlir:22:12: note: see current operation: %1 = "memref.load"(%arg1, %2) <{nontemporal = false}> : (memref<?xindex>, index) -> index dominator.mlir:21:10: note: operand defined here (op in the same block) %0 = arith.muli %arg2, %c4 : index ``` Properly fixing this issue requires a dominator analysis which is expensive to run within a canonicalization pattern. So, this patch fixes the canonicalization pattern by being more strict/conservative about the legality condition in which we perform this canonicalization. The more general pattern is also added to `tensor.dim`. Since tensors are immutable we don't need to worry about where to introduce the `tensor.extract` call after canonicalization.	2024-03-11 19:37:33 -07:00
Matthias Springer	2a30684557	[mlir][Transforms] Use correct listener in dialect conversion (#84861 ) There was a typo in the dialect conversion: `RewriterBase::Listener` should be used instead of `ForwardingListener`.	2024-03-12 10:51:11 +09:00
James Newling	67ef4ae2c3	[MLIR][Tensor,MemRef] Fold expand_shape and collapse_shape if identity (#80658 ) Before: op verifiers failed if the input and output ranks were the same (i.e. no expansion or collapse). This behavior requires users of these shape ops to verify manually that they are not creating identity versions of these ops every time they build them -- problematic. This PR removes this strict verification, and introduces folders for the the identity cases. The PR also removes the special case handling of rank-0 tensors for expand_shape and collapse_shape, there doesn't seem to be any reason to treat them differently.	2024-03-12 10:11:58 +09:00
Thomas Preud'homme	b2ea04673b	[MLIR] Add missing omp_gen dep to MLIROpenMPDialect (#84552 ) This fixes the following failure when doing a clean build (in particular no .ninja* lying around) of lib/libMLIROpenMPDialect.a only: ``` In file included from mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp:29: llvm/include/llvm/Frontend/OpenMP/OMPConstants.h:20:10: fatal error: llvm/Frontend/OpenMP/OMP.h.inc: No such file or directory ```	2024-03-11 23:10:26 +00:00
Thomas Preud'homme	36cf982d6c	[MLIR] Add missing MLIRFuncDialect dep to MLIRAMDGPUTransforms (#84550 ) This fixes the following failure when doing a clean build (in particular no .ninja* lying around) of lib/libMLIRAMDGPUTransforms.a only: ``` In file included from mlir/lib/Dialect/AMDGPU/Transforms/OptimizeSharedMemory.cpp:21: mlir/include/mlir/Dialect/Func/IR/FuncOps.h:29:10: fatal error: mlir/Dialect/Func/IR/FuncOps.h.inc: No such file or directory ```	2024-03-11 23:08:56 +00:00
Thomas Preud'homme	9688a6dae4	[MLIR] Add missing MLIRFuncDialect dep to MLIRNVVMToLLVM (#84548 ) This fixes the following failure when doing a clean build (in particular no .ninja* lying around) of lib/libMLIRNVVMToLLVM.a only: ``` In file included from mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp:18: mlir/include/mlir/Dialect/Func/IR/FuncOps.h:29:10: fatal error: mlir/Dialect/Func/IR/FuncOps.h.inc: No such file or directory ```	2024-03-11 23:07:49 +00:00
Congcong Cai	ad23127222	[mlir][inline] avoid inline self-recursive function (#83092 )	2024-03-12 06:49:09 +08:00
Yinying Li	83c9244ae4	[mlir][sparse] Migrate more tests to use sparse_tensor.print (#84833 ) Continuous efforts following #84249.	2024-03-11 18:44:32 -04:00
Quinn Dawkins	60e562d11a	[mlir][linalg] Add unit dim folding pattern for tensor.pad (#84684 ) Unit extent dims that are not padded by a tensor.pad can be folded away. When folding unit extent dims of surrounding linalg ops, this increases the chance that the iteration space of the linalg op will align with nearby pad ops, improving fusion opportunities.	2024-03-11 18:24:23 -04:00
Marius Brehler	a924da6d4b	[mlir][IR] Add `isInteger()` (without width) (#84467 ) For the singless and signed integers overloads exist, so that the width does not need to be specified as an argument. This adds the same for integers without checking for signedness.	2024-03-11 08:47:06 -07:00
Matthias Gehre	818af71b72	[mlir][emitc] Add ArrayType (#83386 ) This models a one or multi-dimensional C/C++ array. The type implements the `ShapedTypeInterface` and prints similar to memref/tensor: ``` %arg0: !emitc.array<1xf32>, %arg1: !emitc.array<10x20x30xi32>, %arg2: !emitc.array<30x!emitc.ptr<i32>>, %arg3: !emitc.array<30x!emitc.opaque<"int">> ``` It can be translated to a C array type when used as function parameter or as `emitc.variable` type.	2024-03-11 16:40:57 +01:00
Krzysztof Drewniak	b05c15259b	[mlir][AMDGPU] Improve amdgpu.lds_barrier, add warnings (#77942 ) On some architectures (currently gfx90a, gfx94, and gfx10*), we can implement an LDS barrier using compiler intrinsics instead of inline assembly, improving optimization possibilities and decreasing the fragility of the underlying code. Other AMDGPU chipsets continue to require inline assembly to implement this barrier, as, by the default, the LLVM backend will insert waits on global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure memory watchpoints set by debuggers work correctly. Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff between debugability and performance. The documentation, as well as the generated inline assembly, have been updated to explicitly call attention to this fact. For chipsets that did not require the inline assembly hack, we move to the s.waitcnt and s.barrier intrinsics, which have been added to the ROCDL dialect. The magic constants used as an argument to the waitcnt intrinsic can be derived from llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp	2024-03-11 10:06:49 -05:00

1 2 3 4 5 ...

19271 Commits