This allows to define custom splitters, which is interesting for
non-MLIR inputs and outputs to `mlir-translate`. For example, one may
use `; -----` as a splitter of `.ll` files. The splitters are now passed
as arguments into `splitAndProcessBuffer`, the input splitter defaulting
to the previous default (`// -----`) and the output splitter defaulting
to the empty string, which also corresponds to the previous default. The
behavior of the input split marker should not change at all; however,
outputs now have one new line *more* than before if there is no splitter
(old: `insertMarkerInOutput = false`, new: `outputSplitMarker = ""`) and
one new line *less* if there is one. The value of the input splitter is
exposed as a command line options of `mlir-translate` and other tools as
an optional value to the previously existing flag `-split-input-file`,
which defaults to the default splitter if not specified; the value of
the output splitter is exposed with the new `-output-split-marker`,
which default to the empty string in `mlir-translate` and the default
splitter in the other tools. In short, the previous usage or omission of
the flags should result in previous behavior (modulo the new lines
mentioned before).
mlir/include/mlir/Target/Cpp/CppEmitter.h:27:45: error: unknown type name 'raw_ostream'; did you mean 'llvm::raw_ostream'?
27 | LogicalResult translateToCpp(Operation *op, raw_ostream &os,
| ^~~~~~~~~~~
| llvm::raw_ostream
Adds a new pass option `add-result-attr` that will make the pass add the
attribute `{bufferize.result}` to each argument that was converted from
a result.
This is important e.g. when later using the python bindings / execution
engine to understand which arguments are actually results.
To be able to test this, the pass option was added to the tablegen. To
avoid collisions with the existing, manually defined option struct
`BufferResultsToOutParamsOptions`, that one was renamed to
`BufferResultsToOutParamsOpts`.
For convenience, add an additional constructor to `RewriterBase` and
`IRRewriter` that also sets the insertion point. `OpBuilder` provides a
similar constructor.
isAccessIndexInvariant had outdated code and didn't handle IR with
multiple
affine.apply ops, which is inconvenient when used as a utility. This is
addressed by switching to use the proper API on AffineValueMap. Add
mlir::affine::isInvariantAccess exposed for outside use and tested via
the test pass. Add a method on AffineValueMap. Add test cases to
exercise simplification and composition for invariant access analysis.
A TODO/FIXME has been added but this issue existed before.
Because `arith.select` does not propagate poison of the second or third
operand depending on the condition, some canonicalization patterns are
currently incorrect. This patch removes these incorrect patterns, and
adds a new pattern to fix the case of `i1` select with constants.
Patterns that are removed:
* select(predA, select(predB, x, y), y) => select(and(predA, predB), x,
y)
* select(predA, select(predB, y, x), y) => select(and(predA,
not(predB)), x, y)
* select(predA, x, select(predB, x, y)) => select(or(predA, predB), x,
y)
* select(predA, x, select(predB, y, x)) => select(or(predA, not(predB)),
x, y)
* arith.select %arg, %x, %y : i1 => and(%arg, %x) or and(!%arg, %y)
Pattern that is added:
* select(pred, false, true) => not(pred) for i1
The first two patterns are incorrect when `predB` is poison and `predA`
is false, as a non-poison `y` gets compiled to `poison`. The next two
patterns are incorrect when `predB` is poison and `predA` is true, as a
non-poison `x` gets compiled to `poison`. The last pattern is incorrect
as it propagates poison from all operands afer compilation.
Updates "transform-op-matmul-to-outerproduct.mlir". Summary:
* refines TD sequence so that it's easier to reason about the
compilation pipeline (e.g.
`transform.structured.vectorize_children_and_apply_patterns`
is replaced with`transform.structured.vectorize `),
* new input dims to be able to distinguish parallel from reduction
dims,
* updates LIT variable names (makes the output easier to follow),
* removes "noise" from the expected LIT output (e.g. types).
These Linalg -> Vector tests using Transform Dialect are great reference
points for constructing lowering pipelines. This simplification +
clean-up will hopefully make it easier to follow.
If low and high are constants (i.e., not attributes), users still prefer
attributes. Otherwise, there could be failures in type inference. A
failure is introduced by
60e562d11a,
see the drop_known_unit_constant_low_high test for more details.
Discussion at https://discourse.llvm.org/t/inliner-cost-model/2992
This change adds a callback that reports whether inlining
of the particular call site (communicated via ResolvedCall argument)
is profitable or not. The default MLIR inliner pass behavior
is unchanged, i.e. the callback always returns true.
This callback may be used to customize the inliner behavior
based on the target specifics (like target instructions costs),
profitability of the inlining for further optimizations
(e.g. if inlining may enable loop optimizations or scalar optimizations
due to object shape propagation), optimization levels (e.g. -Os inlining
may be quite different from -Ofast inlining), etc.
One of the questions is whether the ResolvedCall entity represents
enough of the context for the custom inlining models to come up with
the profitability decision. I think we can start with this and
extend it as necessary.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Previously reduction variables were always passed by value into and out
of the initialization and combiner regions of the OpenMP reduction
declare operation.
This worked well for reductions of primitive types (and might perform
better than passing by reference). But passing by reference will be
useful for array and derived type reductions (e.g. to move allocation
inside of the init region).
Passing reductions by reference requires different LLVM-IR generation
when lowering from MLIR because some of the loads/stores/allocations
will now be moved inside of the init and combiner regions. This
alternate code generation is requested using a new attribute to
omp.wsloop and omp.parallel.
Existing lowerings from mlir are unaffected (these will continue to use
the by-value argument passing.
Flang will continue to pass by-value argument passing for trivial types
unless a (hidden) command line argument is supplied. Non-trivial types
will always use the by-ref lowering.
Array reductions are not ready yet (but are coming very soon). In the
meantime, this is tested by forcing existing reductions to use by-ref.
Commit series for by-ref OpenMP reductions 3/3
---------
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
We need to generate `.has_value` for `OptionalParseResult`, also ensure
that `auto result` doesn't conflict with `result` which is the variable
name for `OperationState`.
This debug log adds noise to a large fraction of *other* debug logs when
you run with -debug, because it prints "Verifying operation: blah blah\n"
whenever those other debug logs dump an op.
You can use -debug-only to get around this, but sometimes -debug really
is what's called for!
The current canonicalization of `memref.dim` operating on the result of
`memref.reshape` into `memref.load` is incorrect as it doesn't check
whether the `index` operand of `memref.dim` dominates the source
`memref.reshape` op. It always introduces `memref.load` right after
`memref.reshape` to ensure the `memref` is not mutated before the
`memref.load` call. As a result, the following error is observed:
```
$> mlir-opt --canonicalize input.mlir
func.func @reshape_dim(%arg0: memref<*xf32>, %arg1: memref<?xindex>, %arg2: index) -> index {
%c4 = arith.constant 4 : index
%reshape = memref.reshape %arg0(%arg1) : (memref<*xf32>, memref<?xindex>) -> memref<*xf32>
%0 = arith.muli %arg2, %c4 : index
%dim = memref.dim %reshape, %0 : memref<*xf32>
return %dim : index
}
```
results in:
```
dominator.mlir:22:12: error: operand #1 does not dominate this use
%dim = memref.dim %reshape, %0 : memref<*xf32>
^
dominator.mlir:22:12: note: see current operation: %1 = "memref.load"(%arg1, %2) <{nontemporal = false}> : (memref<?xindex>, index) -> index
dominator.mlir:21:10: note: operand defined here (op in the same block)
%0 = arith.muli %arg2, %c4 : index
```
Properly fixing this issue requires a dominator analysis which is
expensive to run within a canonicalization pattern. So, this patch fixes
the canonicalization pattern by being more strict/conservative about the
legality condition in which we perform this canonicalization.
The more general pattern is also added to `tensor.dim`. Since tensors are
immutable we don't need to worry about where to introduce the
`tensor.extract` call after canonicalization.
Before: op verifiers failed if the input and output ranks were the same
(i.e. no expansion or collapse). This behavior requires users of these
shape ops to verify manually that they are not creating identity
versions of these ops every time they build them -- problematic. This PR
removes this strict verification, and introduces folders for the the
identity cases.
The PR also removes the special case handling of rank-0 tensors for
expand_shape and collapse_shape, there doesn't seem to be any reason to
treat them differently.
This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIROpenMPDialect.a only:
```
In file included from mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp:29:
llvm/include/llvm/Frontend/OpenMP/OMPConstants.h:20:10: fatal error: llvm/Frontend/OpenMP/OMP.h.inc: No such file or directory
```
This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIRAMDGPUTransforms.a only:
```
In file included from mlir/lib/Dialect/AMDGPU/Transforms/OptimizeSharedMemory.cpp:21:
mlir/include/mlir/Dialect/Func/IR/FuncOps.h:29:10: fatal error: mlir/Dialect/Func/IR/FuncOps.h.inc: No such file or directory
```
This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIRNVVMToLLVM.a only:
```
In file included from mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp:18:
mlir/include/mlir/Dialect/Func/IR/FuncOps.h:29:10: fatal error: mlir/Dialect/Func/IR/FuncOps.h.inc: No such file or directory
```
Unit extent dims that are not padded by a tensor.pad can be folded away.
When folding unit extent dims of surrounding linalg ops, this increases
the chance that the iteration space of the linalg op will align with
nearby pad ops, improving fusion opportunities.
For the singless and signed integers overloads exist, so that the width
does not need to be specified as an argument. This adds the same for
integers without checking for signedness.
This models a one or multi-dimensional C/C++ array.
The type implements the `ShapedTypeInterface` and prints similar to
memref/tensor:
```
%arg0: !emitc.array<1xf32>,
%arg1: !emitc.array<10x20x30xi32>,
%arg2: !emitc.array<30x!emitc.ptr<i32>>,
%arg3: !emitc.array<30x!emitc.opaque<"int">>
```
It can be translated to a C array type when used as function parameter
or as `emitc.variable` type.
On some architectures (currently gfx90a, gfx94*, and gfx10**), we can
implement an LDS barrier using compiler intrinsics instead of inline
assembly, improving optimization possibilities and decreasing the
fragility of the underlying code.
Other AMDGPU chipsets continue to require inline assembly to implement
this barrier, as, by the default, the LLVM backend will insert waits on
global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure
memory watchpoints set by debuggers work correctly.
Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff
between debugability and performance. The documentation, as well as the
generated inline assembly, have been updated to explicitly call
attention to this fact.
For chipsets that did not require the inline assembly hack, we move to
the s.waitcnt and s.barrier intrinsics, which have been added to the
ROCDL dialect. The magic constants used as an argument to the waitcnt
intrinsic can be derived from
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
#82629 added additional overloads to `replaceAllUsesWith` and
`replaceUsesWithIf`. This caused a build breakage with MSVC when called
with ops that can implicitly convert to `Value`.
```
external/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp(881): error C2666: 'mlir::RewriterBase::replaceAllUsesWith': 2 overloads have similar conversions
external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(631): note: could be 'void mlir::RewriterBase::replaceAllUsesWith(mlir::Operation *,mlir::ValueRange)'
external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(626): note: or 'void mlir::RewriterBase::replaceAllUsesWith(mlir::ValueRange,mlir::ValueRange)'
external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(616): note: or 'void mlir::RewriterBase::replaceAllUsesWith(mlir::Value,mlir::Value)'
external/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp(882): note: while trying to match the argument list '(mlir::tensor::ExtractSliceOp, T)'
with
[
T=mlir::Value
]
```
Note: The LLVM build bots (Linux and Windows) did not break, this seems
to be an issue with `Tools\MSVC\14.29.30133\bin\HostX64\x64\cl.exe`.
This change renames the newly added overloads to `replaceAllOpUsesWith`
and `replaceOpUsesWithIf`.
MLIROpenACCTransforms does not use the LLVMIR dialect yet includes
LLVMIR headers. This causes building MLIROpenACCTransforms only from a
clean build to fail with:
In file included from
mlir/lib/Dialect/OpenACC/Transforms/LegalizeData.cpp:9:
In file included from
mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.h:12:
mlir/include/mlir/Dialect/LLVMIR/Transforms/AddComdats.h:21:10: fatal
error: 'mlir/Dialect/LLVMIR/Transforms/Passes.h.inc' file not found
This patch removes the problematic includes.
This commit adds two new notifications to `RewriterBase::Listener`:
* `notifyPatternBegin`: Called when a pattern application begins during
a greedy pattern rewrite or dialect conversion.
* `notifyPatternEnd`: Called when a pattern application finishes during
a greedy pattern rewrite or dialect conversion.
The listener infrastructure already provides a `notifyMatchFailure`
callback that notifies about the reason for a pattern match failure. The
two new notifications provide additional information about pattern
applications.
This change is in preparation of improving the handle update mechanism
in the `apply_conversion_patterns` transform op.
Until now, `transform.apply_conversion_patterns` consumed the target
handle and potentially invalidated handles. This commit adds tracking
functionality similar to `transform.apply_patterns`, such that handles
are no longer invalidated, but updated based on op replacements
performed by the dialect conversion.
This new functionality is hidden behind a `preserve_handles` attribute
for now.
This patch adds a the `LowerVectorToArmNeonPattern` patterns to the
ArmNeon.
This pattern inspects `vector.contract` ops that can be 1-1 mapped to an
`arm.neon.smmla` intrinsic. The contract ops must be separated into
tiles who's inputs must fit that of a single smmla op (`2x8xi32` inputs
and `2x2xi32` output). The `vector.contract` inputs must be sign
extended from narrow types (<=i8) to be converted. If all conditions are
met, an smmla op is inserted with additional `vector.shape_casts` to
handle linearizing the input and output dimension.
Fixes that
```
Pattern {
let tuple = (attr<"3 : i34">);
not tuple.0;
erase _;
}
```
would crash the PDLL parser because it expected a native constraint
after `not`.