Commit Graph

16466 Commits

Author SHA1 Message Date
Shilei Tian
6bd74fd65f Revert commits for kernel environment
This reverts commits for kernel environments as they causes issues in AMD BB.
2023-07-23 23:32:31 -04:00
Shilei Tian
ae4292abe4 [MLIR][OpenMP] Remove local_unnamed_addr from check line 2023-07-23 18:51:02 -04:00
Shilei Tian
c7df940184 [MLIR][OpenMP] Fix wrong check lines in mlir/test/Target/LLVMIR/omptarget-region-device-llvm.mlir 2023-07-23 18:45:44 -04:00
Shilei Tian
c5c8040390 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-07-23 18:36:01 -04:00
Andrzej Warzynski
5da317a79e [mlir][docs][nfc] Fix markdown link 2023-07-23 23:05:47 +02:00
Daniil Dudkin
8be07adfb4 [mlir][LLVM] Introduce reduction intrinsics for minimum/maximum
This patch adds supports for the reduction intrinsic
for floating point minimum and maximum that have
been added to LLVM by https://reviews.llvm.org/D152370.

Related to: #63969

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D155869
2023-07-22 16:25:32 +03:00
Fabian Mora
4538347fb2 [mlir][rocdl] Change the translation of GridDim*Op to __ockl_get_num_groups
Currently, `ROCDL::GridDim*Op` is being translated to `__ockl_get_global_size`, however
to match the meaning of `gpu.grid_dim` it should instead be translated to
`__ockl_get_num_groups`. This change would also make it agree with the meaning
of `gridDimx.*` in HIP, see:
https://github.com/ROCm-Developer-Tools/hipamd/blob/develop/include/hip/amd_detail/amd_hip_runtime.h#L257

Difference between the functions:
```
__ockl_get_global_size =  blockDim * numBlocks
__ockl_get_num_groups = numBlocks
```

Reviewed By: krzysz00

Differential Revision: https://reviews.llvm.org/D156009
2023-07-22 11:33:11 +00:00
Matthias Springer
0318123f67 [mlir][Transforms] GreedyPatternRewriteDriver: Fix changed parameter
`changed` was not updated correctly when it was already set to "true" before calling `applyPatternsAndFoldGreedily`.

Differential Revision: https://reviews.llvm.org/D155934
2023-07-22 08:31:24 +02:00
Matthias Springer
c2d5d348a8 [mlir][transform] Add transform.apply_dce op
Add a transform that eliminates dead operations. This is useful after certain transforms (such as fusion) that create/clone new IR but leave the original IR in place.

Differential Revision: https://reviews.llvm.org/D155954
2023-07-22 08:25:02 +02:00
Mehdi Amini
5f1a388a11 Fix crash in ODS backend for Type/Attr when an incorrect construct is used for Type/Attr
Instead of crashing, try to print a useful error message.
2023-07-21 22:06:02 -07:00
TatWai Chong
a5f0b237be [mlir][tosa][fix] Add proper type checking trait for tosa mul
when operating integer type tensors, tosa elementwise multiplication
requires the element type of result to be a 32-bit integer rather
than the same type as inputs.

Change-Id: Ifd3d7ebd879be5c6b2c8e23aa6d7ef41f39c6d41

Reviewed By: mgehre-amd

Differential Revision: https://reviews.llvm.org/D154988
2023-07-21 23:29:05 +00:00
Jakub Kuderski
eaa4bc6557 [mlir][arith] Add canon pattern for chained arith.muli
@benvanik reported this as missing.

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D155907
2023-07-21 18:20:31 -04:00
Rafael Ubal Tena
b2d76a063d TOSA-to-Linalg lowering for element-wise ops
- Wrote complete documentation for the `Broadcastable` op trait. This is mostly meant as a thorough description of its previous behavior, with the exception of minor feature updates.

- Restricted legality criteria for a `Broadcastable` op in order to simplify current and future lowering passes and increase efficiency of code generated by those passes. New restriction are: 1) A dynamic dimension in an inferred result is not compatible with a static dimension in the actual result. 2) Broadcast semantics are restricted to input operands and not supported between inferred and actual result shapes.

- Implemented TOSA-to-Linalg lowering support for unary, binary, tertiary element-wise ops. This support is complete for all legal cases described in the `Broadcastable` trait documentation.

- Added unit tests for `tosa.abs`, `tosa.add`, and `tosa.select` as examples of unary, binary, and tertiary ops.

Reviewed By: eric-k256

Differential Revision: https://reviews.llvm.org/D153291
2023-07-21 22:08:33 +00:00
Jacques Pienaar
863e8123df [mlir] Move attr -> properties to not require Operation
This allows for converting before/without an Operation is created.

Differential Revision: https://reviews.llvm.org/D155996
2023-07-21 14:54:43 -07:00
Srishti Srivastava
de826ea35d [MLIR][ANALYSIS] Add liveness analysis utility
This commit adds a utility to implement liveness analysis using the
sparse backward data-flow analysis framework. Theoretically, liveness
analysis assigns liveness to each (value, program point) pair in the
program and it is thus a dense analysis. However, since values are
immutable in MLIR, a sparse analysis, which will assign liveness to
each value in the program, suffices here.

Liveness analysis has many applications. It can be used to avoid the
computation of extraneous operations that have no effect on the memory
or the final output of a program. It can also be used to optimize
register allocation. Both of these applications help achieve one very
important goal: reducing runtime.

A value is considered "live" iff it:
  (1) has memory effects OR
  (2) is returned by a public function OR
  (3) is used to compute a value of type (1) or (2).
It is also to be noted that a value could be of multiple types (1/2/3) at
the same time.

A value "has memory effects" iff it:
  (1.a) is an operand of an op with memory effects OR
  (1.b) is a non-forwarded branch operand and a block where its op could
  take the control has an op with memory effects.

A value `A` is said to be "used to compute" value `B` iff `B` cannot be
computed in the absence of `A`. Thus, in this implementation, we say that
value `A` is used to compute value `B` iff:
  (3.a) `B` is a result of an op with operand `A` OR
  (3.b) `A` is used to compute some value `C` and `C` is used to compute
  `B`.

---

It is important to note that there already exists an MLIR liveness
utility here: llvm-project/mlir/include/mlir/Analysis/Liveness.h. So,
what is the need for this new liveness analysis utility being added by
this commit? That need is explained as follows:-

The similarities between these two utilities is that both use the
fixpoint iteration method to converge to the final result of liveness.
And, both have the same theoretical understanding of liveness as well.

However, the main difference between (a) the existing utility and (b)
the added utility is the "scope of the analysis". (a) is restricted to
analysing each block independently while (b) analyses blocks together,
i.e., it looks at how the control flows from one block to the other,
how a caller calls a callee, etc. The restriction in the former implies
that some potentially non-live values could be marked live and thus the
full potential of liveness analysis will not be realised.

This can be understood using the example below:

```
1 func.func private @private_dead_return_value_removal_0() -> (i32, i32) {
2   %0 = arith.constant 0 : i32
3   %1 = arith.addi %0, %0 : i32
4   return %0, %1 : i32, i32
5 }
6 func.func @public_dead_return_value_removal_0() -> (i32) {
7   %0:2 = func.call @private_dead_return_value_removal_0() : () -> (i32, i32)
8   return %0#0 : i32
9 }
```

Here, if we just restrict our analysis to a per-block basis like (a), we
will say that the %1 on line 3 is live because it is computed and then
returned outside its block by the function. But, if we perform a
backward data-flow analysis like (b) does, we will say that %0#1 of line
7 is not live because it isn't returned by the public function and thus,
%1 of line 3 is also not live. So, while (a) will be unable to suggest
any IR optimizations, (b) can enable this IR to convert to:-

```
1 func.func private @private_dead_return_value_removal_0() -> i32 {
2   %0 = arith.constant 0 : i32
3   return %0 : i32
4 }
5 func.func @public_dead_return_value_removal_0() -> i32 {
6   %0 = call @private_dead_return_value_removal_0() : () -> i32
7   return %0 : i32
8 }
```

One operation was removed and one unnecessary return value of the
function was removed and the function signature was modified. This is an
optimization that (b) can enable but (a) cannot. Such optimizations can
help remove a lot of extraneous computations that are currently being
done.

Signed-off-by: Srishti Srivastava <srishtisrivastava.ai@gmail.com>

Reviewed By: matthiaskramm, jcai19

Differential Revision: https://reviews.llvm.org/D153779
2023-07-21 13:29:14 -07:00
Uday Bondhugula
b36de52c98 NFC. Move remaining affine/memref test cases into respective dialect dirs
Move a bunch of lingering test cases from test/Transforms/ into
test/Dialect/Affine and MemRef.

Differential Revision: https://reviews.llvm.org/D155855
2023-07-21 22:36:01 +05:30
Lorenzo Chelini
4fb25ca51c [MLIR][Linalg] Preserve DPS when decomposing Softmax
Preserve destination passing style (DPS) when decomposing
`linalg.Softmax`; instead of creating a new empty, which may materialize
as a new buffer after bufferization, use the result directly.

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D155942
2023-07-21 18:03:26 +02:00
Guray Ozen
4622113820 [mlir][nvgpu] Set useDefaultAttributePrinterParser
Differential Revision: https://reviews.llvm.org/D155959
2023-07-21 17:00:39 +02:00
Matthias Springer
440808faf6 [mlir][linalg] MapCopyToThreadsOp: Support tensor.pad
Also return the generated loop op.

Differential Revision: https://reviews.llvm.org/D155950
2023-07-21 15:51:46 +02:00
Matthias Springer
a5bba98a58 [mlir][linalg] BufferizeToAllocationOp: Add option to materialize buffers for operands
Add an option that does not bufferize the targeted op itself, but just materializes a buffer for the destination operands. This is useful for partial bufferization of complex ops such as `scf.forall`, which need special handling (and an analysis if the region).

Differential Revision: https://reviews.llvm.org/D155946
2023-07-21 15:29:59 +02:00
Matthias Springer
20245ed4de [mlir][transform] Add apply_cse option to transform.apply_patterns op
Applying the canonicalizer and CSE in an interleaved fashion is useful after bufferization (and maybe other transforms) to fold away self copies.

Differential Revision: https://reviews.llvm.org/D155933
2023-07-21 15:13:56 +02:00
Jie Fu
3fd1790638 [mlir][nvgpu] Ignore -Wunused-function in NVGPUDialect.cpp (NFC)
In file included from /Users/jiefu/llvm-project/mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp:363:
/Users/jiefu/llvm-project/build-Release/tools/mlir/include/mlir/Dialect/NVGPU/IR/NVGPUAttrDefs.cpp.inc:22:36: error: unused function 'generatedAttributeParser' [-Werror,-Wunused-function]
static ::mlir::OptionalParseResult generatedAttributeParser(::mlir::AsmParser &parser, ::llvm::StringRef *mnemonic, ::mlir::Type type, ::mlir::Attribute &value) {
                                   ^
/Users/jiefu/llvm-project/build-Release/tools/mlir/include/mlir/Dialect/NVGPU/IR/NVGPUAttrDefs.cpp.inc:46:30: error: unused function 'generatedAttributePrinter' [-Werror,-Wunused-function]
static ::mlir::LogicalResult generatedAttributePrinter(::mlir::Attribute def, ::mlir::AsmPrinter &printer) {
                             ^
2 errors generated.
2023-07-21 20:50:48 +08:00
Matthias Springer
544f0e9161 [mlir] Fix build after D155680 2023-07-21 13:33:54 +02:00
Matthias Springer
ba745eea40 [mlir][bufferization] Remove cleanup pipeline from bufferization pass
To keep the pass simple, users should apply cleanup passes manually when necessary. In particular, `-cse -canonicalize` are often desireable to fold away self-copies that are created by the bufferization.

This addresses a comment in D120191.

Differential Revision: https://reviews.llvm.org/D155923
2023-07-21 12:11:25 +02:00
Ingo Müller
8fd207fd0d [mlir][transform][structured][python] Allow str arg in match_op_names.
Allow the `names` argument in `MatchOp.match_op_names` to be of type
`str` in addition to `Sequence[str]`. In this case, the argument is
treated as a list with one name, i.e., it is possible to write
`MatchOp.match_op_names(..., "test.dummy")` instead of
`MatchOp.match_op_names(..., ["test.dummy"])`.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D155807
2023-07-21 09:36:55 +00:00
Ingo Müller
522831384f [mlir][linalg][transform] Extend diagnostics of FuseIntoContainingOp.
This patch extends the diagnostic output of `FuseIntoContainingOp` when
it fails to find the next producer by also provided the location of the
affected transform op.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D155803
2023-07-21 09:34:04 +00:00
Guray Ozen
e56d6745f7 [mlir][nvgpu] Add tma.create.descriptor to create tensor map descriptor
The Op creates a tensor map descriptor object representing tiled memory region. The descriptor is used by Tensor Memory Access (TMA). The `tensor` is the source tensor to be tiled. The `boxDimensions` is the size of the tiled memory region in each dimension.

The pattern here lowers `tma.create.descriptor` to a runtime function call that eventually calls calls CUDA Driver's `cuTensorMapEncodeTiled`. For more information see below:
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TENSOR__MEMORY.html

Depends on D155453

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155680
2023-07-21 11:33:04 +02:00
Andrzej Warzynski
5c9db62587 [mlir][test] Add missing LIT config for mlir-cpu-config + emulator
Similarly to when using `lli`, make sure that when using
`mlir-cpu-runner` with an emulator, a full path to `mlir-cpu-runner` is
used. Otherwise `mlir-cpu-runner` won't be found and you will get the
following error:
```
Error while loading mlir-cpu-runner: No such file or directory
```

This patch should fix:
  * https://lab.llvm.org/buildbot/#/builders/179
The breakage was originally introduced in
https://reviews.llvm.org/D155405.

Differential Revision: https://reviews.llvm.org/D155920
2023-07-21 09:20:03 +00:00
Alex Zinenko
8dbddb1718 [mlir] allow region branch spec from parent op to itself
RegionBranchOpInterface did not allow the operation with regions to
specify itself as successors. Therefore, this implied that the control
is always transferred to a region before being transferred back to the
parent op. Since the region can only transfer the control back to the
parent op from a terminator, this transitively implied that the first
block of any region with a RegionBranchOpInterface is always executed
until the terminator can transfer the control flow back. This is
trivially false for any conditional-like operation that may or may not
execute the region, as well as for loop-like operations that may not
execute the body.

Remove the restriction from the interface description and update the
only transform that relied on it.

See
https://discourse.llvm.org/t/rfc-region-control-flow-interfaces-should-encode-region-not-executed-correctly/72103.

Depends On: https://reviews.llvm.org/D155757

Reviewed By: Mogball, springerm

Differential Revision: https://reviews.llvm.org/D155822
2023-07-21 09:16:56 +00:00
Alex Zinenko
5d8813dec6 [mlir] allow dense dataflow to customize call and region operations
Initial implementations of dense dataflow analyses feature special cases
for operations that have region- or call-based control flow by
leveraging the corresponding interfaces. This is not necessarily
sufficient as these operations may influence the dataflow state by
themselves as well we through the control flow. For example,
`linalg.generic` and similar operations have region-based control flow
and their proper memory effects, so any memory-related analyses such as
last-writer require processing `linalg.generic` directly instead of, or
in addition to, the region-based flow.

Provide hooks to customize the processing of operations with region-
cand call-based contol flow in forward and backward dense dataflow
analysis. These hooks are trigerred when control flow is transferred
between the "main" operation, i.e. the call or the region owner, and
another region. Such an apporach allows the analyses to update the
lattice before and/or after the regions. In the `linalg.generic`
example, the reads from memory are interpreted as happening before the
body region and the writes to memory are interpreted as happening after
the body region. Using these hooks in generic analysis may require
introducing additional interfaces, but for now assume that the specific
analysis have spceial cases for the (rare) operaitons with call- and
region-based control flow that need additional processing.

Reviewed By: Mogball, phisiart

Differential Revision: https://reviews.llvm.org/D155757
2023-07-21 09:16:03 +00:00
Guray Ozen
9dad32cb90 [mlir][nvgpu] Improve finding module Op to for mbarrier.create
Current transformation expects module op to be two level higher, however, it is not always the case. This work searches module op in a while loop.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155825
2023-07-21 10:36:45 +02:00
Guray Ozen
70c2e0618a [mlir][nvgpu] Add nvgpu.tma.async.load and nvgpu.tma.descriptor
This work adds `nvgpu.tma.async.load` Op that requests tma load asyncronusly using mbarrier object.

It also creates nvgpu.tma.descriptor type. The type is supposed be created by `cuTensorMapEncodeTiled` cuda drivers api.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155453
2023-07-21 10:23:25 +02:00
Alex Zinenko
2469cdd156 [mlir] remove RegionBranchOpInterface from linalg ops
Linalg structure ops do not implement control flow in the way expected
by RegionBranchOpInterface, and the interface implementation isn't
actually used anywhere. The presence of this interface without correct
implementation is confusing for, e.g., dataflow analyses.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D155841
2023-07-21 08:18:41 +00:00
Markus Böck
f117bbca04 [mlir] Add opt-in default property bytecode read and write implementation
Using properties currently requires at the very least implementing four methods/code snippets:
* `convertToAttribute`
* `convertFromAttribute`
* `writeToMlirBytecode`
* `readFromMlirBytecode`

This makes replacing attributes with properties harder than it has to be: Attributes by default do not require immediately defining custom bytecode encoding.

This patch therefore adds opt-in implementations of `writeToMlirBytecode` and `readFromMlirBytecode` that work with the default implementations of `convertToAttribute` and `convertFromAttribute`. They are provided by `defvar`s in `OpBase.td` and can be used by adding:
```
let writeToMlirBytecode = writeMlirBytecodeWithConvertToAttribute;
let readFromMlirBytecode = readMlirBytecodeUsingConvertFromAttribute;
```
to ones TableGen definition.

While this bytecode encoding is almost certainly not ideal for a given property, it allows more incremental use of properties and getting something sane working before optimizing the bytecode format.

Differential Revision: https://reviews.llvm.org/D155286
2023-07-21 08:03:26 +02:00
wren romano
889f4bf264 [mlir][sparse] Improve DimLvlMapParser's handling of variable bindings
This commit comprises a number of related changes:

(1) Reintroduces the semantic distinction between `parseVarUsage` vs `parseVarBinding`, adds documentation explaining the distinction, and adds commentary to the one place that violates the desired/intended semantics.

(2) Improves documentation/commentary about the forward-declaration of level-vars, and about the meaning of the `bool` parameter to `parseLvlSpec`.

(2) Removes the `VarEnv::addVars` method, and instead has `DimLvlMapParser` handle the conversion issues directly.  In particular, the parser now stores and maintains the `{dims,lvls}AndSymbols` arrays, thereby avoiding the O(n^2) behavior of scanning through the entire `VarEnv` for each `parse{Dim,Lvl}Spec` call.  Unfortunately there still remains another source of O(n^2) behavior, namely: the `AsmParser::parseAffineExpr` method will copy the `DimLvlMapParser::{dims,lvls}AndSymbols` arrays into `AffineParser::dimsAndSymbols` on each `parse{Dim,Lvl}Spec` call; but fixing that would require extensive changes to `AffineParser` itself.

Depends On D155532

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D155533
2023-07-20 15:56:03 -07:00
varconst
81b4e7d2b0 [mlir][spirv] Extract more ops from the main implementation file. NFC.
Continue to work outlined in D155747 and split the main SPIR-V ops
implementation file into a few smaller and quicker to compile files.

Move control flow and memory ops to their own implementation files.
Create new `.cpp` files for tablegened code.

After this change, the `SPIRVOps.cpp` is 2k LoC-long and takes a
reasonable amount of time to compile.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D155883
2023-07-20 17:11:32 -04:00
Amanda Tang
057fc8e7d8 [ODS] Use Adaptor Trait for Shaped Type Inference
Author inferReturnTypeComponents methods with the Op Adaptor by using the InferShapedTypeOpAdaptor.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D155243
2023-07-20 19:41:08 +00:00
Andrzej Warzynski
64466b777b [mlir][test] Add emulator to the mlir-cpu-runner invocation
In https://reviews.llvm.org/D146917, MLIR's LIT configuration was
updated to allow us to use `mlir-cpu-runner` to run Arm SVE integration
tests. That update broke the following buildbot that doesn't support
SVE:

  https://lab.llvm.org/buildbot/#/builders/179/builds/6704

While that bot doesn't support SVE, it can run SVE tests under
emulation. This patch makes sure that whenever an Arm emulator is set
(via `RM_EMULATOR_EXECUTABLE` CMake variable), it is used to run both
`lli` _and_ `mlir-cpu-runner`.

I am sending this without a review as it's a rather trivial change and I
want to quickly fix the spurious bot failure.
2023-07-20 19:08:23 +00:00
Giuseppe Rossini
4b3eaee270 [mlir][AMDGPU] Define wrappers for WMMA matrix ops
Wave Matrix Multiply Accumulate (WMMA) is the instruction to accelerate
matrix multiplication on RDNA3 architectures.  LLVM already provides a
set of intrinsics to generate wmma instructions. This change uses those
intrinsics to enable the feature in MLIR.

Reviewed By: krzysz00

Differential Revision: https://reviews.llvm.org/D152451
2023-07-20 18:38:35 +00:00
Jakub Kuderski
ab6827f2d4 [mlir][spirv] Extract Atomic/Cast/Group op implementation. NFC.
Continue to work outlined in D155747 and split the main SPIR-V ops
implementation file into a few smaller and quicker to compile files.
This organization matches the op definition organizaion in `.td` files.

In this patch, extract atomic, cast/conversion, and group op
implementation into separate files.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D155777
2023-07-20 11:15:30 -04:00
Ingo Müller
4f30746ca0 [mlir][transform][python] Add extended ApplyPatternsOp.
This patch adds a mixin for ApplyPatternsOp to _transform_ops_ext.py
with syntactic sugar for construction such ops. Curiously, the op did
not have any constructors yet, probably because its tablegen definition
said to skip the default builders. The new constructor is thus quite
straightforward. The commit also adds a refined `region` property which
returns the first block of the single region.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D155435
2023-07-20 14:20:50 +00:00
Ingo Müller
f62cb13fb2 [mlir][linalg][transform] Rename ApplyPatternsOp.{region => patterns}.
This gives the region a more meaningful name. The topic came up in a
discussion on https://reviews.llvm.org/D155435, where the name `region`
would have led to a situation where a convenience accessor called
`region` (after the ODS name) would have returned a Block.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D155810
2023-07-20 14:20:45 +00:00
Ingo Müller
5f4f9220f9 [mlir][transform][gpu][python] Add MapForallToBlocks mix-in.
This patch adds a mix-in class for MapForallToBlocks with overloaded
constructors. This makes it optional to provide the return type of the
op, which is defaulte to `AnyOpType`.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D155717
2023-07-20 14:20:40 +00:00
Sergio Afonso
40340cf91a [MLIR][OpenMP][OMPIRBuilder] Use target triple to initialize IsGPU flag
This patch modifies the construction of the `OpenMPIRBuilder` in MLIR to
initialize the `IsGPU` flag using target triple information passed down from
the Flang frontend. If not present, it will default to `false`.

This replicates the behavior currently implemented in Clang, where the
`CodeGenModule::createOpenMPRuntime()` method creates a different
`CGOpenMPRuntime` instance depending on the target triple, which in turn has an
effect on the `IsGPU` flag of the `OpenMPIRBuilderConfig` object.

Differential Revision: https://reviews.llvm.org/D151903
2023-07-20 15:07:50 +01:00
Markus Böck
f9173c2958 [mlir][LLVM] Convert noalias parameters into alias scopes during inlining
Currently, inlining a function with a `noalias` parameter leads to a large loss of optimization potential as the `noalias` parameter, an important hint for alias analysis, is lost completely.

This patch fixes this with the same approach as LLVM by annotating all users of the `noalias` parameter with appropriate alias and noalias scope lists.
The implementation done here is not as sophisticated as LLVMs, which has more infrastructure related to escaping and captured pointers, but should work in the majority of important cases.
Any deficiency can be addressed in future patches.

Related LLVM code: 27ade4b554/llvm/lib/Transforms/Utils/InlineFunction.cpp (L1090)

Differential Revision: https://reviews.llvm.org/D155712
2023-07-20 15:05:28 +02:00
Guray Ozen
836dbb8522 [mlir][nvgpu] Add mbarrier.arrive.expect_tx and mbarrier.try_wait.parity
This work adds two Ops:
`mbarrier.arrive.expect_tx` performs expect_tx `mbarrier.barrier` returns `mbarrier.barrier.token`
`mbarrier.try_wait.parity` waits on `mbarrier.barrier` and `mbarrier.barrier.token`

`mbarrier.arrive.expect_tx` is one of the requirement to enable H100 TMA support.

Depends on D154074 D154076 D154059 D154060

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D154094
2023-07-20 13:48:30 +02:00
Tobias Gysi
10fa27704b [mlir][llvm] Add branch weight op interface
This revision adds a branch weight op interface for the call / branch
operations that support branch weights. It can be used in the LLVM IR
import and export to simplify the branch weight conversion. An
additional mapping between call operations and instructions ensures
the actual conversion can be done in the module translation itself,
rather than in the dialect translation interface. It also has the
benefit that downstream users can amend custom metadata to the call
operation during the export to LLVM IR.

Reviewed By: zero9178, definelicht

Differential Revision: https://reviews.llvm.org/D155702
2023-07-20 10:46:04 +00:00
Ivan Butygin
9dec3fd812 [mlir] Add ub dialect and poison op.
Add new dialect boilerplate and `poison` op definition.

Discussion: https://discourse.llvm.org/t/rfc-poison-semantics-for-mlir/66245/24

Differential Revision: https://reviews.llvm.org/D154248
2023-07-20 11:19:43 +02:00
Markus Böck
b82acf8a14 [mlir][LLVM] Handle access groups during inlining
Handling access groups is luckily rather trivial: Any access groups from the call instruction are simply appended to any memory operations.
This is similar to one of the steps when handling alias scopes.
This patch nevertheless implements it as a separate function purely for readability purposes as it uses a different interface than alias scopes.

Differential Revision: https://reviews.llvm.org/D155795
2023-07-20 10:45:15 +02:00
Matthias Springer
2137915137 [mlir] Remove some code duplication between Builders.cpp and FoldUtils.cpp
Also update the documentation of `Operation::fold`, which did not take into account in-place foldings.

Differential Revision: https://reviews.llvm.org/D155691
2023-07-20 10:27:14 +02:00