Commit Graph

8727 Commits

Author SHA1 Message Date
Matthias Springer
a5bba98a58 [mlir][linalg] BufferizeToAllocationOp: Add option to materialize buffers for operands
Add an option that does not bufferize the targeted op itself, but just materializes a buffer for the destination operands. This is useful for partial bufferization of complex ops such as `scf.forall`, which need special handling (and an analysis if the region).

Differential Revision: https://reviews.llvm.org/D155946
2023-07-21 15:29:59 +02:00
Matthias Springer
20245ed4de [mlir][transform] Add apply_cse option to transform.apply_patterns op
Applying the canonicalizer and CSE in an interleaved fashion is useful after bufferization (and maybe other transforms) to fold away self copies.

Differential Revision: https://reviews.llvm.org/D155933
2023-07-21 15:13:56 +02:00
Guray Ozen
e56d6745f7 [mlir][nvgpu] Add tma.create.descriptor to create tensor map descriptor
The Op creates a tensor map descriptor object representing tiled memory region. The descriptor is used by Tensor Memory Access (TMA). The `tensor` is the source tensor to be tiled. The `boxDimensions` is the size of the tiled memory region in each dimension.

The pattern here lowers `tma.create.descriptor` to a runtime function call that eventually calls calls CUDA Driver's `cuTensorMapEncodeTiled`. For more information see below:
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TENSOR__MEMORY.html

Depends on D155453

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155680
2023-07-21 11:33:04 +02:00
Alex Zinenko
8dbddb1718 [mlir] allow region branch spec from parent op to itself
RegionBranchOpInterface did not allow the operation with regions to
specify itself as successors. Therefore, this implied that the control
is always transferred to a region before being transferred back to the
parent op. Since the region can only transfer the control back to the
parent op from a terminator, this transitively implied that the first
block of any region with a RegionBranchOpInterface is always executed
until the terminator can transfer the control flow back. This is
trivially false for any conditional-like operation that may or may not
execute the region, as well as for loop-like operations that may not
execute the body.

Remove the restriction from the interface description and update the
only transform that relied on it.

See
https://discourse.llvm.org/t/rfc-region-control-flow-interfaces-should-encode-region-not-executed-correctly/72103.

Depends On: https://reviews.llvm.org/D155757

Reviewed By: Mogball, springerm

Differential Revision: https://reviews.llvm.org/D155822
2023-07-21 09:16:56 +00:00
Alex Zinenko
5d8813dec6 [mlir] allow dense dataflow to customize call and region operations
Initial implementations of dense dataflow analyses feature special cases
for operations that have region- or call-based control flow by
leveraging the corresponding interfaces. This is not necessarily
sufficient as these operations may influence the dataflow state by
themselves as well we through the control flow. For example,
`linalg.generic` and similar operations have region-based control flow
and their proper memory effects, so any memory-related analyses such as
last-writer require processing `linalg.generic` directly instead of, or
in addition to, the region-based flow.

Provide hooks to customize the processing of operations with region-
cand call-based contol flow in forward and backward dense dataflow
analysis. These hooks are trigerred when control flow is transferred
between the "main" operation, i.e. the call or the region owner, and
another region. Such an apporach allows the analyses to update the
lattice before and/or after the regions. In the `linalg.generic`
example, the reads from memory are interpreted as happening before the
body region and the writes to memory are interpreted as happening after
the body region. Using these hooks in generic analysis may require
introducing additional interfaces, but for now assume that the specific
analysis have spceial cases for the (rare) operaitons with call- and
region-based control flow that need additional processing.

Reviewed By: Mogball, phisiart

Differential Revision: https://reviews.llvm.org/D155757
2023-07-21 09:16:03 +00:00
Guray Ozen
70c2e0618a [mlir][nvgpu] Add nvgpu.tma.async.load and nvgpu.tma.descriptor
This work adds `nvgpu.tma.async.load` Op that requests tma load asyncronusly using mbarrier object.

It also creates nvgpu.tma.descriptor type. The type is supposed be created by `cuTensorMapEncodeTiled` cuda drivers api.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155453
2023-07-21 10:23:25 +02:00
Alex Zinenko
2469cdd156 [mlir] remove RegionBranchOpInterface from linalg ops
Linalg structure ops do not implement control flow in the way expected
by RegionBranchOpInterface, and the interface implementation isn't
actually used anywhere. The presence of this interface without correct
implementation is confusing for, e.g., dataflow analyses.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D155841
2023-07-21 08:18:41 +00:00
Markus Böck
f117bbca04 [mlir] Add opt-in default property bytecode read and write implementation
Using properties currently requires at the very least implementing four methods/code snippets:
* `convertToAttribute`
* `convertFromAttribute`
* `writeToMlirBytecode`
* `readFromMlirBytecode`

This makes replacing attributes with properties harder than it has to be: Attributes by default do not require immediately defining custom bytecode encoding.

This patch therefore adds opt-in implementations of `writeToMlirBytecode` and `readFromMlirBytecode` that work with the default implementations of `convertToAttribute` and `convertFromAttribute`. They are provided by `defvar`s in `OpBase.td` and can be used by adding:
```
let writeToMlirBytecode = writeMlirBytecodeWithConvertToAttribute;
let readFromMlirBytecode = readMlirBytecodeUsingConvertFromAttribute;
```
to ones TableGen definition.

While this bytecode encoding is almost certainly not ideal for a given property, it allows more incremental use of properties and getting something sane working before optimizing the bytecode format.

Differential Revision: https://reviews.llvm.org/D155286
2023-07-21 08:03:26 +02:00
Amanda Tang
057fc8e7d8 [ODS] Use Adaptor Trait for Shaped Type Inference
Author inferReturnTypeComponents methods with the Op Adaptor by using the InferShapedTypeOpAdaptor.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D155243
2023-07-20 19:41:08 +00:00
Giuseppe Rossini
4b3eaee270 [mlir][AMDGPU] Define wrappers for WMMA matrix ops
Wave Matrix Multiply Accumulate (WMMA) is the instruction to accelerate
matrix multiplication on RDNA3 architectures.  LLVM already provides a
set of intrinsics to generate wmma instructions. This change uses those
intrinsics to enable the feature in MLIR.

Reviewed By: krzysz00

Differential Revision: https://reviews.llvm.org/D152451
2023-07-20 18:38:35 +00:00
Jakub Kuderski
ab6827f2d4 [mlir][spirv] Extract Atomic/Cast/Group op implementation. NFC.
Continue to work outlined in D155747 and split the main SPIR-V ops
implementation file into a few smaller and quicker to compile files.
This organization matches the op definition organizaion in `.td` files.

In this patch, extract atomic, cast/conversion, and group op
implementation into separate files.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D155777
2023-07-20 11:15:30 -04:00
Ingo Müller
f62cb13fb2 [mlir][linalg][transform] Rename ApplyPatternsOp.{region => patterns}.
This gives the region a more meaningful name. The topic came up in a
discussion on https://reviews.llvm.org/D155435, where the name `region`
would have led to a situation where a convenience accessor called
`region` (after the ODS name) would have returned a Block.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D155810
2023-07-20 14:20:45 +00:00
Sergio Afonso
40340cf91a [MLIR][OpenMP][OMPIRBuilder] Use target triple to initialize IsGPU flag
This patch modifies the construction of the `OpenMPIRBuilder` in MLIR to
initialize the `IsGPU` flag using target triple information passed down from
the Flang frontend. If not present, it will default to `false`.

This replicates the behavior currently implemented in Clang, where the
`CodeGenModule::createOpenMPRuntime()` method creates a different
`CGOpenMPRuntime` instance depending on the target triple, which in turn has an
effect on the `IsGPU` flag of the `OpenMPIRBuilderConfig` object.

Differential Revision: https://reviews.llvm.org/D151903
2023-07-20 15:07:50 +01:00
Markus Böck
f9173c2958 [mlir][LLVM] Convert noalias parameters into alias scopes during inlining
Currently, inlining a function with a `noalias` parameter leads to a large loss of optimization potential as the `noalias` parameter, an important hint for alias analysis, is lost completely.

This patch fixes this with the same approach as LLVM by annotating all users of the `noalias` parameter with appropriate alias and noalias scope lists.
The implementation done here is not as sophisticated as LLVMs, which has more infrastructure related to escaping and captured pointers, but should work in the majority of important cases.
Any deficiency can be addressed in future patches.

Related LLVM code: 27ade4b554/llvm/lib/Transforms/Utils/InlineFunction.cpp (L1090)

Differential Revision: https://reviews.llvm.org/D155712
2023-07-20 15:05:28 +02:00
Guray Ozen
836dbb8522 [mlir][nvgpu] Add mbarrier.arrive.expect_tx and mbarrier.try_wait.parity
This work adds two Ops:
`mbarrier.arrive.expect_tx` performs expect_tx `mbarrier.barrier` returns `mbarrier.barrier.token`
`mbarrier.try_wait.parity` waits on `mbarrier.barrier` and `mbarrier.barrier.token`

`mbarrier.arrive.expect_tx` is one of the requirement to enable H100 TMA support.

Depends on D154074 D154076 D154059 D154060

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D154094
2023-07-20 13:48:30 +02:00
Tobias Gysi
10fa27704b [mlir][llvm] Add branch weight op interface
This revision adds a branch weight op interface for the call / branch
operations that support branch weights. It can be used in the LLVM IR
import and export to simplify the branch weight conversion. An
additional mapping between call operations and instructions ensures
the actual conversion can be done in the module translation itself,
rather than in the dialect translation interface. It also has the
benefit that downstream users can amend custom metadata to the call
operation during the export to LLVM IR.

Reviewed By: zero9178, definelicht

Differential Revision: https://reviews.llvm.org/D155702
2023-07-20 10:46:04 +00:00
Ivan Butygin
9dec3fd812 [mlir] Add ub dialect and poison op.
Add new dialect boilerplate and `poison` op definition.

Discussion: https://discourse.llvm.org/t/rfc-poison-semantics-for-mlir/66245/24

Differential Revision: https://reviews.llvm.org/D154248
2023-07-20 11:19:43 +02:00
Matthias Springer
2137915137 [mlir] Remove some code duplication between Builders.cpp and FoldUtils.cpp
Also update the documentation of `Operation::fold`, which did not take into account in-place foldings.

Differential Revision: https://reviews.llvm.org/D155691
2023-07-20 10:27:14 +02:00
Matthias Springer
dd115e5a9b [mlir][IR] Implement proper folder for IsCommutative trait
Commutative ops were previously folded with a special rule in `OperationFolder`. This change turns the folding into a proper `OpTrait` folder.

Differential Revision: https://reviews.llvm.org/D155687
2023-07-20 10:19:48 +02:00
Kelvin Li
99dc3935b9 [flang] Add PowerPC vec_convert, vec_ctf and vec_cvf intrinsic
Co-authored-by: Paul Scoropan <1paulscoropan@gmail.com>

Differential Revision: https://reviews.llvm.org/D155235
2023-07-19 22:27:55 -04:00
Jakub Kuderski
9415241c5b [mlir][spirv] Split op implementation file into subfiles. NFC.
The main op implementation file for SPIR-V grew past 5k LOC. This makes it
take a long time to compile and index with LSPs like clangd.

Pull out the first few SPIR-V extension ops into their own `.cpp` files,
just like we do with `.td` op definitions. This includes the
KHR/NV/Intel coop matrix and the integer dot prod extensions.

I plan to further split this in future revisions.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D155747
2023-07-19 16:48:47 -04:00
Mahesh Ravishankar
67399932c7 [mlir][Linalg] Cleanup the drop unit dims pass in Linalg.
TL;DR the following API functions have been merged

```
void populateFoldUnitExtentDimsViaReshapesPatterns(RewritePatternSet &patterns);
void populateFoldUnitExtentDimsViaSlicesPatterns(RewritePatternSet &patterns);
```

into

```
void populateFoldUnitExtentDimsPatterns(RewritePatternSet &patterns,
                                        ControlDropUnitDims &options);
```

To use the previous functionality use

```
ControlDropUnitDims options;
// By default options.rankReductionStrategy is
// ControlDropUnitDims::RankReductionStrategy::ReassociativeReshape.
populateFoldUnitExtentDimsPatterns(patterns, options);
```

and

```
ControlDropUnitDims options;
options.rankReductionStrategy = ControlDropUnitDims::RankReductionStrategy::ExtractInsertSlice
populateFoldUnitExtentDimsPatterns(patterns, options);

```

This pass is quite old and needed to be updated based on the current
approach to transformations in Linalg

- Instead of two patterns, one to just remove loop dimensions that are
  unit extent (and using 0 in the indexing maps), and another to drop
  the unit-extents in the operand shapes, combine into a single
  transformation. This avoid creating an intermediate step with
  indexing maps having 0's in the domains exp ressions.

- Expose the core transformation as a utility function and add a
  pattern that calls this transformation.

This is a mostly NFC change, apart from the API change and dropping
the patterns/test that only dropped the loops that are unit extents.

Differential Revision: https://reviews.llvm.org/D155518
2023-07-19 17:47:18 +00:00
Valentin Clement
1d49834280 [mlir][openacc] Relax verifier for the acc.reduction.recipe
Allow the init and combiner regions to have more
arguments to pass information.

Reviewed By: razvanlupusoru

Differential Revision: https://reviews.llvm.org/D155656
2023-07-19 10:31:37 -07:00
Razvan Lupusoru
65cd3cfed2 [nfc][openacc] Add missing comma in acc dialect operation macros
A comma is missing which is incorrect if macro is used.

Reviewed By: clementval

Differential Revision: https://reviews.llvm.org/D155722
2023-07-19 09:12:06 -07:00
Jakub Kuderski
68cd1dbc2e [mlir][spirv] Add cooperative matrix store op
Implement cooperative matrix store for the `SPV_KHR_cooperative_matrix`
extension: https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/KHR/SPV_KHR_cooperative_matrix.html.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D155631
2023-07-19 11:01:09 -04:00
Jakub Kuderski
1fa9e150b4 [mlir][spirv] Add cooperative matrix load op
Implement cooperative matrix load for the `SPV_KHR_cooperative_matrix`
extension: https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/KHR/SPV_KHR_cooperative_matrix.html.

Also some minor fixes in common code for custom parsing.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D155616
2023-07-19 10:55:27 -04:00
Razvan Lupusoru
132c376a33 [openacc] Add attribute to hold declare data clause information
For variables in declare clauses, their producing operation should be
marked with the data clause for ease of lookup and consistency
verification. Thus add an attribute that can be used for this purpose
plus verification that declare data operation matches the declare
data clause on variable.

Reviewed By: clementval

Differential Revision: https://reviews.llvm.org/D155640
2023-07-19 07:54:57 -07:00
Markus Böck
1dda134f85 [mlir][flang] Convert TBAA metadata to an attribute representation
The current representation of TBAA is the very last in-tree user of the `llvm.metadata` operation.
Using ops to model metadata has a few disadvantages:
* Building a graph has to be done through some weakly typed indirection mechanism such as `SymbolRefAttr`
* Creating the metadata has to be done through a builder within a metadata op.
* It is not multithreading safe as operation insertion into the same block is not thread-safe

This patch therefore converts TBAA metadata into an attribute representation, in a similar manner as it has been done for alias groups and access groups in previous patches.

This additionally has the large benefit of giving us more "correctness by construction" as it makes things like cycles in a TBAA graph, or references to an incorrectly typed metadata node impossible.

Differential Revision: https://reviews.llvm.org/D155444
2023-07-19 16:42:50 +02:00
Martin Erhart
07c079a97a [mlir][bufferization] Add lowering of bufferization.dealloc to memref.dealloc
Adds a generic lowering that suppors all cases of bufferization.dealloc
and one specialized, more efficient lowering for the simple case. Using
a helper function with for loops in the general case enables
O(|num_dealloc_memrefs|+|num_retain_memrefs|) size of the lowered code.

Depends on D155467

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D155468
2023-07-19 14:28:01 +00:00
Guray Ozen
bf62748342 [mlir][nvvm] Introduce Syncronization Ops for WGMMA
This work introduces : `wgmma.fence.aligned`, `wgmma.commit.group.sync.aligned` and `wgmma.wait.group.sync.aligned` Ops. They are used to syncronize warpgroup level matrix multiply-accumulate instructions, as known as WGMMA.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155676
2023-07-19 11:45:04 +02:00
Markus Böck
ec3cbe92c0 [mlir][LLVM] add llvm.ssa.copy intrinsic
This is quite the niche intrinsic, whose whole purpose is to be able to essentially split an SSA value to be able to attach additional information to the new value.
It therefore really acts like a noop that just passes through its argument. It interestingly does not have any support in CodeGen and is therefore required to also be deleted by any pass creating it.

Differential Revision: https://reviews.llvm.org/D155678
2023-07-19 09:53:28 +02:00
Matthias Springer
9d072bbe0f [mlir][NFC] Avoid OpBuilder::setListener when possible
`setListener` is dangerous because an already registered listener may accidentally be overwritten/replaced. (A `ForwardingListener` must be used in such cases.) This change updates a few trivial call sites of `setListener`, where no forwarding listener is needed.

Differential Revision: https://reviews.llvm.org/D155599
2023-07-19 09:13:38 +02:00
Alexander Belyaev
2e52b093cf [mlir] Remove dead code in Analysis/FlatLinearValueConstraints.
Differential Revision: https://reviews.llvm.org/D154830
2023-07-19 08:19:53 +02:00
River Riddle
857b0a1f40 [mlir-lsp] Add client information to the InitializationParams
This is specified in the spec, but we just never really needed it. This
allows for users of the LSP libraries to inspect information about the
client that is connected to the server.

Differential Revision: https://reviews.llvm.org/D155566
2023-07-18 12:27:01 -07:00
Razvan Lupusoru
b232054fac [openacc] Update data clause attribute definition
Instead of using the I64EnumAttr for the DataClause, now use
EnumAttr instead. This makes tests more readable since now
one can use the format #acc<data_clause acc_copyin> instead of
just the number.

Reviewed By: clementval, vzakhari

Differential Revision: https://reviews.llvm.org/D155605
2023-07-18 11:39:33 -07:00
Amanda Tang
5267ed05bc [ODS] Use Adaptor Traits for Type Inference
Author inferReturnTypes methods with the Op Adaptor by using the InferTypeOpAdaptor.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D155115
2023-07-18 17:58:31 +00:00
Quentin Colombet
9be8219f60 [mlir][Linalg] Add an interface to decompose complex ops
This patch adds an interface, named AggregatedOpInterface, that decomposes
complex operations into simpler ones.

For now, make the interface specific to Linalg because although the concept
is general, the way to materialize it needs some maturing.

Use that interface with the softmax operator.

Differential Revision: https://reviews.llvm.org/D154363
2023-07-18 19:06:36 +02:00
Rahul Kayaith
67a910bbff [mlir][python] Remove PythonAttr mapping functionality
This functionality has been replaced by TypeCasters (see D151840)

depends on D154468

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D154469
2023-07-18 12:21:28 -04:00
Razvan Lupusoru
7496177d98 [openacc] Add dialect definition for acc declare
A declare directive is used to specify the creation of a visible device
copy of a variable for the duration of the implicit data region as it
relates to the scope in which the variable is declared.

In order to support this, the following new operations were added:
1) `acc.global_ctor` and `acc.global_dtor`. These are used whenever the
declare directive applies to a global.
2) `acc.declare_enter` and `acc.declare_exit`. These operations are
modeled similarly to `acc.enter_data` and `acc.exit_data`. The reason
they are not modeled like `acc.data` is so that these operations can be
used both for globals and regions like functions.
3) `acc.declare_device_resident` and `acc.declare_link`. These
operations are modeled in a manner consistent with previously defined
data entry operation model.

The `acc.getdeviceptr` was generalized so that it can be used with
acc.declare_exit.

Reviewed By: clementval, vzakhari

Differential Revision: https://reviews.llvm.org/D155322
2023-07-18 08:11:06 -07:00
iambrj
3dd9931c0f [MLIR][Presburger] Implement domain and range restriction for PresburgerRelation
This patch implements domain and range restriction for PresburgerRelation

Reviewed By: Groverkss

Differential Revision: https://reviews.llvm.org/D154798
2023-07-18 19:12:12 +05:30
Matthias Springer
db393288ff [mlir][NVGPU][transform] Add create_async_groups transform op
This transform looks for suitable vector transfers from global memory to shared memory and converts them to async device copies.

Differential Revision: https://reviews.llvm.org/D155569
2023-07-18 14:36:41 +02:00
Alex Zinenko
4a6b31b8d8 [mlir] NFC: untangle SCF Patterns.h and Transforms.h
These two headers both contained a strange mix of definitions related to
both patterns and non-pattern transforms. Put patterns and "populate"
functions into Patterns.h and standalone transforms into Transforms.h.

Depends On: D155223

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155454
2023-07-18 11:27:36 +00:00
Andrzej Warzynski
3fa5ee67ba [mlir][ArmSME] Introduce custom TypeConverter for ArmSME
At the moment, SME-to-LLVM lowerings rely entirely on
`LLVMTypeConverter`. This patch introduces a dedicated `TypeConverter`
that inherits from `LLVMTypeConverter` (it will also be used when
lowering ArmSME Ops to LLVM).

The new type converter merely disables lowerings for `VectorType` to
prevent 2-d scalable vectors (common in the context of ArmSME), e.g.

   `vector<[16]x[16]xi8>`,

entering the LLVM Type converter. LLVM does not support arrays of
scalable vectors and hence the need for specialisation. In the case of
SME such types are effectively eliminated when emitting LLVM IR
intrinsics for SME.

Differential Revision: https://reviews.llvm.org/D155365
2023-07-18 09:35:32 +00:00
Cullen Rhodes
fb54fec726 [mlir][ArmSME] Implement tile allocation
This patch adds a pass '-allocate-sme-tiles' to the ArmSME dialect that
implements allocation of SME ZA tiles.

It does this at the 'func.func' op level by replacing
'arm_sme.get_tile_id' ops with 'arith.constant' ops that represent the
tile number. The tiles in use in a given function are tracked by an
integer function attribute 'arm_sme.tiles_in_use' that is a 16-bit tile
mask with a bit for each 128-bit element tile (ZA0.Q-ZA15.Q), the
smallest ZA tile granule. This is initialized on the first
'arm_sme.get_tile_id' rewrite and updated on each subsequent rewrite.
Mixing of different element tile types is supported.

Section B2.3.2 of the SME spec [1] describes how the 128-bit element
tiles overlap with other element tiles.

Depends on D154941

[1] https://developer.arm.com/documentation/ddi0616/aa

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D154955
2023-07-18 08:46:40 +00:00
Andrzej Warzynski
447bb5bee4 [mlir][ArmSME] Introduce new lowering layer (Vector -> ArmSME)
At the moment, the lowering from the Vector dialect to SME looks like
this:

  * Vector --> SME LLVM IR intrinsics

This patch introduces a new lowering layer between the Vector dialect
and the Arm SME extension:

  * Vector --> ArmSME dialect (custom Ops) --> SME LLVM IR intrinsics.

This is motivated by 2 considerations:
1. Storing `ZA` to memory (e.g. `vector.transfer_write`) requires an
   `scf.for` loop over all rows of `ZA`. Similar logic will apply to
   "load to ZA from memory". This is a rather complex transformation and
   a custom Op seems justified.
2. As discussed in [1], we need to prevent the LLVM type converter from
   having to convert types unsupported in LLVM, e.g.
   `vector<[16]x[16]xi8>`. A dedicated abstraction layer with custom Ops
   opens a path to some fine tuning (e.g. custom type converters) that
   will allow us to avoid this.

To facilitate this change, two new custom SME Op are introduced:

  * `TileStoreOp`, and
  * `ZeroOp`.

Note that no new functionality is added - these Ops merely model what's
already supported. In particular, the following tile size is assumed
(dimension and element size are fixed):

  * `vector<[16]x[16]xi8>`

The new lowering layer is introduced via a conversion pass between the
Vector and the SME dialects. You can use the `-convert-vector-to-sme`
flag to run it. The following function:
```
func.func @example(%arg0 : memref<?x?xi8>) {
  // (...)
  %cst = arith.constant dense<0> : vector<[16]x[16]xi8>
  vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8>
  return
}
```
would be lowered to:
```
  func.func @example(%arg0: memref<?x?xi8>) {
    // (...)
    %0 = arm_sme.zero : vector<[16]x[16]xi8>
    arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>
    return
  }
```

Later, a mechanism will be introduced to guarantee that `arm_sme.zero`
and `arm_sme.tile_store` operate on the same virtual tile. For `i8`
elements this is not required as there is only one tile.

In order to lower the above output to LLVM, use
  * `-convert-vector-to-llvm="enable-arm-sme"`.

[1] https://github.com/openxla/iree/issues/14294

Reviewed By: WanderAway

Differential Revision: https://reviews.llvm.org/D154867
2023-07-18 08:04:59 +00:00
Cullen Rhodes
6ff9761a69 [mlir][ArmSME] Add custom get_tile_id and cast ops
This patch adds three new custom ops to the ArmSME dialect:

  * arm_sme.get_tile_id - returns a scalar integer representing an SME
    "virtual tile" that is not in use.
  * arm_sme.cast_tile_to_vector - casts from a tile id to a 2-d scalable
    vector type, which represents an SME "virtual tile".
  * arm_sme.cast_vector_to_tile - casts from a 2-d scalable vector type,
    which represents an SME "virtual tile", to a tile id.

The 'arm_sme.get_tile_id' op currently only supports tile 0, a follow-up
patch will implement proper tile allocation. A further follow-up patch
will demonstrate load/store to/from ZA using these ops.

See the op descriptions for further details and examples.

Thanks to @paulwalker-arm and @awarzynski for helping drive this.

Reviewed By: awarzynski, dcaballe

Differential Revision: https://reviews.llvm.org/D154941
2023-07-18 07:41:45 +00:00
Martin Erhart
d582562188 [mlir][bufferization] Add DeallocOp
The dealloc operation deallocates each of the given memrefs if there is no alias
to that memref in the list of retained memrefs and the corresponding
condition value is set. This condition can be used to indicate and pass on
ownership of memref values (or in other words, the responsibility of
deallocating that memref). If two memrefs alias each other, only one will be
deallocated to avoid double free situations.

The memrefs to be deallocated must be the originally allocated memrefs,
however, the memrefs to be retained may be arbitrary memrefs.

Returns a list of conditions corresponding to the list of memrefs which
indicates the new ownerships, i.e., if the memref was deallocated the
ownership was dropped (set to 'false') and otherwise will be the same as the
input condition.

Differential Revision: https://reviews.llvm.org/D155467
2023-07-18 07:32:49 +00:00
Matthias Springer
9f808f6e2f [mlir][vector][NFC] Drop get...AttrStrName helper functions
These functions are not needed. They are auto-generated from the `.td` files.

Differential Revision: https://reviews.llvm.org/D155483
2023-07-17 18:16:08 +02:00
Guray Ozen
baba13e9a1 [mlir][nvvm] Delete backslash
Delete the backslash. It was there to compile tablegen file. It looks like space also works fine.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D155474
2023-07-17 17:56:52 +02:00
Matthias Springer
0e8c68c301 [mlir][Interfaces] Fix DestinationStyleOpInterface for vector ops
This revision fixes `hasTensorSemantics` and `hasBufferSemantics` for vector transfer ops, which may have a vector operand. `VectorType` implements `ShapedType` and such operands do not affect whether an op has tensor or buffer semantics. Also implement `DestinationStyleOpInterface` on `TransferReadOp` so that `hasTensorSemantics`/`hasBufferSemantics` can be called. (The op has no inits, but this makes it symmetric to `TransferWriteOp`.)

Differential Revision: https://reviews.llvm.org/D155469
2023-07-17 17:40:18 +02:00