This PR adds an integration test for an argmax kernel with
`mlir-vulkan-runner`. This test exercises the `convert-to-spirv` pass
(landed in #95942) and demonstrates that we can use SPIR-V ops as
"intrinsics" among higher-level dialects.
The support for `index` dialect in `mlir-vulkan-runner` is also added.
**Description**
This PR adds a new option for `convert-to-spirv` pass to clone and
convert only GPU kernel modules for integration testing. The reason for
using pass options instead of two separate passes is that they both
consist of `memref` types conversion and individual dialect patterns,
except they run on different scopes. The PR also replaces the
`gpu-to-spirv` pass with the `convert-to-spirv` pass (with the new
option) in `mlir-vulkan-runner`.
**Future Plan**
Use nesting pass pipelines in `mlir-vulkan-runner` instead of adding
this option.
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
On linux lldb-dap uses the location of the lldb-dap binary to search for
lldb-server. Previously these were produced in different directories
corresponding to the BUILD file paths. It's not ideal that the BUILD
file location matters for the binary at runtime but it doesn't hurt to
have this tool here too like the others.
The cmake config creates two targets, `MLIRTensorMeshShardingExtensions`
and `MLIRTensorAllExtensions`; but for bazel, with the `Func` dialect we
only have a single `FuncExtensions`. Here I am following the `Func`
dialect convension to only create a single `TensorExtensions`.
This PR adds conversion patterns for GPU to the `convert-to-spirv` pass,
introduced in #95942. Now the pass is able to convert each `gpu.module`
and its ops within a `builtin.module` into a `spirv.module`.
**Future Plans**
- Use `gpu.launch_func` to invoke kernel from host functions
- Potentially integrate into the `mlir-vulkan-runner` for e2e testing
I assume the intent of the initial `*/*.py` was to also collect things
in `*.py`, but that's not what bazel does unless you use `**/*.py` which
is what we're doing now. A few of these tests fail so I explicitly
disabled them until someone has time to debug.
This PR adds conversion patterns for MemRef to the `convert-to-spirv`
pass, introduced in #95942. Conversions from MemRef memory space to
SPIR-V storage class were also included, and would run before the final
dialect conversion phase.
**Future Plans**
- Add tests for ops other than `memref.load` and `memref.store`
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
The str_to_float conversion code doesn't need the features provided by
fenv and the dependency is creating a blocker for hand-in-hand. This
patch uses a workaround to remove this dependency.
This reverts commit 0fa20c55b5.
Storing raw symbol names is generally preferred in profile files.
Demangling might lose information. Language frontends might use
demangling schemes not supported by LLVMDemangle
(https://github.com/llvm/llvm-project/issues/45901#issuecomment-2008686663).
In addition, calling `demangle` for each function has a significant
performance overhead (#102222).
I believe that even if we decide to provide a producer-side demangling,
it would not be on by default.
Pull Request: https://github.com/llvm/llvm-project/pull/102274
The implementation of these methods are legacy and they are removed in
favor of using the `scf::tileUsingSCF` methods as replacements. To get
the latter on par with requirements of the deprecated methods, the
tiling allows one to specify the maximum number of tiles to use instead
of specifying the tile sizes. When tiling to `scf.forall` this
specification is used to generate the `num_threads` version of the
operation.
A slight deviation from previous implementation is that the deprecated
method always generated the `num_threads` variant of the `scf.forall`
operation. Instead now this is driven by the tiling options specified.
This reduces the indexing math generated when the tile sizes are
specified.
**Moving from `linalg::tileToForallOp` to `scf::tileUsingSCF`**
```
OpBuilder b;
TilingInterface op;
ArrayRef<OpFoldResult> numThreads;
ArrayAttr mapping;
FailureOr<ForallTilingResult> result =linalg::tileToForallOp(b, op, numThreads, mapping);
```
can be replaced by
```
scf::SCFTilingOptions options;
options.setNumThreads(numThreads);
options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp);
options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */
FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options);
```
This generates the `numThreads` version of the `scf.forall` for the
inter-tile loops, i.e.
```
... = scf.forall (%arg0, %arg1) in (%nt0, %nt1) shared_outs(...)
```
**Moving from `linalg::tileToForallOpUsingTileSizes` to
`scf::tileUsingSCF`**
```
OpBuilder b;
TilingInterface op;
ArrayRef<OpFoldResult> tileSizes;
ArrayAttr mapping;
FailureOr<ForallTilingResult> result =linalg::tileToForallOpUsingTileSizes(b, op, tileSizes, mapping);
```
can be replaced by
```
scf::SCFTilingOptions options;
options.setTileSizes(tileSizes);
options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp);
options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */
FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options);
```
Also note that `linalg::tileToForallOpUsingTileSizes` would effectively
call the `linalg::tileToForallOp` by computing the `numThreads` from the
`op` and `tileSizes` and generate the `numThreads` version of the
`scf.forall`. That is not the case anymore. Instead this will directly
generate the `tileSizes` version of the `scf.forall` op
```
... = scf.forall(%arg0, %arg1) = (%lb0, %lb1) to (%ub0, %ub1) step(%step0, %step1) shared_outs(...)
```
If you actually want to use the `numThreads` version, it is upto the
caller to compute the `numThreads` and set `options.setNumThreads`
instead of `options.setTileSizes`. Note that there is a slight
difference in the num threads version and tile size version. The former
requires an additional `affine.max` on the tile size to ensure
non-negative tile sizes. When lowering to `numThreads` version this
`affine.max` is not needed since by construction the tile sizes are
non-negative. In previous implementations, the `numThreads` version
generated when using the `linalg::tileToForallOpUsingTileSizes` method
would avoid generating the `affine.max` operation. To get the same
state, downstream users will have to additionally normalize the
`scf.forall` operation.
**Changes to `transform.structured.tile_using_forall`**
The transform dialect op that called into `linalg::tileToForallOp` and
`linalg::tileToForallOpUsingTileSizes` have been modified to call
`scf::tileUsingSCF`. The transform dialect op always generates the
`numThreads` version of the `scf.forall` op. So when `tile_sizes` are
specified for the transform dialect op, first the `tile_sizes` version
of the `scf.forall` is generated by the `scf::tileUsingSCF` method which
is then further normalized to get back to the same state. So there is no
functional change to `transform.structured.tile_using_forall`. It always
generates the `numThreads` version of the `scf.forall` op (as it did
before this change).
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>