Commit Graph

25253 Commits

Author SHA1 Message Date
Rolf Morel
f12fcf030c [MLIR][Transform][Python] transform.foreach wrapper and .owner OpViews (#172228)
Friendlier wrapper for transform.foreach.

To facilitate that friendliness, makes it so that OpResult.owner returns
the relevant OpView instead of Operation. For good measure, also changes
Value.owner to return OpView instead of Operation, thereby ensuring
consistency. That is, makes it is so that all op-returning .owner
accessors return OpView (and thereby give access to all goodies
available on registered OpViews.)

Reland of #171544 due to fixup for integration test.
2025-12-14 22:10:31 +00:00
Mehdi Amini
b9fe6532a7 Revert "[MLIR][Transform][Python] transform.foreach wrapper and .owner OpViews" (#172225)
Reverts llvm/llvm-project#171544 ; bots are broken.
2025-12-14 21:27:02 +00:00
Rolf Morel
4cdec92827 [MLIR][Transform][Python] transform.foreach wrapper and .owner OpViews (#171544)
Friendlier wrapper for `transform.foreach`.

To facilitate that friendliness, makes it so that `OpResult.owner`
returns the relevant `OpView` instead of `Operation`. For good measure,
also changes `Value.owner` to return `OpView` instead of `Operation`,
thereby ensuring consistency. That is, makes it is so that all
op-returning `.owner` accessors return `OpView` (and thereby give access
to all goodies available on registered `OpView`s.)
2025-12-14 20:44:15 +00:00
Asher Mancinelli
4b267d5caa [MLIR][MemRef] Emit error on atomic generic result op defined outside the region (#172190)
While figuring out how to perform an atomic exchange on a memref, I
tried the generic atomic rmw with the yielded value captured from the
enclosing scope (instead of a plain atomic_rmw with
`arith::AtomicRMWKind::assign`). Instead of segfaulting, this PR changes
the pass to produce an error when the result is not found in the
region's IR map.

It might be more useful to give a suggestion to the user, but giving an
error message instead of a crash is at least an imrovement, I think.

See: #172184
2025-12-14 08:31:43 -08:00
Ivan Butygin
f785ca0d72 [mlir][nvgpu] Move memref memspace attributes conversion to single place (#172156)
Also, some fixes for AMDGPU part for better naming.
2025-12-14 12:44:47 +03:00
Rolf Morel
b33354f272 [MLIR][Python][Transform] Print diagnostics also upon success (#172188)
If we do not collect the diagnostics from the
CollectDiagnosticsToStringScope, even when the named_sequence applied
successfully, the Scope object's destructor will assert (with a
unhelpful message).
2025-12-14 00:35:52 +00:00
Longsheng Mou
ad8d9e1428 [mlir][gpu] Use arith dialect to lower gpu.global_id (#171614)
This PR lowers the`gpu.global_id` op using the arith dialect instead of
the index dialect. Fixes #171303.
2025-12-13 18:43:12 +08:00
Guray Ozen
eeaf435859 [MLIR][Remarks] Improve the doc (#171128) 2025-12-13 10:53:38 +01:00
Susan Tan (ス-ザン タン)
47b4c6a7d7 [acc][test] add tests for RegionBranchOpInterface for acc regions (#172073)
use last modified analysis to test if RegionBranchOpInterface is correct
on acc regions
2025-12-12 18:03:44 -05:00
Maksim Levental
536163650e [mlir][LLVM] refactor FailOnUnsupportedFP (#172054)
Enable `FailOnUnsupportedFP` for `ConvertToLLVMPattern` and set it to
`true` for all `math-to-llvm` patterns. This fixes various invalid
lowerings of `math` ops on `fp8`/`fp4` types.
2025-12-12 22:31:39 +00:00
Zhewen Yu
d107b3c82a [MLIR][AMDGPU] Implement reifyDimOfResult for FatRawBufferCastOp (#171839)
Since `FatRawBufferCastOp` preserves the shape of its source operand,
the result dimensions can be reified by querying the source's
dimensions.

---------

Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>
2025-12-12 11:39:00 -08:00
Maksim Levental
8d5ade8feb [mlir] enable APFloatWrappers on MacOS (#172070) 2025-12-12 11:34:23 -08:00
Durgadoss R
9dc6f18a3e [MLIR][NVVM] Fix results-check for mbarrier Op (#171657)
This patch fixes the lowering of the newly
added mbarrier.arrive Op w.r.t return value.
(Follow-up of PR #170545)

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-12-12 22:45:09 +05:30
Mehdi Amini
0570cab7c1 [MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in IndexingUtils.cpp (NFC) 2025-12-12 08:22:33 -08:00
Ravil Dorozhinskii
3ae5f2782e [ROCDL] Added LDS barrier ops to ROCDL (gfx1250) (#171810)
Added `ds.atomic.barrier.arrive.rtn.b64` and
`ds.atomic.async.barrier.arrive.b64` to ROCDL. These are parts of the
LDS memory barrier concept in GFX1250. Also added alias analysis to
`global/flat` data prefetch ops. Extended rocdl tests.
2025-12-12 16:27:59 +01:00
Asher Mancinelli
568ce76c6e [MLIR][LLVM] Add pass to update ops with default visibility (#171727)
To support the `-fvisibility=...` option in Flang, we need a pass to
rewrite all the global definitions in the LLVM dialect that have the
default visibility to have the specified visibility. This change adds
such a pass.

Note that I did not add an option for `visiblity=default`; I believe
this makes sense for compiler drivers since users may want to tack an
option on at the end of a compile line to override earlier options, but
I don't think it makes sense for this pass to accept
`visibility=default`--it would just be an early exit IIUC.
2025-12-12 06:49:20 -08:00
Erick Ochoa Lopez
5ebb928532 [mlir][amdgpu] Adds make_dma_gather_base (#171857)
* Adds `tdm_gather_base` type.
* Adds `make_dma_gather_base` op.
* Adds `make_dma_gather_base` lowering to ROCDL.
2025-12-12 09:20:38 -05:00
Kirill Vedernikov
07eb9fa43f [MLIR][NVVM] Support for dense and sparse MMA with block scaling (#170566)
This change adds dense and sparse MMA with block scaling intrinsics to
MLIR -> NVVM IR -> NVPTX flow. NVVM and NVPTX implementation is based on
PTX ISA 9.0.
2025-12-12 13:47:00 +01:00
Ryutaro Okada
04b197599e [MLIR] [Vector] Fix canonicalization for vector.scatter with tensor output (#168824)
Commit
7e7ea9c535
added tensor support for scatter, but running the existing
canonicalization on tensors causes bugs, so we fix the canonicalization
with tensor output.

Closes https://github.com/llvm/llvm-project/issues/168695

---------

Signed-off-by: Ryutaro Okada <1015ryu88@gmail.com>
2025-12-12 12:24:37 +00:00
Kunwar Grover
e4733424bc [mlir][Vector] Improve vector.transferx store-to-load-forwarding (#171840)
This patch changes the transfer_write -> transfer_read load store
forwarding canonicalization pattern to work based on permutation maps
and less on adhoc logic. The old logic couldn't canonicalize a simple
unit dim broadcast through transfer_write/transfer_read which is added
as a test in this patch.

This patch also details what would be needed to support cases which are
not yet implemented better.
2025-12-12 10:37:42 +00:00
lonely eagle
917e458b96 [mlir] Cleanup the addLegalOp of convert-linalg-to-std pass (NFC) (#171979) 2025-12-12 18:02:56 +08:00
ShivaChen
a318c50110 [mlir][tosa] Remove NegateOp to SubOp and 48-bit promotion in TosaToLinalg (#170622)
The patch motivated by Tosa Conformance test negate_32x45x49_i16_full failure.

TosaToLinalg pass has an optimization to transfer Tosa Negate to Sub if the zero points are zeros. However, when the input value is minimum negative number, the transformation will cause the underflow. By removing the transformation, if zp = 0 it would do the promotion to avoid the underflow.

Promotion types could be from int32 to int48. TOSA negate specification does not mention support for int48. Should we consider removing the promotion to int48 to stay aligned with the TOSA spec?
2025-12-12 16:45:22 +08:00
Hongzheng Chen
1335a05ab8 [MLIR][Python] Fix AffineIfOp insertion point (#171957)
This bug was introduced by #108323, where the loc and ip were not
properly set. It may lead to errors when the operations are not linearly
asserted to the IR.
2025-12-11 20:23:46 -08:00
Hongzheng Chen
86cc934b4a [python] Expose replaceUsesOfWith C API (#171892)
This PR exposes the `replaceUsesOfWith` C API to Python
2025-12-11 16:09:18 -08:00
Nishant Patel
71ee84acc4 [MLIR][Vector] Add unroll pattern for vector.constant_mask (#171518)
This PR adds unrolling for vector.constant_mask op based on the
targetShape. Each unrolled vector computes its local mask size in each
dimension (d) as:
min(max(originalMaskSize[d] - offset[d], 0), unrolledMaskSize[d]).
2025-12-11 13:16:55 -08:00
Erick Ochoa Lopez
2f9b8b7428 [mlir][amdgpu] Continue lowering make_tdm_descriptor. (#171498)
* changes workgroup mask's type from i16 to vector<16xi1>
* changes pad_amount and pad_interval from Index to I32
* adds lit tests for padEnable, iteration and dynamic cases
* adds TODO for a future instrumentation pass to validate inputs
* adds descriptor groups 2 and 3
2025-12-11 15:49:50 -05:00
Ivan Butygin
c22d82a1d4 [mlir][amdgpu] Move GPU memory spaces conversion to single place (#171876) 2025-12-11 21:39:57 +03:00
Eric Feng
00b3a18550 [mlir][rocdl] add gfx950 smfmac instructions to rocdl dialect (#171737)
Signed-off-by: Eric Feng <Eric.Feng@amd.com>
2025-12-11 09:05:36 -08:00
Arun Thangamani
b0f1f77cfe [mlir][x86vector] Sink Vector.transfer_reads and vector.load before the consumer (#169333)
Adds a pattern that sinks vector producer ops (`vector.load` and
`vector.transfer_read`) forward in a block to their earliest legal use,
reducing live ranges and improving scheduling opportunities.

**The lowering pattern**: `batch_reduce.matmul` (input) ->
register-tiling(M, N) -> Vectorization (to `vector.contract`) ->
`unroll` vector.contract (`unit` dims) -> `hoisting` transformation
(move `C` loads/store outside batch/k loop) -> **sink vector producers**
-> apply `licm`, `canonicalization`, and `bufferize` ->
`vector.contract` to `fma` -> **sink vector producers**.
2025-12-11 16:59:55 +00:00
Luke Hutton
9f8e0f6606 [mlir][tosa] Add clamp op support to TosaNarrowI64ToI32 pass (#169308)
This commit allows the narrowing of `tosa.clamp` when the min/max
attributes are within the int32 range.
2025-12-11 15:51:36 +00:00
Andrzej Warzyński
8512c074c8 [mlir][vector] Remove hooks deprecated pre Release/22 branch (#171829)
As mentioned on Discourse,
  * https://discourse.llvm.org/t/psa-vector-standardise-operand-naming

I am removing the deprecated Vector hooks near the creation of the
release/22 branch. These hooks were introduced in #158258 (~September
'25, ~3 months ago), so I assume folks have enough time to transition
away.
2025-12-11 15:22:36 +00:00
Erick Ochoa Lopez
87345d2ad4 [mlir][amdgpu] Add type conversion to populate method (NFC) (#171708)
* Renames populateAMDGPUMemorySpaceAttributeConversions to
populateAMDGPUTypeAndAttributeConversions.
* Adds TDMBaseType conversion to
populateAMDGPUTypeAndAttributeConversions.
2025-12-11 08:44:19 -05:00
Mehdi Amini
4ea7488c27 [MLIR] Apply clang-tidy fixes for readability-identifier-naming in MLIRServer.cpp (NFC) 2025-12-11 04:37:47 -08:00
Jacques Pienaar
6b7b0ab530 Enable pass instrumentation to signal failures. (#163126)
Enables adding instrumentation to pass manager that can track/flag
invariants. This would be useful for cases where one some tighter
requirements than the general dialects or for a phase of conversion that
elsewhere.

It would enable making verify also just a regular instrumentation I
believe, but also a non-goal as that is a first class concept and
baseline for the ops and passes.

Would have enabled some of the requirements of
https://discourse.llvm.org/t/pre-verification-logic-before-running-conversion-pass-in-mlir/88318/10
.
2025-12-11 14:26:10 +02:00
Mehdi Amini
0f2f9e1c80 [MLIR] Apply clang-tidy fixes for performance-unnecessary-copy-initialization in DropUnitDims.cpp (NFC) 2025-12-11 04:19:46 -08:00
Mehdi Amini
3558537e28 [MLIR] Apply clang-tidy fixes for readability-identifier-naming in TosaOps.cpp (NFC) 2025-12-11 04:15:09 -08:00
Mehdi Amini
bb40d94721 [MLIR] Apply clang-tidy fixes for llvm-else-after-return in ElementwiseOpFusion.cpp (NFC) 2025-12-11 04:15:09 -08:00
Men-cotton
06aecdbebe [MLIR][SCF] Verify number of regions in scf.reduce (#171450)
This patch adds `ReduceOp::verifyRegions` to ensure that the number of
reduction regions equals the number of operands (`getReductions().size()
== getOperands().size()`).

Additionally, `ParallelOp::verify` is updated to gracefully handle cases
where the number of reduce operands differs from the initial values,
preventing verification logic crashes and relying on `ReduceOp` to
report structural inconsistencies.

Fixes: #118768
2025-12-11 12:39:39 +01:00
Durgadoss R
8af88a45ca [MLIR][NVVM] Update PMEvent lowering to intrinsics (#171649)
The patch updates the lowering of `id` based pmevent
also to intrinsics. The mask is simply (1 << event-id).

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-12-11 16:28:12 +05:30
Kunwar Grover
f8d1f53bb6 [mlir][scf] Add value bound for computed upper bound of forall loop (#171158)
Add additional bound for the induction variable of the scf.forall such
that:
%iv <= %lower_bound + (%trip_count - 1) * step

Same as https://github.com/llvm/llvm-project/pull/126426 but for
scf.forall loop
2025-12-11 10:49:13 +00:00
Abhishek Varma
39a723edbb [Linalg] Add *Conv2D* matchers (#168362)
-- This commit is the fourth in the series of adding matchers
for linalg.*conv*/*pool*. Refer:
https://github.com/llvm/llvm-project/pull/163724
-- In this commit all variants of Conv2D convolution ops have been
   added.
-- It also refactors the way these matchers work to make adding more
matchers concise.

Signed-off-by: Abhishek Varma <abhvarma@amd.com>

---------

Signed-off-by: Abhishek Varma <abhvarma@amd.com>
Signed-off-by: hanhanW <hanhan0912@gmail.com>
Co-authored-by: hanhanW <hanhan0912@gmail.com>
2025-12-10 23:39:14 -08:00
Razvan Lupusoru
575d6892bc [mlir][acc] Introduce acc loop tiling pass (#171692)
This pass implements the OpenACC loop tiling transformation for acc.loop
operations that have the tile clause (OpenACC 3.4 spec, section 2.9.8).

The tile clause specifies that the iterations of the associated loops
should be divided into tiles (rectangular blocks). The pass transforms a
single or nested acc.loop with tile clauses into a structure of "tile
loops" (iterating over tiles) containing "element loops" (iterating
within tiles).

For example, tiling a 2-level nested loop with tile(T1, T2):
```
  // Before tiling:
  acc.loop tile(T1, T2) control(%i, %j) = ...

  // After tiling:
  acc.loop control(%i) step (s1*T1) {        // tile loop 1
    acc.loop control(%j) step (s2*T2) {      // tile loop 2
      acc.loop control(%ii) = (%i) to (min(ub1, %i+s1*T1)) {
        acc.loop control(%jj) = (%j) to (min(ub2, %j+s2*T2)) {
          // loop body using %ii, %jj
        }
      }
    }
  }
```

Key features:
- Handles constant tile sizes and wildcard tile sizes ('*') which use a
configurable default tile size
- Properly handles collapsed loops with tile counts exceeding collapse
count by uncollapsing loops before tiling
- Distributes gang/worker/vector attributes appropriately: gang -> tile
loops, vector -> element loops
- Validates that tile size types are not wider than loop IV types
- Emits optimization remarks for tiling decisions

Three test files are added:
- acc-loop-tiling.mlir: Tests single and nested loop tiling with
constant tile sizes, unknown tile sizes (*), and loops with collapse
attributes
- acc-loop-tiling-invalid.mlir: Tests error diagnostic when tile size
type is wider than the loop IV type
- acc-loop-tiling-remarks.mlir: Tests optimization remarks emitted for
tiling decisions including default tile size selection

Co-authored-by: Vijay Kandiah <vkandiah@nvidia.com>
2025-12-10 14:18:28 -08:00
Mehdi Amini
c1fd5ac50b [MLIR] Apply clang-tidy fixes for readability-simplify-boolean-expr in NVVMDialect.cpp (NFC) 2025-12-10 10:23:08 -08:00
Charitha Saumya
3ece6626cb [mlir][xegpu] Add support for vector.extract_strided_slice XeGPU SIMT distribution with partial offsets. (#171512)
`vector.extract_strided_slice` can have two forms when specifying
offsets.

Case 1:
```
%1 = vector.extract_strided_slice %0 { offsets = [8, 0], sizes = [8, 16], strides = [1, 1]}
      : vector<24x16xf32> to vector<8x16xf32>
```

Case 2:
```
%1 = vector.extract_strided_slice %0 { offsets = [8], sizes = [8], strides = [1]}
      : vector<24x16xf32> to vector<8x16xf32>
```

These two ops means the same thing, but case 2 is syntactic sugar to
avoid specifying offsets for fully extracted dims. Currently case 2
fails in XeGPU SIMT distribution. This PR fixes this issue.
2025-12-10 09:53:56 -08:00
Mehdi Amini
f5c28bdaa6 [MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in ViewOpGraph.cpp (NFC) 2025-12-10 09:45:24 -08:00
Jianhui Li
8adcf0ad5a [MLIR][XeGPU] Support subview memref: handling the base address during xegpu to xevm type conversion (#170541)
During the XeGPU-to-XeVM type conversion, a memref is lowered to its
base address. This PR extends the conversion to correctly handle memrefs
that include an offset, such as those generated by memref.subview.
2025-12-10 08:53:18 -08:00
Valentin Clement (バレンタイン クレメン)
bf81bdec66 [mlir][acc] Add isValidValueUse to OpenACCSupport (#171538)
Add a new API `isValidValueUse ` to OpenACCSupport. This is used in
ACCImplicitData to check value that are already legal in the OpenACC
region and do not require implicit clause to be generated. An example
would be a CUDA Fortran device variable that is already on the GPU.
2025-12-10 08:19:04 -08:00
Ravil Dorozhinskii
fec0a64dae [ROCDL] Added global/flag data prefetch ops (#171449)
This PR brings data prefetch ops to ROCDL for gfx1250 architecture.
Extended all necessary rocdl tests
2025-12-10 09:57:26 -05:00
Ivan Butygin
c9c4e6eb58 Reland [mlir][amdgpu] Add common gpu mem space conversions to convert-amdgpu-to-rocdl (#171599)
Reland https://github.com/llvm/llvm-project/pull/171543

Added missing GPU lib `MLIRGPUToGPURuntimeTransforms`.
2025-12-10 17:33:51 +03:00
Jacques Pienaar
1d0d7da57c [mlir] Add symbol user attribute interface. (#153206)
Enables verification of attributes, independent of op, that references symbols.
This enables verifying Attribute with symbol usage independent of operation
attached to (e.g., the validity is on the Attribute independent of the operation).

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2025-12-10 14:13:33 +00:00