intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-25 10:55:58 +08:00

Author	SHA1	Message	Date
Rolf Morel	f12fcf030c	[MLIR][Transform][Python] transform.foreach wrapper and .owner OpViews (#172228 ) Friendlier wrapper for transform.foreach. To facilitate that friendliness, makes it so that OpResult.owner returns the relevant OpView instead of Operation. For good measure, also changes Value.owner to return OpView instead of Operation, thereby ensuring consistency. That is, makes it is so that all op-returning .owner accessors return OpView (and thereby give access to all goodies available on registered OpViews.) Reland of #171544 due to fixup for integration test.	2025-12-14 22:10:31 +00:00
Mehdi Amini	b9fe6532a7	Revert "[MLIR][Transform][Python] transform.foreach wrapper and .owner OpViews" (#172225 ) Reverts llvm/llvm-project#171544 ; bots are broken.	2025-12-14 21:27:02 +00:00
Rolf Morel	4cdec92827	[MLIR][Transform][Python] transform.foreach wrapper and .owner OpViews (#171544 ) Friendlier wrapper for `transform.foreach`. To facilitate that friendliness, makes it so that `OpResult.owner` returns the relevant `OpView` instead of `Operation`. For good measure, also changes `Value.owner` to return `OpView` instead of `Operation`, thereby ensuring consistency. That is, makes it is so that all op-returning `.owner` accessors return `OpView` (and thereby give access to all goodies available on registered `OpView`s.)	2025-12-14 20:44:15 +00:00
Asher Mancinelli	4b267d5caa	[MLIR][MemRef] Emit error on atomic generic result op defined outside the region (#172190 ) While figuring out how to perform an atomic exchange on a memref, I tried the generic atomic rmw with the yielded value captured from the enclosing scope (instead of a plain atomic_rmw with `arith::AtomicRMWKind::assign`). Instead of segfaulting, this PR changes the pass to produce an error when the result is not found in the region's IR map. It might be more useful to give a suggestion to the user, but giving an error message instead of a crash is at least an imrovement, I think. See: #172184	2025-12-14 08:31:43 -08:00
Ivan Butygin	f785ca0d72	[mlir][nvgpu] Move memref memspace attributes conversion to single place (#172156 ) Also, some fixes for AMDGPU part for better naming.	2025-12-14 12:44:47 +03:00
Rolf Morel	b33354f272	[MLIR][Python][Transform] Print diagnostics also upon success (#172188 ) If we do not collect the diagnostics from the CollectDiagnosticsToStringScope, even when the named_sequence applied successfully, the Scope object's destructor will assert (with a unhelpful message).	2025-12-14 00:35:52 +00:00
Longsheng Mou	ad8d9e1428	[mlir][gpu] Use `arith` dialect to lower gpu.global_id (#171614 ) This PR lowers the`gpu.global_id` op using the arith dialect instead of the index dialect. Fixes #171303.	2025-12-13 18:43:12 +08:00
Guray Ozen	eeaf435859	[MLIR][Remarks] Improve the doc (#171128 )	2025-12-13 10:53:38 +01:00
Susan Tan (ス-ザン　タン)	47b4c6a7d7	[acc][test] add tests for RegionBranchOpInterface for acc regions (#172073 ) use last modified analysis to test if RegionBranchOpInterface is correct on acc regions	2025-12-12 18:03:44 -05:00
Maksim Levental	536163650e	[mlir][LLVM] refactor FailOnUnsupportedFP (#172054 ) Enable `FailOnUnsupportedFP` for `ConvertToLLVMPattern` and set it to `true` for all `math-to-llvm` patterns. This fixes various invalid lowerings of `math` ops on `fp8`/`fp4` types.	2025-12-12 22:31:39 +00:00
Zhewen Yu	d107b3c82a	[MLIR][AMDGPU] Implement reifyDimOfResult for FatRawBufferCastOp (#171839 ) Since `FatRawBufferCastOp` preserves the shape of its source operand, the result dimensions can be reified by querying the source's dimensions. --------- Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>	2025-12-12 11:39:00 -08:00
Maksim Levental	8d5ade8feb	[mlir] enable APFloatWrappers on MacOS (#172070 )	2025-12-12 11:34:23 -08:00
Durgadoss R	9dc6f18a3e	[MLIR][NVVM] Fix results-check for mbarrier Op (#171657 ) This patch fixes the lowering of the newly added mbarrier.arrive Op w.r.t return value. (Follow-up of PR #170545) Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-12-12 22:45:09 +05:30
Mehdi Amini	0570cab7c1	[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in IndexingUtils.cpp (NFC)	2025-12-12 08:22:33 -08:00
Ravil Dorozhinskii	3ae5f2782e	[ROCDL] Added LDS barrier ops to ROCDL (gfx1250) (#171810 ) Added `ds.atomic.barrier.arrive.rtn.b64` and `ds.atomic.async.barrier.arrive.b64` to ROCDL. These are parts of the LDS memory barrier concept in GFX1250. Also added alias analysis to `global/flat` data prefetch ops. Extended rocdl tests.	2025-12-12 16:27:59 +01:00
Asher Mancinelli	568ce76c6e	[MLIR][LLVM] Add pass to update ops with default visibility (#171727 ) To support the `-fvisibility=...` option in Flang, we need a pass to rewrite all the global definitions in the LLVM dialect that have the default visibility to have the specified visibility. This change adds such a pass. Note that I did not add an option for `visiblity=default`; I believe this makes sense for compiler drivers since users may want to tack an option on at the end of a compile line to override earlier options, but I don't think it makes sense for this pass to accept `visibility=default`--it would just be an early exit IIUC.	2025-12-12 06:49:20 -08:00
Erick Ochoa Lopez	5ebb928532	[mlir][amdgpu] Adds make_dma_gather_base (#171857 ) * Adds `tdm_gather_base` type. * Adds `make_dma_gather_base` op. * Adds `make_dma_gather_base` lowering to ROCDL.	2025-12-12 09:20:38 -05:00
Kirill Vedernikov	07eb9fa43f	[MLIR][NVVM] Support for dense and sparse MMA with block scaling (#170566 ) This change adds dense and sparse MMA with block scaling intrinsics to MLIR -> NVVM IR -> NVPTX flow. NVVM and NVPTX implementation is based on PTX ISA 9.0.	2025-12-12 13:47:00 +01:00
Ryutaro Okada	04b197599e	[MLIR] [Vector] Fix canonicalization for vector.scatter with tensor output (#168824 ) Commit `7e7ea9c535` added tensor support for scatter, but running the existing canonicalization on tensors causes bugs, so we fix the canonicalization with tensor output. Closes https://github.com/llvm/llvm-project/issues/168695 --------- Signed-off-by: Ryutaro Okada <1015ryu88@gmail.com>	2025-12-12 12:24:37 +00:00
Kunwar Grover	e4733424bc	[mlir][Vector] Improve vector.transferx store-to-load-forwarding (#171840 ) This patch changes the transfer_write -> transfer_read load store forwarding canonicalization pattern to work based on permutation maps and less on adhoc logic. The old logic couldn't canonicalize a simple unit dim broadcast through transfer_write/transfer_read which is added as a test in this patch. This patch also details what would be needed to support cases which are not yet implemented better.	2025-12-12 10:37:42 +00:00
lonely eagle	917e458b96	[mlir] Cleanup the addLegalOp of convert-linalg-to-std pass (NFC) (#171979 )	2025-12-12 18:02:56 +08:00
ShivaChen	a318c50110	[mlir][tosa] Remove NegateOp to SubOp and 48-bit promotion in TosaToLinalg (#170622 ) The patch motivated by Tosa Conformance test negate_32x45x49_i16_full failure. TosaToLinalg pass has an optimization to transfer Tosa Negate to Sub if the zero points are zeros. However, when the input value is minimum negative number, the transformation will cause the underflow. By removing the transformation, if zp = 0 it would do the promotion to avoid the underflow. Promotion types could be from int32 to int48. TOSA negate specification does not mention support for int48. Should we consider removing the promotion to int48 to stay aligned with the TOSA spec?	2025-12-12 16:45:22 +08:00
Hongzheng Chen	1335a05ab8	[MLIR][Python] Fix AffineIfOp insertion point (#171957 ) This bug was introduced by #108323, where the loc and ip were not properly set. It may lead to errors when the operations are not linearly asserted to the IR.	2025-12-11 20:23:46 -08:00
Hongzheng Chen	86cc934b4a	[python] Expose replaceUsesOfWith C API (#171892 ) This PR exposes the `replaceUsesOfWith` C API to Python	2025-12-11 16:09:18 -08:00
Nishant Patel	71ee84acc4	[MLIR][Vector] Add unroll pattern for vector.constant_mask (#171518 ) This PR adds unrolling for vector.constant_mask op based on the targetShape. Each unrolled vector computes its local mask size in each dimension (d) as: min(max(originalMaskSize[d] - offset[d], 0), unrolledMaskSize[d]).	2025-12-11 13:16:55 -08:00
Erick Ochoa Lopez	2f9b8b7428	[mlir][amdgpu] Continue lowering make_tdm_descriptor. (#171498 ) * changes workgroup mask's type from i16 to vector<16xi1> * changes pad_amount and pad_interval from Index to I32 * adds lit tests for padEnable, iteration and dynamic cases * adds TODO for a future instrumentation pass to validate inputs * adds descriptor groups 2 and 3	2025-12-11 15:49:50 -05:00
Ivan Butygin	c22d82a1d4	[mlir][amdgpu] Move GPU memory spaces conversion to single place (#171876 )	2025-12-11 21:39:57 +03:00
Eric Feng	00b3a18550	[mlir][rocdl] add gfx950 smfmac instructions to rocdl dialect (#171737 ) Signed-off-by: Eric Feng <Eric.Feng@amd.com>	2025-12-11 09:05:36 -08:00
Arun Thangamani	b0f1f77cfe	[mlir][x86vector] Sink Vector.transfer_reads and vector.load before the consumer (#169333 ) Adds a pattern that sinks vector producer ops (`vector.load` and `vector.transfer_read`) forward in a block to their earliest legal use, reducing live ranges and improving scheduling opportunities. The lowering pattern: `batch_reduce.matmul` (input) -> register-tiling(M, N) -> Vectorization (to `vector.contract`) -> `unroll` vector.contract (`unit` dims) -> `hoisting` transformation (move `C` loads/store outside batch/k loop) -> sink vector producers -> apply `licm`, `canonicalization`, and `bufferize` -> `vector.contract` to `fma` -> sink vector producers.	2025-12-11 16:59:55 +00:00
Luke Hutton	9f8e0f6606	[mlir][tosa] Add clamp op support to `TosaNarrowI64ToI32` pass (#169308 ) This commit allows the narrowing of `tosa.clamp` when the min/max attributes are within the int32 range.	2025-12-11 15:51:36 +00:00
Andrzej Warzyński	8512c074c8	[mlir][vector] Remove hooks deprecated pre Release/22 branch (#171829 ) As mentioned on Discourse, * https://discourse.llvm.org/t/psa-vector-standardise-operand-naming I am removing the deprecated Vector hooks near the creation of the release/22 branch. These hooks were introduced in #158258 (~September '25, ~3 months ago), so I assume folks have enough time to transition away.	2025-12-11 15:22:36 +00:00
Erick Ochoa Lopez	87345d2ad4	[mlir][amdgpu] Add type conversion to populate method (NFC) (#171708 ) * Renames populateAMDGPUMemorySpaceAttributeConversions to populateAMDGPUTypeAndAttributeConversions. * Adds TDMBaseType conversion to populateAMDGPUTypeAndAttributeConversions.	2025-12-11 08:44:19 -05:00
Mehdi Amini	4ea7488c27	[MLIR] Apply clang-tidy fixes for readability-identifier-naming in MLIRServer.cpp (NFC)	2025-12-11 04:37:47 -08:00
Jacques Pienaar	6b7b0ab530	Enable pass instrumentation to signal failures. (#163126 ) Enables adding instrumentation to pass manager that can track/flag invariants. This would be useful for cases where one some tighter requirements than the general dialects or for a phase of conversion that elsewhere. It would enable making verify also just a regular instrumentation I believe, but also a non-goal as that is a first class concept and baseline for the ops and passes. Would have enabled some of the requirements of https://discourse.llvm.org/t/pre-verification-logic-before-running-conversion-pass-in-mlir/88318/10 .	2025-12-11 14:26:10 +02:00
Mehdi Amini	0f2f9e1c80	[MLIR] Apply clang-tidy fixes for performance-unnecessary-copy-initialization in DropUnitDims.cpp (NFC)	2025-12-11 04:19:46 -08:00
Mehdi Amini	3558537e28	[MLIR] Apply clang-tidy fixes for readability-identifier-naming in TosaOps.cpp (NFC)	2025-12-11 04:15:09 -08:00
Mehdi Amini	bb40d94721	[MLIR] Apply clang-tidy fixes for llvm-else-after-return in ElementwiseOpFusion.cpp (NFC)	2025-12-11 04:15:09 -08:00
Men-cotton	06aecdbebe	[MLIR][SCF] Verify number of regions in scf.reduce (#171450 ) This patch adds `ReduceOp::verifyRegions` to ensure that the number of reduction regions equals the number of operands (`getReductions().size() == getOperands().size()`). Additionally, `ParallelOp::verify` is updated to gracefully handle cases where the number of reduce operands differs from the initial values, preventing verification logic crashes and relying on `ReduceOp` to report structural inconsistencies. Fixes: #118768	2025-12-11 12:39:39 +01:00
Durgadoss R	8af88a45ca	[MLIR][NVVM] Update PMEvent lowering to intrinsics (#171649 ) The patch updates the lowering of `id` based pmevent also to intrinsics. The mask is simply (1 << event-id). Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-12-11 16:28:12 +05:30
Kunwar Grover	f8d1f53bb6	[mlir][scf] Add value bound for computed upper bound of forall loop (#171158 ) Add additional bound for the induction variable of the scf.forall such that: %iv <= %lower_bound + (%trip_count - 1) * step Same as https://github.com/llvm/llvm-project/pull/126426 but for scf.forall loop	2025-12-11 10:49:13 +00:00
Abhishek Varma	39a723edbb	[Linalg] Add Conv2D matchers (#168362 ) -- This commit is the fourth in the series of adding matchers for linalg.conv/pool. Refer: https://github.com/llvm/llvm-project/pull/163724 -- In this commit all variants of Conv2D convolution ops have been added. -- It also refactors the way these matchers work to make adding more matchers concise. Signed-off-by: Abhishek Varma <abhvarma@amd.com> --------- Signed-off-by: Abhishek Varma <abhvarma@amd.com> Signed-off-by: hanhanW <hanhan0912@gmail.com> Co-authored-by: hanhanW <hanhan0912@gmail.com>	2025-12-10 23:39:14 -08:00
Razvan Lupusoru	575d6892bc	[mlir][acc] Introduce acc loop tiling pass (#171692 ) This pass implements the OpenACC loop tiling transformation for acc.loop operations that have the tile clause (OpenACC 3.4 spec, section 2.9.8). The tile clause specifies that the iterations of the associated loops should be divided into tiles (rectangular blocks). The pass transforms a single or nested acc.loop with tile clauses into a structure of "tile loops" (iterating over tiles) containing "element loops" (iterating within tiles). For example, tiling a 2-level nested loop with tile(T1, T2): ``` // Before tiling: acc.loop tile(T1, T2) control(%i, %j) = ... // After tiling: acc.loop control(%i) step (s1T1) { // tile loop 1 acc.loop control(%j) step (s2T2) { // tile loop 2 acc.loop control(%ii) = (%i) to (min(ub1, %i+s1T1)) { acc.loop control(%jj) = (%j) to (min(ub2, %j+s2T2)) { // loop body using %ii, %jj } } } } ``` Key features: - Handles constant tile sizes and wildcard tile sizes ('') which use a configurable default tile size - Properly handles collapsed loops with tile counts exceeding collapse count by uncollapsing loops before tiling - Distributes gang/worker/vector attributes appropriately: gang -> tile loops, vector -> element loops - Validates that tile size types are not wider than loop IV types - Emits optimization remarks for tiling decisions Three test files are added: - acc-loop-tiling.mlir: Tests single and nested loop tiling with constant tile sizes, unknown tile sizes (), and loops with collapse attributes - acc-loop-tiling-invalid.mlir: Tests error diagnostic when tile size type is wider than the loop IV type - acc-loop-tiling-remarks.mlir: Tests optimization remarks emitted for tiling decisions including default tile size selection Co-authored-by: Vijay Kandiah <vkandiah@nvidia.com>	2025-12-10 14:18:28 -08:00
Mehdi Amini	c1fd5ac50b	[MLIR] Apply clang-tidy fixes for readability-simplify-boolean-expr in NVVMDialect.cpp (NFC)	2025-12-10 10:23:08 -08:00
Charitha Saumya	3ece6626cb	[mlir][xegpu] Add support for `vector.extract_strided_slice` XeGPU SIMT distribution with partial offsets. (#171512 ) `vector.extract_strided_slice` can have two forms when specifying offsets. Case 1: ``` %1 = vector.extract_strided_slice %0 { offsets = [8, 0], sizes = [8, 16], strides = [1, 1]} : vector<24x16xf32> to vector<8x16xf32> ``` Case 2: ``` %1 = vector.extract_strided_slice %0 { offsets = [8], sizes = [8], strides = [1]} : vector<24x16xf32> to vector<8x16xf32> ``` These two ops means the same thing, but case 2 is syntactic sugar to avoid specifying offsets for fully extracted dims. Currently case 2 fails in XeGPU SIMT distribution. This PR fixes this issue.	2025-12-10 09:53:56 -08:00
Mehdi Amini	f5c28bdaa6	[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in ViewOpGraph.cpp (NFC)	2025-12-10 09:45:24 -08:00
Jianhui Li	8adcf0ad5a	[MLIR][XeGPU] Support subview memref: handling the base address during xegpu to xevm type conversion (#170541 ) During the XeGPU-to-XeVM type conversion, a memref is lowered to its base address. This PR extends the conversion to correctly handle memrefs that include an offset, such as those generated by memref.subview.	2025-12-10 08:53:18 -08:00
Valentin Clement (バレンタインクレメン)	bf81bdec66	[mlir][acc] Add isValidValueUse to OpenACCSupport (#171538 ) Add a new API `isValidValueUse ` to OpenACCSupport. This is used in ACCImplicitData to check value that are already legal in the OpenACC region and do not require implicit clause to be generated. An example would be a CUDA Fortran device variable that is already on the GPU.	2025-12-10 08:19:04 -08:00
Ravil Dorozhinskii	fec0a64dae	[ROCDL] Added global/flag data prefetch ops (#171449 ) This PR brings data prefetch ops to ROCDL for gfx1250 architecture. Extended all necessary rocdl tests	2025-12-10 09:57:26 -05:00
Ivan Butygin	c9c4e6eb58	Reland [mlir][amdgpu] Add common gpu mem space conversions to convert-amdgpu-to-rocdl (#171599 ) Reland https://github.com/llvm/llvm-project/pull/171543 Added missing GPU lib `MLIRGPUToGPURuntimeTransforms`.	2025-12-10 17:33:51 +03:00
Jacques Pienaar	1d0d7da57c	[mlir] Add symbol user attribute interface. (#153206 ) Enables verification of attributes, independent of op, that references symbols. This enables verifying Attribute with symbol usage independent of operation attached to (e.g., the validity is on the Attribute independent of the operation). --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2025-12-10 14:13:33 +00:00

1 2 3 4 5 ...

25253 Commits