intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-26 12:26:52 +08:00

Author	SHA1	Message	Date
Longsheng Mou	8f9fc6ce47	[mlir][GPU] Add FunctionOpInterface check for `OpToFuncCallLowering` (#113449 ) This PR adds a `FunctionOpInterface` check in `OpToFuncCallLowering` to resolve a crash when ops not in function. Fixes #113334.	2024-10-26 11:22:08 +08:00
Matthias Springer	206fad0e21	[mlir][NFC] Mark type converter in `populate...` functions as `const` (#111250 ) This commit marks the type converter in `populate...` functions as `const`. This is useful for debugging. Patterns already take a `const` type converter. However, some `populate...` functions do not only add new patterns, but also add additional type conversion rules. That makes it difficult to find the place where a type conversion was added in the code base. With this change, all `populate...` functions that only populate pattern now have a `const` type converter. Programmers can then conclude from the function signature that these functions do not register any new type conversion rules. Also some minor cleanups around the 1:N dialect conversion infrastructure, which did not always pass the type converter as a `const` object internally.	2024-10-05 21:32:40 +02:00
Matthias Springer	2da417e7f6	[mlir][GPU] gpu.printf: Do not emit duplicate format strings (#110504 ) Even if the same format string is used multiple times, emit just one `LLVM:GlobalOp`.	2024-10-01 09:12:08 +02:00
Matthias Springer	49df12c01e	[mlir][NFC] Minor cleanup around `ModuleOp` usage (#110498 ) Use `moduleOp.getBody()` instead of `moduleOp.getBodyRegion().front()`.	2024-09-30 21:20:48 +02:00
Daniel Hernandez-Juarez	1c47fa9b62	[mlir][AMDGPU] Add support for AMD f16 math library calls (#108809 ) In this PR we add support for AMD f16 math library calls (`__ocml_*_f16`) CC: @krzysz00 @manupak	2024-09-23 12:52:00 -05:00
Krzysztof Drewniak	90a0be9482	[mlir][LLVM] Refactor how range() annotations are handled for ROCDL intrinsics (#107658 ) This commit introduces a ConstantRange attribute to match the ConstantRange attribute type present in LLVM IR. It then refactors the LLVM_IntrOpBase so that the basic part of the intrinsic builder code can be re-used without needing to copy it or get rid of important context. This, along with adding code for handling an optional `range` attribute to that same base, allows us to make the support for range() annotations generic without adding another bit to IntrOpBase. This commit then updates the lowering of index intrinsic operations to use the new ConstantRange attribute and fixes a bug (where we'd be subtracting 1 from upper bounds instead of adding it on operations like gpu.block_dim) along the way. The point of these changes is to enable these range annotations to be used for the corresponding NVVM operations in a future commit.	2024-09-12 09:46:42 -05:00
Victor Perez	75cb9edf09	[MLIR][GPU-LLVM] Add GPU to LLVM-SPV address space mapping (#102621 ) Implement mapping: - `global`: 1 - `workgroup`: 3 - `private`: 0 Add `addressSpaceToStorageClass`, mapping GPU address spaces to SPIR-V storage classes to be able to use SPIR-V's `storageClassToAddressSpace`, mapping SPIR-V storage classes to LLVM address spaces according to our mapping above by definition. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>	2024-08-16 11:18:35 +02:00
Victor Perez	d45de8003a	[MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` (#101664 ) Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to `llvm.func` operations. - `spir_kernel`/`spir_func` calling conventions used for kernels/functions. - `workgroup` attributions encoded as additional `llvm.ptr<3>` arguments. - No attribute used to annotate kernels - `reqd_work_group_size` attribute using to encode `gpu.known_block_size`. - `llvm.mlir.workgroup_attrib_size` used to encode workgroup attribution sizes. This will be attached to the pointer argument workgroup attributions lower to. Note: A notable missing feature that will be addressed in a follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace MemRef arguments with bare pointers to the MemRef element types instead of the current MemRef descriptor approach. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>	2024-08-09 16:09:11 +02:00
Emilio Cota	4d51e83728	[mlir] fixes for `f6431f0c52`	2024-07-25 22:18:05 -04:00
runseny	f6431f0c52	[MLIR][GPUToNVVM] support fastMath and other non-supported mathOp (#99890 ) Support fastMath and other non-supported mathOp which only require float operands and call libdevice function directly to nvvm. 1. lowering mathOp with fastMath attribute to correct libdevice intrinsic. 2. some mathOp in math dialect has been lowered to libdevice now, but it doesn't cover all mathOp. so this mr lowers all the remaining mathOp which only require float operands.	2024-07-25 13:54:58 +02:00
Matthias Springer	9e8ccf6b64	[mlir][Conversion] `FuncToLLVM`: Simplify bare-pointer handling (#96393 ) Before this commit, there used to be a workaround in the `func.func`/`gpu.func` op lowering when the bare-pointer calling convention is enabled. This workaround "patched up" the argument materializations for memref arguments. This can be done directly in the argument materialization functions (as the TODOs in the code base indicate). This commit effectively reverts back to the old implementation (`a664c14001`) and adds additional checks to make sure that bare pointers are used only for function entry block arguments.	2024-06-24 08:38:26 +02:00
Matthias Springer	3f33d2f3ca	[mlir][GPUToNVVM] Fix memref function args/results (#96392 ) The `gpu.func` op lowering accounts for memref arguments/results (both "normal" and bare-pointer supported), but the `gpu.return` op lowering did not. The lowering produced invalid IR that did not verify. This commit uses the same lowering strategy as for `func.return` in the `gpu.return` lowering. (The C++ implementation is copied. We may want to share some code between `func` and `gpu` lowerings in the future.)	2024-06-23 09:51:12 +02:00
Krzysztof Drewniak	43fd4c49bd	[mlir][GPU] Improve handling of GPU bounds (#95166 ) This change reworks how range information for GPU dispatch IDs (block IDs, thread IDs, and so on) is handled. 1. `known_block_size` and `known_grid_size` become inherent attributes of GPU functions. This makes them less clunky to work with. As a consequence, the `gpu.func` lowering patterns now only look at the inherent attributes when setting target-specific attributes on the `llvm.func` that they lower to. 2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size` are made official dialect-level discardable attributes which can be placed on arbitrary functions. This allows for progressive lowerings (without this, a lowering for `gpu.thread_id` couldn't know about the bounds if it had already been moved from a `gpu.func` to an `llvm.func`) and allows for range information to be provided even when `gpu._{id,dim}` are being used outside of a `gpu.func` context. 3. All of these index operations have gained an optional `upper_bound` attribute, allowing for an alternate mode of operation where the bounds are specified locally and not inherited from the operation's context. These also allow handling of cases where the precise launch sizes aren't known, but can be bounded more precisely than the maximum of what any platform's API allows. (I'd like to thank @benvanik for pointing out that this could be useful.) When inferring bounds (either for range inference or for setting `range` during lowering) these sources of information are consulted in order of specificity (`upper_bound` > inherent attribute > discardable attribute, except that dimension sizes check for `known__bounds` to see if they can be constant-folded before checking their `upper_bound`). This patch also updates the documentation about the bounds and inference behavior to clarify what these attributes do when set and the consequences of setting them up incorrectly. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-06-17 23:47:38 -05:00
Jay Foad	d4a0154902	[llvm-project] Fix typo "seperate" (#95373 )	2024-06-13 20:20:27 +01:00
Fabian Mora	8e12f31be5	[mlir][gpu] Update LaunchFuncOp lowering in GPU to LLVM (#94991 ) This patch updates the lowering of `LaunchFuncOp` in GPU to LLVM to only legalize the operation with the converted operands, effectively removing the lowering used by the old serialization pipeline. It also removes all remaining uses of the old gpu serialization infrastructure in `gpu-to-llvm`. See [Compilation overview \| 'gpu' Dialect - MLIR docs](https://mlir.llvm.org/docs/Dialects/GPU/#compilation-overview) for additional information on the target attributes compilation pipeline that replaced the old serialization pipeline.	2024-06-10 20:22:22 -05:00
Kazu Hirata	dec8055a1e	[mlir] Use StringRef::operator== instead of StringRef::equals (NFC) (#91560 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 10 under mlir/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-08 23:52:22 -07:00
Christian Sigg	a5757c5b65	Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356 ) This change cleans up call sites. Next step is to mark the member functions deprecated. See https://mlir.llvm.org/deprecation and https://discourse.llvm.org/t/preferred-casting-style-going-forward.	2024-04-19 15:58:27 +02:00
Jakub Kuderski	971b852546	[mlir][NFC] Simplify type checks with isa predicates (#87183 ) For more context on isa predicates, see: https://github.com/llvm/llvm-project/pull/83753.	2024-04-01 11:40:09 -04:00
Andrei Golubev	89cd345667	[mlir][LLVM] Use int32_t to indirectly construct GEPArg (#79562 ) GEPArg can only be constructed from int32_t and mlir::Value. Explicitly cast other types (e.g. unsigned, size_t) to int32_t to avoid narrowing conversion warnings on MSVC. Some recent examples of such are: ``` mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398: Element '1': conversion from 'size_t' to 'T' requires a narrowing conversion with [ T=mlir::LLVM::GEPArg ] mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398: Element '1': conversion from 'unsigned int' to 'T' requires a narrowing conversion with [ T=mlir::LLVM::GEPArg ] ``` Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>	2024-01-26 14:27:51 +01:00
Matthias Springer	5fcf907b34	[mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260 ) This commit renames 4 pattern rewriter API functions: * `updateRootInPlace` -> `modifyOpInPlace` * `startRootUpdate` -> `startOpModification` * `finalizeRootUpdate` -> `finalizeOpModification` * `cancelRootUpdate` -> `cancelOpModification` The term "root" is a misnomer. The root is the op that a rewrite pattern matches against (https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional). A rewriter must be notified of all in-place op modifications, not just in-place modifications of the root (https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old function names were confusing and have contributed to various broken rewrite patterns. Note: The new function names use the term "modify" instead of "update" for consistency with the `RewriterBase::Listener` terminology (`notifyOperationModified`).	2024-01-17 11:08:59 +01:00
Guray Ozen	2aec7083ad	[mlir][gpu] Use DenseI32Array for NVVM's maxntid and reqntid (NFC) (#77466 )	2024-01-09 16:44:25 +01:00
Guray Ozen	763109e346	[mlir][gpu] Use `known_block_size` to set `maxntid` for NVVM target (#77301 ) Setting thread block size with `maxntid` on the kernel has great performance benefits. In this way, downstream PTX compiler can do better register allocation. MLIR's `gpu.launch` and `gpu.launch_func` already has an attribute (`known_block_size`) that keeps the thread block size when it is known. This PR simply uses this attribute to set `maxntid`.	2024-01-08 14:49:19 +01:00
Krzysztof Drewniak	ddd6acd7a8	[mlir][GPU] Expand LLVM function attribute copies (#76755 ) Expand the copying of attributes on GPU kernel arguments during LLVM lowering. Support copying attributes from values that are already LLVM pointers. Support copying attributes, like `noundef`, that aren't specific to (the pointer parts of) arguments.	2024-01-03 14:28:15 -06:00
Paul C Fuqua	11141bc68a	Fix what seems to be a silly bug in gpu.set_default_device rewriting. Smoke test included. (#75756 )	2023-12-20 09:35:42 -06:00
Mehdi Amini	6ac80a7677	Apply clang-tidy fixes for readability-identifier-naming in GPUToLLVMConversion.cpp (NFC)	2023-12-07 21:39:25 -08:00
Mehdi Amini	9415fca848	[mlir] Fix build with shared libs (missing cmake link dependency) (NFC)	2023-11-29 12:17:52 -08:00
Mehdi Amini	9e7b6f46ba	[mlir] Adopt `ConvertToLLVMPatternInterface` GpuToLLVMConversionPass to align with `convert-to-llvm` (#73761 ) This is a follow-up to the introduction of `convert-to-llvm`: it is supposed to be a unifying pass through the `ConvertToLLVMPatternInterface`, but some specific conversion (like the GPU target) aren't vanilla LLVM target. Instead they need extra customizations that are specific to LLVM-on-GPUs and our custom runtime wrappers. This change make the GpuToLLVMConversionPass just as pluggable as the `convert-to-llvm` by using the same mechanism.	2023-11-29 11:37:53 -08:00
Guray Ozen	edf5cae739	[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871 ) NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.	2023-11-27 11:05:07 +01:00
Mehdi Amini	e204b9198a	Apply clang-tidy fixes for llvm-else-after-return in GPUToLLVMConversion.cpp (NFC)	2023-11-20 01:40:49 -08:00
Guray Ozen	ea84897ba3	[mlir][gpu] Introduce `gpu.dynamic_shared_memory` Op (#71546 ) While the `gpu.launch` Op allows setting the size via the `dynamic_shared_memory_size` argument, accessing the dynamic shared memory is very convoluted. This PR implements the proposed Op, `gpu.dynamic_shared_memory` that aims to simplify the utilization of dynamic shared memory. RFC: https://discourse.llvm.org/t/rfc-simplifying-dynamic-shared-memory-access-in-gpu/ Proposal from RFC This PR `gpu.dynamic.shared.memory` Op to use dynamic shared memory feature efficiently. It is is a powerful feature that enables the allocation of shared memory at runtime with the kernel launch on the host. Afterwards, the memory can be accessed directly from the device. I believe similar story exists for AMDGPU. Current way Using Dynamic Shared Memory with MLIR Let me illustrate the challenges of using dynamic shared memory in MLIR with an example below. The process involves several steps: - memref.global 0-sized array LLVM's NVPTX backend expects - dynamic_shared_memory_size Set the size of dynamic shared memory - memref.get_global Access the global symbol - reinterpret_cast and subview Many OPs for pointer arithmetic ``` // Step 1. Create 0-sized global symbol. Manually set the alignment memref.global "private" @dynamicShmem : memref<0xf16, 3> { alignment = 16 } func.func @main() { // Step 2. Allocate shared memory gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 { // Step 3. Access the global object %shmem = memref.get_global @dynamicShmem : memref<0xf16, 3> // Step 4. A sequence of `memref.reinterpret_cast` and `memref.subview` operations. %4 = memref.reinterpret_cast %shmem to offset: [0], sizes: [14, 64, 128], strides: [8192,128,1] : memref<0xf16, 3> to memref<14x64x128xf16,3> %5 = memref.subview %4[7, 0, 0][7, 64, 128][1,1,1] : memref<14x64x128xf16,3> to memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> %6 = memref.subview %5[2, 0, 0][1, 64, 128][1,1,1] : memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> to memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> %7 = memref.subview %6[0, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 73728>, 3> %8 = memref.subview %6[32, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 77824>, 3> // Step.5 Use "test.use.shared.memory"(%7) : (memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>) -> (index) "test.use.shared.memory"(%8) : (memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>) -> (index) gpu.terminator } ``` Let’s write the program above with that: ``` func.func @main() { gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 { %i = arith.constant 18 : index // Step 1: Obtain shared memory directly %shmem = gpu.dynamic_shared_memory : memref<?xi8, 3> %c147456 = arith.constant 147456 : index %c155648 = arith.constant 155648 : index %7 = memref.view %shmem[%c147456][] : memref<?xi8, 3> to memref<64x64xf16, 3> %8 = memref.view %shmem[%c155648][] : memref<?xi8, 3> to memref<64x64xf16, 3> // Step 2: Utilize the shared memory "test.use.shared.memory"(%7) : (memref<64x64xf16, 3>) -> (index) "test.use.shared.memory"(%8) : (memref<64x64xf16, 3>) -> (index) } } ``` This PR resolves #72513	2023-11-16 14:42:17 +01:00
Mehdi Amini	d9dadfda85	Refactor ModuleToObject to offer more flexibility to subclass (NFC) Some specific implementation of the offload may want more customization, and even avoid using LLVM in-tree to dispatch the ISA translation to a custom solution. This refactoring makes it possible for such implementation to work without even configuring the target backend in LLVM. Reviewers: fabianmcg Reviewed By: fabianmcg Pull Request: https://github.com/llvm/llvm-project/pull/71165	2023-11-03 13:41:45 -07:00
Christian Ulmann	97a238e863	[MLIR][LLVM] Remove typed pointer conversion utils (#71169 ) This commit removes the no longer required type pointer helpers from the LLVM dialect conversion utils. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-11-03 13:02:35 +01:00
Christian Ulmann	dbd4a0dd38	[MLIR][GPUCommon] Remove typed pointer support (#70735 ) This commit removes the GPUCommon's lowering support for typed pointers. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-10-31 09:22:44 +01:00
Nishant Patel	ced9f4f0e8	[MLIR] Modify lowering of gpu.alloc op to llvm (#69969 ) If gpu.alloc has no asyn deependency ( in case if gpu.alloc has hostShared allocation), create a new stream & synchronize. This PR is follow up to #66401	2023-10-25 22:00:47 +03:00
Christian Ulmann	484668c759	Reland "[MLIR][LLVM] Change addressof builders to use opaque pointers" (#69292 ) This relands `fbde19a664`, which was broken due to incorrect GEP element type creation. This commit changes the builders of the `llvm.mlir.addressof` operations to no longer produce typed pointers. As a consequence, a GPU to NVVM pattern had to be updated, that still relied on typed pointers.	2023-10-17 11:33:45 +02:00
Christian Ulmann	9397e5f581	Revert "[MLIR][LLVM] Change addressof builders to use opaque pointers (#69215 )" This reverts commit `fbde19a664` due to breaking integration tests.	2023-10-17 06:31:48 +00:00
Christian Ulmann	fbde19a664	[MLIR][LLVM] Change addressof builders to use opaque pointers (#69215 ) This commit changes the builders of the `llvm.mlir.addressof` operations to no longer produce typed pointers. As a consequence, a GPU to NVVM pattern and the toy example LLVM lowerings had to be updated, as they still relied on typed pointers.	2023-10-17 07:55:00 +02:00
Aart Bik	39038177ee	[mlir][sparse][gpu] add CSC and BSR format to cuSparse GPU ops (#67509 ) This adds two cuSparse formats to the GPU dialect support. Together with proper lowering and runtime cuda support. Also fixes a few minor omissions.	2023-09-27 09:32:25 -07:00
Nishant Patel	1002a1d058	[MLIR] Pass hostShared flag in gpu.alloc op to runtime wrappers (#66401 ) This PR is a breakdown of the big PR https://github.com/llvm/llvm-project/pull/65539 which enables intel gpu integration. In this PR we pass hostShared flag to runtime wrappers (required by SyclRuntimeWrappers which will come in subsequent PR) to indicate if the allocation is done on host shared gpu memory or device only memory.	2023-09-26 15:32:11 -07:00
Nishant Patel	ebfea261e6	[MLIR] Pass count of parameters & gpu binary size to runtime wrappers (#66154 ) This PR is a breakdown of the big PR #65539 which enables intel gpu integration. In this PR we pass count of parameters and size of gpu binary to runtime wrappers since the SyclRuntimeWrappers (which will come in subsequent PR) requires the spirv size for compilation and also the number of parameters to iterate over the params.	2023-09-26 11:27:07 -07:00
Tobias Gysi	85175edd4e	[mlir][llvm] Replace NullOp by ZeroOp (#67183 ) This revision replaces the LLVM dialect NullOp by the recently introduced ZeroOp. The ZeroOp is more generic in the sense that it represents zero values of any LLVM type rather than null pointers only. This is a follow to https://github.com/llvm/llvm-project/pull/65508	2023-09-25 11:11:52 +02:00
stefankoncarevic	fbf67bfaf0	[mlir][GPU] Handle LLVM pointer attributes on memref arguments. Handle pointer attributes (noalias, nonnull, readonly, writeonly, dereferencable, dereferencable_or_null). "noalias" attribute is ignore for non-bare pointer. Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D157082	2023-09-11 15:10:55 +00:00
Adrian Kuegel	583e78b372	[mlir] Apply ClangTidy fixes (NFC) Prefer to use .empty() instead of checking size().	2023-08-23 17:51:11 +02:00
Matthias Springer	7f4dbd83dc	[mlir][GPU][NFC] Remove type converter hack Remove `dangerousSetOptions` and call `promoteOperands` with the correct arguments directly. Differential Revision: https://reviews.llvm.org/D158175	2023-08-18 15:28:47 +02:00
Aart Bik	289f7231f9	[mlir][sparse][gpu] minor code cleanup for sparse gpu ops Consistent order of ops and related methods. Also, renamed SpGEMMGetSizeOp to SpMatGetSizeOp since this is a general utility for sparse matrices, not specific to GEMM ops only. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D157922	2023-08-14 15:08:57 -07:00
Matthias Springer	ce254598b7	[mlir][Conversion] Store const type converter in ConversionPattern ConversionPatterns do not (and should not) modify the type converter that they are using. * Make `ConversionPattern::typeConverter` const. * Make member functions of the `LLVMTypeConverter` const. * Conversion patterns take a const type converter. * Various helper functions (that are called from patterns) now also take a const type converter. Differential Revision: https://reviews.llvm.org/D157601	2023-08-14 09:03:11 +02:00
Fabian Mora	fcfeb1e5b3	[mlir][gpu] Add GPU target support to `gpu-to-llvm`. For an explanation of these patches see D154153. This patch modifies the lowering of `gpu.module` & `gpu.launch_func` in the `gpu-to-llvm` pass, allowing the usage of the new GPU compilation mechanism in the patch series ending in D154153. Instead of removing Modules, this patch preserves the module if it has target attributes so that the `gpu-module-to-binary` pass can later serialize them. Instead of lowering the kernel calls to the LLVM dialect, this patch primarily updates the operation's arguments, leaving the job of converting the operation into LLVM instructions to the translation stage. The reason for not lowering the operation to LLVM at this stage is that kernel launches do not have a single one-to-one representation in LLVM. For example, a kernel launch can be represented by a call to a kernel stub, like in CUDA or HIP. Kernel launches are also intrinsically linked to the binary associated with the call, and the binaries are converted during translation. Depends on D154149 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154152	2023-08-12 00:27:28 +00:00
Aart Bik	95a6c509c9	[mlir][sparse][gpu] add set csr pointers, remove estimate op, fix bugs Rationale: Since we only support default algorithm for SpGEMM, we can remove the estimate op (for now at least). This also introduces the set csr pointers op, and fixes a few bugs in the existing lowering for the SpGEMM breakdown. This revision paves the way for actual recognition of SpGEMM in the sparsifier. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157645	2023-08-10 13:52:47 -07:00
Nicolas Vasilache	888717e853	[mlir][transform] Enable gpu-to-nvvm via conversion patterns driven by TD This revision untangles a few more conversion pieces and allows rewriting the relatively intricate (and somewhat inconsistent) LowerGpuOpsToNVVMOpsPass in a declarative fashion that provides a much better understanding and control. Differential Revision: https://reviews.llvm.org/D157617	2023-08-10 15:30:48 +00:00
Aart Bik	e7e4ed0d7a	[mlir][sparse][gpu] only support default algorithm for SpGEMM Rationale: This is the approach taken for all the others too (SpMV, SpMM, SDDMM), so it is more consistent to follow the same path (until we have a need for more algorithms). Also, in a follow up revision, this will allow us to remove some unused GEMM ops. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157542	2023-08-09 12:49:47 -07:00

1 2 3 4 5

239 Commits