mirror of
https://github.com/intel/llvm.git
synced 2026-01-13 19:08:21 +08:00
This PR re-lands #165873. This PR extends the gpu.subgroup_mma_* ops to support fp64 type. The extension requires special handling during the lowering to nvvm due to the return type for load ops for fragment a and b (they return a scalar instead of a struct). The original PR did not guard the new test based on the required architecture (sm80) which lead to a failure on the cuda runners with T4 GPUs.
Multi-Level Intermediate Representation
See https://mlir.llvm.org/ for more information.