mirror of
https://github.com/intel/llvm.git
synced 2026-01-26 12:26:52 +08:00
Add support for distributing the `vector.multi_reduction` operation across lanes in a warp. Currently only 2D to 1D reductions are supported. Given layouts for the source and accumulator vectors, * If the reduction dimension is distributed across lanes, the reduction is non-lane-local and the reduction is done using warp shuffles. Here we simply rewrite the `MultiDimReductionOp` to a sequence of `ReductionOp`s inside the warp op body. Actual distribution will be done by `WarpOpReduction` pattern. * If the reduction dimension is not distributed across lanes, the reduction is lane-local. In this case, we yield the source and accumulator vectors from the warp op and perform the lane-local reduction outside the warp op using a sequence of `ReductionOp`s. PR also adds support for distributing `vector.shape_cast` based on layouts.