This commit introduces a flag to allow skipping the potentially
recursive import of DICompositeType elements. This patch is essentially a
bandaid for the still broken recursive debug type import.
Some of our downstream inputs are produced by excessive usage of
template meta programming, and thus contain tens of thousands of types
that all participate in such recursions. Unfortunately, the series of
patches that introduces type support is not easily revertible due to
being around for a while now and Modular depending on it.
We can consider to revert this change once the type importer has show to
be very performant, but for now we are talking second vs hours to import
specific files.
This commit extends the data layout to support scalable vectors. For
scalable vectors, the `TypeSize`'s scalable field is set accordingly,
and the alignment information remains the same as for normal vectors.
This behavior is in sync with what LLVM's data layout queries are
producing.
Before this change, scalable vectors incorrectly returned the same size
as "normal" vectors.
Restrict the types which are valid for EmitC operations. Use what is
currently supported by the emitter as restriction. Define a utility
functions for valid types, such that they can be used to restrict the
operations in table gen as well as being available for reuse in dialect
conversions.
This PR adds support for converting `vector.extract_strided_slice` and
`vector.extract` operations to equivalent `vector.shuffle` operations
that operates on linearized (1-D) vectors. `vector.shuffle` operations
operating on n-D (n > 1) are also converted to equivalent shuffle
operations working on linearized vectors.
The pull request includes the following changes.
1. Refactors the interface to `PresburgerSpace::identifiers` to `setId` and a
const `getId`, instead of previous `getId` which returned a mutable
reference. `resetIds` does not need to be called to use identifiers, `setId`
calls `resetIds` if identifiers are not enabled.
2. Deprecates `FlatAffineRelation` by refactoring all usages of
`FlatAffineRelation` to `IntegerRelation`. To achieve this,
`FlatAffineRelation::compose` is refactored into
`IntegerRelation::mergeAndCompose`.
3. Deletes unneeded overrides of virtual functions `hasConsistentState`,
`clearAndCopyFrom` and `fourierMotzkinEliminate` from
`FlatLinearValueConstraints` as these were only used through
`FlatAffineRelation` and we now use `IntegerRelation`'s member functions
instead.
4. Fixes an existing bug in FlatLinearValueConstraints' constructor
which caused
identifiers set by superclass FlatLinearConstraints' constructor to be
erased.
5. Fixes `IntegerRelation::convertVarKind` not preserving identifiers.
If the python callback throws an error, the c++ code will throw a
py::error_already_set that needs to be caught and handled in the c++
code .
This change is inspired by the similar solution in
PySymbolTable::walkSymbolTables.
This allows to configure both the op used for allocation and copy of
memrefs.
It also changes the default behavior because the default allocation in
`BufferizationOptions` creates `memref.alloc` with `alignment = 64`
where we used to create `memref.alloca` without any alignment before.
Fixes
```
// TODO: Use alloc/memcpy callback from BufferizationOptions if called via
// BufferizableOpInterface impl of ToMemrefOp.
```
These were added in faf697e49b so things
can flow through non-opaque LLVM ptrs. Those ptrs are gone so there is
no reason for this to be around anymore. LLVM doesn't support f8 types,
they get converted to i8 when lowering to LLVM dialect.
Removing the f8 types makes LLVM::isCompatibleType and
LLVM::isCompatibleFloatingPointType consistent again.
Move the documentation of the ownership-based buffer deallocation pass
to a separate file. Also improve the documentation a bit and insert a
figure that explains the `bufferization.dealloc` op (copied from the
tutorial at the LLVM Dev Summit 2023).
This commit improves LLVM dialect's Mem2Reg interfaces to support
promotions of partial loads from larger memory slots. To support this,
the Mem2Reg interface methods are extended with additional data layout
parameters. The data layout is required to determine type sizes to
produce correct conversion sequences.
Note: There will be additional followups that introduce a similar
functionality for stores, and there are plans to support accesses into
the middle of memory slots.
This PR add `TmaDescriptorBuilder`
- class simplifies TMA generation.
- Makes the code ready to support various Tma configurations
- removes strings and use the enums from `mlir.nvgpu.ENUMs`.
- Example "swizzle = swizzle_128b, l2promo=none, oob=zero,
interleave=none" to enums in `mlir.nvgpu` dialects.
- Enums have string equivalent that are used during the IR writing and
generation (see `TmaDescriptorBuilder::tensormap_descriptor_ty`).
- Improves readability and abstracts out TMA descriptor builders in
reusable component.
---------
Co-authored-by: Manish Gupta <manigupta@google.com>
The first argument to the nvvm_shfl_sync_* family
of intrinsics is the thread_mask (aka member_mask).
This patch renames the corresponding operand in the Op
to reflect the same i.e. `dst` -> `thread_mask`.
While we are there, add summary and description
for this Op.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This patch removes the LoopControl parsing/printing functions that are
no longer used after transitioning `omp.simdloop` and `omp.taskloop`
into loop wrapper operations.
This patch updates the definition of `omp.simdloop` to enforce the
restrictions of a wrapper operation. It has been renamed to `omp.simd`,
to better reflect the naming used in the spec. All uses of "simdloop" in
function names have been updated accordingly.
Some changes to Flang lowering and OpenMP to LLVM IR translation are
introduced to prevent the introduction of compilation/test failures. The
eventual long term solution might be different.
This commit fixes the following error when stopping the sparse compiler
pipeline after bufferization (e.g., with `test-analysis-only`):
```
LLVM ERROR: Building op `vector.print` but it isn't known in this MLIRContext: the dialect may not be loaded or this operation hasn't been added by the dialect. See also https://mlir.llvm.org/getting_started/Faq/#registered-loaded-dependent-whats-up-with-dialects-management
```
This commit adds `walk` method to PyOperationBase that uses a python
object as a callback, e.g. `op.walk(callback)`. Currently callback must
return a walk result explicitly.
We(SiFive) have implemented walk method with python in our internal
python tool for a while. However the overhead of python is expensive and
it didn't scale well for large MLIR files. Just replacing walk with this
version reduced the entire execution time of the tool by 30~40% and
there are a few configs that the tool takes several hours to finish so
this commit significantly improves tool performance.
…se of tensor pack
When the vector sizes are not passed as inputs to the vector transform
operation, the vector sizes are queried from the static result shape in
the case of tensor.pack op.
This patch fixes:
mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp:58:2: error: extra ';'
outside of a function is incompatible with C++98
[-Werror,-Wc++98-compat-extra-semi]
A `sparse_tensor.extract_space %tensor at %iterator` extracts a *sparse*
iteration space defined `%tensor`, the operation to traverse the
iteration space will be introduced in following PRs.
The existing lowering for tosa.max_pool2d only supports dynamic
dimensions when the dynamic dimension is the batch dimension. This
change updates the lowering to support arbitrary dynamic dimensions on
the inputs and outputs of the tosa.max_pool2d operation.
This change also fixes a bug in the implementation of implicit
broadcasting in the tosa-to-linalg pass, which was introducing uses of
constant ops that violated dominance requirements.
At the moment there is no support for vector.shuffle for scalable
vectors - various hooks/helpers related to `vector.shuffle` simply
ignore the scalable flags (e.g. ` ShuffleOp::inferReturnTypes`).
This is unlikely to change any time soon (vector shuffles are known to
be tricky for scalable vectors), hence this patch restricts
`vector.shuffle` to fixed width vectors.
This adds a simple rewrite/legalization to decompose constant splats
larger than a single ArmSME tile into multiple SME virtual tile sized
splats. E.g. a constant splat to `vector<[8]x[8]xi32>` would decompose
into four `vector<[4]x[4]xi32>` splats.