This update introduces two new remark emitting policies:
1. `RemarkEmittingPolicyAll`, which emits all remarks,
2. `RemarkEmittingPolicyFinal`, which only emits final remarks after
processing.
The `RemarkEngine` is modified to support these policies, allowing for
more flexible remark handling based on user configuration.
PR also adds flag to `mlir-opt`
```
--remark-policy=<value> - Specify the policy for remark output.
=all - Print all remarks
=final - Print final remarks
```
Relanding https://github.com/llvm/llvm-project/pull/160526
This PR requires RemarkEngine to be finalize manually. So here is usage:
```
MLIRContext ctx;
ctx.setRemarkEngine(...)
...
ctx.getRemarkEngine().shutdown() <-- PR adds this, it is required when the emission policy is final
```
This update introduces two new remark emitting policies:
1. `RemarkEmittingPolicyAll`, which emits all remarks,
2. `RemarkEmittingPolicyFinal`, which only emits final remarks after
processing.
The `RemarkEngine` is modified to support these policies, allowing for
more flexible remark handling based on user configuration.
PR also adds flag to `mlir-opt`
```
--remark-policy=<value> - Specify the policy for remark output.
=all - Print all remarks
=final - Print final remarks
```
`Operation::clone` does not clone the properties of unregistered ops.
This patch modifies the property initialization for unregistered ops to
initialize properties as attributes.
fixes#151640
---------
Signed-off-by: Boyana Norris <brnorris03@gmail.com>
This PR fixes a use-after-free error that happens when `DistinctAttr`
instances are created within a `PassManager` running with crash recovery
enabled. The root cause is that `DistinctAttr` storage is allocated in a
thread_local allocator, which is destroyed when the crash recovery
thread joins, invalidating the storage.
Moreover, even without crash reproduction disabling multithreading on
the context will destroy the context's thread pool, and in turn delete
the threadlocal storage. This means a call to
`ctx->disableMulthithreading()` breaks the IR.
This PR replaces the thread local allocator with a synchronised
allocator that's shared between threads. This persists the lifetime of
allocated DistinctAttr storage instances to the lifetime of the context.
### Problem Details:
The `DistinctAttributeAllocator` uses a
`ThreadLocalCache<BumpPtrAllocator>` for lock-free allocation of
`DistinctAttr` storage in a multithreaded context. The issue occurs when
a `PassManager` is run with crash recovery (`runWithCrashRecovery`), the
pass pipeline is executed on a temporary thread spawned by
`llvm::CrashRecoveryContext`. Any `DistinctAttr`s created during this
execution have their storage allocated in the thread_local cache of this
temporary thread. When the thread joins, the thread_local storage is
destroyed, freeing the `DistinctAttr`s' memory. If this attribute is
accessed later, e.g. when printing, it results in a use-after-free.
As mentioned previously, this is also seen after creating some
`DistinctAttr`s and then calling `ctx->disableMulthithreading()`.
### Solution
`DistinctAttrStorageAllocator` uses a synchronised, shared allocator
instead of one wrapped in a `ThreadLocalCache`. The former is what
stores the allocator in transient thread_local storage.
### Testing:
A C++ unit test has been added to validate this fix. (I was previously
reproducing this failure with `mlir-opt` but I can no longer do so and I
am unsure why.)
-----
Note: This is a 2nd attempt at my previous PR
https://github.com/llvm/llvm-project/pull/128566 that was reverted in
https://github.com/llvm/llvm-project/pull/133000. I believe I've
addressed the TSAN and race condition concerns.
- try_emplace(Key) is shorter than insert({Key, nullptr}).
- try_emplace performs value initialization without value parameters.
- We overwrite values on successful insertion anyway.
Currently, `DistinctAttr` uses an allocator wrapped in a
`ThreadLocalCache` to manage attribute storage allocations. This ensures
all allocations are freed when the allocator is destroyed.
However, this setup can cause use-after-free errors when
`mlir::PassManager` runs its passes on a separate thread as a result of
crash reproduction being enabled. Distinct attribute storages are
created in the child thread's local storage and freed once the thread
joins. Attempting to access these attributes after this can result in
segmentation faults, such as during printing or alias analysis.
Example: This invocation of `mlir-opt` demonstrates the segfault issue
due to distinct attributes being created in a child thread and their
storage being freed once the thread joins:
```
mlir-opt --mlir-pass-pipeline-crash-reproducer=. --test-distinct-attrs mlir/test/IR/test-builtin-distinct-attrs.mlir
```
This pull request changes the distinct attribute allocator to use
different allocators depending on whether or not threading is enabled
and whether or not the pass manager is running its passes in a separate
thread. If multithreading is disabled, a non thread-local allocator is
used. If threading remains enabled and the pass manager invokes its pass
pipelines in a child thread, then a non-thread local but synchronised
allocator is used. This ensures that the lifetime of allocated storage
persists beyond the lifetime of the child thread.
I have added two tests for the `-test-distinct-attrs` pass and the
`-enable-debug-info-on-llvm-scope` passes that run them with crash
reproduction enabled.
Remove builder API (e.g., `b.getFloat4E2M1FNType()`) and caching in
`MLIRContext` for low-precision FP types. Types are still cached in the
type uniquer.
For details, see:
https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361/28
Note for LLVM integration: Use `b.getType<Float4E2M1FNType>()` or
`Float4E2M1FNType::get(b.getContext())` instead of
`b.getFloat4E2M1FNType()`.
Currently we have `MLIRContext::getRegisteredOperations` which returns
all operations for the given context, with the addition of
`MLIRContext::getRegisteredOperationsByDialect` we can now retrieve the
same for a given dialect class.
Closes#111591
This PR adds `f8E8M0FNU` type to MLIR.
`f8E8M0FNU` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 8-bit floating point number with bit layout S0E8M0. Unlike
IEEE-754 types, there are no infinity, denormals, zeros or negative
values.
```c
f8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111
Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```
Related PRs:
- [PR-107127](https://github.com/llvm/llvm-project/pull/107127)
[APFloat] Add APFloat support for E8M0 type
- [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR]
Add f6E3M2FN type - was used as a template for this PR
- [PR-107999](https://github.com/llvm/llvm-project/pull/107999) [MLIR]
Add f6E2M3FN type
- [PR-108877](https://github.com/llvm/llvm-project/pull/108877) [MLIR]
Add f4E2M1FN type
This PR adds `f4E2M1FN` type to mlir.
`f4E2M1FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 4-bit floating point number with bit layout S1E2M1. Unlike
IEEE-754 types, there are no infinity or NaN values.
```c
f4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs
Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5
```
Related PRs:
- [PR-95392](https://github.com/llvm/llvm-project/pull/95392) [APFloat]
Add APFloat support for FP4 data type
- [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR]
Add f6E3M2FN type - was used as a template for this PR
- [PR-107999](https://github.com/llvm/llvm-project/pull/107999) [MLIR]
Add f6E2M3FN type
This PR adds `f6E2M3FN` type to mlir.
`f6E2M3FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 6-bit floating point number with bit layout S1E2M3. Unlike
IEEE-754 types, there are no infinity or NaN values.
```c
f6E2M3FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs
Additional details:
- Zeros (+/-): S.00.000
- Max normal number: S.11.111 = ±2^(2) x (1 + 0.875) = ±7.5
- Min normal number: S.01.000 = ±2^(0) = ±1.0
- Max subnormal number: S.00.111 = ±2^(0) x 0.875 = ±0.875
- Min subnormal number: S.00.001 = ±2^(0) x 0.125 = ±0.125
```
Related PRs:
- [PR-94735](https://github.com/llvm/llvm-project/pull/94735) [APFloat]
Add APFloat support for FP6 data types
- [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR]
Add f6E3M2FN type - was used as a template for this PR
This PR adds `f6E3M2FN` type to mlir.
`f6E3M2FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 6-bit floating point number with bit layout S1E3M2. Unlike
IEEE-754 types, there are no infinity or NaN values.
```c
f6E3M2FN
- Exponent bias: 3
- Maximum stored exponent value: 7 (binary 111)
- Maximum unbiased exponent value: 7 - 3 = 4
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs
Additional details:
- Zeros (+/-): S.000.00
- Max normal number: S.111.11 = ±2^(4) x (1 + 0.75) = ±28
- Min normal number: S.001.00 = ±2^(-2) = ±0.25
- Max subnormal number: S.000.11 = ±2^(-2) x 0.75 = ±0.1875
- Min subnormal number: S.000.01 = ±2^(-2) x 0.25 = ±0.0625
```
Related PRs:
- [PR-94735](https://github.com/llvm/llvm-project/pull/94735) [APFloat]
Add APFloat support for FP6 data types
- [PR-97118](https://github.com/llvm/llvm-project/pull/97118) [MLIR] Add
f8E4M3 type - was used as a template for this PR
This PR adds `f8E3M4` type to mlir.
`f8E3M4` type follows IEEE 754 convention
```c
f8E3M4 (IEEE 754)
- Exponent bias: 3
- Maximum stored exponent value: 6 (binary 110)
- Maximum unbiased exponent value: 6 - 3 = 3
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Precision specifies the total number of bits used for the significand (mantissa),
including implicit leading integer bit = 4 + 1 = 5
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs
Additional details:
- Max exp (unbiased): 3
- Min exp (unbiased): -2
- Infinities (+/-): S.111.0000
- Zeros (+/-): S.000.0000
- NaNs: S.111.{0,1}⁴ except S.111.0000
- Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5
- Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2)
- Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6)
- Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 = +/-2^(-2) x 2^(-4) = +/-2^(-6)
```
Related PRs:
- [PR-99698](https://github.com/llvm/llvm-project/pull/99698) [APFloat]
Add support for f8E3M4 IEEE 754 type
- [PR-97118](https://github.com/llvm/llvm-project/pull/97118) [MLIR] Add
f8E4M3 IEEE 754 type
This PR adds `f8E4M3` type to mlir.
`f8E4M3` type follows IEEE 754 convention
```c
f8E4M3 (IEEE 754)
- Exponent bias: 7
- Maximum stored exponent value: 14 (binary 1110)
- Maximum unbiased exponent value: 14 - 7 = 7
- Minimum stored exponent value: 1 (binary 0001)
- Minimum unbiased exponent value: 1 − 7 = −6
- Precision specifies the total number of bits used for the significand (mantisa),
including implicit leading integer bit = 3 + 1 = 4
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs
Additional details:
- Max exp (unbiased): 7
- Min exp (unbiased): -6
- Infinities (+/-): S.1111.000
- Zeros (+/-): S.0000.000
- NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111}
- Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240
- Min normal number: S.0001.000 = +/-2^(-6)
- Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7
- Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9)
```
Related PRs:
- [PR-97179](https://github.com/llvm/llvm-project/pull/97179) [APFloat]
Add support for f8E4M3 IEEE 754 type
This speeds up registered op creation by 10-11% by allowing lookup by
TypeID instead of StringRef.
This can break your build/tests at runtime with an error that you're creating
an unregistered operation that you have registered. If so you are likely using
a class inheriting from the "real" operation. See for example in this patch the
case of:
class ConstantIndexOp : public arith::ConstantOp {
If one is using `builder.create<ConstantIndexOp>()` they actually create an
`arith.constant` operation, but the builder will fetch the TypeID for
the `ConstantIndexOp` class which does not correspond to any registered
operation. To fix it the `ConstantIndexOp` class got this addition:
static ::mlir::TypeID resolveTypeID() { return TypeID::get<ConstantOp>(); }
The base class llvm::ThreadPoolInterface will be renamed
llvm::ThreadPool in a subsequent commit.
This is a breaking change: clients who use to create a ThreadPool must
now create a DefaultThreadPool instead.
This patch expose the type and attribute names in C++ as methods in the
`AbstractType` and `AbstractAttribute` classes, and keep a map of names
to `AbstractType` and `AbstractAttribute` in the `MLIRContext`. Type and
attribute names should be unique.
It adds support in ODS to generate the `getName` methods in
`AbstractType` and `AbstractAttribute`, through the use of two new
variables, `typeName` and `attrName`. It also adds names to C++-defined
type and attributes.
This is a follow-up to 8c2bff1ab9 which lazy-initialized the
diagnostic and removed the need to dynamically abandon() an
InFlightDiagnostic. This further simplifies the code to not needed to
return a reference to an InFlightDiagnostic and instead eagerly emit
errors.
Also use `emitError` as name instead of `getDiag` which seems more
explicit and in-line with the common usage.
Add capabilities for comparing opaque properties. This is useful when
dealing with arbitrary operations which can be compare based on their
OperationName. Now you can furthermore compare their properties without
the need to determine their actual type.
A distinct attribute associates a referenced attribute with a unique
identifier. Every call to its create function allocates a new
distinct attribute instance. The address of the attribute instance
temporarily serves as its unique identifier. Similar to the names
of SSA values, the final unique identifiers are generated during
pretty printing.
Examples:
#distinct = distinct[0]<42.0 : f32>
#distinct1 = distinct[1]<42.0 : f32>
#distinct2 = distinct[2]<array<i32: 10, 42>>
This mechanism is meant to generate attributes with a unique
identifier, which can be used to mark groups of operations
that share a common properties such as if they are aliasing.
The design of the distinct attribute ensures minimal memory
footprint per distinct attribute since it only contains a reference
to another attribute. All distinct attributes are stored outside of
the storage uniquer in a thread local store that is part of the
context. It uses one bump pointer allocator per thread to ensure
distinct attributes can be created in-parallel.
Reviewed By: rriddle, Dinistro, zero9178
Differential Revision: https://reviews.llvm.org/D153360
applyExtensions can load further dialects, invalidating the reference to
the dialect pointer in the dialects DenseMap. Capture the pointer to
prevent that from happening.
Currently, the dialects precede the registered operations in the context object, which means that the latter is destroyed first. At the same time, Operation::~Operation dereferences the registered operation when destroying properties, which can cause use-after-free (e.g. if a dialect owns an op). This patch fixes that by changing the order of the members so that dialects come after registered operations.
Differential Revision: https://reviews.llvm.org/D151440
This new features enabled to dedicate custom storage inline within operations.
This storage can be used as an alternative to attributes to store data that is
specific to an operation. Attribute can also be stored inside the properties
storage if desired, but any kind of data can be present as well. This offers
a way to store and mutate data without uniquing in the Context like Attribute.
See the OpPropertiesTest.cpp for an example where a struct with a
std::vector<> is attached to an operation and mutated in-place:
struct TestProperties {
int a = -1;
float b = -1.;
std::vector<int64_t> array = {-33};
};
More complex scheme (including reference-counting) are also possible.
The only constraint to enable storing a C++ object as "properties" on an
operation is to implement three functions:
- convert from the candidate object to an Attribute
- convert from the Attribute to the candidate object
- hash the object
Optional the parsing and printing can also be customized with 2 extra
functions.
A new options is introduced to ODS to allow dialects to specify:
let usePropertiesForAttributes = 1;
When set to true, the inherent attributes for all the ops in this dialect
will be using properties instead of being stored alongside discardable
attributes.
The TestDialect showcases this feature.
Another change is that we introduce new APIs on the Operation class
to access separately the inherent attributes from the discardable ones.
We envision deprecating and removing the `getAttr()`, `getAttrsDictionary()`,
and other similar method which don't make the distinction explicit, leading
to an entirely separate namespace for discardable attributes.
Recommit d572cd1b06 after fixing python bindings build.
Differential Revision: https://reviews.llvm.org/D141742
This new features enabled to dedicate custom storage inline within operations.
This storage can be used as an alternative to attributes to store data that is
specific to an operation. Attribute can also be stored inside the properties
storage if desired, but any kind of data can be present as well. This offers
a way to store and mutate data without uniquing in the Context like Attribute.
See the OpPropertiesTest.cpp for an example where a struct with a
std::vector<> is attached to an operation and mutated in-place:
struct TestProperties {
int a = -1;
float b = -1.;
std::vector<int64_t> array = {-33};
};
More complex scheme (including reference-counting) are also possible.
The only constraint to enable storing a C++ object as "properties" on an
operation is to implement three functions:
- convert from the candidate object to an Attribute
- convert from the Attribute to the candidate object
- hash the object
Optional the parsing and printing can also be customized with 2 extra
functions.
A new options is introduced to ODS to allow dialects to specify:
let usePropertiesForAttributes = 1;
When set to true, the inherent attributes for all the ops in this dialect
will be using properties instead of being stored alongside discardable
attributes.
The TestDialect showcases this feature.
Another change is that we introduce new APIs on the Operation class
to access separately the inherent attributes from the discardable ones.
We envision deprecating and removing the `getAttr()`, `getAttrsDictionary()`,
and other similar method which don't make the distinction explicit, leading
to an entirely separate namespace for discardable attributes.
Differential Revision: https://reviews.llvm.org/D141742
The way dependent dialects are implemented is by recursively calling
loadDialect in the constructor. This means we have to reload from the
dialect table because the constructor might have rehashed that table.
The steps for loading a dialect are
1. Insert a nullptr into loadedDialects. This indicates the dialect is
loading
2. Call ctor(). This recursively loads dependent dialects
3. Insert the new dialect into the table.
We had a conflict between steps 2 and 3 here. You have to be extremely
unlucky though as rehashing is rare and operator[] does no generation
checking on DenseMap. Changing that to an iterator would've uncovered
this issue immediately.
X. Sun et al. (https://dl.acm.org/doi/10.5555/3454287.3454728) published
a paper showing that an FP format with 4 bits of exponent, 3 bits of
significand and an exponent bias of 11 would work quite well for ML
applications.
Google hardware supports a variant of this format where 0x80 is used to
represent NaN, as in the Float8E4M3FNUZ format. Just like the
Float8E4M3FNUZ format, this format does not support -0 and values which
would map to it will become +0.
This format is proposed for inclusion in OpenXLA's StableHLO dialect: https://github.com/openxla/stablehlo/pull/1308
As part of inclusion in that dialect, APFloat needs to know how to
handle this format.
Differential Revision: https://reviews.llvm.org/D146441
The concept of the ActionManager acts as a sort of "Hub" that can receive
various types of action and dispatch them to a set of registered handlers.
One handler will handle the action or it'll cascade to other handlers.
This model does not really fit the current evolution of the Action tracing
and debugging: we can't foresee a good case where this behavior compose with
the use-case behind the handlers. Instead we simplify it with a single
callback installed on the Context.
Differential Revision: https://reviews.llvm.org/D144811
This is a preparation for adding support for more infrastructure around the concept
of Action and make tracing Action more of a first class concept.
The doc will be updated later in a subsequent revision after the changes are
completed.
Action belongs to IR because of circular dependency: Actions are dispatched through
the MLIRContext but Action will learn to encapsulate IR construct.
Differential Revision: https://reviews.llvm.org/D144809
Float8E5M2FNUZ and Float8E4M3FNUZ have been added to APFloat in D141863.
This change adds these types as MLIR builtin types alongside Float8E5M2
and Float8E4M3FN (added in D133823 and D138075).
Reviewed By: krzysz00
Differential Revision: https://reviews.llvm.org/D143744
This allows for interfaces to define a set of "base classes",
which are interfaces whose methods/extra class decls/etc.
should be inherited by the derived interface. This more
easily enables combining interfaces and their dependencies,
without lots of awkard casting. Additional implicit conversion
operators also greatly simplify the conversion process.
One other aspect of this "inheritance" is that we also implicitly
add the base interfaces to the attr/op/type. The user can still
add them manually if desired, but this should help remove some
of the boiler plate when an interface has dependencies.
See https://discourse.llvm.org/t/interface-inheritance-and-dependencies-interface-method-visibility-interface-composition
Differential Revision: https://reviews.llvm.org/D140198
This streamlines the implementation and makes it so that the virtual
tables are in the binary instead of dynamically assembled during initialization.
The dynamic allocation size of op registration is also smaller with this
change.
This reverts commit 7bf1e441da
and re-introduce e055aad5ff
after fixing the windows crash by making ParseAssemblyFn a
unique_function again
Differential Revision: https://reviews.llvm.org/D141492