Removed DpasMacroBuilder::getSuppressionBlockCandidate. Now the dpas
macro is formed until a dpas is seen that cannot be in a macro, even
if there is no suppression opportunity, i.e. no sources are the same
within the macro. There is no performance drawback doing so. This also
aligns with vISA's dpas macro logic.
Removed DpasMacroBuilder::getSuppressionBlockCandidate. Now the dpas
macro is formed until a dpas is seen that cannot be in a macro, even
if there is no suppression opportunity, i.e. no sources are the same
within the macro. There is no performance drawback doing so. This also
aligns with vISA's dpas macro logic.
Prevent opencl-clang from automatically enabling extensions by undefining __SPIR__/__SPIRV__
macros. This way we only enable extensions that are passed to IGC from NEO, which are extensions
supported by the device we compile code for.
This change also enables the cl_khr_integer_dot_product extension in OpenCL C < 3.0.
- Pass string by const ref where makes sense.
- Construct + push_back -> emplace_back.
- Move NewOutputArgs instead of cpy.
- const auto to const auto ref where possible.
- dyn_cast to cast where certain cast won't fail.
By default disable InterpreterPatternMatching due to performance
reasons. Recompilation will be still triggered in some cases, but now
based on register pressure.
Prevent opencl-clang from automatically enabling extensions by undefining __SPIR__/__SPIRV__
macros. This way we only enable extensions that are passed to IGC from NEO, which are extensions
supported by the device we compile code for.
This change also enables the cl_khr_integer_dot_product extension in OpenCL C < 3.0.
Removed DpasMacroBuilder::getSuppressionBlockCandidate. Now the dpas
macro is formed until a dpas is seen that cannot be in a macro, even
if there is no suppression opportunity, i.e. no sources are the same
within the macro. There is no performance drawback doing so. This also
aligns with vISA's dpas macro logic.
DenseSet would be preferred since it is a more mature container type than SetVector.
Unfortunately, iteration over lifetime ends and lifetime edges should be deterministic and only SetVector guarantees that.
This commit changes STB_TranslateOutputArgs to use data structures
instead of pointers to arrays of char. This is done for the purpose of
preventing memory leaks.
For `pOutput` field, llvm::SmallVector is used, as it works with
ZEBinaryBuilder::getBinaryObject(llvm::raw_pwrite_stream).
When we handle annotations with opaque pointers, we can call only single getOperand()
on annotation struct, because we don't need to use e.g. bitcast instruction
like for typed pointers.
When `-cl-fast-relaxed-math` is enabled, IGC computes e^x using 2^x by
calculating 2^(x * log2(e)), where log2(e) is `M_LOG2E_F` (≈ 1.44269504).
For `exp(a * b)`, IGC transforms this to `exp2((a * b) * M_LOG2E_F)`.
The compiler must preserve the original multiplication order to avoid
overflow in critical cases.
Critical case: When `a` is large (e.g., `FLOAT_MAX`) and `b` is `0`:
- Correct order: `(a * b) * M_LOG2E_F` = `(FLOAT_MAX * 0) * M_LOG2E_F` = `0`
- Wrong order: `(a * M_LOG2E_F) * b` = `(FLOAT_MAX * M_LOG2E_F) * 0` = `INF * 0` = `NaN`
This change ensures that fast math flags are not applied to the
(x * M_LOG2E_F) multiplication in the exp builtin implementation,
preventing reordering optimization in `CustomUnsafeOptPass.cpp` that
could lead to incorrect results.
The multiplication by `M_LOG2E_F` now happens right before passing
the value to the math.exp instruction, preserving the original
multiplication order.
Prevent opencl-clang from automatically enabling extensions by undefining __SPIR__/__SPIRV__
macros. This way we only enable extensions that are passed to IGC from NEO, which are extensions
supported by the device we compile code for.
This change also enables the cl_khr_integer_dot_product extension in OpenCL C < 3.0.
When processing inline vISA, IGC checks that inputs match constraints.
Before this change, if a check failed, the compiler used to silently drop the
instruction or produce an assert, if the compiler was built in debug mode.
After this change, the compiler will throw an error regardless of the build type.
The replacement EATOMIC_IADD with EATOMIC_INC and EATOMIC_DEC seems to
have unexpected performance impact on some workload. Use AIL flag to
bail out the incompatible use case.
`SOALayoutChecker::visitBitCastInst()`
`SOALayoutChecker::visitBitCastInst()` assumed we're on typed pointers
and tried to get pointer types, triggering LLVM assert on opaque
pointers.