Fixed the problem in split barrier when we are using with regular barrier.
Case:
splitbarrier.signal()
regularbarrier()
splitbarrier.wait()
was causing the hang due assigning the same ID of the barrier in the regular barrier and split barrier.
Now, the split barrier will take other ID than the regular one.
- Fix CodeScheduling incorrect behavior in case of DPAS and load are in
different BBs
- Fix RematChainsAnalysis incorrect behavior in some cases with selects
Extends the bail out in `SOALayoutChecker::MismatchDetected` to skip not only i8-based
GEPs with non-constant byte indices, but all GEPs with non-constant byte
indices that don't operate on the alloca promoted type.
For example:
```llvm
%8 = alloca [1024 x i8], align 1
memcpy.body:
%46 = getelementptr <8 x i32>, ptr addrspace(1) %45, i64 %pIV.0
%47 = getelementptr <8 x i32>, ptr %8, i64 %pIV.0
%48 = load <8 x i32>, ptr addrspace(1) %46, align 1
store <8 x i32> %48, ptr %47, align 1
%49 = add i64 %pIV.0, 1
```
Here the alloca element is `i8`, but element of GEP and store is i32.
On typed pointers, the alloca ptr had to be bitcasted
from *i8 to *i32, and the optimization was disabled because
`SOALayoutChecker::visitBitCastInst` detected the mismatch in size.
This is not possible to check on opaque pointers, so it wasn't detected as
a mismatch, triggering asserts and leading to miscompilations.
Removed DpasMacroBuilder::getSuppressionBlockCandidate. Now the dpas
macro is formed until a dpas is seen that cannot be in a macro, even
if there is no suppression opportunity, i.e. no sources are the same
within the macro. There is no performance drawback doing so. This also
aligns with vISA's dpas macro logic.
For typed pointers we can get a type from more complex type
like e.g. struct by using GetBaseType(), but for opaque pointers
we also need to be able to deduce this type, so we can get this
from investigating GEP instruction and then using GetBaseType().
Revert of change: 96b26e6: Add a new constructor to UnorderedMap class
Reapplies the revert as there was a desync after commit
60ef89439c which accidentaly re-added the
constructor after the previous revert.
The LLVM IR that IGC receives from the LLVM 16-based SPIR-V Reader
contains OpenCL/SPIR-V builtins represented as TargetExtTy types.
Unfortunately, Clang 16 does not emit TargetExtTy and hence the modules
coming from Clang and SPIR-V Reader are not compatible and cannot be
linked together. The solution/workaround is to retype TargetExtTy types
as pointers of correct address space. This approach works since the
mangling/OpenCL builtin call resolution is already done by the SPIR-V
Reader and IGC does not need to work on TargetExtTy types directly.
Such retyping also ensures that all the current pointer-based
optimizations continue to work.
This patch extends the retyping beyond just function arguments and
return types. It now also retypes TargetExtTy used in:
- local variables (alloca instructions)
- loads and stores of TargetExtTy values
- struct types containing TargetExtTy fields
- function attributes (byval, sret, byref)
The retyping is done in a single pass over the module.
LegalizeFunctionSignature Pass used to make an assumption that SPIRV Front End
always converts function returning structs by value to reference,
which isn’t guaranteed, since SPIR-V Spec allows returning structs by value.
This problem got fixed in this PR.
Cast alignment values to uint64_t before bitwise NOT to generate
proper 64-bit alignment masks (0xFFFFFFFFFFFFFFF8) instead of
truncated 32-bit values (0xFFFFFFF8) in stack pointer operations.
Adding IGCRegisterPressurePublisher pass to make register pressure
estimation available in CISABuilder.
Enable BCR for kernels with low register pressure.
Removed DpasMacroBuilder::getSuppressionBlockCandidate. Now the dpas
macro is formed until a dpas is seen that cannot be in a macro, even
if there is no suppression opportunity, i.e. no sources are the same
within the macro. There is no performance drawback doing so. This also
aligns with vISA's dpas macro logic.
Adding IGCRegisterPressurePublisher pass to make register pressure
estimation available in CISABuilder.
Enable BCR for kernels with low register pressure.