Previously DispatchGRFStartRegister was used as base, but it points at
the start of 'constants+input' section of the payload.
GetTotalURBReadLength() refers only to inputs, so constants were missed
and GetMaxRegForThreadDispatch() didn't have the full picture.
When 2D block load reads data that have multiple blocks
and one block, which size is calculated by:
```
block size = block width * block height * elem_size (bytes)
```
is less than device GRF size, hardware need whole GRF
anyway and reads each block into new GRF, zeropadding
rest of it. Later instructions expect this data to be
continously in following GRFs, so data needs to be moved.
Lowers loads using PHI instructions to incoming blocks to avoid
uncessary address space casts only in case there are generic pointers to
local or private memory.
In opaque pointers mode struct elements can be addressed by byte offset
instead of indices. This case is not supported yet, so simply do not
split such structures.
Add new pass which propagates null pointers accross address space casts
and remove no longer needed Generic Pointers Comparision Pattern Match.
This change is needed to fix bug, where sometimes comparison between
generic pointers returns incorrect results.
Fixed the problem in split barrier when we are using with regular barrier.
Case:
splitbarrier.signal()
regularbarrier()
splitbarrier.wait()
was causing the hang due assigning the same ID of the barrier in the regular barrier and split barrier.
Now, the split barrier will take other ID than the regular one.
This change rewrites the TargetExtTy retyper to use the
ValueMapTypeRemapper infrastructure, significantly improving the overall
design and maintainability of the code. The change also removes unused
cases added for additional safety if earlier retyping logic fails.
Two additional test cases are added, covering more complex retyping
scenarios.
Split on spill pass splits live-interval of variables that are
live-in/live-out of a loop and are used inside the loop. Splitting such
a spilled variable reduces RA constraints on the split variable, making
it possible to allocate a register for that variable in the loop.
This split must be done only when the variable is live-in to the loop or
live-out of the loop and is defined in the loop. Latter condition is
because in case a variable is written in the loop, it's also spilled to
home location at loop exit.
Removed Ubuntu 20.04 support from build scripts.
Updated `add-apt-repository` section to current versions.
Unified buildSLT.sh to versions in buildIGC.sh.
Switch builds to use LLVM 16. Updated the documentation to treat LLVM 16 as default.
Refreshed parts of buildIGC.sh regarding supported versions. Fixed a bug when setting a variable in buildIGC.sh to a default value.
Force enabled exceptions for VC. This is a workaround while we're investigating why they're disabled.
OpenCL represents built-in variables like `get_global_id` with generic
type `size_t`, which translates to i64. This change adds a new
optimization that simplifies built-in calculation to i32 if built-in's
use has an assumption hinting that value fits in i32 range.
The original logic may failed if two macro are build in adjacent
scheduling, the candidate instruction may depends on first block but not
second block. As a result, it cannot be added into second block.
When the `SOALayoutChecker::visitBitCastInst()` encountered a ptr bitcast
on opaque pointers, we just used to skip it. With this change, we checks users of
the bitcast, like it's done on typed pointers (just without ptr
type checks).
Such bitcasted ptr `%p` would be skipped without properly handling it.
```llvm
%arr = alloca [32 x float], align 4
%p = bitcast ptr %arr to ptr
```
---------------------------
Fixed the problem in split barrier when we are using with regular barrier.
Case:
splitbarrier.signal()
regularbarrier()
splitbarrier.wait()
was causing the hang due assigning the same ID of the barrier in the regular barrier and split barrier.
Now, the split barrier will take other ID than the regular one.
Emit a compilation error when non-kernel functions exceed the hardware-supported
scratch space limit instead of silently dropping the kernel from the final
binary. This issue was observed on O0 compilation path.