Clang 16 still lowers OpenCL/SPIR-V built-ins as ptr to opaque structs,
while SPIR-V Reader uses TargetExtTy values. This patch extends the
retyping function to also retype the return types of function (builtins)
declarations. Please note that the builtin function resolution is already
done earlier by SPIR-V Reader.
Move the PromoteToPredicatedMemoryAccess pass to
the optimization stage of the compiler.
This allows to keep standard LLVM passes to optimize the IR before
the predication pass is applied.
Change the pass to support scalarized loads and stores.
Add a new pass to hoist conversion operations
to the common dominator to unblock the predication pass.
Fix generation of predicated stores, in case address is uniform
and stored value is not.
This change is to fix issues with pattern matching of the pointer extraction in `InsertBranchOpt`. This rematerializes pointer and coordinates of typed access insts.
Co-authored-by: Andrzejewski, Krystian <krystian.andrzejewski@intel.com>
Cleaned up dead code that's related to patch token binary format deprecation. Removed unused code, adjusted some comments.
Most of these changes are related to previous commits that deprecated the format in VC and OCL.
Some parts are still to be refactored, this doesn't cover all patch token code.
When adding Opaque Pointers support to JointMatrix I've found that 4 test were failing due to this assert:
info: error, assertion failed: bits == elementSize
file: Source\IGC\Compiler\Optimizer\OpenCLPasses\PrivateMemory\PrivateMemoryResolution.cpp
function: TransposeHelperPrivateMem::handleLoadInst
line: 665
Failed Tests (4):
SYCL :: Matrix/SG32/joint_matrix_bf16_fill_k_cache_unroll.cpp
SYCL :: Matrix/SG32/joint_matrix_bf16_fill_k_cache_unroll_init.cpp
SYCL :: Matrix/joint_matrix_bf16_fill_k_cache_unroll.cpp
SYCL :: Matrix/joint_matrix_bf16_fill_k_cache_unroll_init.cpp
My investigation showed that such resolution path:
alloca -> gep -> load
used invalid vector elements count value, which caused this assert to fail.
To my understanding the reason for this was that we used elementSize saved in "TransposeHelperPrivateMem" instance,
But when we were going thru instructions (alloca->gep->load) then they weren't updated, so there was mismatch.
Changes:
* UseNewInlineRaytracing is now a mask that lets user selectively enable new inline raytracing for particular shader type
* New regkey AddDummySlotsForNewInlineRaytracing forces increased number of slots required for rayqueries to test if UMD allocated the HW stacks necessary
Fixed problem in split barrier when we are using with regular barrier.
Case:
splitbarrier.signal()
regularbarrier()
splitbarrier.wait()
was causing the hang due assingning the same ID of the barrier in the regular barrier and split barrier.
Now, the split barrier will take other ID than the regular one.
Fixed problem in split barrier when we are using with regular barrier.
Case:
splitbarrier.signal()
regularbarrier()
splitbarrier.wait()
was causing the hang due assigning the same ID of the barrier in the regular barrier and split barrier.
Now, the split barrier will take other ID than the regular one.
Removed deprecated logic after fully deprecating the patch token format in OCL.
The removed code patches offsets based on the m_PatchLaterDataVector, which is always empty, as it was used on the patch token code path.
When emitting zeinfo IGC tags addr mode of images with no users as
stateful even if the module is compiled to use bindless images. This
caused NEO to throw an error as it disallows the use of both bindless
and bindful mode in the same module.
This commit sets the default addr mode to bindless for modules that have
UseBindlessImage set to true.