In opaque pointer mode, GEPs that index into globals often have a
different shape. SimplifyConstant pass assumed two-index GEPs (0, index)
and directly used the second operand as an element index. However, it is
possible to address flat aggregates using single-index GEPs.
See the two examples below from SYCL_CTS-math_builtin_float_double_1_ocl
run in typed and opaque pointer mode.
Two-index GEP example:
%130 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(2)* @__stgamma_ep_nofp64__ones, i64 0, i64 %129
%131 = bitcast i32 addrspace(2)* %130 to float addrspace(2)*
%132 = load float, float addrspace(2)* %131, align 4, !tbaa !5163, !noalias !5409
Single-index GEP example:
%103 = getelementptr inbounds float, ptr addrspace(2) @__stgamma_ep_nofp64__ones, i64 %102
%104 = load float, ptr addrspace(2) %103, align 4, !tbaa !5163, !noalias !5409
This patch changes the pass to always use the last GEP index as the
element selector. This works because the pass only transforms top-level
arrays of scalars/vectors. In these cases, the element being loaded is
always designated by the final GEP index (whether there are earlier
indices selecting the actual aggregate or single index in opaque pointer
mode).
Do not rely on bitcasts when deciding whether an index adjustment is
necessary. In opaque pointers mode types can change between instructions
without bitcasts.
Compute workloads add following implicit arguments:
* payloadHeader - 8 x i32 packing global_id_offset (3 x i32),
local_size (3 x i32) and 2 x i32 reserved.
* enqueued_local_size - 3 x i32
Most of the time only enqueued_local_size is used, leaving local_size
unnecessary. In the end, payloadHeader has unused 20 bytes.
This commit enables short payload header on PVC platform.
When verifying if an operand access exceeds the declared variable size, we should do special
handling for madw instruction as this instruction write both the low and high results to
GRFs.
This change addresses the handling of predicated
stores for sub-DW values with non-uniform stored values.
Predicate alone is not enough to calculate the correct
offset. So, we use `EMASK & Predicate` to determine the
correct offset.
When LLVM IR uses opaque pointers or inserts a bitcast to i8*, a
subsequent GEP is expressed in bytes. The legacy handleGEPInst always
scalarized indices by starting from pGEP->getSourceElementType(). After
the i8* cast, the type is i8, so the algorithm mistakenly treated the
byte index as a count of elements, producing misscaled (too large)
scalarized index.
Example:
%a = alloca [16 x [16 x float]], align 4
%b = bitcast [16 x [16 x float]]* %a to i8*
%c = getelementptr inbounds i8, i8* %b, i64 64
Here, 64 is a byte offset into the original aggregate. The old
implementation, seeing i8, scaled as if 64 elements, not 64 bytes.
Yet, the meaningful base of the GEP is alloca's aggregate type
[16 x [16 x float]] and the element-calculations should be based on this
type.
This change:
1. Introduces getFirstNonScalarSourceElementType(GEP), which
walks back from the GEP base through pointer casts to find a root
aggregate element type.
2. Adds additional handling in handleGEPInst, so that i8 GEP byte offset
is converted to an element index of the underlying base type.
This way the algorithm avoids basing element index scalarization on
incidental i8* and keeps index calculation aligned with the underlying
allocation layout.
For reference, in typed pointer mode (or without the bitcast), the GEP
would look like this:
%a = alloca [16 x [16 x float]], align 4
%c = getelementptr inbounds [16 x [16 x float]], [16 x [16 x float]]* %a, i64 0, i64 1
Here, %c is the pointer to the 2nd inner array [16 x float]*.
In cases where we have no local casts to generics and we allocate
private memory in global space, we can replace GenericCastToPtrExplicit
with simple address space cast.
This PR introduces test suite for the OpReadClockKHR SPIR-V instruction, ensuring proper compilation and intrinsic generation across different scenarios.
CloneAddressArithmetic marks rematted instructions with metadata
Use the metadata in RematChainsAnalysis pass to mark the patterns that
are safe to consider in the scheduling.
Use the estimation of the target instructions (because it's usually a
load) in the RegisterPressureTracker of the scheduling and schedule the
remat chain as a whole.
Introduce llvm patch that builds upon commit:
88da019977
Original commit diagnosed an issue in the legacy inliner and claimed
to fix it but the change was non-functional and only added
a debug mode assert.
This patch modifies it to mitigate the problem in the cases where
the assert would happen.