Commit Graph

18337 Commits

Author SHA1 Message Date
8ae6734b82 Remove initializeMergeUniformStoresPass 2025-08-22 17:59:32 +02:00
98bedfba35 Changes in code. 2025-08-22 16:58:07 +02:00
ed88b63592 Add get_coord formula for get_coord for 16 and 32 bit datatypes.
Platforms:
All

Keywords:
Feature

Related-to: GSD-11139
Resolves:
2025-08-22 12:31:50 +02:00
d85d0be961 2D block I/O for SIMD32
Expose SPIR-V API for 2D block load/store/prefetch for SIMD32 kernels.
Works for platforms with minimum subgroup-size=16.
2025-08-22 10:15:56 +02:00
Y
e679b1de8a Use Ray Query Return value in Compute Ray Tracing Extension
Modified intel_get_hit_candidate and intel_is_traversal_done functions.
2025-08-22 09:40:08 +02:00
6a6a8a0f22 Expose option to force linear scan RA via metadata
Expose option to force linear scan RA via metadata
2025-08-22 02:21:53 +02:00
7249d00150 Support width and pointer type agnostic loads/stores for private memory allocas in LowerGEPForPrivMem
The old handleStoreInst/loadEltsFromVecAlloca assume 1:1 lane mapping
and equal sizes between user value and the promoted vector element type.

This is insufficient for mixed widths (e.g. <4 x i8> and <... x i32>),
cross-lane accesses created by the new byte-offset GEP lowering, or
pointers under opaque pointers (bitcasts between pointers and
non-pointers are illegal).

With the changes:
1) Stores (handleStoreInst and storeEltsToVecAlloca) normalize the
   source (scalar or vector) to a single integer of NeedBits = N *
   DstBits using ptrtoint/bitcast, split the big integer into K = ceil(
   NeedBits / SrcBits) chunks, bitcast/inttoptr each chunk back to
   the promoted lane type and insert into K consecutive lanes starting
   at the scalarized index.
2) Loads (handleLoadInst and loadEltsFromVecAlloca) read K promoted
   lanes starting at the scalarized index, convert each lane to
   iSrcBits, pack into i(K*SrcBits), truncate to i(NeedBits), then
   expand to the requested scalar or <N x DstScalarTy>. Use inttoptr for
   pointer results.
   There is also still a simple (old) path. If SrcBits == DstBits, just
   emit extractelement with casts (if needed).

All paths do a single load of the promoted vector,
extractelement/insertelement, and in case of stores only a single store
back.

With these changes, the LLVM IR emitted from LowerGEPForPrivMem
will look different. Instead of using plain bitcasts, there are now
ptrtoint/inttoptr instructions and there is additional packing/splitting
logic. For the simple (old) load path, the new implementation should
essentially emit the same pattern (potnetially skipping bitcasts).

The additional integer/bitcast instruction sequences should be easily
foldable. Memory traffic is unchanged (still one vector load/store).
Overall register pressure should be similar, the pass still eliminates
GEPs and avoids private/scratch accesses.
2025-08-21 20:25:19 +02:00
9e80e069bb GenXStructSplitter opaque pointers fix
Handle the special opaque pointers case when a GEP user indexing into a
non-structure type because leading zero indices are omitted.
2025-08-21 19:23:19 +02:00
5e4a7b9e87 Drop preprocessor checks for unsupported LLVM versions (#23491)
Drop preprocessor checks for unsupported LLVM versions


Delete all #if LLVM_VERSION_MAJOR checks which were about LLVM versions older than 14.
2025-08-21 15:02:07 +02:00
47c1f6e944 Stub Vectorization for WAVEALL, CMP, SELECT enabled
Stub Vectorization for WAVEALL, CMP, SELECT enabled by default.
Dependency check window enlarged to 6 x number elements inside slice.
2025-08-20 13:38:22 +02:00
4f0123a7d6 Bump CMake version
Change minimum CMake version to 3.13.4
2025-08-20 12:40:33 +02:00
9ae78bef34 Apply rule-of-three
Applies rule-of-three by removing missing copy ctors.
2025-08-20 11:36:17 +02:00
6a2055eeeb Apply rule-of-three
Apply rule-of-three by explicitly deleting copy ctors.
2025-08-20 11:34:09 +02:00
0b5f9e11eb Add has_printf_calls to zeinfo
zeinfo now contains information if kernel/function has printf calls
and function pointer calls. This allows neo to create printf_buffer when
it is really used.
2025-08-20 10:30:06 +02:00
8ca10e3d2e Enable loop unrolling in retry for 3D
Enable loop unrolling in retry for 3D
2025-08-20 02:41:26 +02:00
aefd0097e6 Handle single-index GEPs into flat aggregates in SimplifyConstant
In opaque pointer mode, GEPs that index into globals often have a
different shape. SimplifyConstant pass assumed two-index GEPs (0, index)
and directly used the second operand as an element index. However, it is
possible to address flat aggregates using single-index GEPs.

See the two examples below from SYCL_CTS-math_builtin_float_double_1_ocl
run in typed and opaque pointer mode.

Two-index GEP example:
%130 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(2)* @__stgamma_ep_nofp64__ones, i64 0, i64 %129
%131 = bitcast i32 addrspace(2)* %130 to float addrspace(2)*
%132 = load float, float addrspace(2)* %131, align 4, !tbaa !5163, !noalias !5409

Single-index GEP example:
%103 = getelementptr inbounds float, ptr addrspace(2) @__stgamma_ep_nofp64__ones, i64 %102
%104 = load float, ptr addrspace(2) %103, align 4, !tbaa !5163, !noalias !5409

This patch changes the pass to always use the last GEP index as the
element selector. This works because the pass only transforms top-level
arrays of scalars/vectors. In these cases, the element being loaded is
always designated by the final GEP index (whether there are earlier
indices selecting the actual aggregate or single index in opaque pointer
mode).
2025-08-19 21:52:02 +02:00
215e971107 [Autobackout][FunctionalRegression]Revert of change: 6876fb54b2: Enable loop unrolling in retry
Enable loop unrolling in retry
2025-08-19 19:22:15 +02:00
38f1569e69 Revert of Remove redundant guard for a pattern of global imm offsets
Revert
2025-08-19 19:15:31 +02:00
e2c4ba8d76 Bitcast in StatelessToStateful pass
The fix prevents crash in StatelessToStateful pass
if all ptr usees are bitcast instructions.
2025-08-19 13:43:37 +02:00
a057740b8a [Autobackout][FunctionalRegression]Revert of change: 6072b2cdf4: _OS_SUMMARY
_OS_DESCRIPTION
2025-08-19 03:05:22 +02:00
6876fb54b2 Enable loop unrolling in retry
Enable loop unrolling in retry
2025-08-18 21:11:44 +02:00
df0baa89e6 Include chrono explicitly
Include chrono explicitly
2025-08-18 20:41:35 +02:00
95ac72d3b8 Fix remat inst handling in CodeScheduling test
Temporary fix for the remat inst handling in CodeScheduling LIT test
that allows indeterminism of the first two loads order
2025-08-18 18:42:00 +02:00
46f497d623 GenXPromoteArray opaque pointers fix
Do not rely on bitcasts when deciding whether an index adjustment is
necessary. In opaque pointers mode types can change between instructions
without bitcasts.
2025-08-18 14:13:46 +02:00
6072b2cdf4 _OS_SUMMARY
_OS_DESCRIPTION
2025-08-18 10:48:02 +02:00
9d9d6b3e5e enable ShortImplicitPayloadHeader on PVC
Compute workloads add following implicit arguments:
 * payloadHeader - 8 x i32 packing global_id_offset (3 x i32),
   local_size (3 x i32) and 2 x i32 reserved.
 * enqueued_local_size - 3 x i32
Most of the time only enqueued_local_size is used, leaving local_size
unnecessary. In the end, payloadHeader has unused 20 bytes.

This commit enables short payload header on PVC platform.
2025-08-18 09:07:53 +02:00
3c9eb3b099 [Autobackout][FunctionalRegression]Revert of change: bdd9b15ad7: Fix GEP lowering overflow issues
This change prevents usage of potentially
    negative values which are then zero-extended to
    64 bits as indexes.
v2.18.0
2025-08-16 02:57:21 +02:00
ceb9c26626 [Autobackout][FunctionalRegression]Revert of change: 76b5b50eb2: Only modify cr0 on debug SIP exit
Only modify cr0 on debug SIP exit
2025-08-16 00:29:20 +02:00
43da807c49 Changes in code. 2025-08-15 02:14:34 +02:00
587c7e9603 [Autobackout][FunctionalRegression]Revert of change: 882201b325: Use Ray Query Return value in Compute Ray Tracing Extension
Modified intel_get_hit_candidate and intel_is_traversal_done functions.
2025-08-15 01:01:14 +02:00
dcc6f77411 Fix issue in emit pattern with LVN matching for And
Fix issue with LVN matching for And when in SIMD32 with mad operation.
2025-08-15 00:20:33 +02:00
8eb1fe42bd Enable loop unroll but only for reducing code size during compilation retry
Enable loop unroll but only for reducing code size during compilation retry
2025-08-15 00:17:07 +02:00
d81684bd3f Fix the access bound check issue of src operand for madw instruction
For madw instruction, only the dst operand needs special handling in verifier and
src operand should be treated as other instructions.
2025-08-14 22:58:17 +02:00
cedf0f970b Parameterize UnrollMaxCountForAllocai in GenTTI
Parameterize UnrollMaxCountForAllocai in GenTTI
2025-08-14 20:17:15 +02:00
4c2e31a450 Fix the bug of verifying if an operand access exceeds the declared variable size for madw instruction
When verifying if an operand access exceeds the declared variable size, we should do special
handling for madw instruction as this instruction write both the low and high results to
GRFs.
2025-08-14 19:01:27 +02:00
941ba382ec Fix predicated store sub-DW value handling
This change addresses the handling of predicated
stores for sub-DW values with non-uniform stored values.
Predicate alone is not enough to calculate the correct
offset. So, we use `EMASK & Predicate` to determine the
correct offset.
2025-08-14 18:14:13 +02:00
f68235fad2 Bump MINOR to 18 2025-08-14 13:29:44 +02:00
6cad180e82 Add lit test for conversion from i64 to double
Add lit test for conversion from i64 to double
2025-08-14 12:46:40 +02:00
c442009f88 Bump MINOR to 17 2025-08-14 12:23:27 +02:00
Y
882201b325 Use Ray Query Return value in Compute Ray Tracing Extension
Modified intel_get_hit_candidate and intel_is_traversal_done functions.
2025-08-14 10:59:06 +02:00
e4d71856fa Change schedule priority according to dep type
For barrier dep, shouldn't use latency cycle to calcuate priority, because barrier is order issue, not the latecy issue. Use occupancy steady.
2025-08-14 08:09:25 +02:00
e9afb1822b Changes in code. 2025-08-14 00:04:07 +02:00
e8906d0679 Fix i8/opaque pointer byte offset GEP scalarization in PrivateMemoryResolution
When LLVM IR uses opaque pointers or inserts a bitcast to i8*, a
subsequent GEP is expressed in bytes. The legacy handleGEPInst always
scalarized indices by starting from pGEP->getSourceElementType(). After
the i8* cast, the type is i8, so the algorithm mistakenly treated the
byte index as a count of elements, producing misscaled (too large)
scalarized index.

Example:
%a = alloca [16 x [16 x float]], align 4
%b = bitcast [16 x [16 x float]]* %a to i8*
%c = getelementptr inbounds i8, i8* %b, i64 64

Here, 64 is a byte offset into the original aggregate. The old
implementation, seeing i8, scaled as if 64 elements, not 64 bytes.

Yet, the meaningful base of the GEP is alloca's aggregate type
[16 x [16 x float]] and the element-calculations should be based on this
type.

This change:
1. Introduces getFirstNonScalarSourceElementType(GEP), which
walks back from the GEP base through pointer casts to find a root
aggregate element type.
2. Adds additional handling in handleGEPInst, so that i8 GEP byte offset
is converted to an element index of the underlying base type.

This way the algorithm avoids basing element index scalarization on
incidental i8* and keeps index calculation aligned with the underlying
allocation layout.

For reference, in typed pointer mode (or without the bitcast), the GEP
would look like this:
%a = alloca [16 x [16 x float]], align 4
%c = getelementptr inbounds [16 x [16 x float]], [16 x [16 x float]]* %a, i64 0, i64 1

Here, %c is the pointer to the 2nd inner array [16 x float]*.
2025-08-13 22:53:48 +02:00
bdd9b15ad7 Fix GEP lowering overflow issues
This change prevents usage of potentially
negative values which are then zero-extended to
64 bits as indexes.
2025-08-13 20:41:06 +02:00
dcfe3f25db Change new inline raytracing setting
Change new inline raytracing setting
2025-08-13 17:49:52 +02:00
4458a3bfcc Stub vectorization for IGCVectorizer
Allow certain instructions to be "stub-vectorized"
New tests are added to cover for additional flexibitlity of
vectorization.
2025-08-13 14:54:45 +02:00
d19cdc5a52 Refactor ZEBinary flags and documentation
Refactored all conditions based on enableZEBinary() and supportsZEBin(), as if they were always true. Removed said conditions.
2025-08-13 09:05:48 +02:00
aafca7ed1b Improve spill threshold handling
Improve spill threshold handling in units of GRFs calculated from
byte input.
2025-08-12 23:08:27 +02:00
d3ca4a545c Add -vc-codegen option handling for VLD
.
2025-08-12 17:00:51 +02:00
b799e7c1f2 Add GenericCastToPtrOpt pass
In cases where we have no local casts to generics and we allocate
private memory in global space, we can replace GenericCastToPtrExplicit
with simple address space cast.
2025-08-12 15:45:04 +02:00