Commit Graph

18372 Commits

Author SHA1 Message Date
1380ea29db Refresh workaround files
Refreshes workaround-related files.
2025-09-01 15:48:29 +02:00
d69a75f632 [Autobackout][FunctionalRegression]Revert of change: a2dbe990ca: Replace Atomic Fence with GenISA_source_value
Replace Atomic Fence with GenISA_source_value
2025-09-01 13:44:04 +02:00
cf2dc92ae5 Changes in code. 2025-09-01 13:03:23 +02:00
46629d9b5f [Autobackout][FunctionalRegression]Revert of change: 5bffd05743: Modify Integer MAD Pattern Matching
Modify Integer MAD pattern matching to catch more cases.
2025-08-30 08:28:14 +02:00
a2dbe990ca Replace Atomic Fence with GenISA_source_value
Replace Atomic Fence with GenISA_source_value
2025-08-30 07:21:57 +02:00
5bffd05743 Modify Integer MAD Pattern Matching
Modify Integer MAD pattern matching to catch more cases.
2025-08-30 04:02:24 +02:00
5c1f4b67f2 Change new inline raytracing regkey default value to 12
Change new inline raytracing regkey default value to 12
2025-08-29 22:12:52 +02:00
a060a9a901 Prevent from a double erasion of instructions in HoistCongruentPHI
This change is to avoid exception related to double erasion of instructions in `HoistCongruentPHI`.
2025-08-29 18:40:22 +02:00
82a7975dea Support zero incoming values in MergeScalarPhisPass.
Add support to MergeScalarPhisPass for vectorizing phi instructions
with constant zero inсoming values.
2025-08-29 17:56:52 +02:00
6cdf4a7232 Workaround for prefetch_cache_control failing due to unknown type size on opaque pointers
Currently `handleCacheControlINTELForPrefetch` requires type size to perform call conversion correctly,
but on opaque pointers such size is not available and we cannot just extract it anyhow.

We're waiting for "OpUntypedPrefetch" extension which will support opaque pointers in such cases.

Meanwhile as of today I'm adding skip of the whole prefetch conversion when opaque pointer type is involved
and we're waiting for update about the status of "OpUntypedPrefetch"
2025-08-29 13:21:43 +02:00
a22afc6f41 Disable CodeScheduling for older platforms
Disable CodeScheduling for older platforms
2025-08-29 11:23:10 +02:00
75b46bbc7d Don't swap src operands if the swapping causes invalid datetype combination for mad instruction
Don't swap src0 and src1 of pseudo_mad instruction in HWConformity if the swapping
causes invalid datatype combination. For example:
pseudo_mad (32)  result1(0,0)<1>:d  x1(0,0)<2;0>:uw  r0.1<0;0>:d  z(0,0)<1;0>:d

In this case, we swap src0(actually src2) and src1 if src1 is scalar but src0 is
not, as src0(actually src2) has no regioning support:
pseudo_mad (32)  result1(0,0)<1>:d  r0.1<0;0>:d  x1(0,0)<2;0>:uw  z(0,0)<1;0>:d

After swapping, the datatype combination is invalid as it changes the datatype
combination from (W * D + D) to (D * W + D). If src2(actually src0) is D, HW only
supports (W * D + D).

Then we wouldn't generate mad, and we generate mul+add instead. But without this
swapping, actually we can generate mad as src0(actually src2) is aligned to dst.
2025-08-29 03:22:04 +02:00
74c3bcce15 Implement cross block load vectorization for inline raytracing
We are not having performance parity with the old implementation. One of the reasons is suboptimal loading from rtstack.
This change should coalesce loads for trivial rayquery usages
2025-08-29 00:45:05 +02:00
3275c8a2a4 Handle selects and memsets in alloca tracking
Handle selects and memsets in alloca tracking
2025-08-28 21:29:37 +02:00
cca2a9fe60 IGCVectorizer now supports I32 PHI
IGCVectorizer now supports I32 Phi instructions.
2025-08-28 16:00:49 +02:00
0998b31acf Include additional headers in dev package
Include additional headers in dev package
2025-08-28 15:27:28 +02:00
137cd1df57 Fix constant folding in PredefinedConstantResolving
PredefinedConstantResolving pass caused type mismatch assertion
in tests while moving to opaque pointers. It happened when there was a
type difference between a global variable and it's load instruction
user. With typed pointers the pass was skipped in this scenario because
user of same global was bitcast and then it's user was load. What this
pass tried to do was doing RAUW operation on load to replace it with
global constant. This fix changes pass's behaviour by enabling constant
folding even when there is a type difference between load instruction
and global constant.

Example crashing ir:
```llvm
 @global = constant [3 x i64] [i64 16, i64 32, i64 64]
 define void @func(i64 %0) {
  %2 = load i64, ptr @global ; <-- crash
  ret void
 }
```
2025-08-28 14:02:41 +02:00
f7a18fd28f add Code Scheduling LIT for SIMD32
Add new LIT for Code Scheduling to test SIMD32 kernels.
2025-08-28 11:49:50 +02:00
7682d93a20 Remove null dereference
Remove null return value dereference
2025-08-28 09:30:11 +02:00
8b32f60b27 [Autobackout][FunctionalRegression]Revert of change: 7b8b6da4df: Eliminate samples with the same texture coordinates
Adding ReassociatePass before GVN pass ensures the second sample gets
    eliminated.
2025-08-28 07:33:23 +02:00
7b8b6da4df Eliminate samples with the same texture coordinates
Adding ReassociatePass before GVN pass ensures the second sample gets
eliminated.
2025-08-28 01:03:46 +02:00
c24729eddc Add Pre-RA register pressure stat 2025-08-27 18:09:24 +02:00
2fb53dfcba Use const ref. Initialize members.
In RematChainsAnalysis.hpp pass arguments using const reference to
avoid copy construction. Initialize pointer members to nullptr
to avoid uninitialized memory usage.
2025-08-27 17:51:05 +02:00
a29088c5d7 ConstCoalescing SEXT/ZEXT Fix
Fix bug where pass treated zero-extended values as sign-extended values in add instruction
2025-08-27 16:35:10 +02:00
d9eda715d4 Enable CodeScheduling on the 1st try
Enable CodeScheduling on the 1st try (not only on the recompilation)
2025-08-26 20:22:34 +02:00
b717c7c181 fix test
igc_opt returns non-zero on failed assert
2025-08-26 15:56:46 +02:00
46c11fc759 Rematerialization pass now supports CMP instructions
Rematerialization pass now supports CMP instructions
2025-08-26 14:42:06 +02:00
391a1da977 [LLVM 16] Fixing ResolveOCLRaytracingBuiltins by creating "struct.intel_ray_query_opaque_t" that's not present in generated BiF .bc file
When importing built-in types, the type called "struct.intel_ray_query_opaque_t" was properly imported on typed-pointers mode as:

```
%struct.intel_ray_query_opaque_t = type opaque
```

but on opaque types mode it was not present in the generated BiF .bc file, thus it was not imported.
It caused ResolveOCLRaytracingBuiltins pass to fail because it relied on having  when creating Alloca.

```
auto *allocaType = IGCLLVM::getTypeByName(callInst.getModule(), "struct.intel_ray_query_opaque_t");
auto *alloca = m_builder->CreateAlloca(allocaType);
```

This patch adds workaround for it by creating such type when it is not present.
2025-08-26 13:14:12 +02:00
ea5c192f8c Enable Code Scheduling on recompilation
Enable Code Scheduling on recompilation
2025-08-26 12:41:53 +02:00
68d6ad7d08 CodeScheduling improvements
CodeScheduling improvements to ensure better register pressure handling
- Support handling of the remated instructions that are used by select
  (not memop)
- Various heuristics added to handle situations with small (splitted)
  loads
- Heuristic to populate the same vector added
2025-08-26 12:36:13 +02:00
7fb2b0b28f Revert DepWindow setting inside igc-vectorizer back
Revert DepWindow setting inside igc-vectorizer back
2025-08-25 19:45:11 +02:00
09e26cf081 Revert LoopUnrollMaxPercentThresholdBoostForHighRegPressure
to 400

Revert LoopUnrollMaxPercentThresholdBoostForHighRegPressure to 400
2025-08-25 19:18:51 +02:00
7591da236d Adding WillReturn attribute to intrinsics with memory(read)
Adding `WillReturn` attributes to all intrinsics with `Ref` memory_effects
to solve performance regressions from LLVM 16 transition.
2025-08-25 12:14:00 +02:00
c1d34755f1 Support aggregate with bools promotion in functions
This PR enables support of structs and arrays with bools in function
arguments and return types.
2025-08-22 21:16:25 +02:00
cd5c825d4d Use alloca size instead number of element as the cutoff for promoting loop unrolling
Use alloca size instead number of element as the cutoff for promoting loop unrolling
2025-08-22 20:57:12 +02:00
8ae6734b82 Remove initializeMergeUniformStoresPass 2025-08-22 17:59:32 +02:00
98bedfba35 Changes in code. 2025-08-22 16:58:07 +02:00
ed88b63592 Add get_coord formula for get_coord for 16 and 32 bit datatypes.
Platforms:
All

Keywords:
Feature

Related-to: GSD-11139
Resolves:
2025-08-22 12:31:50 +02:00
d85d0be961 2D block I/O for SIMD32
Expose SPIR-V API for 2D block load/store/prefetch for SIMD32 kernels.
Works for platforms with minimum subgroup-size=16.
2025-08-22 10:15:56 +02:00
Y
e679b1de8a Use Ray Query Return value in Compute Ray Tracing Extension
Modified intel_get_hit_candidate and intel_is_traversal_done functions.
2025-08-22 09:40:08 +02:00
6a6a8a0f22 Expose option to force linear scan RA via metadata
Expose option to force linear scan RA via metadata
2025-08-22 02:21:53 +02:00
7249d00150 Support width and pointer type agnostic loads/stores for private memory allocas in LowerGEPForPrivMem
The old handleStoreInst/loadEltsFromVecAlloca assume 1:1 lane mapping
and equal sizes between user value and the promoted vector element type.

This is insufficient for mixed widths (e.g. <4 x i8> and <... x i32>),
cross-lane accesses created by the new byte-offset GEP lowering, or
pointers under opaque pointers (bitcasts between pointers and
non-pointers are illegal).

With the changes:
1) Stores (handleStoreInst and storeEltsToVecAlloca) normalize the
   source (scalar or vector) to a single integer of NeedBits = N *
   DstBits using ptrtoint/bitcast, split the big integer into K = ceil(
   NeedBits / SrcBits) chunks, bitcast/inttoptr each chunk back to
   the promoted lane type and insert into K consecutive lanes starting
   at the scalarized index.
2) Loads (handleLoadInst and loadEltsFromVecAlloca) read K promoted
   lanes starting at the scalarized index, convert each lane to
   iSrcBits, pack into i(K*SrcBits), truncate to i(NeedBits), then
   expand to the requested scalar or <N x DstScalarTy>. Use inttoptr for
   pointer results.
   There is also still a simple (old) path. If SrcBits == DstBits, just
   emit extractelement with casts (if needed).

All paths do a single load of the promoted vector,
extractelement/insertelement, and in case of stores only a single store
back.

With these changes, the LLVM IR emitted from LowerGEPForPrivMem
will look different. Instead of using plain bitcasts, there are now
ptrtoint/inttoptr instructions and there is additional packing/splitting
logic. For the simple (old) load path, the new implementation should
essentially emit the same pattern (potnetially skipping bitcasts).

The additional integer/bitcast instruction sequences should be easily
foldable. Memory traffic is unchanged (still one vector load/store).
Overall register pressure should be similar, the pass still eliminates
GEPs and avoids private/scratch accesses.
2025-08-21 20:25:19 +02:00
9e80e069bb GenXStructSplitter opaque pointers fix
Handle the special opaque pointers case when a GEP user indexing into a
non-structure type because leading zero indices are omitted.
2025-08-21 19:23:19 +02:00
5e4a7b9e87 Drop preprocessor checks for unsupported LLVM versions (#23491)
Drop preprocessor checks for unsupported LLVM versions


Delete all #if LLVM_VERSION_MAJOR checks which were about LLVM versions older than 14.
2025-08-21 15:02:07 +02:00
47c1f6e944 Stub Vectorization for WAVEALL, CMP, SELECT enabled
Stub Vectorization for WAVEALL, CMP, SELECT enabled by default.
Dependency check window enlarged to 6 x number elements inside slice.
2025-08-20 13:38:22 +02:00
4f0123a7d6 Bump CMake version
Change minimum CMake version to 3.13.4
2025-08-20 12:40:33 +02:00
9ae78bef34 Apply rule-of-three
Applies rule-of-three by removing missing copy ctors.
2025-08-20 11:36:17 +02:00
6a2055eeeb Apply rule-of-three
Apply rule-of-three by explicitly deleting copy ctors.
2025-08-20 11:34:09 +02:00
0b5f9e11eb Add has_printf_calls to zeinfo
zeinfo now contains information if kernel/function has printf calls
and function pointer calls. This allows neo to create printf_buffer when
it is really used.
2025-08-20 10:30:06 +02:00
8ca10e3d2e Enable loop unrolling in retry for 3D
Enable loop unrolling in retry for 3D
2025-08-20 02:41:26 +02:00