18653 Commits

Author SHA1 Message Date
db7015317a ADD, MUL, SUB i32 instructions added to IGCVectorizer
ADD, MUL, SUB i32 instructions added to IGCVectorizer
2025-10-22 11:53:09 +02:00
89c7117387 Prevent fast math flag propagation to __spirv_ocl_native_exp builtin implementation
This is a follow-up after 5f3b2b4c5a.
`__spirv_ocl_native_exp` has the same issue as `__spirv_ocl_exp`
2025-10-22 10:56:29 +02:00
94f2cb27d9 Fix OCL atomic benchmark regression
More investigation is needed. Currently, we don't apply atomic_iadd to
atomic_inc/dec optimization to OCL cases.
2025-10-21 21:16:48 +02:00
f8934ec463 Minor indent fix
Minor indent fix
2025-10-21 20:13:12 +02:00
bc27ff2baa Fix the bug in the destination register alignment checking
for pre-RA ACC sub

If the register is pre assgined, it's not a candidate.
2025-10-21 19:01:10 +02:00
c982af7201 GenXVectorDecomposer: Fix iterative dead code elimination
Fixes vector decomposition correctness issues where some phi
parts were being eliminated despite being necessary for proper
vector reconstruction.
2025-10-21 17:06:28 +02:00
59e367e548 Update spill threshold logic 2025-10-21 16:51:17 +02:00
857fb62d05 Fix problem in split barrier
Fixed the problem in split barrier when we are using with regular barrier.
Case:
splitbarrier.signal()
regularbarrier()
splitbarrier.wait()

was causing the hang due assigning the same ID of the barrier in the regular barrier and split barrier.
Now, the split barrier will take other ID than the regular one.
2025-10-21 15:59:22 +02:00
2a53b762fa Turn 2 asserts into warnings - try 2
Turn NumDebugCUs == 1 and llvm.dbg.declare count asserts into warnings
2025-10-21 15:57:53 +02:00
68eb7029ba Fix CodeScheduling in case of DPAS in different BB
- Fix CodeScheduling incorrect behavior in case of DPAS and load are in
different BBs
- Fix RematChainsAnalysis incorrect behavior in some cases with selects
2025-10-21 15:55:39 +02:00
2a0dedc2ba Rematerialize runtime_value intrinsics
This change is to rematerialize `runtime_value` instructions.
2025-10-21 15:51:46 +02:00
849ca205c0 Remove legacy SPIRV Translator macros usages pt. 2
Remove usages of legacy SPIRV Translator macros from ```IGC/BiFModule/Implementation/IMF/FP32```
2025-10-21 15:47:45 +02:00
aa69cad230 Remove legacy SPIRV Translator macros usages pt. 3
Remove usages of legacy SPIRV Translator macros from ```IGC/BiFModule/Implementation/IMF/FP64```
2025-10-21 15:42:47 +02:00
61b9e70ce2 Remove legacy SPIRV Translator macros pt. 1
Remove usages of legacy SPIRV Translator macros from ```IGC/BiFModule/Languages/```
2025-10-21 10:41:02 +02:00
941be5f779 Lower loads using PHI instructions
Lowers loads using PHI instructions to incoming blocks to avoid
uncessary address space casts.
2025-10-21 09:41:28 +02:00
dd10c47fca Skip SOA promotion for variable non-promoted type GEPs
Extends the bail out in `SOALayoutChecker::MismatchDetected` to skip not only i8-based
GEPs with non-constant byte indices, but all GEPs with non-constant byte
indices that don't operate on the alloca promoted type.

For example:
```llvm
%8 = alloca [1024 x i8], align 1

memcpy.body:
  %46 = getelementptr <8 x i32>, ptr addrspace(1) %45, i64 %pIV.0
  %47 = getelementptr <8 x i32>, ptr %8, i64 %pIV.0
  %48 = load <8 x i32>, ptr addrspace(1) %46, align 1
  store <8 x i32> %48, ptr %47, align 1
  %49 = add i64 %pIV.0, 1
```
Here the alloca element is `i8`, but element of GEP and store is i32.
On typed pointers, the alloca ptr had to be bitcasted
from *i8 to *i32, and the optimization was disabled because
`SOALayoutChecker::visitBitCastInst` detected the mismatch in size.
This is not possible to check on opaque pointers, so it wasn't detected as
a mismatch, triggering asserts and leading to miscompilations.
2025-10-21 07:14:03 +02:00
05d91c3547 Changes in code. 2025-10-20 23:36:13 +02:00
a0b0c172bc Use SmallSetVector instead of a SetVector in AllocationLivenessAnalyzer
Moving SetVector introduced a performance hit that was detected by blender, this change should mitigate some of it.
2025-10-20 21:36:06 +02:00
16724f2d5c Add an option to construct UnorderedMap from key/value arrays
This constructor makes it easy to deserialize the unordered maps from a shader cache.
2025-10-20 21:16:29 +02:00
a75e33d72e Add canCachePartialWrites to platform methods
Add canCachePartialWrites to platform methods
2025-10-20 20:36:08 +02:00
82c7c0a11b IGA SWSB: Refactor dpas macro builder (3rd try)
Removed DpasMacroBuilder::getSuppressionBlockCandidate. Now the dpas
macro is formed until a dpas is seen that cannot be in a macro, even
if there is no suppression opportunity, i.e. no sources are the same
within the macro. There is no performance drawback doing so. This also
aligns with vISA's dpas macro logic.
2025-10-20 19:59:24 +02:00
ac0e22751c Fix replacing memset intrinsic for opaque pointers
For typed pointers we can get a type from more complex type
like e.g. struct by using GetBaseType(), but for opaque pointers
we also need to be able to deduce this type, so we can get this
from investigating GEP instruction and then using GetBaseType().
2025-10-20 18:04:51 +02:00
f1afce832f Revert of change: 96b26e6: Add a new constructor to UnorderedMap class
Revert of change: 96b26e6: Add a new constructor to UnorderedMap class
Reapplies the revert as there was a desync after commit
60ef89439c which accidentaly re-added the
constructor after the previous revert.
2025-10-20 12:35:33 +02:00
60ef89439c [Autobackout][FunctionalRegression]Revert of change: 81333302cf: Rematerialize runtime_value intrinsics
This change is to rematerialize `runtime_value` instructions.
2025-10-18 19:28:13 +02:00
cdef50501c [Autobackout][FunctionalRegression]Revert of change: 96b26e630c: Add a new constructor to UnorderedMap class
Add a new constructor to UnorderedMap class to allow for creation from key and value arrays.
2025-10-18 06:45:27 +02:00
7278948f62 [Autobackout][FunctionalRegression]Revert of change: 448d9eda16: Update pre-RA scheduling for ACC
Scheduling according to the ACC number when 95% insts are ACC candidates
2025-10-18 06:19:22 +02:00
c7b3c6f482 Remove convergent on gradient
Remove convergent on gradient
2025-10-18 03:57:56 +02:00
6eff9e99d7 Sampler opaque ptr readiness
This change is part of the effort to support opaque pointers in next llvm versions.
2025-10-18 00:12:50 +02:00
0e95445170 Extend TargetExtTy retyping to instructions and structs
The LLVM IR that IGC receives from the LLVM 16-based SPIR-V Reader
contains OpenCL/SPIR-V builtins represented as TargetExtTy types.
Unfortunately, Clang 16 does not emit TargetExtTy and hence the modules
coming from Clang and SPIR-V Reader are not compatible and cannot be
linked together. The solution/workaround is to retype TargetExtTy types
as pointers of correct address space. This approach works since the
mangling/OpenCL builtin call resolution is already done by the SPIR-V
Reader and IGC does not need to work on TargetExtTy types directly.
Such retyping also ensures that all the current pointer-based
optimizations continue to work.

This patch extends the retyping beyond just function arguments and
return types. It now also retypes TargetExtTy used in:
- local variables (alloca instructions)
- loads and stores of TargetExtTy values
- struct types containing TargetExtTy fields
- function attributes (byval, sret, byref)
The retyping is done in a single pass over the module.
2025-10-17 19:39:07 +02:00
1dcb701feb Changes in code. 2025-10-17 19:34:47 +02:00
d9ea8a2e3c Add a new constructor to UnorderedMap class
Add a new constructor to UnorderedMap class to allow for creation from key and value arrays.
2025-10-17 19:11:17 +02:00
0253ed6ecb Fix SEGFAULT when function returns struct by-val in input SPIRV
LegalizeFunctionSignature Pass used to make an assumption that SPIRV Front End
always converts function returning structs by value to reference,
which isn’t guaranteed, since SPIR-V Spec allows returning structs by value.

This problem got fixed in this PR.
2025-10-17 13:56:58 +02:00
4bd6b70328 Disable legacy 2d load scheduling in newer platforms
Disable legacy 2d load scheduling in CodeLoopSinking pass in newer platforms
2025-10-17 12:20:15 +02:00
e9078798d5 Changes around channel pruning.
Code improvements.
2025-10-17 12:15:53 +02:00
81333302cf Rematerialize runtime_value intrinsics
This change is to rematerialize `runtime_value` instructions.
2025-10-17 11:37:40 +02:00
b485ba6f1d transform Y*(X+1) into X*Y+Y
Adds new transformation pass from integer op `Y*(X+1)` to `X*Y+Y`. On
platforms with integer MAD this will be translated into single
instruction.
2025-10-17 10:22:55 +02:00
448d9eda16 Update pre-RA scheduling for ACC
Scheduling according to the ACC number when 95% insts are ACC candidates
2025-10-17 00:16:48 +02:00
96b26e630c Add a new constructor to UnorderedMap class
Add a new constructor to UnorderedMap class to allow for creation from key and value arrays.
2025-10-16 22:55:52 +02:00
8a8cf790df Fix bug in stack allignment
Cast alignment values to uint64_t before bitwise NOT to generate
proper 64-bit alignment masks (0xFFFFFFFFFFFFFFF8) instead of
truncated 32-bit values (0xFFFFFFF8) in stack pointer operations.
2025-10-16 22:52:25 +02:00
c4f288e2f4 [LLVM16][StatelessToStateful] DeterminePointerAlignment algorithm fix
The DeterminePointerAlignment algorithm is analyzing alignment of load/store instructions.

Before this fix, it was walking over all stores/loads and picking the highest alignment.

The problem with this approach was that some loads/stores were used basing on control flow,
which in practice meant that some loads/stores were never used. And we accidentally could use their alignment, which caused mismatch.

Such case occured here, in the "__devicelib_memcpy", which utilized various memcpy strategies basing on sizes

ee397f94cf/libdevice/fallback-cstring.cpp (L47)

```
22:                                               ; preds = %19
	%23 = getelementptr inbounds i8, ptr addrspace(4) %1, i64 %20
	%24 = load i8, ptr addrspace(4) %23, align 1                     here
	%25 = getelementptr inbounds i8, ptr addrspace(4) %0, i64 %20
	store i8 %24, ptr addrspace(4) %25, align 1                      here
	%26 = add nuw i64 %20, 1, !spirv.Decorations !15
	br label %19

27:                                               ; preds = %13
	%28 = icmp eq i64 %16, 0
	br i1 %28, label %29, label %58

29:                                               ; preds = %27
	%30 = and i64 %2, 3
	%31 = lshr i64 %2, 2
	br label %32

32:                                               ; preds = %35, %29
	%33 = phi i64 [ 0, %29 ], [ %41, %35 ]
	%34 = icmp ult i64 %33, %31
	br i1 %34, label %35, label %42

35:                                               ; preds = %32
	%36 = bitcast ptr addrspace(4) %1 to ptr addrspace(4)
	%37 = getelementptr inbounds i32, ptr addrspace(4) %36, i64 %33
	%38 = load i32, ptr addrspace(4) %37, align 4                     here
	%39 = bitcast ptr addrspace(4) %0 to ptr addrspace(4)
	%40 = getelementptr inbounds i32, ptr addrspace(4) %39, i64 %33
	store i32 %38, ptr addrspace(4) %40, align 4                      here
	%41 = add nuw nsw i64 %33, 1, !spirv.Decorations !17
	br label %32
```
2025-10-16 21:17:53 +02:00
d1b702c3ef Update scheduling heuristics for SIMD32 DPAS kernels
Update scheduling heuristics for SIMD32 DPAS kernels
2025-10-16 17:45:55 +02:00
b30ba07f02 Revert of change: Serialize and deserialize Shader Unordered Map for cache state
Revert of change: Serialize and deserialize Shader Unordered Map for cache state
2025-10-16 17:17:04 +02:00
ad4662733c [Autobackout][FunctionalRegression]Revert of change: 02e8acfbc8: Enable BCR for kernels with low register pressure
Adding IGCRegisterPressurePublisher pass to make register pressure
    estimation available in CISABuilder.
    Enable BCR for kernels with low register pressure.
v2.22.0
2025-10-16 15:59:49 +02:00
23e01bc9f2 [Autobackout][FunctionalRegression]Revert of change: ecb7315c86: Fix DIStringType length emitting in DWARF
For some cases, there wasn't DW_AT_string_length added to
    variable, which resulted in treating vla array as character.
2025-10-16 11:33:56 +02:00
40d882b1ac Changes in code. 2025-10-16 07:16:44 +02:00
bf94c04263 [Autobackout][FunctionalRegression]Revert of change: 0fe2acfbb4: src2 acc support
For single precision float instruction only
2025-10-16 04:30:12 +02:00
8955b7820a [Autobackout][FunctionalRegression]Revert of change: b3e1d9a27b: IGA SWSB: Refactor dpas macro builder
Removed DpasMacroBuilder::getSuppressionBlockCandidate. Now the dpas
    macro is formed until a dpas is seen that cannot be in a macro, even
    if there is no suppression opportunity, i.e. no sources are the same
    within the macro. There is no performance drawback doing so. This also
    aligns with vISA's dpas macro logic.
2025-10-16 03:07:45 +02:00
e31e178065 [Autobackout][FunctionalRegression]Revert of change: a20b78cbf7: Rematerialize runtime_value intrinsics
This change is to rematerialize `runtime_value` instructions.
2025-10-15 21:32:02 +02:00
b54358e688 Improve code around MCSOptimization.
Code improvements.
2025-10-15 18:35:56 +02:00
02e8acfbc8 Enable BCR for kernels with low register pressure
Adding IGCRegisterPressurePublisher pass to make register pressure
estimation available in CISABuilder.
Enable BCR for kernels with low register pressure.
2025-10-15 16:46:06 +02:00