Add new pass which propagates null pointers accross address space casts
and remove no longer needed Generic Pointers Comparision Pattern Match.
This change is needed to fix bug, where sometimes comparison between
generic pointers returns incorrect results.
The PR adds another pattern that detects packing 4 i8 values into
32-bit scalar. Detected pattern is packing values clamped to [0, 127]
and packed using `shl` and `or` instructions:
Re-apply multiple fixes
c8f734d Fix a warning issue related to the overloaded struct
77fb673 GEP canon is on only for OCL
8d12a0f avoid y*(1/x) for double precision type
fbf1aa9 [IGC VC] GenXVerify pass, initial
0ae6dfb SetMaxRegForThreadDispatch was hardcoded for up to 128 GRFs.
932eafa Minor update to DisableRecompilation regkey description
bc3034f Fix bugs and update the LIT test for linear scan RA
45f1295 Enable GRF read delay of send stall instructions
7c95f49 [Autobackout][FuncReg]Revert of change: 3f0c186620c74c Fix a few register regioning issues for 64b instructions on MTL platform
SIMD shuffle down intrinsic takes "current" and "next" values and
combines them into 2N variable (where N is SIMD size) to deal with
OOB lanes when shuffling. The moves to initialize this temporary
variable are materialized in emit vISA pass, thus when multiple
shuffle intrinsic calls have the identical source operands, we end
up with multiple temp variables and redundant moves.
This change adds per basic block caching of temporary variable, so
multiple shuffle in the same BB can share common source.
SIMD shuffle down intrinsic takes "current" and "next" values and
combines them into 2N variable (where N is SIMD size) to deal with
OOB lanes when shuffling. The moves to initialize this temporary
variable are materialized in emit vISA pass, thus when multiple
shuffle intrinsic calls have the identical source operands, we end
up with multiple temp variables and redundant moves.
This change adds per basic block caching of temporary variable, so
multiple shuffle in the same BB can share common source.
When shader has early 'return' under non-uniform control flow, all
further calculations are skipped for selected SIMD lanes. On SIMD lanes
there might be sample instruction which depends on uninitialized
coordinates. This is undefined behaviour according to spec. This WA
initializes such lanes to zero to avoid corrupions.
Before change: sample sources calculations were marked as 'subspanUse'
and noMask was applied. Exception - noMask was not applied if sample
result was a source to another sample, Vmask needed was set instead.
After change: sample sources are still marked as 'subspanUse', but they
list is also stored separately - subspanUse is a larger set than only
sample sources calculations. Additionaly, list of sample sources under
control flow is created. Vmask is still requested if sample result is
used as a souce for another sample.
Execution mask policy - if shader is executed with dispatch mask (Vmask
not requested), old behaviour is preserved - noMasks are applied. If
Vmask is requested, noMask is only applied for sample sources under
control flow - this is for covering common application missuses of
samples.
Change sample sources execution mask
Before change: sample sources calculations were marked as 'subspanUse'
and noMask was applied. Exception - noMask was not applied if sample
result was a source to another sample, Vmask needed was set instead.
After change: sample sources are still marked as 'subspanUse', but they
list is also stored separately - subspanUse is a larger set than only
sample sources calculations. Additionaly, list of sample sources under
control flow is created. Vmask is still requested if sample result is
used as a souce for another sample.
Execution mask policy - if shader is executed with dispatch mask (Vmask
not requested), old behaviour is preserved - noMasks are applied. If
Vmask is requested, noMask is only applied for sample sources under
control flow - this is for covering common application missuses of
samples.
Before change: sample sources calculations were marked as 'subspanUse'
and noMask was applied. Exception - noMask was not applied if sample
result was a source to another sample, Vmask needed was set instead.
After change: sample sources are still marked as 'subspanUse', but they
list is also stored separately - subspanUse is a larger set than only
sample sources calculations. Additionaly, list of sample sources under
control flow is created. Vmask is still requested if sample result is
used as a souce for another sample.
Execution mask policy - if shader is executed with dispatch mask (Vmask
not requested), old behaviour is preserved - noMasks are applied. If
Vmask is requested, noMask is only applied for sample sources under
control flow - this is for covering common application missuses of
samples.
Before change: sample sources calculations were marked as 'subspanUse'
and noMask was applied. Exception - noMask was not applied if sample
result was a source to another sample, Vmask needed was set instead.
After change: sample sources are still marked as 'subspanUse', but they
list is also stored separately - subspanUse is a larger set than only
sample sources calculations. Additionaly, list of sample sources under
control flow is created. Vmask is still requested if sample result is
used as a souce for another sample.
Execution mask policy - if shader is executed with dispatch mask (Vmask
not requested), old behaviour is preserved - noMasks are applied. If
Vmask is requested, noMask is only applied for sample sources under
control flow - this is for covering common application missuses of
samples.
As insertvalue chain is coalesced in dessa, emit part needs
to be changed accordingly.
1. Fixed emit errors related to partially-shared insertvalue.
If src0 and dst are different visa variable, need to copy
src0 to dst first.
2. no need to have patterns for insertvalue/extractvalue.
With this, insertvalue/extractvalue on struct of primitive
types should work as expected.
remove dead instructions with debug info enabled.
This change removes IGC backend pattern match dependency setting on arguments to debug instructions.
call void @llvm.dbg.value(metadata float %8, metadata !903, metadata !DIExpression()), !dbg !901 ; visa id: 20
This forces generation of instruction producing %8. However, if that instruction is not used, it will be generated in dead code.
For example,
%8 = load float, float addrspace(1)* %7, align 4, !dbg !902 ; visa id: 18
...
%simdShuffle = call float @llvm.genx.GenISA.WaveShuffleIndex.f32(float %8, i32 0, i32 0), !dbg !981 ; visa id: 23
...
call void @llvm.dbg.value(metadata float %simdShuffle, metadata !904, metadata !DIExpression()), !dbg !901 ; visa id: 26
...
%9 = fadd fast float %simdShuffle, %simdShuffle.1, !dbg !982 ; visa id: 30
The pattern matcher will link out %simdShuffle and directly use %8 (regioning),
However, llvm.dbg.value creates a false dependency on %simdShuffle and causes the shuffle
to emit a dead broadcast mov in vISA.
This change wouldn't impact in -O0 debug mode, and just impact -O2 debug mode.
This change removes IGC backend pattern match dependency setting on arguments to debug instructions.
call void @llvm.dbg.value(metadata float %8, metadata !903, metadata !DIExpression()), !dbg !901 ; visa id: 20
This forces generation of instruction producing %8. However, if that instruction is not used, it will be generated in dead code.
For example,
%8 = load float, float addrspace(1)* %7, align 4, !dbg !902 ; visa id: 18
...
%simdShuffle = call float @llvm.genx.GenISA.WaveShuffleIndex.f32(float %8, i32 0, i32 0), !dbg !981 ; visa id: 23
...
call void @llvm.dbg.value(metadata float %simdShuffle, metadata !904, metadata !DIExpression()), !dbg !901 ; visa id: 26
...
%9 = fadd fast float %simdShuffle, %simdShuffle.1, !dbg !982 ; visa id: 30
The pattern matcher will link out %simdShuffle and directly use %8 (regioning),
However, llvm.dbg.value creates a false dependency on %simdShuffle and causes the shuffle
to emit a dead broadcast mov in vISA.
This change wouldn't impact in -O0 debug mode, and just impact -O2 debug mode.
Refactoring the code to caching num of active lanes of
the entire dispatch for reuse within a BB.
With this, GetNumActiveLanes() will be read-only
and cached for reuse within a BB.
Refactoring by caching emask
Refactoring the code to caching execMask and num of active lanes of
the entire dispatch for reuse within a BB.
With this, GetExecutionMask() and GetNumActiveLanes() will be read-only
and cached for reuse within a BB.
Refactoring the code to caching execMask and num of active lanes of
the entire dispatch for reuse within a BB.
With this, GetExecutionMask() and GetNumActiveLanes() will be read-only
and cached for reuse within a BB.
For scalar atomic (add/sub/inc/dec) without return and with uniform addend,
a more efficient code sequence will be used. For example,
"atomic_add (16|M0) p, 1" will be:
emask = current emask
numBits = numOfOne(emask);
(W) atomic_add (1|M0) p, numBits
We basically save numBits for reuses within the same BB.
Improve scalar atomic add/sub
For scalar atomic (add/sub/inc/dec) without return and with uniform addend,
a more efficient code sequence will be used. For example,
"atomic_add (16|M0) p, 1" will be:
emask = current emask
numBits = numOfOne(emask);
(W) atomic_add (1|M0) p, numBits
We basically save numBits for reuses within the same BB.
For scalar atomic (add/sub/inc/dec) without return and with uniform addend,
a more efficient code sequence will be used. For example,
"atomic_add (16|M0) p, 1" will be:
emask = current emask
numBits = numOfOne(emask);
(W) atomic_add (1|M0) p, numBits
We basically save numBits for reuses within the same BB.
This change introduces clearing tag bits before generic pointers
comparison. It is required since some NULL generic pointers may
have tag set and some may not.
Added helper lane mode argument to wave intrinsics. Much like with
GenISA_WaveShuffleIndex, this argument denotes that helper lanes
should be active for this instruction when its value is 1.
Add helper lane mode to wave intrinsics
Added helper lane mode argument to wave intrinsics. Much like with
GenISA_WaveShuffleIndex, this argument denotes that helper lanes
should be active for this instruction when its value is 1.
Added helper lane mode argument to wave intrinsics. Much like with
GenISA_WaveShuffleIndex, this argument denotes that helper lanes
should be active for this instruction when its value is 1.