- Add caching for register pressure estimation, real uses computation
and values size
- Implement fragmentation-aware register pressure adjustment heuristic for large loads
- Add new heuristic for prioritizing loads that unlock DPAS instructions
- Fix initial register pressure estimation for hoisted loads and corresponding IEs in BBIn
- Fix ftobf regpressure estimation
- Some changes of the whole scheduling workflow to take advantage of the
backtracking
- Add new heuristic to put instructions between the load and the subsequent shuffling to hide latency
Block2d load's return size per block is multiple of GRFs. If the actual
returned data per block is not multiple of GRFs, its size is rounded up
to the next whole GRF with unused GRF storage filled with zero.
Previously, some workloads used implicit int types in function
definitions. With LLVM 15, implicit ints are treated as errors.
The workaround to disable this error has now been removed, so IGC
enforces the same behavior as LLVM 15 and treats implicit ints
as errors.
Block2d load's return size per block is multiple of GRFs. If the actual
returned data per block is not multiple of GRFs, its size is rounded up
to the next whole GRF with unused GRF storage filled with zero.
If inline asm's operands are aliases, the current code generate a copy
if the operand is input; and does not handle aliased output operand.
When using copy, it is a little tricky whether to use NoMask or not,
especially for output operands. In addition, using inline asm is most
likely for performance and additional copies should be avoided as much
as possible.
This change fixes output alias operands and also removes copies by
generating visa alias decl with non-zero offset.
Previously, using `intel_reqd_sub_group_size(32)` on DG2 resulted in two
redundant SIMD32 call instructions being generated in vISA, which could
lead to unexpected issues. This change ensures that only a single SIMD32
call instruction is generated. All function arguments and return values
are now correctly passed using two SIMD16 instructions, eliminating
redundancy and improving
GRFs in fail safe
In fail safe RA, we reserve some number of GRFs to guarantee RA
termination. When GRFs are reserved, we must also reduce number of
available colors when determining color ordering.
On platforms with default cache policy set to L1 and L3 cached
such as DG2 or BMG volatile instructions are also cached. Since
CUDA doesn't cache volatile pointers, there is a code that is
not supported by Intel GPU, as caching volatile can lead to hangs.
On platforms with default cache policy set to L1 and L3 cached
such as DG2 or BMG volatile instructions are also cached. Since
CUDA doesn't cache volatile pointers, there is a code that is
not supported by Intel GPU, as caching volatile can lead to hangs.
In LegalizeFunctionSignatures don't call `getFunction()` which
returns parent function. Add support for llvm15+ which works
with opaque pointers and a legacy llvm 14 path.
In PromoteBools:
- Call `getType()` on load instruction - calling `getType()` on src
returns an opaque pointer.
- Use getValueType() in promoteGlobalVariable to work with
opaque pointers.
Previously, using `intel_reqd_sub_group_size(32)` on DG2 resulted in two
redundant SIMD32 call instructions being generated in vISA, which could
lead to unexpected issues. This change ensures that only a single SIMD32
call instruction is generated. All function arguments and return values
are now correctly passed using two SIMD16 instructions, eliminating
redundancy and improving
The metadata node !kernel_arg_base_type must mirror !kernel_arg_type for
OpenCL builtin types (e.g. image1d_t). Unfortunately, this is
inconsistent with LLVM 16-based Common Clang.
This patch ensures that every OpenCL builtin type (*_t) listed in
!kernel_arg_type is also present in !kernel_arg_base_type at the same
position.
Clang 16 still lowers OpenCL/SPIR-V built-ins as ptr to opaque structs,
while SPIR-V Reader uses TargetExtTy values. This patch extends the
retyping function to also retype the return types of function (builtins)
declarations. Please note that the builtin function resolution is already
done earlier by SPIR-V Reader.
This patch also changes how ImageFuncsAnalysis pass recognizes
image/sampler types. Now, instead of relying on pointer element
types, the pass uses IGC metadata (m_OpenCLArgBaseTypes) --
consistent with other passes later on in the pipeline.
Previously, using `intel_reqd_sub_group_size(32)` on DG2 resulted in two
redundant SIMD32 call instructions being generated in vISA, which could
lead to unexpected issues. This change ensures that only a single SIMD32
call instruction is generated. All function arguments and return values
are now correctly passed using two SIMD16 instructions, eliminating
redundancy and improving
Clang 16 still lowers OpenCL/SPIR-V built-ins as ptr to opaque structs,
while SPIR-V Reader uses TargetExtTy values. This patch extends the
retyping function to also retype the return types of function (builtins)
declarations. Please note that the builtin function resolution is already
done earlier by SPIR-V Reader.
This patch also changes how ImageFuncsAnalysis pass recognizes
image/sampler types. Now, instead of relying on pointer element
types, the pass uses IGC metadata (m_OpenCLArgType) --
consistent with other passes later on in the pipeline.