Change dyn_cast to cast where it's clear cast was intended.
- Change dyn_cast to cast where it's clear cast was intended.
- Fix lock scope in Dump.cpp.
- Pass sectName by ref in RelocSection constructor.
- Apply rule-of-three by explicitly deleting copy constructors.
- Fix memory leak in DbgDecoder in case of error by moving fopen to
constructor and fclose to destructor.
Add a bail out in SOALayoutChecker::MismatchDetected to treat i8-based
GEPs with non-constant byte indices as a mismatch and disable SOA
promotion for those allocas. This avoids incorrect results produced by the
legacy scalarization path when byte-wise addressing is used and the
offsets are not multiples of the lane size.
On opaque pointers we could end up with such group:
%C = getelementptr inbounds %class.MC_Vector, %class.MC_Vector* %.privateBuffer, i64 0, i32 1
%C2 = getelementptr inbounds %class.MC_Vector, %class.MC_Vector* %.privateBuffer, i64 0, i32 1
%A = getelementptr inbounds %class.MC_Particle, %class.MC_Particle* %.privateBuffer, i64 0, i32 4
%A2 = getelementptr inbounds %class.MC_Particle, %class.MC_Particle* %.privateBuffer, i64 0, i32 4
The problem is that we have type mismatch here, and later, when such group is being processed it tries
to access indices that aren't available in given type e.g 8th index of 3 element struct.
Such problematic group should be splitted into two different groups and it solves that problem.
%C = getelementptr inbounds %class.MC_Vector, %class.MC_Vector* %.privateBuffer, i64 0, i32 1
%C2 = getelementptr inbounds %class.MC_Vector, %class.MC_Vector* %.privateBuffer, i64 0, i32 1
%A = getelementptr inbounds %class.MC_Particle, %class.MC_Particle* %.privateBuffer, i64 0, i32 4
%A2 = getelementptr inbounds %class.MC_Particle, %class.MC_Particle* %.privateBuffer, i64 0, i32 4
This PR adds check which should result in creation of new group
instead of the merge with existing group that'd result in mismatched types.
Change args to const ref where makes sense.
Put std::move where makes sense
Apply rule of three
Change usage of unique_ptr and ptrs to unique_ptr to use just shared_ptr
Update comment in CodeSinking
Use saved boolean value instead of calling method over and over.
In igc_ocl_tranlation_ctx_impl there is handling of -igc_opts passed to
ocloc. The implementation however doesn't handle improper format, which
results in an infinite loop in case no single quotes were provided, or
the second single quote is missing.
This commit adds handling of these cases.
PromoteBools was wrongly skipping RAUW operation on ptr function calls
after promoting their signature and adding their users to promotion
queue which lead to cleaning those users and creating undef values in
subsequent users, which then in turn created dead code which was
eliminated and caused wrong test results. Essentially this pass was
wrongly skipping RAUW when neccessary.
In cases like these:
```llvm
@call = call ptr @some-function(i1 true)
@bitcast = bitcast ptr @call to ptr
@ptrtoint = ptrtoint ptr @bitcast to i64
```
After pass pre fix:
```llvm
@0 = call ptr @some-function(i8 1)
@ptrtoint = ptrtoint ptr undef to i64
```
Proper pass behaviour:
```llvm
@0 = call ptr @some-function(i8 1)
@bitcast = bitcast ptr @0 to ptr
@ptrtoin = ptrtoint ptr @bitcast to i64
```
This commit revisits functionality introduced by b74e645.
Previous solution didn't work on llvm16 + opaque pointers.
Additionally it assumed specific bitcast -> GEP pattern which wasn't
guaranteed to work in the future if SYCL codegen changed.
New algorithm does recursive search on users of AllocaInst
to find GEP and lifetime instructions that need fixing.
MismatchDetected solved assert in the PrivateMemoryResolution pass
caused by mismatched widths. LowerGEPForPrivMem solves that case
but in case of alloca over the allowed size it will leave mismatched
widths for the later PrivateMemoryResolution causing the assert to
fail. Extended mismatch detection for cases with struct of array/struct
of vector.
Current implementation of MismatchDetection was skipping cases due to algorithm version check.
It prevented mismatched cases to be detected, so I'm removing that check.
LLVM's instcombine pass (method
InstCombinerImpl::SimplifyDemandedVectorElts) is able to set input on
first insertelement to poison if all indices in vector are inserted (all
indices are overwritten). This optimization has a hardcoded limit on
vector size in LLVM 15 and 16, which is too short for compute workloads.
Manually optimize such cases in IGC.
When choosing between 2 nodes with same SU number, allow breaking tie
based on subtree selection heuristic. Pick a node if it's direct
predecessor of last scheduled node. This allows us to retire the busy
GRF faster and reduce register pressure quickly.
Currently this feature is disabled and must be enabled by setting a bit
in pre-sched ctrl.
MismatchDetected solved assert in the PrivateMemoryResolution pass
caused by mismatched widths. LowerGEPForPrivMem solves that case
but in case of alloca over the allowed size it will leave mismatched
widths for the later PrivateMemoryResolution causing the assert to
fail. Extended mismatch detection for cases with struct of array/struct
of vector.
- Fix the incorrect estimation of the initial register pressure
- Support more special cases for code with various casts in the
CodeScheduling's RegisterPressureTracker
MismatchDetected solved assert in the PrivateMemoryResolution pass
caused by mismatched widths. LowerGEPForPrivMem solves that case
but in case of alloca over the allowed size it will leave mismatched
widths for the later PrivateMemoryResolution causing the assert to
fail. Extended mismatch detection for cases with struct of array/struct
of vector.
There's this assumption: "If m_hasPositivePointerOffset is true, BUFFER_OFFSET are assumed to be **zero**"
But I've found a case on LLVM 16 + Opaques where such transformation causes this test to fail:
https://github.com/intel/llvm/ => DeviceLib/string_test.cpp
This:
%15 = add i32 %bufferOffset1, 1
%16 = getelementptr inbounds i8, ptr addrspace(1) %1, i64 1 // unused
%17 = inttoptr i32 %bindlessOffset2 to ptr addrspace(2490368) //