When we're calculating instructions count in BB, we should skip `dbg` calls,
because we cross threshold for ScalarAliasBBSizeThreshold,
and we disable scalar aliasing in case with `-g`.
As some dead BBs are not removed in codegen emit, don't count those dead
bbs when counting the number of BBs.
Once those dead BBs are removed, should use F->size() to get the number
of BBs. (This is a temporary solution. Once the regerssion caused by adding simplifyCFG gets
resolved, this temporary solution should be removed.)
std::map with ptr as key has non-deterministic order of iterating
its members as ptr would be different from run to run. This non-
deterministic order causes visa to generate different code.
This PR fixes it by sorting map using a fixed value id. With this,
it will generate the same order of cvariables for vector alias for
the same input function.
For cases that vector alias could increase register pressure, don't
do vector alias.
Also, the extractMask optimization is favored for smaller vector over vector alias.
VA code checks that if extratMask can be done, skip vector alias.
Vector alias uses a node value as the ID for a group of aliased values.
As two vectors of different sizes could be aliased to each other, a node
value may be different from the original one and thus has a different
element size than the original vector, which would cause incorrect offset
calculation.
This change fixes that by adding the type of the original base vector
into base vector struct.
In addition, the previous alignment checking code for subvector isn't
complete. This change re-implements it by get all coalesced values and
checks alignment for every one of them and selects the max of them.
The offset of aliaser to base vector was calculated incorrectly using
struct's size. The correct one should calculate offset based on base
vector's element type.
Also, Add minimal alignment as an argument of GetSymbol() so that CVariable's
alignment can be set correctly for vector aliasing.
Besides minor refactoring to save compiling time by skipping vectoralias
earlier if it does not apply.
Also, add VectorAliasBBThreshold. If F's number of BBs is greater than
this threshold, skip vectorAlias to avoid increasing GRF pressure
As VectoAlias is off, no functional change.
This change is to fix an aliaser check before setting a value as
an aliaser, to make sure no value can be an aliaer twice.
As VectorAlias is off by default, this change has no functional change.
1. Refactor subvec aliasing and apply it to limited cases.
2. Add uniform checking to make sure subvec and vec have the same uniformity.
3. Further add alignment checking to make sure subvec's mininum alignment
requirement is guaranteed after becoming an alias to a larger vector.
(Note: as simdsize isn't available when doing analysis, the minimum
simdsize is used instead. This should be okay for dpas kernel as it
uses the minimum simdsize.)
4. This refactor also split funtionality into several sub functions for
ease of testing. With VATemp=1, it handles vectors that are basically
isolated; general cases are handled under VATemp=2.
(VATemp >> 2) & 0x3 is to control extractelement aliasing, and
(VATemp >> 4) & 0x3 is to control lifestart/end generation. Both
will be turned on and tested later if needed.
If dessa is off, coalescing insertvalue needs to check
if operand 0 is a single user. If it is, continue the
chain, otherwise has to stop.
This is to avoid coalescing the following case:
a0 = insertvalue undef, s0, 0
a1 = insertvalue a0, s1, 1
a2 = insertvalue a1, s2, 2
b1 = insertvalue a0, x1, 1
b2 = insertvalue b1, x2, 2
= foo(a2)
= foo(b2)
{a1, a2} can be coalesced as well as {b1, b2}; but
{a1, a2} cannot coalesce with {b1, b2}.
EnableDeSSAAlias is of int originally during development of coalescing
alias (bitcast, etc) to have a finer control. It is stable now and no
longer need to be of int.
This submit has the following changes:
1. Changes EnableDeSSAAlias to bool;
2. Change DisableDeSSA to EnableDeSSA
3. Guard the use of EnableDeSSAAlias with EnableDeSSA as EnableDeSSAAlias
is used only if DeSSA is on.
No function change expected from this submit.
-cl-intel-vector-coalesing=<0-5>
to control vector coalescing (extract/insert coalescing)
Remove unused driverInfo function : EnableVecAliasing()
Note that the previous failure might be due to wrong merging (I saw
that my change had unknown changes).
Change-Id: I37615f49e81040c0e1829cf0150512a4b17bcdb4
Added an ocl internal option
-cl-intel-vector-coalesing=<0-5>
to control vector coalescing (extract/insert coalescing)
Remove unused driverInfo function : EnableVecAliasing()
Change-Id: I5591235a439f60954bb393230221efdff866a772
-cl-intel-vector-coalesing=<0-5>
to control vector coalescing (extract/insert coalescing)
Remove unused driverInfo function : EnableVecAliasing()
Change-Id: Ib721bb431bd7e37a9611ada78c017d9985af7fba