By design, address payload updating builtins does not create a new
address payload. If the payload's type is int, not pointer, the
updating builtin would be like the following:
(1) int addrP0 = createAddrPayload(...)
(2) v = block2d_read (addrP0, ....)
(3) int addrP1 = setBlockX(addrP0, ...)
In llvm IR, those addrP0 and addrP1 are different llvm values. And
it is legal for llvm to reorder instructions into { (1), (3), (2) },
which is incorrect as (2) should use the original addrP0, not the
updated one (addrP1).
For this reason, address payload should be of pointer type. With ptr
type, the above would be:
(1) int* addrP0 = createAddrPayload(...)
(2) v = block2d_read (addrP0, ....)
(3) setBlockX(addrP0, ...)
The llvm cannot reorder them as there are dependences b/w (2) and (3)
via ptr arg.
This PR makes address payload's type to be pointer type (int*).
Using ptr as address payload's type
By design, address payload updating builtins does not create a new
address payload. If the payload's type is int, not pointer, the
updating builtin would be like the following:
(1) int addrP0 = createAddrPayload(...)
(2) v = block2d_read (addrP0, ....)
(3) int addrP1 = setBlockX(addrP0, ...)
In llvm IR, those addrP0 and addrP1 are different llvm values. And
it is legal for llvm to reorder instructions into { (1), (3), (2) },
which is incorrect as (2) should use the original addrP0, not the
updated one (addrP1).
For this reason, address payload should be of pointer type. With ptr
type, the above would be:
(1) int* addrP0 = createAddrPayload(...)
(2) v = block2d_read (addrP0, ....)
(3) etBlockX(addrP0, ...)
The llvm cannot reorder them as there are dependences b/w (2) and (3).
This PR makes address payload's type to be pointer type (int*).
By design, address payload updating builtins does not create a new
address payload. If the payload's type is int, not pointer, the
updating builtin would be like the following:
(1) int addrP0 = createAddrPayload(...)
(2) v = block2d_read (addrP0, ....)
(3) int addrP1 = setBlockX(addrP0, ...)
In llvm IR, those addrP0 and addrP1 are different llvm values. And
it is legal for llvm to reorder instructions into { (1), (3), (2) },
which is incorrect as (2) should use the original addrP0, not the
updated one (addrP1).
For this reason, address payload should be of pointer type. With ptr
type, the above would be:
(1) int* addrP0 = createAddrPayload(...)
(2) v = block2d_read (addrP0, ....)
(3) etBlockX(addrP0, ...)
The llvm cannot reorder them as there are dependences b/w (2) and (3).
This PR makes address payload's type to be pointer type (int*).
1. address payload argument is of int, not int8. It is opaque.
The reason for replacing int8 with int is to prevent IGC from
scalarizing int8.
2. Add Address payload copy in case it is more efficient than
address payload create.
3. Add updating builtins for base/width/height/pitch besides
previously for blockx and blocky.
The block shape is still only in create builtin as updating it
needs to create a new address payload. For this, the shape will
be in create builtin only.
New lsc2d block intrinsic that uses address payload as a single
operand to block2d read/write/prefetch (their names have _ap_
in their name).
This is the 1st draft and subject to change.
Vector alias uses a node value as the ID for a group of aliased values.
As two vectors of different sizes could be aliased to each other, a node
value may be different from the original one and thus has a different
element size than the original vector, which would cause incorrect offset
calculation.
This change fixes that by adding the type of the original base vector
into base vector struct.
In addition, the previous alignment checking code for subvector isn't
complete. This change re-implements it by get all coalesced values and
checks alignment for every one of them and selects the max of them.
1. Refactor subvec aliasing and apply it to limited cases.
2. Add uniform checking to make sure subvec and vec have the same uniformity.
3. Further add alignment checking to make sure subvec's mininum alignment
requirement is guaranteed after becoming an alias to a larger vector.
(Note: as simdsize isn't available when doing analysis, the minimum
simdsize is used instead. This should be okay for dpas kernel as it
uses the minimum simdsize.)
4. This refactor also split funtionality into several sub functions for
ease of testing. With VATemp=1, it handles vectors that are basically
isolated; general cases are handled under VATemp=2.
(VATemp >> 2) & 0x3 is to control extractelement aliasing, and
(VATemp >> 4) & 0x3 is to control lifestart/end generation. Both
will be turned on and tested later if needed.
Add support in emit pass/dessa/wianalysis to support load combining that
uses struct for combining.
As load combining is off, this change has no functional issue.
There is an issue (reported by Jakacki, Jakub) when merging two IVI chains.
Two chain cannot be merged if they have different uniformness. The code
does not check if two chains have the same uniformness.
This change adds that check. It also refactors the code a little bit.
When grouping values of insertValue, only need to distinguish whether
both are uniform or both are not uniform. It is not necessary to distinguish
whether it is a global/group/thread uniform.
Better alias handling
When grouping values of insertValue and insertElement, only need to
distinguish whether both are uniform or both are not uniform. It is
not necessary to distinguish whether it is a global/group/thread
uniform.
When grouping values of insertValue and insertElement, only need to
distinguish whether both are uniform or both are not uniform. It is
not necessary to distinguish whether it is a global/group/thread
uniform.
Make coalesing insertelement and alias work on a function, instead of
on a basic block, so that special handling for them can be done locally.
No functional change.
This is for combining LLVM LoadInst and StoreInst. It works even
those loads and stores have different element sizes (current memopt
does not handle load/store with different element sizes).
This is the first submit. It has majority of boilerplace code
implemented. It has store combining, the load combining will be
added later.
If dessa is off, coalescing insertvalue needs to check
if operand 0 is a single user. If it is, continue the
chain, otherwise has to stop.
This is to avoid coalescing the following case:
a0 = insertvalue undef, s0, 0
a1 = insertvalue a0, s1, 1
a2 = insertvalue a1, s2, 2
b1 = insertvalue a0, x1, 1
b2 = insertvalue b1, x2, 2
= foo(a2)
= foo(b2)
{a1, a2} can be coalesced as well as {b1, b2}; but
{a1, a2} cannot coalesce with {b1, b2}.
For any insertvalue inst chain, make them aliases to each other.
Also, if two insertvalue chain define the disjoint struct fields,
they are combined and set to alias each other.
This is to let dessa to support insertValue to struct. This
change is an initial effort and is experimental in nature.
The reasonale is to coalescing multiple load/store into a struct and
have an efficient and single load/store instruction for them.
As EnableDeSSAAlias has been stable for a while, the key is no longer needed.
This change deletes the key EnableDeSSAAlias. With this, this DeSSAAlias is
always on and cannot be disabled. This simplifies the dessa as the code for
dessaalias=off can be safely deleted.
EnableDeSSAAlias is of int originally during development of coalescing
alias (bitcast, etc) to have a finer control. It is stable now and no
longer need to be of int.
This submit has the following changes:
1. Changes EnableDeSSAAlias to bool;
2. Change DisableDeSSA to EnableDeSSA
3. Guard the use of EnableDeSSAAlias with EnableDeSSA as EnableDeSSAAlias
is used only if DeSSA is on.
No function change expected from this submit.