This change is intended to be used in immediate command lists that are
using flush task functionality.
With this change all immediate command list using the same csr will consume
shared allocations for dsh and ssh heaps. This will decrease number of SBA
commands dispatched when multiple command lists coexists and dispatch kernels.
With this change new SBA command should be dispatched only when current heap
allocation is exhausted.
Functionality is currently disabled and available under debug key.
Functionality will be enabled by default for all immediate command lists
with flush task functionality enabled.
Related-To: NEO-7142
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
So far captureStateBaseAddress() was a wrapper around
programSbaTrackingCommands(), doing an additional checking before
calling the latter. The checking is apparently no longer relevant, so
unify the distinction and remove part of the code which is no longer
needed.
In practice, keep the captureStateBaseAddress() while moving the body of
programSbaTrackingCommands() into it. This imposes lower diff-impact
onto the class hierarchy. Remove the second function. Simplify the
caller which had to distinct these two functions previously.
Related-To: NEO-6774
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
Related-To: NEO-6075
Binding table entry count was zeroed even when
ForceBtpPrefetchMode debug flag was enabled
Fixes found out while working on the StateBaseAddress adaptation to
StreamProperties. Removing unused parameters, improving code reuse
(further improvements come with following commits).
Related-To: NEO-6774
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
With compiler LSC WAs this gives better performance.
If debugger is active, policy will not be changed ie.
will be WBP.
Related-To: NEO-7003
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
With compiler LSC WAs this gives better performance.
If debugger is active, policy will not be changed ie.
will be WBP.
Related-To: NEO-7003
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
The default TG dispatch size can be changed
to a better value based on number of threads in TG or
currently available amount of threads on GPU.
Decision on what TG dispatch size should be are based on
implemented heuristics.
Signed-off-by: Rafal Maziejuk <rafal.maziejuk@intel.com>
Related-To: NEO-6989
For all platforms different than XE_HP_SDV (ATS) stop considering the
`useGlobalAtomics` flag as a decisive factor for trigerring the SBA
(StateBaseAddress) programming on the HW. Only XE_HP_SDV supports such
flag.
For consistency of the implementation, keep the related logic in one
place only, that is a helper in `command_encoder` and then just reuse it
in different places (`command_stream_receiver`).
Related-To: NEO-6953
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
When linear stream created for command container has not enough space
for command and BB_END it will program BB_END and allocate new command
buffer allocation. Pointer returned from getSpace in this case will
return storage from new command buffer allocation.
Related-To: NEO-5707
Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>
OCL image surface state programming for Xe Hp core is now reusing logic
of EncodeSurfaceState helper
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
return if context has multiple sub devices related to a given root device
Related-To: NEO-3691
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- Block R/W in kernels requires a minimum of 16B alignment/OWORD
alignment to properly work without data corruption.
- Level Zero currently writes Base Surface State addresses alignment to
4B vs OpenCL writes Base Surface State addresses aligned to PageSize for
4KB.
- Added a function in encode buffer to verify that at a minimum the size
being encoded has the minumum alignment of 4B which is supported, but
will not support Block R/W
Change-Id: I6486c2cbbb0008834c779bf54918388d79c193bb
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
Define MI_MATH "greater than" function and simplify code
in encodeGreaterThanPredicate().
Change-Id: Ib1d0a3f712e672f105d0697a105e4d9b14301172
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
General purpose register cannot be used for MI_MATH
calculations. ALU registers must be used.
To prevent passing general purpose register into the
EncodeMath interface, enforce a ALU register type
at compile time.
Change-Id: I98aa8605cde27e7003029d33b3ef3bcfb2306878
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
encodeMulRegVal() makes extensive use of encodeAluAdd().
The following problems are addressed:
* encodeAluAdd() performs an addition and saves the
calculated result to the first register. Saving the
result to the first register clears the calculated result.
* An array of MI_MATH buffers is setup prior to performing a
series of encodeAluAdd()'s where the same registers are
reused for the calculations. For calculated results to be
carried over from one encodeAluAdd() operation to subsequent
encodeAluAdd() operations, the MI_MATH buffer needs to be
setup per encodeAluAdd().
Create EncodeMath<Family>::addition() to reserve a MI_MATH buffer
and performs the addition by calling encodeAluAdd().
Modify encodeAluAdd() to save calculated result to a third
register. Then, after EncodeMath<Family>::addition() is called
in encodeMulRegVal(), copy the calculated result from the result
register to the first register from the EncodeMath<Family>::addition()
operation. This will allow the calculated value to be carried over
to subsequent addition operations.
Change-Id: I9c6f8362a1ca2f7e3361aaa48d8748dd6ff0f4c8
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>