Level Zero doesn't use deferred allocations so no point in paying the
price to check for them in cleanAllocationList.
Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
Add mechanism to preallocate cmd buffer allocations in command stream
receiver reusable allocations list per command queue initialized.
This should limit additional allocations during hot loop.
Needs to be enabled in subsequent commits by setting product helper
method.
Related-To: NEO-8152
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- not only gpuAddress is offset but also cpu address with data needs
to be offset while writing memory.
Related-To: GSD-6604
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- set monitor fence dispatch for all cases task count post sync operation
- stand alone flush task count will not happen when already flushed and so
monitor fence
- monitor fence then must be dispatched together with task count post sync
Related-To: NEO-8395
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Add support for different timestamp packet counts per gfx family.
Change all packet counts to 1 except for xe-hpc.
Related-To: NEO-8154
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- this change handles level zero immediate command lists on copy engine
- monitor fence will be dispatched for blocking calls
- asynchronous mode will dispatch monitor fence only on host synchronization
Related-To: NEO-8395
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
In L0 its not possible to track objects relations. For example CmdList
may be removed before Event.
In such case, Event needs to safely skip unregister call, without
accessing CmdList/CmdQueue object.
Related-To: NEO-8884
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
- when Surface State is reused for new resource, State Cache needs to be
invalidated
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Program barrier to task stream, before next enqueue kernel.
This will reduce the number of batch buffer starts for sequences of
enqueue, barrier, enqueue, ... .
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Program barrier immediately to task stream.
This will reduce the number of batch buffer starts.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- allocate SSH in cmdContainer when scratch allocation used with
private heaps
- scratch SurfaceStates are addressed relative to
SurfaceStateBaseAddress and have to be placed on SSH
- remove not used SCRATCH_SSH heap type from bindelssHeapHelper
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Add debug flag ProgramBarrierInCommandStreamTask to program barrier
pipe control in task command stream instead of csr command stream.
This will reduce the number of batch buffer starts.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- use testTaskCountReady method to check TaskCount value
- download all allocations when TaskCount is ready
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
If waitForBarrier is not passed outEvent then do
dcFlush on the next synchronize call.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- program global bindless ssba when external allocator used (
UseExternalAllocatorForSshAndDsh)
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>