Add missing test with debug flag
SetAmountOfReusableAllocationsPerCmdQueue unset.
Related-To: NEO-8152
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Add mechanism to preallocate cmd buffer allocations in command stream
receiver reusable allocations list per command queue initialized.
This should limit additional allocations during hot loop.
Needs to be enabled in subsequent commits by setting product helper
method.
Related-To: NEO-8152
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- not only gpuAddress is offset but also cpu address with data needs
to be offset while writing memory.
Related-To: GSD-6604
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- set monitor fence dispatch for all cases task count post sync operation
- stand alone flush task count will not happen when already flushed and so
monitor fence
- monitor fence then must be dispatched together with task count post sync
Related-To: NEO-8395
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Add support for different timestamp packet counts per gfx family.
Change all packet counts to 1 except for xe-hpc.
Related-To: NEO-8154
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
In L0 its not possible to track objects relations. For example CmdList
may be removed before Event.
In such case, Event needs to safely skip unregister call, without
accessing CmdList/CmdQueue object.
Related-To: NEO-8884
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
- when Surface State is reused for new resource, State Cache needs to be
invalidated
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Program barrier to task stream, before next enqueue kernel.
This will reduce the number of batch buffer starts for sequences of
enqueue, barrier, enqueue, ... .
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Program barrier immediately to task stream.
This will reduce the number of batch buffer starts.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Add debug flag ProgramBarrierInCommandStreamTask to program barrier
pipe control in task command stream instead of csr command stream.
This will reduce the number of batch buffer starts.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- use testTaskCountReady method to check TaskCount value
- download all allocations when TaskCount is ready
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
By default prefer allocating memory first by KMD, instead of malloc first.
By default prefer not caching allocations on MTL devices. This results
in allocations being handled with non-coherent pat index.
For integrated devices when caching is not preferred do not allow
direct memory access in CPU domain. For map/unmap operations create
a dedicated memory allocation for CPU access, instead of accessing it
directly, reusing the same logic as when mapping/unmapping local memory.
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
If waitForBarrier is not passed outEvent then do
dcFlush on the next synchronize call.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>