Program barrier to task stream, before next enqueue kernel.
This will reduce the number of batch buffer starts for sequences of
enqueue, barrier, enqueue, ... .
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Program barrier immediately to task stream.
This will reduce the number of batch buffer starts.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
accept debug flag for all platforms
cleanup ocl unit tests for xe hpg platforms
remove not needed excludes
Related-To: NEO-8187
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
This change disables CPU caching for resources
not accessed by CPU for MTL devices.
Related-To: NEO-7194
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
Group allocation types related to cpu side allocations in function to
query gmm usage type. These types will have caching enabled even if
CPU caching is not preferred by GPU.
Add logic to query whether the cpu access is allowed for an allocation
(in cases when it is not preffered by GPU).
Related-To: NEO-7194
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
By default prefer allocating memory first by KMD, instead of malloc first.
By default prefer not caching allocations on MTL devices. This results
in allocations being handled with non-coherent pat index.
For integrated devices when caching is not preferred do not allow
direct memory access in CPU domain. For map/unmap operations create
a dedicated memory allocation for CPU access, instead of accessing it
directly, reusing the same logic as when mapping/unmapping local memory.
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
If waitForBarrier is not passed outEvent then do
dcFlush on the next synchronize call.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Currently the whole code resides within the opencl/ tree, but the
mechanism is meant to be reused in L0 for kernel-ISA allocations
optimization (further work).
This commit is a preparation step, which extracts the generic mechanism
and moves the extracted part under the shared/ tree.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Allows the user to use alignments > 64KB in `createUnifiedMemoryAllocation`
So that the restriction in `piextUSMDeviceAlloc` of the DPC++ runtime
could be lifted
Related-To: LOCI-4168
Signed-off-by: Lu, Wenbin <wenbin.lu@intel.com>
- Do not wait for GPU completion on pool exhaust if allocs are in use,
allocate new pool instead
- Free small buffer address range if allocs are not in use and
buffer pool is exhausted
Resolves: NEO-7769, NEO-7836
Signed-off-by: Igor Venevtsev <igor.venevtsev@intel.com>
- For static create() method for Kernel and MultiDeviceKernel force errcodeRet
parameter to be passed via reference (instead of a pointer)
- Move part of kernel's creation logic to initialize() method
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Allows the user to use alignments > 64KB in `createUnifiedMemoryAllocation`
So that the restriction in `piextUSMDeviceAlloc` of the DPC++ runtime
could be lifted
Related-To: LOCI-4168
Signed-off-by: Lu, Wenbin <wenbin.lu@intel.com>
Pass resolveDependenciesByPipecontrol bool value to get command stream
methods instead of reevaluating the condition.
Related-To: NEO-7321
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- Do not wait for GPU completion on pool exhaust if allocs are in use,
allocate new pool instead
- Reuse existing pool if allocs are not in use
Related-To: NEO-7769
Signed-off-by: Igor Venevtsev <igor.venevtsev@intel.com>