All problems with single address space mode have
been resolved and this Debug Key is no longer needed.
Related-to: NEO-7191
Signed-off-by: Yates, Brandon <brandon.yates@intel.com>
PVC platform with no support for atomic operations on system memory
must always allocate buffers in local memory to avoid atomic access violation.
Note: the feature is being implemented under the new registry key
AllocateBuffersInLocalMemoryForMultiRootDeviceContexts (disabled by default)
Related-To: NEO-7092
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Heuristic is enabled by default
to disable, set:
AdjustThreadGroupDispatchSize=0
Related-To: NEO-6989
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Improves performance in workloads that create small opencl buffers.
To enable, set env var ExperimentalSmallBufferPoolAllocator=1
Known issues (will be addressed in further commits):
- cannot create subBuffer from such buffer
- pool buffer allocation should be reused
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- Added Support for reading the Device LUID of the given device used in
Windows WDDM given EnableL0ReadLUIDExtension=1.
- Added inital support for passing back the NodeMask of 1.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
This change is intended to be used in immediate command lists that are
using flush task functionality.
With this change all immediate command list using the same csr will consume
shared allocations for dsh and ssh heaps. This will decrease number of SBA
commands dispatched when multiple command lists coexists and dispatch kernels.
With this change new SBA command should be dispatched only when current heap
allocation is exhausted.
Functionality is currently disabled and available under debug key.
Functionality will be enabled by default for all immediate command lists
with flush task functionality enabled.
Related-To: NEO-7142
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Improve performance by binding the command buffer together with other
allocations if VM_BIND feature is available. Remove the legacy
flag PassBoundBOToExec from DebugManager to simplify the logic.
Adapt unit tests and reuse handy macros to generate proxy mock-methods.
Related-To: NEO-7348
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
use cpu copy with locked pointer if possible
because this is faster than copy on gpu
limit to buffers of size at most 64kb
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Related-To: NEO-7237
Enable copy on cpu by default.
This commit also changes barrierCounter to bool
barrierCalled
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
This change reflects exact nature of debug variable and what is code
actually doing
Related-To: NEO-7187
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- enable tile attach mode by default
- both root device and subdevice may be attached to
Related-To: NEO-7347
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Related-To: NEO-7237
If size is small enough, it is more efficient to
perform copy through locked ptr on CPU.
This change also introduces experimental flag to
enable this.
Signed-off-by: Szymon Morek <szymon.morek@intel.com>