Currently the whole code resides within the opencl/ tree, but the
mechanism is meant to be reused in L0 for kernel-ISA allocations
optimization (further work).
This commit is a preparation step, which extracts the generic mechanism
and moves the extracted part under the shared/ tree.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
- Do not wait for GPU completion on pool exhaust if allocs are in use,
allocate new pool instead
- Free small buffer address range if allocs are not in use and
buffer pool is exhausted
Resolves: NEO-7769, NEO-7836
Signed-off-by: Igor Venevtsev <igor.venevtsev@intel.com>
- Do not wait for GPU completion on pool exhaust if allocs are in use,
allocate new pool instead
- Reuse existing pool if allocs are not in use
Related-To: NEO-7769
Signed-off-by: Igor Venevtsev <igor.venevtsev@intel.com>
When a dummy kernel "kernel void_(){}" is passed in sources - specific
for workloads with ngen backend - enforce fallback to CTNI for the whole
application context (mark the context as non-zebinary).
Related-To: NEO-7772
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Increase chunk alignment from 256 to 512.
Restores performance in some workloads with pool enabled but lowers maximum
possible number of buffers in pool from 256 to 128.
MemObj size will keep the value passed to clCreateBuffer ie. will not be
aligned up by chunk alignment.
CL_MEM_SIZE will now return same value as with pool disabled.
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
With this commit events created on multi root device contexts will
synchronize using signaled TagNodes instead of using taskCounts.
Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>
Related-To: NEO-7105
check if flags allow buffer from pool
add buffer offset to aubtests
disable pool buffer where required
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Allow to: disable performance hints, make allocation lockable
Used in BufferPoolAllocator
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Allocations of buffers <= 64KB will be lockable, to
allow copying through locked pointer.
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Improves performance in workloads that create small opencl buffers.
To enable, set env var ExperimentalSmallBufferPoolAllocator=1
Known issues (will be addressed in further commits):
- cannot create subBuffer from such buffer
- pool buffer allocation should be reused
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Stack vector will not cause dynamic allocations in most circumstances
ie. number of root device indices not more than 16
Related-To: NEO-6837
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Dates corrected in copyright headers to reflect original publication date
(2018 for OpenCL, 2020 for Level Zero).
Signed-off-by: lgotszal <lukasz.gotszald@intel.com>
return if context has multiple sub devices related to a given root device
Related-To: NEO-3691
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
store reference to std of root device indices and device bitfields
store NEO::Device in USM properties
Related-To: NEO-3691
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
create program for all provided devices
move OCL specific code from shared to opencl
Related-To: NEO-5001
Change-Id: Ic352b4e907ae75426634ae4b3c7048edecaf83e7
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>