In case of indirect kernel launch some payload arguments are patched
just before walker command, this change disables prefetch, performs
batch buffer start to next bytes and then re-enable prefetch. All these
operations are performed between MI_STORE_REGISTER_MEM and COMPUTE_WALKER
Related-To: NEO-14584
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Created EncodePostSync template struct to organize various post-sync
variables/functions from EncodeDispatchKernel
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
If some operatioins requires ULLS light stop, execute such operations
under mutex in pair with ULLS stop to ensure no other thread will start
ULLS.
Related-To: NEO-14406, NEO-13922
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
When trimming old allocations in usm reuse start from largest
allocations.
This will reduce memory usage more quickly once max hold time is hit.
Related-To: NEO-6893, NEO-14429
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Real allocation size should be used to properly apply limits and allow
more usm reuse hits.
Related-To: NEO-6893
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Related-To: NEO-14538
It's valid for 3D image to copy 2D region.
Current checks for mip map do not consider that.
This change correctly checks for mip mapped 3D image.
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
Save svmData on putting into reuse, instead of searching each time.
Change UNRECOVERABLE_IF to DEBUG_BREAK_IF.
Related-To: NEO-6893
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Related-To: NEO-14538
If user passes slice pitch which is larger than region
to copy, do not override memory beyond region but within
that slice pitch.
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
Pass hwInfo to isHeaplessModeEnabled and isForceBindlessRequired functions.
Related-To: NEO-14526
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
The patch applies to Level Zero.
Only allocations < 2MB will be fetched from the pool.
Allocations are shared and reused within a given device.
Additionally, I added a new debug flag to control the allocator:
EnableTimestampPoolAllocator
Related-To: NEO-12287
Signed-off-by: Fabian Zwoliński <fabian.zwolinski@intel.com>
Related-To: NEO-14056
No need to explicitly wait on Windows KMD during make resident as it has
a while loop that does it nevertheless. The KMD wait affects the API
overhead of zeCommandQueueExecuteCommandLists some platforms (MTL, and ARL).
Signed-off-by: Chandio, Bibrak Qamar <bibrak.qamar.chandio@intel.com>
Lock on csr is needed before lock on residency controller to prevent
incorrect lock order. Csr may be locked in waitOnCpu called from trimToBudget,
which may lead to deadlocks
Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>
instead of printf use makro that make flush after printf
Related-To: HSD-14024170600
Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>