So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Related-To: LOCI-4176
- Given a Base Pointer passed into Get Peer Allocation, then the base
pointer is used in the map of the new allocation to the virtual memory.
- Enables users to use the same pointer for all devices in Peer To Peer.
- Currently unsupported on reserved memory due to mapped and exec
resiedency of Virtual addresses.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
It is possible that a module has so many kernels that the 4GB limit of
GPU VA is depleted when each kernel allocates a 64 KB page for its own
ISA. In such case, propagate the ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY to
the API caller to indicate the actual problem.
Currently such scenario is not detected, the execution advances a bit
further and the following crashes do not let the user to easily
understand what happened.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
- apply revelant flags only on platforms supporting these flags
- update command list preemption level when supported
- use actual kernel preemption level to program interface descriptor data
Related-To: NEO-7771
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- printf used in kernel is printed on synchronize() call, if
hang is detected - printf buffer was not printed immediately but
only when Kernel was destroyed
- this change adds copying printf buffer with internal engine
(whenever available) right after hang detection on
CommandQueue::synchronize() call
Related-To: NEO-6427
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Dates corrected in copyright headers to reflect original publication date
(2018 for OpenCL, 2020 for Level Zero).
Signed-off-by: lgotszal <lukasz.gotszald@intel.com>
Instead of moving the ISAs for all kernel in a module when the module
is created, move the ISA when the kernel is created, to avoid
unnecessary memory transfers.
Related-To: LOCI-2009
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
This instead of when the associated module is created, to avoid
allocating memory for kernels that are never created nor used.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
This instead of when the associated module is created, to avoid
allocating memory for kernels that are never created nor used.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
This instead of when the associated module is created, to avoid
allocating memory for kernels that are never created nor used.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Add experimental extension to set global work offest in L0.
Current L0 specification does not have interface to export
experimental function symbols, so for now, applications need
to find the symbol like with dlsym on Linux.
A blackbox test showing functionality is also added.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>