Related-To: NEO-9116
- To allow for IPC handles to be shared between contexts, the ipc handle
tracking is now moved to the driver handle to be tracked globally.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
- test is using image in the kernel
- test is allocating and releasing many images to trigger SurfaceState
reuse logic. This allows to test reusing SurfaceState slots
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Added getDefaultDeviceHierarchy call that describes default device
hierarchy for a gfx core. Refactored L0 and OCL paths to use this
value by default and override this value when user sets
ZE_FLAT_DEVICE_HIERARCHY environment variable or
ReturnSubDevicesAsApiDevices debug key.
Updated ReturnSubDevicesAsApiDevices to force COMPOSITE device hierarchy
when set to 0.
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
If several kernel heaps are sharing the same page then use a temporary
buffer to collect all of them and transfer to memory in one shot.
Previously there were several transfers performed (one per kernel) and,
observably, they happened not to be immediately effective at times.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Related-To: NEO-9012
- Allows for the memory size requested by the user to be within the
physical memory size if that is set, otherwise the limit is the global
memory size.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
- program debugSurface's SurfaceState at the beginning of Bindless Surface
State Heap - SPECIAL_SSH
- ensure SPECIAL_SSH is resident
Related-To: NEO-7063
Signed-off-by: Hoppe, Mateusz <mateusz.hoppe@intel.com>
- For calculating number of threads per workgroup, for SIMD 1, return
local work size (each software thread should be mapped into a whole hardware
thread).
- Correct logic of calculating space for per thread data for SIMD 1.
- Minor: unit tests refactor.
- Corrected naming.
Related-To: NEO-8261
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Add support for different timestamp packet counts per gfx family.
Change all packet counts to 1 except for xe-hpc.
Related-To: NEO-8154
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
CmdList can be released before Event. In this case, GfxAllocation
destruction must be deferred.
Related-To: NEO-7966
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
- this change handles level zero immediate command lists on copy engine
- monitor fence will be dispatched for blocking calls
- asynchronous mode will dispatch monitor fence only on host synchronization
Related-To: NEO-8395
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
In L0 its not possible to track objects relations. For example CmdList
may be removed before Event.
In such case, Event needs to safely skip unregister call, without
accessing CmdList/CmdQueue object.
Related-To: NEO-8884
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>