If several kernel heaps are sharing the same page then use a temporary
buffer to collect all of them and transfer to memory in one shot.
Previously there were several transfers performed (one per kernel) and,
observably, they happened not to be immediately effective at times.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
This patch avoids returning error for system addresses in setArg
Related-To: GSD-3597
Signed-off-by: Joshua Santosh Ranjan <joshua.santosh.ranjan@intel.com>
Enabling on pvc after patch in igc.
Enabling only for JIT kernels because AOT could have been compiled with
IGC older than required.
Related-To: NEO-7712
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- program surface states for redescribed images correctly. Image copy
to/from memory are using redescribed surface states,
- refactor state base address programming - program address and size
together, set max size at the beginning due to lack of Enable flag
- set GpuBase in WddmAllocation when external heap is used
- return max ssh required size from kernelInfo or based on stateful args
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- store surface state info for bindless addressing in graphics
allocation
- remove map in BindlessHeapsHelper - bindlessInfo is constant for
the lifetime of an allocation
- program bindless offsets and surface states for images when used in
bindless kernel
- handle ouf of memory on surface state heap - return error
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- also use helper when checking that is simd1 to have same flow
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- use calculateNumThreadsPerThreadGroup instead of getThreadsPerWG to
have same flow and proper values of threads per work groups
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
Added setErrorDescription() and getErrorDescription() in DriverHandle
to record and retrieve the custom string for errors.
Related-To: LOCI-4619
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
- use calculateNumThreadsPerThreadGroup instead of getThreadsPerWG to
have same flow and proper values of threads per work groups
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- remove useless flag ForceNumberOfThreadsInGpgpuThreadGroup
- add new flag "RemoveRestrictionsOnNumberOfThreadsInGpgpuThreadGroup"
to restore old path without restrictions about number of threads in
thread group
- fix forwarding information about hw local ids generations to
calculate numOfThreadsInThreadGroup correctly
Related-To: NEO-7952, NEO-7982
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- allow bindless kernels to execute
- bindless addressing kernels are using private heaps mode
- do not differentiate bindful and bindless surface state base addresses
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Related-To: LOCI-4381
- Enabled support for customers to use full Virtual reservation range
with multiple physical mappings with additional allocations implicitly
included in residency.
- Buffer Surface state size extended for first allocation to stretch to
the bufferSize requested.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
Related-To: LOCI-4176
- Given a Base Pointer passed into Get Peer Allocation, then the base
pointer is used in the map of the new allocation to the virtual memory.
- Enables users to use the same pointer for all devices in Peer To Peer.
- Currently unsupported on reserved memory due to mapped and exec
resiedency of Virtual addresses.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
When setting kernel group size with incorrect values, error would not be
returned if method called with same arguments a second time.
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- 3D btd command should be programed only once per context
- Add conditional pipe control command prior dispatching 3D btd command
- share 3D btd state between immediate and regular command lists
- add pipe control after ray tracing kernel to invalidate state cache
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
It is possible that a module has so many kernels that the 4GB limit of
GPU VA is depleted when each kernel allocates a 64 KB page for its own
ISA. In such case, propagate the ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY to
the API caller to indicate the actual problem.
Currently such scenario is not detected, the execution advances a bit
further and the following crashes do not let the user to easily
understand what happened.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
After resolving NEO-7684 in turns out that `zeKernelGetProperties`
is still returning wrong value for `maxNumSubgroups` since it
did not take into account `LargeGRF & SIMD` limitation.
Related-To: NEO-7829
Signed-off-by: Krzysztof Gibala <krzysztof.gibala@intel.com>