- program debugSurface's SurfaceState at the beginning of Bindless Surface
State Heap - SPECIAL_SSH
- ensure SPECIAL_SSH is resident
Related-To: NEO-7063
Signed-off-by: Hoppe, Mateusz <mateusz.hoppe@intel.com>
- For calculating number of threads per workgroup, for SIMD 1, return
local work size (each software thread should be mapped into a whole hardware
thread).
- Correct logic of calculating space for per thread data for SIMD 1.
- Minor: unit tests refactor.
- Corrected naming.
Related-To: NEO-8261
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Add support for different timestamp packet counts per gfx family.
Change all packet counts to 1 except for xe-hpc.
Related-To: NEO-8154
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
value depends on CCS count:
- single CCS mode (default) - 50% available
- two CCS mode - 25% available
- four CCS mode - 12.5% available
Related-To: NEO-8377
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- when Surface State is reused for new resource, State Cache needs to be
invalidated
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
value depends on CCS count:
- single CCS mode (default) - no limitations
- two CCS mode - 25% available
- four CCS mode - 12.5% available
Related-To: NEO-8377
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- For calculating number of threads per workgroup, treat simd 1 as it
was simd 32
- Correct logic of calculating space for per thread data for simd 1
- Minor: unit tests refactor
- Corrected naming
Related-To: NEO-8261
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
For all of the devices, preferredPlatformName is initialized with
nullptr by default and platform name will be initialized to driver's default
platform name, at the moment this is "Intel(R) OpenCL Graphics".
When Platform is initialized and preferredPlatformName is not nullptr then
Platform name will be set to the value stored in preferredPlatformName.
Add ENABLE_LEGACY_PLATFORM_NAME AIL enum related to added functionality.
Move PlatformInfo to NEO namespace.
Related-To: HSD-22018809561
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
Products for which zebin has been set as default format in OCLOC:
- ICELAKE_LP
- TIGERLAKE_LP
- ROCKETLAKE
- ALDERLAKE_S
- ALDERLAKE_P
- ALDERLAKE_N
The default format does not override `--format` parameter.
Related-To: NEO-8334
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
- introduce 2 reuse pools to bindlessHeapHelper
- one pool stores slots for reuse, second pool stores released slots
- stateCacheDirty flags keep track of state cache - when pools are
switched - flags are set indicating flushing caches is needed after
old slots have been reused for new allocations
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- program surface states for redescribed images correctly. Image copy
to/from memory are using redescribed surface states,
- refactor state base address programming - program address and size
together, set max size at the beginning due to lack of Enable flag
- set GpuBase in WddmAllocation when external heap is used
- return max ssh required size from kernelInfo or based on stateful args
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- allocate SSH in cmdContainer when scratch allocation used with
private heaps
- scratch SurfaceStates are addressed relative to
SurfaceStateBaseAddress and have to be placed on SSH
- remove not used SCRATCH_SSH heap type from bindelssHeapHelper
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>