Make bindless heaps resident right after heap allocation.
Motivation is that SYCL bindless image can be passed as a value argument
or through memory. Therefore, we're not able to make its bindless heap
resident during kernel initialization or setting kernel arguments.
This fixes SYCL bindless image read_write_*D.cpp tests on DG2.
Related-To: NEO-7063
Signed-off-by: Wenju He <wenju.he@intel.com>
- if binding table entries are used in bindless kernel, program Binding
table
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- check releaseHelper support when selecting bindless mode, if not
disabled, prefer bindless mode in L0 API
- bindless mode can be forced with DebugVariable: UseBindlessMode
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Use debug flag PrintKernelDispatchParameters to print params used in
thread group dispatch size heuristic when encoding kernel dispatch.
Related-To: NEO-6989
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- SPECIAL_SSH is used for debug surface SurfaceState which must be
located at bindless offset zero
- limit size of external front window
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
- program global bindless ssba when external allocator used (
UseExternalAllocatorForSshAndDsh)
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
denorm support is controlled by IGC, we should just set zero by default
Related-To: NEO-8059
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- allow bindless kernels to execute
- bindless addressing kernels are using private heaps mode
- do not differentiate bindful and bindless surface state base addresses
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>