- change additional size into local region size
- change walk order into dispatch walk order to distinguish for local id walk
Related-To: NEO-13350
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Update to use igc indirect detection v3. Fix for not detecting indirects
passed as implicit arguments.
Related-To: NEO-11396
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- samplers using bindless adressing require patching bindless offsets to
sampler states on kernel's cross thread data
Related-To: NEO-10505
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- Separate logic for fp16/32/46 caps.
- Add aggregated constexprs for local & global caps of given type
- Pass arguments by reference
- Add hwInfo as argument for future refactors
- Add static_asserts in L0 to ensure there is no mismatch between
internal/external caps
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
rename private scratch space into scratch space slot 1 as it can be generic
Related-To: NEO-9944
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
rename private scratch space into scratch space slot 1 as it can be generic
Related-To: NEO-9944
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
- also use helper when checking that is simd1 to have same flow
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- use calculateNumThreadsPerThreadGroup instead of getThreadsPerWG to
have same flow and proper values of threads per work groups
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- use calculateNumThreadsPerThreadGroup instead of getThreadsPerWG to
have same flow and proper values of threads per work groups
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- allow bindless kernels to execute
- bindless addressing kernels are using private heaps mode
- do not differentiate bindful and bindless surface state base addresses
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>