compute-runtime/level_zero/core/source/kernel
Maciej Bielski 2778043d67 fix(l0): check for largeGRF when computing maxWorkGroupSize
Sizing context (PVC):
When using LargeGRF (a.k.a GRF256) there are only 4 HW threads per EU
(instead of default 8). Together with SIMD16 that means that there can
be max 64 work-items per EU. With 8 EU per subslice this gives 512
work-items on a single subslice. For correct intra-WG synchronization
all its WIs must be executed on the same subslice (to access the same
SLM, where the synchronization primitives are stored). Thus, with SIMD16
and LargeGRF the work-group size must not exceed 512 (PVC example).

So far `maxWorkGroupSize` is taken solely from a DeviceInfo structure
both in `ModuleTranslationUnit::processUnpackedBinary()` and
`ModuleImp::initialize()`. This method does not take kernel parameters
(LargeGRF) into account. It allows to submit a kernel using LargeGRF
with SIMD16 with the work-group size set to 1024. That leads to a hang.

Fix the `.maxWorkGroupSize` computation so that it takes the kernel
parameters into consideration.

Add new (for discrete platforms >= XeHP) and adapt existing tests, fix
cosmetics by the way.

Similar check for OCL:
https://github.com/intel/compute-runtime/blob/master/opencl/source/comma
nd_queue/enqueue_kernel.h#L130

Related-To: NEO-7684
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
2023-02-08 11:20:52 +01:00
..
kernel.cpp Copyright header update 2021-05-17 20:38:19 +02:00
kernel.h feature: print printf contents right after gpu hang detection 2023-01-11 08:14:00 +01:00
kernel_ext.cpp Add option for extending kernel 2022-05-16 12:08:41 +02:00
kernel_hw.h Cleanup includes 42 2023-01-25 09:16:39 +01:00
kernel_imp.cpp fix(l0): check for largeGRF when computing maxWorkGroupSize 2023-02-08 11:20:52 +01:00
kernel_imp.h Add state base address properties tracking for command lists 2023-01-31 12:47:17 +01:00
patch_with_implicit_surface.inl Reduce usage of global gfx core helper getter [3/n] 2022-12-13 11:13:11 +01:00
sampler_patch_values.h Use correct enum values for sampler in clamp mode 2022-01-20 18:15:53 +01:00