If on clCreateProgramWithSource() call we detect a nGen dummy kernel usage,
then enforce fallback to the patchtokens format (only for this kernel).
Extend this logic to all platforms (previously - only for TGL/ICL).
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Allow for use of device binary in ocloc concat. Previously only
AR files could be concatenated.
This feature only works for zebin with AOT Product Config note.
Signed-off-by: Krystian Chmielewski <krystian.chmielewski@intel.com>
This commit adds support for decoding SHT_ZEBIN_GTPIN_INFO type
sections. For each section, passed data will be stored in kernel info
(for corresponding kernel).
Related-To: NEO-7689
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
This commit adds support for parsing SHT_NOBITS zebin's ELF sections
(containing global/constant zero-initialized data).
- Correction: in CTNI path, do not add related symbol if surface has not
been allocated.
Related-To: NEO-7196
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
This PR gives us the ability to distinguish shared allocation from
device allocation.
Related-To: NEO-7564
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
Sizing context (PVC):
When using LargeGRF (a.k.a GRF256) there are only 4 HW threads per EU
(instead of default 8). Together with SIMD16 that means that there can
be max 64 work-items per EU. With 8 EU per subslice this gives 512
work-items on a single subslice. For correct intra-WG synchronization
all its WIs must be executed on the same subslice (to access the same
SLM, where the synchronization primitives are stored). Thus, with SIMD16
and LargeGRF the work-group size must not exceed 512 (PVC example).
So far `maxWorkGroupSize` is taken solely from a DeviceInfo structure
both in `ModuleTranslationUnit::processUnpackedBinary()` and
`ModuleImp::initialize()`. This method does not take kernel parameters
(LargeGRF) into account. It allows to submit a kernel using LargeGRF
with SIMD16 with the work-group size set to 1024. That leads to a hang.
Fix the `.maxWorkGroupSize` computation so that it takes the kernel
parameters into consideration.
Add new (for discrete platforms >= XeHP) and adapt existing tests, fix
cosmetics by the way.
Similar check for OCL:
https://github.com/intel/compute-runtime/blob/master/opencl/source/comma
nd_queue/enqueue_kernel.h#L130
Related-To: NEO-7684
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
- add estimation parameter for interface descriptor data count
- add to the heap estimation alignment parameter for dynamic and surface heaps
- extend encode interface and implementations to allow child heaps
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
This commit switches the device ID logic from the deprecated
to the new one, so that if the user passes a hex value to the -device
parameter, ocloc will use the new implementation in the product config
helper. The change also introduces a fix for setting the values in the
correct order to configure the hwIfno correctly.
Signed-off-by: Daria Hinz daria.hinz@intel.com
Related-To: NEO-7487
Refactor structure and add field to pass USM memory type.
To maintain backwards compatibility with current applications,
pass 0 as type for device allocations, and 1 for host
allocations.
Related-To: LOCI-3771
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Use getSubDevicesCount() from hwInfo to determine whether device
is root or not, instead of isSubDevice(), since the former does not
change with the affinity mask.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
This commit is to introduce optimizations in ocloc when building
targets for release and family.
Instead of building fatbinary after all available targets in
the RTL ID table, we introduce optimizations when there is an
acronym available for the platform in the DEVICE table,
we limit to them only.
Signed-off-by: Daria Hinz <daria.hinz@intel.com>
Related-To: NEO-7582