So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Enabling on pvc after patch in igc.
Enabling only for JIT kernels because AOT could have been compiled with
IGC older than required.
Related-To: NEO-7712
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
When there is a PRINT_DEBUG_MESSAGE message in module that is
applicable to the user, it is now also set to
Driver::zeDriverGetLastErrorDescription.
ULTs are also added to verify that setErrorDescription successfully
stores the error message.
Related-To: LOCI-4653
Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
- Debug key DumpZEBin should dump zebin elf for modules created from
SPIRV format
Related-To: NEO-7895
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Related-To: LOCI-4578
- Report all symbols in the Symbols Map for a Module as the Exported
symbols instead of using the External Functions Program Info.
- Resolves the issue of reporting symbols for platforms that don't have
ZEBIN binaries by default.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
Related-To: NEO-6075
Ngen binaries contain stateful information, however they are
not used in isa on Pvc. Therefore, we can just ignore them.
- set by default flag ZebinIgnoreIcbeVersion to true
- for zebin icbe version check is only inside flag
- only when use patchtoken then check icbe version is mandatory
Resolves: NEO-7904
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
Add "DumpZEBin" debug flag. When this flag is enabled, Zebin will be
dumped to a .elf file (with appropiate suffix, in case such file has
been dumped before).
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Related-To: NEO-7895
Changed -cl-intel-allow-zebin to -cl-intel-enable-zebin only for
API options.
Related-To: NEO-7801
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
It is possible that a module has so many kernels that the 4GB limit of
GPU VA is depleted when each kernel allocates a 64 KB page for its own
ISA. In such case, propagate the ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY to
the API caller to indicate the actual problem.
Currently such scenario is not detected, the execution advances a bit
further and the following crashes do not let the user to easily
understand what happened.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
If gtpin is used, then don't check addressing mode
of the last explicit arg, which is
gtpin's surface.
Related-To: NEO-6075
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
This patch adds FP64 emulation support for ATS-M.
Introducing new environment variable - NEO_FP64_EMULATION - which provides
an option to allow the opt-in emulation of FP64.
When emulation is enabled, we pass -cl-fp64-gen-emu (ocl) /
-ze-fp64-gen-emu (L0) as an internal option to IGC.
Related-To: NEO-7611
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
* Moved zebin related files to zebin directory.
* Moved zebin related code to Zebin namespace.
* Separated zeInfo from zebin elf.
* Seperated zeInfo decoding from zebin decoder.
* Refactored populateKernelPayloadArgument function.
Signed-off-by: Krystian Chmielewski <krystian.chmielewski@intel.com>
Resubmission of 871a3bd11d
Reverted by 9882e992ac due to Elmo
regression (most likely not related to the change anyway).
Fixup for 2778043d67
Related-To: NEO-7684, HSD-18027378546
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
This commit adds support for handling local symbols.
* Added 2 fields to SymbolInfo - binding, and associated
instructions segment id.
* Simplified code for decoding elf symbols and relocations.
* Simplified code for patching instruction segments.
* Changed logic of decodeElfSymbolTableAndRelocations:
* Add every global symbol to symbol map.
* Add any local symbol used by relocation to symbol map.
* Changed logic of link:
* After performing relocations remove local symbols from map.
* Replaced UNRECOVERABLE_IF with returning error.
* Removed LocalSymbolInfo structure used before for local kernel jumps.
* Removed old tests.
Signed-off-by: Krystian Chmielewski <krystian.chmielewski@intel.com>
This reverts commit 871a3bd11d.
This is due do Elmo regression.
Related-To: NEO-7684, HSD-18027378546
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
This commit adds support for parsing SHT_NOBITS zebin's ELF sections
(containing global/constant zero-initialized data).
- Correction: in CTNI path, do not add related symbol if surface has not
been allocated.
Related-To: NEO-7196
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Sizing context (PVC):
When using LargeGRF (a.k.a GRF256) there are only 4 HW threads per EU
(instead of default 8). Together with SIMD16 that means that there can
be max 64 work-items per EU. With 8 EU per subslice this gives 512
work-items on a single subslice. For correct intra-WG synchronization
all its WIs must be executed on the same subslice (to access the same
SLM, where the synchronization primitives are stored). Thus, with SIMD16
and LargeGRF the work-group size must not exceed 512 (PVC example).
So far `maxWorkGroupSize` is taken solely from a DeviceInfo structure
both in `ModuleTranslationUnit::processUnpackedBinary()` and
`ModuleImp::initialize()`. This method does not take kernel parameters
(LargeGRF) into account. It allows to submit a kernel using LargeGRF
with SIMD16 with the work-group size set to 1024. That leads to a hang.
Fix the `.maxWorkGroupSize` computation so that it takes the kernel
parameters into consideration.
Add new (for discrete platforms >= XeHP) and adapt existing tests, fix
cosmetics by the way.
Similar check for OCL:
https://github.com/intel/compute-runtime/blob/master/opencl/source/comma
nd_queue/enqueue_kernel.h#L130
Related-To: NEO-7684
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
- do not trigger incorrect / spurious events from internal modules
for debugger
- do not register Elf for internal modules
Related-To: NEO-7605
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>