When one process had exported and then opened IPC handle
of memory, then close function was called twice for the
same BO handle. It caused debugBreak() and aborted
an application.
This change allows multiple separate BOs to share one
handle. The last shared handle owner calls close() function.
Related-To: NEO-7200
Signed-off-by: Wrobel, Patryk <patryk.wrobel@intel.com>
-start async thread at device initialization which initializes selected
builtins and exits
-share module across builtins using same binary
Resolves: NEO-7644
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
If on clCreateProgramWithSource() call we detect a nGen dummy kernel usage,
then enforce fallback to the patchtokens format (only for this kernel).
Extend this logic to all platforms (previously - only for TGL/ICL).
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Allow for use of device binary in ocloc concat. Previously only
AR files could be concatenated.
This feature only works for zebin with AOT Product Config note.
Signed-off-by: Krystian Chmielewski <krystian.chmielewski@intel.com>
This commit adds support for decoding SHT_ZEBIN_GTPIN_INFO type
sections. For each section, passed data will be stored in kernel info
(for corresponding kernel).
Related-To: NEO-7689
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
This commit adds support for parsing SHT_NOBITS zebin's ELF sections
(containing global/constant zero-initialized data).
- Correction: in CTNI path, do not add related symbol if surface has not
been allocated.
Related-To: NEO-7196
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Use getTransferType, getTransferThreshold in
preferCopyThroughLockedPtr to make the decision clear for which
Transfer Types is CpuMemCopy enabled.
Related-To: NEO-7564
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
This PR gives us the ability to distinguish shared allocation from
device allocation.
Related-To: NEO-7564
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
Related-To: LOCI-3860
- Fixed IPC Event pools that are allocated for a single device such that
when opened thru IPC only that device handle can be used by the process
which opened the IPC event pool.
- IPC Event handle includes numDevices as a field to determine if the
root device index is the only index allowed for this event pool.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
-start async thread at device initialization which initializes selected
builtins and exits
-share module across builtins using same binary
Resolves: NEO-7644
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
Sizing context (PVC):
When using LargeGRF (a.k.a GRF256) there are only 4 HW threads per EU
(instead of default 8). Together with SIMD16 that means that there can
be max 64 work-items per EU. With 8 EU per subslice this gives 512
work-items on a single subslice. For correct intra-WG synchronization
all its WIs must be executed on the same subslice (to access the same
SLM, where the synchronization primitives are stored). Thus, with SIMD16
and LargeGRF the work-group size must not exceed 512 (PVC example).
So far `maxWorkGroupSize` is taken solely from a DeviceInfo structure
both in `ModuleTranslationUnit::processUnpackedBinary()` and
`ModuleImp::initialize()`. This method does not take kernel parameters
(LargeGRF) into account. It allows to submit a kernel using LargeGRF
with SIMD16 with the work-group size set to 1024. That leads to a hang.
Fix the `.maxWorkGroupSize` computation so that it takes the kernel
parameters into consideration.
Add new (for discrete platforms >= XeHP) and adapt existing tests, fix
cosmetics by the way.
Similar check for OCL:
https://github.com/intel/compute-runtime/blob/master/opencl/source/comma
nd_queue/enqueue_kernel.h#L130
Related-To: NEO-7684
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
During event synchronize in commandlist, now the printf buffer
should get flushed out when host synchronize is called.
Related-To: LOCI-3681
Signed-off-by: Zhang, Winston <winston.zhang@intel.com>