Previous design allowed driver version to be no greater than 16-bit.
New design allows driver version to use (almost) whole 32-bit size limit.
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
There is a possibility that getGraphicsAllocation returns nullptr, so
accessing storageInfo would cause a Segmentation fault.
This change prevents that by simply nullptr checking.
Related-To: LOCI-4032
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
This reverts commit 871a3bd11d.
This is due do Elmo regression.
Related-To: NEO-7684, HSD-18027378546
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
When using ReturnSubDevicesAsApiDevices=1 to have
sub-devices-as-root-devices, then the driver should read the values
passed in the mask as those corresponding to the physical
sub-devices.
For instance, in a dual system with multi-tile device, we would have:
card 0, tile 0
card 0, tile 1
card 1, tile 0
card 1, tile 1
With:
ReturnSubDevicesAsApiDevices=0
ZE_AFFINITY_MASK=0,1
Then all tiles in card 0 and card 1 need to be exposed.
With:
ReturnSubDevicesAsApiDevices=1
ZE_AFFINITY_MASK=0,3
Then card 0 tile 0, and card 1 tile 1 need to be exposed.
Related-To: NEO-7137
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
- stateless mocs is present in all state base address commands
- select GPGPU pipeline is present in all pipeline select commands
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Related-To: LOCI-3871
- Enabled allocation of specified base address in the targeted heap.
- Enabled virtual memory reservations to grow by allocating at the start
of the heap vs the end of the heap.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
When one process had exported and then opened IPC handle
of memory, then close function was called twice for the
same BO handle. It caused debugBreak() and aborted
an application.
This change allows multiple separate BOs to share one
handle. The last shared handle owner calls close() function.
Related-To: NEO-7200
Signed-off-by: Wrobel, Patryk <patryk.wrobel@intel.com>
-start async thread at device initialization which initializes selected
builtins and exits
-share module across builtins using same binary
Resolves: NEO-7644
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
This commit adds support for parsing SHT_NOBITS zebin's ELF sections
(containing global/constant zero-initialized data).
- Correction: in CTNI path, do not add related symbol if surface has not
been allocated.
Related-To: NEO-7196
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Use getTransferType, getTransferThreshold in
preferCopyThroughLockedPtr to make the decision clear for which
Transfer Types is CpuMemCopy enabled.
Related-To: NEO-7564
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
This PR gives us the ability to distinguish shared allocation from
device allocation.
Related-To: NEO-7564
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
Related-To: LOCI-3860
- Fixed IPC Event pools that are allocated for a single device such that
when opened thru IPC only that device handle can be used by the process
which opened the IPC event pool.
- IPC Event handle includes numDevices as a field to determine if the
root device index is the only index allowed for this event pool.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
-start async thread at device initialization which initializes selected
builtins and exits
-share module across builtins using same binary
Resolves: NEO-7644
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
Sizing context (PVC):
When using LargeGRF (a.k.a GRF256) there are only 4 HW threads per EU
(instead of default 8). Together with SIMD16 that means that there can
be max 64 work-items per EU. With 8 EU per subslice this gives 512
work-items on a single subslice. For correct intra-WG synchronization
all its WIs must be executed on the same subslice (to access the same
SLM, where the synchronization primitives are stored). Thus, with SIMD16
and LargeGRF the work-group size must not exceed 512 (PVC example).
So far `maxWorkGroupSize` is taken solely from a DeviceInfo structure
both in `ModuleTranslationUnit::processUnpackedBinary()` and
`ModuleImp::initialize()`. This method does not take kernel parameters
(LargeGRF) into account. It allows to submit a kernel using LargeGRF
with SIMD16 with the work-group size set to 1024. That leads to a hang.
Fix the `.maxWorkGroupSize` computation so that it takes the kernel
parameters into consideration.
Add new (for discrete platforms >= XeHP) and adapt existing tests, fix
cosmetics by the way.
Similar check for OCL:
https://github.com/intel/compute-runtime/blob/master/opencl/source/comma
nd_queue/enqueue_kernel.h#L130
Related-To: NEO-7684
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
During event synchronize in commandlist, now the printf buffer
should get flushed out when host synchronize is called.
Related-To: LOCI-3681
Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
In order to support latest spec, where sysman's initialization
could happen independent of core's initialization, add a new sysman
directory inside level_zero.
Related-To: LOCI-3887
Signed-off-by: Jitendra Sharma <jitendra.sharma@intel.com>