Related-To: LOCI-3871
- Enabled allocation of specified base address in the targeted heap.
- Enabled virtual memory reservations to grow by allocating at the start
of the heap vs the end of the heap.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
When one process had exported and then opened IPC handle
of memory, then close function was called twice for the
same BO handle. It caused debugBreak() and aborted
an application.
This change allows multiple separate BOs to share one
handle. The last shared handle owner calls close() function.
Related-To: NEO-7200
Signed-off-by: Wrobel, Patryk <patryk.wrobel@intel.com>
-start async thread at device initialization which initializes selected
builtins and exits
-share module across builtins using same binary
Resolves: NEO-7644
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
Use getTransferType, getTransferThreshold in
preferCopyThroughLockedPtr to make the decision clear for which
Transfer Types is CpuMemCopy enabled.
Related-To: NEO-7564
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
This PR gives us the ability to distinguish shared allocation from
device allocation.
Related-To: NEO-7564
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
Related-To: LOCI-3860
- Fixed IPC Event pools that are allocated for a single device such that
when opened thru IPC only that device handle can be used by the process
which opened the IPC event pool.
- IPC Event handle includes numDevices as a field to determine if the
root device index is the only index allowed for this event pool.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
-start async thread at device initialization which initializes selected
builtins and exits
-share module across builtins using same binary
Resolves: NEO-7644
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
Sizing context (PVC):
When using LargeGRF (a.k.a GRF256) there are only 4 HW threads per EU
(instead of default 8). Together with SIMD16 that means that there can
be max 64 work-items per EU. With 8 EU per subslice this gives 512
work-items on a single subslice. For correct intra-WG synchronization
all its WIs must be executed on the same subslice (to access the same
SLM, where the synchronization primitives are stored). Thus, with SIMD16
and LargeGRF the work-group size must not exceed 512 (PVC example).
So far `maxWorkGroupSize` is taken solely from a DeviceInfo structure
both in `ModuleTranslationUnit::processUnpackedBinary()` and
`ModuleImp::initialize()`. This method does not take kernel parameters
(LargeGRF) into account. It allows to submit a kernel using LargeGRF
with SIMD16 with the work-group size set to 1024. That leads to a hang.
Fix the `.maxWorkGroupSize` computation so that it takes the kernel
parameters into consideration.
Add new (for discrete platforms >= XeHP) and adapt existing tests, fix
cosmetics by the way.
Similar check for OCL:
https://github.com/intel/compute-runtime/blob/master/opencl/source/comma
nd_queue/enqueue_kernel.h#L130
Related-To: NEO-7684
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
During event synchronize in commandlist, now the printf buffer
should get flushed out when host synchronize is called.
Related-To: LOCI-3681
Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
- add estimation parameter for interface descriptor data count
- add to the heap estimation alignment parameter for dynamic and surface heaps
- extend encode interface and implementations to allow child heaps
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Refactor structure and add field to pass USM memory type.
To maintain backwards compatibility with current applications,
pass 0 as type for device allocations, and 1 for host
allocations.
Related-To: LOCI-3771
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>