Related-To: LOCI-4615
- Added Support for users to set ZE_FLAT_DEVICE_HIERARCHY to either FLAT
or COMPOSITE to change how devices are returned in zeDeviceGet and
clGetDeviceIDs.
- COMPOSITE is default behavior that exists today.
- FLAT returns all sub devices which have no sub devices and all root
devices that have no sub devices in zeDeviceGet ie with all devices
flattened out in order.
- Added zeDeviceGetRootDevice for one to retrieve the Root Device for
any SubDevice.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
By default prefer allocating memory first by KMD, instead of malloc first.
By default prefer not caching allocations on MTL devices. This results
in allocations being handled with non-coherent pat index.
For integrated devices when caching is not preferred do not allow
direct memory access in CPU domain. For map/unmap operations create
a dedicated memory allocation for CPU access, instead of accessing it
directly, reusing the same logic as when mapping/unmapping local memory.
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
If waitForBarrier is not passed outEvent then do
dcFlush on the next synchronize call.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Instances returned by `getAllocationsVector()` in some cases cannot be
freed (in the `malloc/new` sense) until the `drain()` function invokes
`allocInUse()` on them. Plus, the `chunksToFree` container operates on
pairs `{offset, size}`, not pointers, so such pair cannot be used to
release allocations either.
Provide an optional callback, which can be implemented by the custom
pool derived from `AbstractBuffersPool`. This callback can be used, for
example, to perform actual release of an allocation related to the
currently processed chunk.
Additionally, provide the `drain()` and `tryFreeFromPoolBuffer()`
functions with pool-independent versions and keep the previous versions
as defaults (for allocators with a single pool). The new versions allow
reusing the code for cases when allocator has multiple pools.
In both cases, there was no such needs so far but it arose when working
on `IsaBuffersAllocator`. The latter is coming with future commits, but
the shared code modifications are extracted as an independent step.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
If workgroup dimension x is 1, use y to ajust for divisible by dispatch
size.
Related-To: NEO-7927, GSD-5417
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Added if conditions to enable useChunking flag by checking
with ExecutionEnvironment::isDebuggingEnabled.
Related-To: NEO-8164
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
- Preserve releases on CMake level.
- Instead of generating builtins per platform, generate them per-release
(+ correct naming accordingly).
- Stop using revisions in builtin compilation logic path, as they are
already embedded in release (device ip).
- Remove platform names & revisions from names for generated files
(related to builtins).
- Remove unnecessary code, refactor ULT logic.
Related-To: NEO-7783
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
The previous name "pageSize2Mb" defined in
shared/source/helpers/constant.h is inconsistent to other variable,
i.e. pageSize64k.
Furthermore, it's a bit misleading because the page size is defined in
Megabytes (MB), not in Megabits (Mb).
Related-to: NEO-7695
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
Related-to: NEO-7695
New debug keys added:
EnableBOChunking is now a mask
0 = no chunking (default).
1 = shared allocations only
2 = device allocations only
3 = shared and device allocations
MinimalAllocationSizeForChunking sets the minimum allocation
size to apply chunking. Default is 2MB.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
- also use helper when checking that is simd1 to have same flow
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- Adobe Photoshop
- Adobe Premiere Pro
- Adobe After Effect
use RCS as a default engine
Related-To: NEO-8049
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- program global bindless ssba when external allocator used (
UseExternalAllocatorForSshAndDsh)
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Instances returned by `getAllocationsVector()` in some cases cannot be
freed (in the `malloc/new` sense) until the `drain()` function invokes
`allocInUse()` on them. Plus, the `chunksToFree` container operates on
pairs `{offset, size}`, not pointers, so such pair cannot be used to
release allocations either.
Provide an optional callback, which can be implemented by the custom
pool derived from `AbstractBuffersPool`. This callback can be used, for
example, to perform actual release of an allocation related to the
currently processed chunk.
Additionally, provide the `drain()` and `tryFreeFromPoolBuffer()`
functions with pool-independent versions and keep the previous versions
as defaults (for allocators with a single pool). The new versions allow
reusing the code for cases when allocator has multiple pools.
In both cases, there was no such needs so far but it arose when working
on `IsaBuffersAllocator`. The latter is coming with future commits, but
the shared code modifications are extracted as an independent step.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Due to this flag was not properly handled on Windows, command buffer
allocations were never reused in immediate command lists in case of
host secondary buffers. This lead to huge host memory consumption
and performance degradation
Related-To: NEO-8072
Signed-off-by: Igor Venevtsev <igor.venevtsev@intel.com>
- use calculateNumThreadsPerThreadGroup instead of getThreadsPerWG to
have same flow and proper values of threads per work groups
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>