- Check allocation root device index during eviction
- Wait for and marked allocation only from the current root device index
Related-To: NEO-7920
Signed-off-by: Krzysztof Gibala <krzysztof.gibala@intel.com>
Enable for zebin format but not CM kernels.
Use heuristic of simdSize == 1 to detect CM kernels.
Related-To: NEO-7712
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
The memadvise with preferred location for kmd-migrated shared allocation
is set to device associated with cmd list by default to migrate data
to lmem on non-atomic gpu page fault as well (for performance reasons).
Related-To: NEO-7252
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Extract distinct steps as dedicated functions, especially when the code
is duplicated. This eases analysis of the logic and highlights
differences between callers of a common code.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Single command list object can be passed multiple times to the execution
command list.
Not all command list instances might require dynamic preamble, as it depends
what state is before particular command list instance.
Correctly assign the particular instance of command list to state transition.
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Global init flag is useful only for once per context initialization.
Correctly set the flag can save the visits to these once per context
calls.
Debugger programming is active not only when queue type allows it,
but also when commands state is dirty and debugger class available.
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
We want to report unrecognized device only after verifying
all possible variants, including deprecated ones.
Signed-off-by: Daria Hinz <daria.hinz@intel.com>
Related-To: NEO-7903
- initialization of FileLogger always removed log file - this change only
removes old file when logging is enabled in current run
Resolves: NEO-7199
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Unify fp16/fp32/fp64 across all platforms. The capabilities indicated by
those flags now refer to both emulated and native-supported (HW) ones:
- Global/local atomic load: HW support on all platforms (handled by native
i16 atomic or) for FP16, FP32 and FP64.
- Global/local atomic store: HW support on all platforms (handled by
native i16 atomic exchange) for FP16, FP32 and FP64.
- Global/local atomic compare/exchange: HW support on all platforms
for FP32.
- Global/local atomic min/max: Emulation support on all platforms for
FP64, HW support on all platforms for FP32, HW support on XE+ platforms
and emulation support on all others for FP16.
- Global atomic add: HW support for PVC+ platforms, emulation support on
all other platforms for FP64, HW support on XE+ platforms and emulation
support on all other platforms for FP32.
- Local atomic add: Emulation on all platforms for both FP64 and FP32.
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Related-To: NEO-7734
- caps check is not needed when link engines are not available for
product
Related-To: NEO-7886
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
Ocloc supports passing hw ip version value to -device arg in
the form of major.minor.revision.
This change adds support for directly passed value as uint32_t as well.
Support added for single and fat binary.
Signed-off-by: Daria Hinz <daria.hinz@intel.com>
Related-To: NEO-7903
Immediate command list can use internal command queue.
Immediate command list then uses variable start offset and it does not
work with primary batch buffer.
Related-To: NEO-7807
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Before state transition was done twice, 1st time for estimation, 2nd time for
dispatch.
Now state transitions only during estimation and required state is saved then.
Commands are dispatched only when command list and property are marked to
dispatch.
During regular workload submission transition is performed only once and it
should be benefitial to reduce host overhead.
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
use GPU address from gpu allocation instead of CPU allocation
check page fault manager presence before migrating to GPU domain
Related-To: NEO-7690
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- Do not wait for GPU completion on pool exhaust if allocs are in use,
allocate new pool instead
- Free small buffer address range if allocs are not in use and
buffer pool is exhausted
Resolves: NEO-7769, NEO-7836
Signed-off-by: Igor Venevtsev <igor.venevtsev@intel.com>
The memadvise with preferred location for kmd-migrated shared allocation
is set to device associated with cmd list by default to migrate data
to lmem on non-atomic gpu page fault too (for performance reasons).
Related-To: NEO-7252
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Add "DumpZEBin" debug flag. When this flag is enabled, Zebin will be
dumped to a .elf file (with appropiate suffix, in case such file has
been dumped before).
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Related-To: NEO-7895
Add mechanism to increase direct submission timeout up to a maximum
value when no new submissions were made since last sleep.
This should help in workloads that have delays between iterations larger
than current direct submission controller timeout.
Related-To: NEO-7878
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
ensure default hw ip version matches the value from helper
change pvc ult execution to revision 3
Related-To: NEO-7738
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>