Related-To: LOCI-4174
- Call zelSetDriverTeardown during L0 Driver teardown to prevent users
from calling into destroyed functions and encountering crashes
during teardown.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
Allows the user to use alignments > 64KB in `createUnifiedMemoryAllocation`
So that the restriction in `piextUSMDeviceAlloc` of the DPC++ runtime
could be lifted
Related-To: LOCI-4168
Signed-off-by: Lu, Wenbin <wenbin.lu@intel.com>
Allocating vector backing storage on stack makes it allocated
together with the whole command list object.
So no second use of heap for the state changes vector data.
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
State changes are kept in vector that is reserved for 32 state changes in
single execute call. It can be useful when multiple commands are executed
at once.
More workload use single or few command lists and so creation time of command
queue could be more benefitial.
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
In immediate cmdlist, initialize copyThroughLockedPtrEnabled at creation
once, instead of querying helper each mem copy.
Related-To: NEO-7796
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- set by default flag ZebinIgnoreIcbeVersion to true
- for zebin icbe version check is only inside flag
- only when use patchtoken then check icbe version is mandatory
Resolves: NEO-7904
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
in most cases we need to iterate over engines associated to single root device
Related-To: NEO-7925
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Related-To: LOCI-4172, LOCI-4305, LOCI-4306
- Create a new IPC Memory handle upon call to getIpcMemHandle if the
previous handle has been freed.
- Release the Ipc Memory Handle when zeMemPutIpcHandle is called.
- Create a new IPC Handle for tracking thru zeMemGetAllocProperties
when ze_external_memory_export_fd_t is used.
- Convert FD to opaque IPC handle and IPC Handle to FD.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
To avoid redundant call to gather cpu timestamp as
we already have that info from first ioctl call.
Related-To: LOCI-4354
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@intel.com>
Current flow will be to have one synchronization point
config.file. Read remains unblocking, only write(caching)
operation will be blocking (lock on config.file)
Related-To: NEO-4262
Signed-off-by: Diedrich, Kamil <kamil.diedrich@intel.com>
Command list batch buffers should be chained when no dynamic or global preamble
is present in command queue.
Return to command queue, when preamble is required.
Chain last command list to the command queue epilog.
Provide first command list batch buffer to KMD/ULLS when no command queue
preamble.
Related-To: NEO-7807
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
The memadvise with preferred location for kmd-migrated shared allocation
is set to device associated with cmd list by default to migrate data
to lmem on non-atomic gpu page fault as well (for performance reasons).
Related-To: NEO-7252
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Single command list object can be passed multiple times to the execution
command list.
Not all command list instances might require dynamic preamble, as it depends
what state is before particular command list instance.
Correctly assign the particular instance of command list to state transition.
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Global init flag is useful only for once per context initialization.
Correctly set the flag can save the visits to these once per context
calls.
Debugger programming is active not only when queue type allows it,
but also when commands state is dirty and debugger class available.
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Immediate command list can use internal command queue.
Immediate command list then uses variable start offset and it does not
work with primary batch buffer.
Related-To: NEO-7807
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Before state transition was done twice, 1st time for estimation, 2nd time for
dispatch.
Now state transitions only during estimation and required state is saved then.
Commands are dispatched only when command list and property are marked to
dispatch.
During regular workload submission transition is performed only once and it
should be benefitial to reduce host overhead.
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Add "DumpZEBin" debug flag. When this flag is enabled, Zebin will be
dumped to a .elf file (with appropiate suffix, in case such file has
been dumped before).
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Related-To: NEO-7895
Enable cpu copy for USM device to USM host transfer in level zero
immediate cmdlist.
Related-To: NEO-7553
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
1. separate front end programing when tracking is enabled and disabled, it will
limit number of conditional checks.
2. setup command list front end properties only when front end state is dirty.
3. instanced context id should be set once, as this is one time per context
property.
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
When getting residency count for all command lists, driver is able to
reallocate container only once and not per each command list.
Add non-zero initial value for command queue residual allocations.
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- group same implementation into dedicated inl files
- remove double implementations for the similiar hw generations
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
In the case of mtl+ platforms, the returned config value
should equal the hardware ip version value.
This change fixes situations where some config has not been
added and in this case we returned an unknown value.
Signed-off-by: Daria Hinz <daria.hinz@intel.com>
Related-To: NEO-7738
For primary batch buffer command list driver should not use return point.
Return points are useful when batch buffers are dispatched as secondary,
for primary buffers, patching of front end command is more desirable option.
Related-To: NEO-7807
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Implicit Scaling barrier have the same requirements as kernel.
It must dispach bb start command with the same level as the command list
is dispatched.
Related-To: NEO-7807
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Extended the regkey ForceMemoryPrefetchForKmdMigratedSharedAllocations
to force meory prefetch of kmd-migrated shared allocation
in clEnqueueNDRangeKernel(), clEnqueueMemFillINTEL, ...
Related-To: NEO-7841
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Related-To: LOCI-4174
- Call zelSetDriverTeardown during L0 Driver teardown to prevent users
from calling into destroyed functions and encountering crashes
during teardown.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
This fix is most important for multi command list execution use cases.
It is also benefitial for single command list execution, as driver saves
on loop enters and exits.
Methods handling single command list instead of array of objects are simpler.
Removed loops were at:
- CommandListExecutionContext constructor
- estimateLinearStreamSizeInitial method
- computePreemptionSize method
- collectPrintfContentsFromAllCommandsLists method
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>