Commit Graph

1932 Commits

Author SHA1 Message Date
Dunajski, Bartosz
23c08f4bca feature: Experimental support of immediate cmd list in-order execution [4/n]
Related-To: LOCI-4332

- Simplify CmdList-Event dependency
- Add waiting on in-order dependency
- Prepare Event for in-order synchronization
- Adjust downloading sync allocation in TBX mode

Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
2023-05-08 13:28:10 +02:00
Raiyan Latif
609265a0af fix: Free Peer Allocations in Virtual Memory Path
Related-To: LOCI-4359

Signed-off-by: Raiyan Latif <raiyan.latif@intel.com>
2023-05-03 01:15:18 +02:00
Neil R Spruit
102c38fc34 feature: Use L0 Loader teardown callback
Related-To: LOCI-4174

- Call zelSetDriverTeardown during L0 Driver teardown to prevent users
from calling into destroyed functions and encountering crashes
during teardown.

Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
2023-05-02 19:42:06 +02:00
Lu, Wenbin
5d653c8536 fix: Add alignment support to createUnifiedMemoryAllocation
Allows the user to use alignments > 64KB in `createUnifiedMemoryAllocation`

So that the restriction in `piextUSMDeviceAlloc` of the DPC++ runtime
could be lifted

Related-To: LOCI-4168

Signed-off-by: Lu, Wenbin <wenbin.lu@intel.com>
2023-05-02 09:19:23 +02:00
Neil R Spruit
7014ddefc2 fix: Set IPC type in ipcData explicitly
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
2023-05-01 19:34:17 +02:00
Diedrich, Kamil
5149d74141 refactor: Remove globaly enabled cl_cache
Current behaviour will be detecd path existence

Related-To: NEO-4262

Signed-off-by: Diedrich, Kamil <kamil.diedrich@intel.com>
2023-04-28 23:28:49 +02:00
Dunajski, Bartosz
ef10c98497 feature: Experimental support of immediate cmd list in-order execution [3/n]
New allocation to track dependencies counter

Related-To: LOCI-4332

Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
2023-04-28 17:51:10 +02:00
Zbigniew Zdanowicz
b6b331fbe2 fix: update unit tests for command list primary batch buffer
Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-28 17:15:21 +02:00
Dunajski, Bartosz
1dcab07300 fix: Call RelaxedOrdering regs init before in-order dependencies
Related-To: LOCI-4332

Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
2023-04-28 17:01:41 +02:00
Zbigniew Zdanowicz
7b0283e810 performance: allocate states vector together with command list
Allocating vector backing storage on stack makes it allocated
together with the whole command list object.
So no second use of heap for the state changes vector data.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-28 16:11:51 +02:00
Fabian Zwolinski
cbce863dc2 refactor: Rename member variables to camelCase 3/n
Additionally enable clang-tidy check for member variables

Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-04-28 16:01:14 +02:00
Lukasz Jobczyk
ff10e400c8 performance: Enable split for non-usm host
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2023-04-28 14:41:33 +02:00
Zbigniew Zdanowicz
fd77b10c6f performance: lower the number of expected state changes in single exec call
State changes are kept in vector that is reserved for 32 state changes in
single execute call. It can be useful when multiple commands are executed
at once.
More workload use single or few command lists and so creation time of command
queue could be more benefitial.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-28 13:10:11 +02:00
Lukasz Jobczyk
48114e5423 fix: Release temporary allocations from bcs split
Related-To: NEO-7933

Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2023-04-28 12:51:07 +02:00
Dominik Dabek
2a262b22e1 performance: initialize cpu copy enabled bool once
In immediate cmdlist, initialize copyThroughLockedPtrEnabled at creation
once, instead of querying helper each mem copy.

Related-To: NEO-7796

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2023-04-28 11:56:46 +02:00
Cencelewska, Katarzyna
861ec524c6 fix: check icbe version only once when patchtoken
- set by default flag ZebinIgnoreIcbeVersion to true
- for zebin icbe version check is only inside flag
- only when use patchtoken then check icbe version is mandatory

Resolves: NEO-7904
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
2023-04-28 09:26:02 +02:00
Fabian Zwolinski
e351a90f81 refactor: Rename member variables to camelCase 2/n
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-04-27 20:39:22 +02:00
Dunajski, Bartosz
75827b66c6 feature: Experimental support of immediate cmd list in-order execution [2/n]
- appendWaitOnEvents for previous dispatch
- update RelaxedOrdering logic
- update Event::setIsCompleted logic to reset already completed Event

Related-To: LOCI-4332

Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
2023-04-27 11:14:37 +02:00
Mateusz Jablonski
32d8a3bc6d fix: store registered engines per root device
in most cases we need to iterate over engines associated to single root device

Related-To: NEO-7925
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-04-27 10:54:07 +02:00
Spruit, Neil R
364c2da9fb feature: Add Support for zeMemPutIpcHandle & zeMemGet IPC Handle converters
Related-To: LOCI-4172, LOCI-4305, LOCI-4306

- Create a new IPC Memory handle upon call to getIpcMemHandle if the
previous handle has been freed.
- Release the Ipc Memory Handle when zeMemPutIpcHandle is called.
- Create a new IPC Handle for tracking thru zeMemGetAllocProperties
when ze_external_memory_export_fd_t is used.
- Convert FD to opaque IPC handle and IPC Handle to FD.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2023-04-27 03:53:52 +02:00
Aravind Gopalakrishnan
e2bbed2f06 fix: Fix get global timestamp for host
To avoid redundant call to gather cpu timestamp as
we already have that info from first ioctl call.

Related-To: LOCI-4354

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@intel.com>
2023-04-26 19:19:50 +02:00
Zbigniew Zdanowicz
5adc1816ff performance: add platform check to get default flag for primary batch buffer
Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-26 13:39:53 +02:00
Dunajski, Bartosz
14c3777409 feature: Experimental support of immediate cmd list in-order execution [1/n]
Related-To: LOCI-4332

Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
2023-04-26 13:15:59 +02:00
Diedrich, Kamil
26ca64bb28 Add process safety to cl_cache on Linux
Current flow will be to have one synchronization point
config.file. Read remains unblocking, only write(caching)
operation will be blocking (lock on config.file)

Related-To: NEO-4262

Signed-off-by: Diedrich, Kamil <kamil.diedrich@intel.com>
2023-04-25 17:35:40 +02:00
Zbigniew Zdanowicz
c0fcdef03e performance: remove not needed estimation
Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-25 14:47:43 +02:00
Lukasz Jobczyk
1e33d00676 Add early return from isAppendSplitNeeded if size too small
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2023-04-25 14:40:08 +02:00
Zbigniew Zdanowicz
f451207372 performance: dispatch and chain command list batch buffers as primary
Command list batch buffers should be chained when no dynamic or global preamble
is present in command queue.
Return to command queue, when preamble is required.
Chain last command list to the command queue epilog.
Provide first command list batch buffer to KMD/ULLS when no command queue
preamble.

Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-25 13:24:11 +02:00
Fabian Zwolinski
e2e00413a8 Apply CamelCase for class and struct names
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-04-24 15:36:27 +02:00
Milczarek, Slawomir
bf778be99e [fix] zeCommandListAppendMemAdvise to clear preferred location correctly
The memadvise with preferred location for kmd-migrated shared allocation
is set to device associated with cmd list by default to migrate data
to lmem on non-atomic gpu page fault as well (for performance reasons).

Related-To: NEO-7252

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-04-24 14:51:49 +02:00
Zbigniew Zdanowicz
09ef0201c6 fix: correctly assign state transition when same command list executed twice
Single command list object can be passed multiple times to the execution
command list.
Not all command list instances might require dynamic preamble, as it depends
what state is before particular command list instance.
Correctly assign the particular instance of command list to state transition.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-24 11:38:44 +02:00
Zbigniew Zdanowicz
6c6cf9dd0c performance: correct setting global init for debugger scenarios
Global init flag is useful only for once per context initialization.
Correctly set the flag can save the visits to these once per context
calls.
Debugger programming is active not only when queue type allows it,
but also when commands state is dirty and debugger class available.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-24 09:42:32 +02:00
John Falkowski
bf88e4ef08 fix queryTimestampsExp to remove check for subdevice and incorrect packetId
Signed-off-by: John Falkowski <john.falkowski@intel.com>
2023-04-20 08:24:33 +02:00
Zbigniew Zdanowicz
669665deff performance: primary batch buffer use only on regular command lists
Immediate command list can use internal command queue.
Immediate command list then uses variable start offset and it does not
work with primary batch buffer.

Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-19 19:36:51 +02:00
Zbigniew Zdanowicz
21ac5f2835 [perf] transition hw state only once, then dispatch command when needed
Before state transition was done twice, 1st time for estimation, 2nd time for
dispatch.
Now state transitions only during estimation and required state is saved then.
Commands are dispatched only when command list and property are marked to
dispatch.
During regular workload submission transition is performed only once and it
should be benefitial to reduce host overhead.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-19 16:31:12 +02:00
Lu, Wenbin
c66546df73 Disable kernel timestamp when not using implicit scaling
Related-To: LOCI-2826

Signed-off-by: Lu, Wenbin <wenbin.lu@intel.com>
2023-04-19 12:14:17 +02:00
Kacper Nowak
c7adbc2140 Add debug key for dumping ELF to file
Add "DumpZEBin" debug flag. When this flag is enabled, Zebin will be
dumped to a .elf file (with appropiate suffix, in case such file has
been dumped before).
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Related-To: NEO-7895
2023-04-18 20:40:25 +02:00
Zbigniew Zdanowicz
4ef879867c [fix] correct fence not ready value
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-14 16:43:45 +02:00
Dominik Dabek
8d834202af feat(l0): enable cpu copy for USM D2H
Enable cpu copy for USM device to USM host transfer in level zero
immediate cmdlist.

Related-To: NEO-7553

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2023-04-14 15:33:45 +02:00
Zbigniew Zdanowicz
f5f073b9fc [perf] move validation call before lock
Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-14 10:53:46 +02:00
Zbigniew Zdanowicz
cd899871b1 [perf] tweak front end programing to remove not needed steps
1. separate front end programing when tracking is enabled and disabled, it will
limit number of conditional checks.
2. setup command list front end properties only when front end state is dirty.
3. instanced context id should be set once, as this is one time per context
property.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-13 11:43:26 +02:00
Zbigniew Zdanowicz
1a4dda57e7 [perf] reallocate residency container once for all command lists
When getting residency count for all command lists, driver is able to
reallocate container only once and not per each command list.
Add non-zero initial value for command queue residual allocations.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-13 11:28:42 +02:00
Zbigniew Zdanowicz
63eb88b819 [refactor] reposition level zero command list implementations
- group same implementation into dedicated inl files
- remove double implementations for the similiar hw generations

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-13 11:00:24 +02:00
Daria Hinz
c3f4074f0a fix: Unification of aot config with hw ip version
In the case of mtl+ platforms, the returned config value
should equal the hardware ip version value.
This change fixes situations where some config has not been
added and in this case we returned an unknown value.

Signed-off-by: Daria Hinz <daria.hinz@intel.com>
Related-To: NEO-7738
2023-04-12 18:34:03 +02:00
Zbigniew Zdanowicz
f12b11786e [feat, perf] add primary batch buffer support to front end properties update
For primary batch buffer command list driver should not use return point.
Return points are useful when batch buffers are dispatched as secondary,
for primary buffers, patching of front end command is more desirable option.

Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-12 16:18:55 +02:00
Zbigniew Zdanowicz
62ea1b1a58 [feat, perf] add primary batch buffer support to multi-tile barrier
Implicit Scaling barrier have the same requirements as kernel.
It must dispach bb start command with the same level as the command list
is dispatched.

Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-12 16:18:38 +02:00
Compute-Runtime-Validation
41ad05eb52 Revert "l0_feature: Use L0 Loader teardown callback"
This reverts commit d31b950b9a.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-04-12 06:45:46 +02:00
Milczarek, Slawomir
01d03aa5b6 Extended regkey to force prefetch of shared memory in enqueue commands
Extended the regkey ForceMemoryPrefetchForKmdMigratedSharedAllocations
to force meory prefetch of kmd-migrated shared allocation
in clEnqueueNDRangeKernel(), clEnqueueMemFillINTEL, ...

Related-To: NEO-7841

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-04-11 11:23:48 +02:00
Neil R Spruit
d31b950b9a l0_feature: Use L0 Loader teardown callback
Related-To: LOCI-4174

- Call zelSetDriverTeardown during L0 Driver teardown to prevent users
from calling into destroyed functions and encountering crashes
during teardown.

Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
2023-04-11 11:16:26 +02:00
John Falkowski
e056082710 refactor graphics allocation structure elements for sub-allocation properties
Resolves:  LOCI-3772

Signed-off-by: John Falkowski <john.falkowski@intel.com>
2023-04-07 16:53:23 +02:00
Zbigniew Zdanowicz
66c19c7749 [perf] remove redundant for loops in command list execution method
This fix is most important for multi command list execution use cases.
It is also benefitial for single command list execution, as driver saves
on loop enters and exits.
Methods handling single command list instead of array of objects are simpler.

Removed loops were at:
- CommandListExecutionContext constructor
- estimateLinearStreamSizeInitial method
- computePreemptionSize method
- collectPrintfContentsFromAllCommandsLists method

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-07 15:27:04 +02:00