Commit Graph

341 Commits

Author SHA1 Message Date
Lukasz Jobczyk
bb86dba152 fix: add missing host ptr assignment increment
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2023-05-19 14:53:27 +02:00
Zbigniew Zdanowicz
01c20212c3 performance: limit number of copies of dirty flags and state values
Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-05-09 09:45:41 +02:00
Zbigniew Zdanowicz
7b0283e810 performance: allocate states vector together with command list
Allocating vector backing storage on stack makes it allocated
together with the whole command list object.
So no second use of heap for the state changes vector data.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-28 16:11:51 +02:00
Zbigniew Zdanowicz
fd77b10c6f performance: lower the number of expected state changes in single exec call
State changes are kept in vector that is reserved for 32 state changes in
single execute call. It can be useful when multiple commands are executed
at once.
More workload use single or few command lists and so creation time of command
queue could be more benefitial.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-28 13:10:11 +02:00
Zbigniew Zdanowicz
5adc1816ff performance: add platform check to get default flag for primary batch buffer
Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-26 13:39:53 +02:00
Zbigniew Zdanowicz
c0fcdef03e performance: remove not needed estimation
Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-25 14:47:43 +02:00
Zbigniew Zdanowicz
f451207372 performance: dispatch and chain command list batch buffers as primary
Command list batch buffers should be chained when no dynamic or global preamble
is present in command queue.
Return to command queue, when preamble is required.
Chain last command list to the command queue epilog.
Provide first command list batch buffer to KMD/ULLS when no command queue
preamble.

Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-25 13:24:11 +02:00
Zbigniew Zdanowicz
09ef0201c6 fix: correctly assign state transition when same command list executed twice
Single command list object can be passed multiple times to the execution
command list.
Not all command list instances might require dynamic preamble, as it depends
what state is before particular command list instance.
Correctly assign the particular instance of command list to state transition.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-24 11:38:44 +02:00
Zbigniew Zdanowicz
6c6cf9dd0c performance: correct setting global init for debugger scenarios
Global init flag is useful only for once per context initialization.
Correctly set the flag can save the visits to these once per context
calls.
Debugger programming is active not only when queue type allows it,
but also when commands state is dirty and debugger class available.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-24 09:42:32 +02:00
Zbigniew Zdanowicz
669665deff performance: primary batch buffer use only on regular command lists
Immediate command list can use internal command queue.
Immediate command list then uses variable start offset and it does not
work with primary batch buffer.

Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-19 19:36:51 +02:00
Zbigniew Zdanowicz
21ac5f2835 [perf] transition hw state only once, then dispatch command when needed
Before state transition was done twice, 1st time for estimation, 2nd time for
dispatch.
Now state transitions only during estimation and required state is saved then.
Commands are dispatched only when command list and property are marked to
dispatch.
During regular workload submission transition is performed only once and it
should be benefitial to reduce host overhead.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-19 16:31:12 +02:00
Zbigniew Zdanowicz
f5f073b9fc [perf] move validation call before lock
Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-14 10:53:46 +02:00
Zbigniew Zdanowicz
cd899871b1 [perf] tweak front end programing to remove not needed steps
1. separate front end programing when tracking is enabled and disabled, it will
limit number of conditional checks.
2. setup command list front end properties only when front end state is dirty.
3. instanced context id should be set once, as this is one time per context
property.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-13 11:43:26 +02:00
Zbigniew Zdanowicz
1a4dda57e7 [perf] reallocate residency container once for all command lists
When getting residency count for all command lists, driver is able to
reallocate container only once and not per each command list.
Add non-zero initial value for command queue residual allocations.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-13 11:28:42 +02:00
Milczarek, Slawomir
01d03aa5b6 Extended regkey to force prefetch of shared memory in enqueue commands
Extended the regkey ForceMemoryPrefetchForKmdMigratedSharedAllocations
to force meory prefetch of kmd-migrated shared allocation
in clEnqueueNDRangeKernel(), clEnqueueMemFillINTEL, ...

Related-To: NEO-7841

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-04-11 11:23:48 +02:00
Zbigniew Zdanowicz
66c19c7749 [perf] remove redundant for loops in command list execution method
This fix is most important for multi command list execution use cases.
It is also benefitial for single command list execution, as driver saves
on loop enters and exits.
Methods handling single command list instead of array of objects are simpler.

Removed loops were at:
- CommandListExecutionContext constructor
- estimateLinearStreamSizeInitial method
- computePreemptionSize method
- collectPrintfContentsFromAllCommandsLists method

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-07 15:27:04 +02:00
Zbigniew Zdanowicz
09b58f4a22 [perf] group once per context calls under single condition
Plenty of calls require hw command programming only once per context.
There is no need to visit every method of them every execute call.
Set global init flag only if any of them is true and then visit all of them.
But for regular command list execution it can save time when there is single
global check.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-07 09:21:28 +02:00
Zbigniew Zdanowicz
a5179aae0b [perf] add debug key and control variable to command list primary buffer
Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-04 10:58:11 +02:00
Zbigniew Zdanowicz
7731264fe3 [fix] update ray tracing commands programing
- 3D btd command should be programed only once per context
- Add conditional pipe control command prior dispatching 3D btd command
- share 3D btd state between immediate and regular command lists
- add pipe control after ray tracing kernel to invalidate state cache

Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-03 11:21:24 +02:00
Milczarek, Slawomir
50da94dc56 Add regkey to force prefetch of shared memory in cmd list execute
Add the regkey ForceMemoryPrefetchForKmdMigratedSharedAllocations
to force meory prefetch of kmd-migrated shared allocation
in zeCommandQueueExecuteCommandLists().

Related-To: NEO-7841

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-04-03 11:14:18 +02:00
Fabian Zwolinski
c0603e0854 Allocate SipKernel per ctx for Offline dbg mode
- Add debuggingEnabledMode getter in ExecutionEnvironment
- Add new overloaded function - BuiltIns::getSipKernel
- Add perContextSipKernels map to BuiltIns
- Add OsContext to PreemptionHelper::programStateSip arguments
- Add new overloaded function - SipKernel::getBindlessDebugSipKernel

Related-To: NEO-7630
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-03-30 16:40:41 +02:00
Zbigniew Zdanowicz
ef12312672 [perf] add selective properties update for one-time and multi-time properties
Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-23 15:59:50 +01:00
Zbigniew Zdanowicz
38e50007f7 [perf] simplify memory layout of command container class
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-23 13:31:47 +01:00
Cencelewska, Katarzyna
a4a296d59f wa: enable wa to add additional dummy blits after blit copy
- reduce number of dummy blits where are not needed
- track if dummy blit required in cmdlist

Related-To: NEO-7450
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
2023-03-17 10:43:00 +01:00
Mateusz Jablonski
0da5e6f277 refactor l0: cleanup cmake file level_zero/core/source/CMakeLists.txt
Related-To: NEO-7507
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-03-16 12:38:15 +01:00
Mateusz Hoppe
0204761add feature: gpu assert implementation
- allocate assert buffer when kernel has assert
- track assert kernels in cmdlists and cmdqueues
- check and print assert at sync calls: cmdqueue synchronize(), fence
synchronize(), event hostSynchronize(), synchronous imm cmdlists
append()

Related-To: NEO-5753

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2023-03-15 19:22:09 +01:00
Zbigniew Zdanowicz
86c91847cc [perf] change stream properties interfaces allowing fine grain selective update
Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-14 13:46:49 +01:00
Zbigniew Zdanowicz
c9b2138060 [perf] calculate scratch general base address once per execution
Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-10 18:01:27 +01:00
Zbigniew Zdanowicz
a365f8ae37 rename level zero file names to more appropriate
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-10 17:55:25 +01:00
Zbigniew Zdanowicz
f3324964f6 [perf] initialize stream properties only once without further check
Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-10 17:50:49 +01:00
Zbigniew Zdanowicz
24c8f089ed [perf] add state compute mode dirty flag to allow selective properties update
- full properties update is time intesive task and must be done only once
- selective update can be done after initial update
- dirty flag will allow to distinguish initial update is done

Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-10 17:27:08 +01:00
Zbigniew Zdanowicz
d93f00e075 [perf] simplify getting indirect heap memory location
Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-10 17:17:01 +01:00
Cencelewska, Katarzyna
398c7b2d29 refactor, remove typo in struct name
change name of EncodeSempahore to EncodeSemaphore
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
2023-03-10 15:44:25 +01:00
Kamil Kopryk
fa8579602f refactor: rename product helper files n/n
Related-To: NEO-7703
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
2023-03-10 13:24:38 +01:00
Zbigniew Zdanowicz
37768a15d3 Add scratch space support to global stateless heap model
Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-09 17:09:21 +01:00
Zbigniew Zdanowicz
d7d6a4040f Add fixes to state base address tracking
- debug address tracking estimation added for state base address tracking
- fix bt command estimation for private heap command lists
- set immediate command lists default one-time settings in csr
- simplify interface of estimate state base address
- set correct mode for legacy unit test expecting preamble

Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-09 11:14:04 +01:00
Cencelewska, Katarzyna
c274309d7b wa: add dummy blits before command MI_FLUSH_DW
to guarantee that all subblt got complete for previous copy
affect xe hpg

temporary changes under flag ForceDummyBlitWa

Related-To: NEO-7450

Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
2023-03-09 10:40:35 +01:00
Zbigniew Zdanowicz
f003666ad7 Add state base address transition for global stateless heap command lists
Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-08 12:32:15 +01:00
Zbigniew Zdanowicz
fd7a3c4096 Add creation of global stateless heap for the context
Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-07 15:00:34 +01:00
Zbigniew Zdanowicz
49def723b7 Unify layout of command list class
Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-07 11:38:47 +01:00
Cencelewska, Katarzyna
3e116ea378 refactor: use same paths when add command mi_semaphore_wait
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
2023-03-07 10:35:26 +01:00
Zbigniew Zdanowicz
d3c99f6414 Add level zero heap addressing enum, property and debug key
Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-01 18:28:00 +01:00
Zbigniew Zdanowicz
42b8a536db Fix redundant state base address dispatch
This fix handles scenario when regular command list uses context first,
then immediate command list is used for the first time.

Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-03-01 10:08:55 +01:00
Zbigniew Zdanowicz
34064811d2 Refactor state base address programing 4/n
- This change gets level one cache policy from cached values instead
of calling virtual methods

Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-02-27 17:30:36 +01:00
Zbigniew Zdanowicz
8d2028a986 Add tracked sba command dispatch in level zero
- When enabled, sba tracking dispatches preambleless, tracked sba commands
in command lists and command queues.
- Tracking disallows any untracked sba commands.
- Adding some tweaks to data initialization and processing.

Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-02-27 14:51:11 +01:00
Zbigniew Zdanowicz
3cb064fe95 Refactor state base address programing 3/n
This is small optimization to replace virtual call and retrieved struct with
cached value.

Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-02-23 13:08:32 +01:00
Zbigniew Zdanowicz
43a49c4486 Refactor state base address programing 2/n
This change allows to read sba data directly from sba properties

Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-02-23 12:20:25 +01:00
Mateusz Hoppe
87ea9473e4 fix: zeFenceHostSynchronize() to flush printf output
- zeFenceHostSynchronize() should flush printf output from GPU kernels

Related-To: NEO-7625

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2023-02-20 23:16:12 +01:00
Zbigniew Zdanowicz
2d6e5c2588 Fix issues in state base address properties tracking
- add correct stateless mocs state update in immediate command lists
- disallow stateless mocs dirty sba command dispatch when sba tracking enabled
- checks support first, only then do the dirty state check in csr

Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-02-17 13:38:47 +01:00
Warchulski, Jaroslaw
b485c025d0 Cleanup includes 57
Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2023-02-17 11:19:59 +01:00