Commit Graph

2566 Commits

Author SHA1 Message Date
Diedrich, Kamil
26ca64bb28 Add process safety to cl_cache on Linux
Current flow will be to have one synchronization point
config.file. Read remains unblocking, only write(caching)
operation will be blocking (lock on config.file)

Related-To: NEO-4262

Signed-off-by: Diedrich, Kamil <kamil.diedrich@intel.com>
2023-04-25 17:35:40 +02:00
Zbigniew Zdanowicz
c0fcdef03e performance: remove not needed estimation
Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-25 14:47:43 +02:00
Lukasz Jobczyk
1e33d00676 Add early return from isAppendSplitNeeded if size too small
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2023-04-25 14:40:08 +02:00
Zbigniew Zdanowicz
f451207372 performance: dispatch and chain command list batch buffers as primary
Command list batch buffers should be chained when no dynamic or global preamble
is present in command queue.
Return to command queue, when preamble is required.
Chain last command list to the command queue epilog.
Provide first command list batch buffer to KMD/ULLS when no command queue
preamble.

Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-25 13:24:11 +02:00
Fabian Zwolinski
e2e00413a8 Apply CamelCase for class and struct names
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-04-24 15:36:27 +02:00
Milczarek, Slawomir
bf778be99e [fix] zeCommandListAppendMemAdvise to clear preferred location correctly
The memadvise with preferred location for kmd-migrated shared allocation
is set to device associated with cmd list by default to migrate data
to lmem on non-atomic gpu page fault as well (for performance reasons).

Related-To: NEO-7252

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-04-24 14:51:49 +02:00
Zbigniew Zdanowicz
09ef0201c6 fix: correctly assign state transition when same command list executed twice
Single command list object can be passed multiple times to the execution
command list.
Not all command list instances might require dynamic preamble, as it depends
what state is before particular command list instance.
Correctly assign the particular instance of command list to state transition.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-24 11:38:44 +02:00
Zbigniew Zdanowicz
6c6cf9dd0c performance: correct setting global init for debugger scenarios
Global init flag is useful only for once per context initialization.
Correctly set the flag can save the visits to these once per context
calls.
Debugger programming is active not only when queue type allows it,
but also when commands state is dirty and debugger class available.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-24 09:42:32 +02:00
Kacper Nowak
83e9a148ca Fix: Unify fp16/fp32/fp64 flags across all platforms
Unify fp16/fp32/fp64 across all platforms. The capabilities indicated by
those flags now refer to both emulated and native-supported (HW) ones:

- Global/local atomic load: HW support on all platforms (handled by native
i16 atomic or) for FP16, FP32 and FP64.
- Global/local atomic store: HW support on all platforms (handled by
native i16 atomic exchange) for FP16, FP32 and FP64.
- Global/local atomic compare/exchange: HW support on all platforms
for FP32.
- Global/local atomic min/max: Emulation support on all platforms for
FP64, HW support on all platforms for FP32, HW support on XE+ platforms
and emulation support on all others for FP16.
- Global atomic add: HW support for PVC+ platforms, emulation support on
all other platforms for FP64, HW support on XE+ platforms and emulation
support on all other platforms for FP32.
- Local atomic add: Emulation on all platforms for both FP64 and FP32.

Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Related-To: NEO-7734
2023-04-20 15:03:39 +02:00
John Falkowski
bf88e4ef08 fix queryTimestampsExp to remove check for subdevice and incorrect packetId
Signed-off-by: John Falkowski <john.falkowski@intel.com>
2023-04-20 08:24:33 +02:00
Compute-Runtime-Validation
ca51e557a2 Revert "Remove default support for DCD"
This reverts commit a3e923e359.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-04-20 04:41:33 +02:00
Mateusz Hoppe
da6cb648b1 test: print command queue properties in verbose mode
- in zello_world_gpu

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2023-04-19 19:51:37 +02:00
Zbigniew Zdanowicz
669665deff performance: primary batch buffer use only on regular command lists
Immediate command list can use internal command queue.
Immediate command list then uses variable start offset and it does not
work with primary batch buffer.

Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-19 19:36:51 +02:00
Fabian Zwolinski
a3e923e359 Remove default support for DCD
Related-To: NEO-7213
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-04-19 19:18:48 +02:00
Zbigniew Zdanowicz
21ac5f2835 [perf] transition hw state only once, then dispatch command when needed
Before state transition was done twice, 1st time for estimation, 2nd time for
dispatch.
Now state transitions only during estimation and required state is saved then.
Commands are dispatched only when command list and property are marked to
dispatch.
During regular workload submission transition is performed only once and it
should be benefitial to reduce host overhead.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-19 16:31:12 +02:00
Lu, Wenbin
c66546df73 Disable kernel timestamp when not using implicit scaling
Related-To: LOCI-2826

Signed-off-by: Lu, Wenbin <wenbin.lu@intel.com>
2023-04-19 12:14:17 +02:00
Kacper Nowak
c7adbc2140 Add debug key for dumping ELF to file
Add "DumpZEBin" debug flag. When this flag is enabled, Zebin will be
dumped to a .elf file (with appropiate suffix, in case such file has
been dumped before).
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Related-To: NEO-7895
2023-04-18 20:40:25 +02:00
Zbigniew Zdanowicz
4ef879867c [fix] correct fence not ready value
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-14 16:43:45 +02:00
Dominik Dabek
8d834202af feat(l0): enable cpu copy for USM D2H
Enable cpu copy for USM device to USM host transfer in level zero
immediate cmdlist.

Related-To: NEO-7553

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2023-04-14 15:33:45 +02:00
Zbigniew Zdanowicz
f5f073b9fc [perf] move validation call before lock
Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-14 10:53:46 +02:00
Zbigniew Zdanowicz
cd899871b1 [perf] tweak front end programing to remove not needed steps
1. separate front end programing when tracking is enabled and disabled, it will
limit number of conditional checks.
2. setup command list front end properties only when front end state is dirty.
3. instanced context id should be set once, as this is one time per context
property.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-13 11:43:26 +02:00
Zbigniew Zdanowicz
1a4dda57e7 [perf] reallocate residency container once for all command lists
When getting residency count for all command lists, driver is able to
reallocate container only once and not per each command list.
Add non-zero initial value for command queue residual allocations.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-13 11:28:42 +02:00
Zbigniew Zdanowicz
63eb88b819 [refactor] reposition level zero command list implementations
- group same implementation into dedicated inl files
- remove double implementations for the similiar hw generations

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-13 11:00:24 +02:00
Zbigniew Zdanowicz
c0f0472b6e test l0: add command queue tests
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-13 10:14:05 +02:00
Daria Hinz
c3f4074f0a fix: Unification of aot config with hw ip version
In the case of mtl+ platforms, the returned config value
should equal the hardware ip version value.
This change fixes situations where some config has not been
added and in this case we returned an unknown value.

Signed-off-by: Daria Hinz <daria.hinz@intel.com>
Related-To: NEO-7738
2023-04-12 18:34:03 +02:00
Zbigniew Zdanowicz
f12b11786e [feat, perf] add primary batch buffer support to front end properties update
For primary batch buffer command list driver should not use return point.
Return points are useful when batch buffers are dispatched as secondary,
for primary buffers, patching of front end command is more desirable option.

Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-12 16:18:55 +02:00
Zbigniew Zdanowicz
62ea1b1a58 [feat, perf] add primary batch buffer support to multi-tile barrier
Implicit Scaling barrier have the same requirements as kernel.
It must dispach bb start command with the same level as the command list
is dispatched.

Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-12 16:18:38 +02:00
Compute-Runtime-Validation
41ad05eb52 Revert "l0_feature: Use L0 Loader teardown callback"
This reverts commit d31b950b9a.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-04-12 06:45:46 +02:00
Milczarek, Slawomir
01d03aa5b6 Extended regkey to force prefetch of shared memory in enqueue commands
Extended the regkey ForceMemoryPrefetchForKmdMigratedSharedAllocations
to force meory prefetch of kmd-migrated shared allocation
in clEnqueueNDRangeKernel(), clEnqueueMemFillINTEL, ...

Related-To: NEO-7841

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-04-11 11:23:48 +02:00
Neil R Spruit
d31b950b9a l0_feature: Use L0 Loader teardown callback
Related-To: LOCI-4174

- Call zelSetDriverTeardown during L0 Driver teardown to prevent users
from calling into destroyed functions and encountering crashes
during teardown.

Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
2023-04-11 11:16:26 +02:00
John Falkowski
e056082710 refactor graphics allocation structure elements for sub-allocation properties
Resolves:  LOCI-3772

Signed-off-by: John Falkowski <john.falkowski@intel.com>
2023-04-07 16:53:23 +02:00
Zbigniew Zdanowicz
66c19c7749 [perf] remove redundant for loops in command list execution method
This fix is most important for multi command list execution use cases.
It is also benefitial for single command list execution, as driver saves
on loop enters and exits.
Methods handling single command list instead of array of objects are simpler.

Removed loops were at:
- CommandListExecutionContext constructor
- estimateLinearStreamSizeInitial method
- computePreemptionSize method
- collectPrintfContentsFromAllCommandsLists method

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-07 15:27:04 +02:00
Mateusz Jablonski
31f32cc16e fix implicit args: generate local ids as for grf size 32
Related-To: IGC-6936

Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-04-07 11:37:07 +02:00
Zbigniew Zdanowicz
d4109eb153 [feat, perf] add closing mechanism to command list primary batch buffers
This change adds space reservation in command list for returning batch buffer
start hw command.
Primary batch buffer can be run from direct submission or from KMD call and
must be aligned to required size.
Ending patch for batch buffer start must be in the last command buffer of the
command list.

Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-07 11:28:41 +02:00
Zbigniew Zdanowicz
1fcf564cc1 Enable state base address tracking
Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-07 11:22:24 +02:00
Zbigniew Zdanowicz
09b58f4a22 [perf] group once per context calls under single condition
Plenty of calls require hw command programming only once per context.
There is no need to visit every method of them every execute call.
Set global init flag only if any of them is true and then visit all of them.
But for regular command list execution it can save time when there is single
global check.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-07 09:21:28 +02:00
Zbigniew Zdanowicz
9ce5351d3f [fix] invalidate state caches only for heaps used by initialized context
This is number of small tweaks to state cache invalidation:
1. Invalidate if heap was actually created.
2. Check if os context was actually initialized.
3. Heap allocation was actually submitted, as it might attain zero task count
value, when allocation is stored in csr internal storage, as csr wasn't used,
but the csr task count being zero is assigned to heap allocation when stored.

Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-07 09:16:12 +02:00
Matias Cabral
97bc43d18b Use correct timer resolution in zello_timestamp
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2023-04-06 14:34:30 +02:00
Katarzyna Cencelewska
1dc3b9b1e1 fix: add missing call make resident for dummy blit allocation
Resolves: HSD-18028732286
Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>
2023-04-06 09:22:53 +02:00
Zbigniew Zdanowicz
e695059152 [perf] reduce host overhead in command list reset call
There is no need to reset all fields and load support flags every reset call.
Add dedicated calls that will reset values and dirty flags.
Call virtual methods only once at init time.

Related-To: NEO-7828

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-05 11:29:39 +02:00
Mateusz Hoppe
d83684785c feature: add dumping debug elf file
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2023-04-04 18:28:16 +02:00
Compute-Runtime-Validation
e1af516c25 Revert "Enable state base address tracking"
This reverts commit 6a08d29869.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-04-04 11:37:19 +02:00
Dominik Dabek
1c52017ceb fix: use correct allocation type in program init
Globals surface allocation via USM manager will have correct allocation
type set (instead of just BUFFER) and will use cpu copy when possible.

Related-To: NEO-7796

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2023-04-04 11:31:11 +02:00
Zbigniew Zdanowicz
a5179aae0b [perf] add debug key and control variable to command list primary buffer
Related-To: NEO-7807

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-04 10:58:11 +02:00
Brandon Yates
7dc4fd8dda fix(l0): Fix cache properties for subdevice
total cache size calculation should use subdevice count
rather than MultiTileArchInfo to ensure correct size when subdevices
are queried

Related-to: LOCI-4217

Signed-off-by: Brandon Yates <brandon.yates@intel.com>
2023-04-04 08:05:09 +02:00
Zbigniew Zdanowicz
6a08d29869 Enable state base address tracking
Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-03 15:26:09 +02:00
Zbigniew Zdanowicz
7731264fe3 [fix] update ray tracing commands programing
- 3D btd command should be programed only once per context
- Add conditional pipe control command prior dispatching 3D btd command
- share 3D btd state between immediate and regular command lists
- add pipe control after ray tracing kernel to invalidate state cache

Related-To: NEO-5055

Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
2023-04-03 11:21:24 +02:00
Milczarek, Slawomir
50da94dc56 Add regkey to force prefetch of shared memory in cmd list execute
Add the regkey ForceMemoryPrefetchForKmdMigratedSharedAllocations
to force meory prefetch of kmd-migrated shared allocation
in zeCommandQueueExecuteCommandLists().

Related-To: NEO-7841

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-04-03 11:14:18 +02:00
Naklicki, Mateusz
d9be9b32f7 fix: add offset when alloc address does not match src/dst address for blit cmd
Include an offset for the appendMemoryCopyBlitRegion
parameters when an source/destination pointer is offseted within
allocation.
Lack of this offset could result in an invalid pointer when source and
destination are on the same allocation or they do not point the start of
allocation.

Related-To: NEO-7694
Signed-off-by: Naklicki, Mateusz <mateusz.naklicki@intel.com>
2023-04-03 10:01:11 +02:00
Fabian Zwolinski
c0603e0854 Allocate SipKernel per ctx for Offline dbg mode
- Add debuggingEnabledMode getter in ExecutionEnvironment
- Add new overloaded function - BuiltIns::getSipKernel
- Add perContextSipKernels map to BuiltIns
- Add OsContext to PreemptionHelper::programStateSip arguments
- Add new overloaded function - SipKernel::getBindlessDebugSipKernel

Related-To: NEO-7630
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-03-30 16:40:41 +02:00