Extended the regkey ForceMemoryPrefetchForKmdMigratedSharedAllocations
to force meory prefetch of kmd-migrated shared allocation
in clEnqueueNDRangeKernel(), clEnqueueMemFillINTEL, ...
Related-To: NEO-7841
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Related-To: LOCI-4174
- Call zelSetDriverTeardown during L0 Driver teardown to prevent users
from calling into destroyed functions and encountering crashes
during teardown.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
This fix is most important for multi command list execution use cases.
It is also benefitial for single command list execution, as driver saves
on loop enters and exits.
Methods handling single command list instead of array of objects are simpler.
Removed loops were at:
- CommandListExecutionContext constructor
- estimateLinearStreamSizeInitial method
- computePreemptionSize method
- collectPrintfContentsFromAllCommandsLists method
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
This change adds space reservation in command list for returning batch buffer
start hw command.
Primary batch buffer can be run from direct submission or from KMD call and
must be aligned to required size.
Ending patch for batch buffer start must be in the last command buffer of the
command list.
Related-To: NEO-7807
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Plenty of calls require hw command programming only once per context.
There is no need to visit every method of them every execute call.
Set global init flag only if any of them is true and then visit all of them.
But for regular command list execution it can save time when there is single
global check.
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
This is number of small tweaks to state cache invalidation:
1. Invalidate if heap was actually created.
2. Check if os context was actually initialized.
3. Heap allocation was actually submitted, as it might attain zero task count
value, when allocation is stored in csr internal storage, as csr wasn't used,
but the csr task count being zero is assigned to heap allocation when stored.
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
There is no need to reset all fields and load support flags every reset call.
Add dedicated calls that will reset values and dirty flags.
Call virtual methods only once at init time.
Related-To: NEO-7828
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
total cache size calculation should use subdevice count
rather than MultiTileArchInfo to ensure correct size when subdevices
are queried
Related-to: LOCI-4217
Signed-off-by: Brandon Yates <brandon.yates@intel.com>
- 3D btd command should be programed only once per context
- Add conditional pipe control command prior dispatching 3D btd command
- share 3D btd state between immediate and regular command lists
- add pipe control after ray tracing kernel to invalidate state cache
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Add the regkey ForceMemoryPrefetchForKmdMigratedSharedAllocations
to force meory prefetch of kmd-migrated shared allocation
in zeCommandQueueExecuteCommandLists().
Related-To: NEO-7841
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Include an offset for the appendMemoryCopyBlitRegion
parameters when an source/destination pointer is offseted within
allocation.
Lack of this offset could result in an invalid pointer when source and
destination are on the same allocation or they do not point the start of
allocation.
Related-To: NEO-7694
Signed-off-by: Naklicki, Mateusz <mateusz.naklicki@intel.com>
Changed -cl-intel-allow-zebin to -cl-intel-enable-zebin only for
API options.
Related-To: NEO-7801
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
This fix will allow save memory and do not allocate private heaps
when global heaps are enabled.
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
When state base address tracking is enabled and command list use private heaps
then command list at destroy time must calls all compute CSRs that were using
that heap to invalidate state caches.
This allows new command list to reuse the same heap allocation for different
surface states, so before new use cached states are invalidated.
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Allows the user to use alignments > 64KB in `createUnifiedMemoryAllocation`
So that the restriction in `piextUSMDeviceAlloc` of the DPC++ runtime
could be lifted
Related-To: LOCI-4168
Signed-off-by: Lu, Wenbin <wenbin.lu@intel.com>
Enables P2P Copy support for all Image API related calls:
- zeCommandListAppendImageCopy
- zeCommandListAppendImageCopyRegion
- zeCommandListAppendImageCopyToMemory
- zeCommandListAppendImageCopyFromMemory
Related-To: LOCI-4112
Signed-off-by: Raiyan Latif <raiyan.latif@intel.com>
It is possible that a module has so many kernels that the 4GB limit of
GPU VA is depleted when each kernel allocates a 64 KB page for its own
ISA. In such case, propagate the ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY to
the API caller to indicate the actual problem.
Currently such scenario is not detected, the execution advances a bit
further and the following crashes do not let the user to easily
understand what happened.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
After resolving NEO-7684 in turns out that `zeKernelGetProperties`
is still returning wrong value for `maxNumSubgroups` since it
did not take into account `LargeGRF & SIMD` limitation.
Related-To: NEO-7829
Signed-off-by: Krzysztof Gibala <krzysztof.gibala@intel.com>
`ZE_extension_image_view` and `ZE_extension_image_view_planar`
should be reported by NEO, and `ZE_STRUCTURE_TYPE_IMAGE_VIEW_PLANAR_EXT_DESC`
needs to be recognized
Related-to: LOCI-3769
Signed-off-by: Lu, Wenbin <wenbin.lu@intel.com>
- reduce number of dummy blits where are not needed
- track if dummy blit required in cmdlist
Related-To: NEO-7450
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
This change allows to set DebuggingMode via
ZET_ENABLE_PROGRAM_DEBUGGING env var
0: Disabled
1: Online
2: Offline
Related-To: NEO-7630
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
- share same code between csr and cmd container to get default heap size
- share handling of debug flag to change heap size
- share platform level surface heap size between csr and command list
- refactor heap size files
- put heap size constant and function into namespace
- command list surface heap size increased to 2MB for xehp+ to match csr
- command list increased surface heap size only for sba tracking
- sba tracking heap consumption increased due to different reset policy
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- state base address tracking allows to reuse base address state
- surface state slots can be reused after sba reload or cache flush
- to avoid cache flush after each reset, then allow to gradualy consume heaps
- only until natural heap depletion and then dispatch reload of sba state
Related-To : NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>