Add flag for setting localPreferred (implicit when gmm localOnly=0 and
NonLocalOnly=0) when allocating buffer, svmGpu and image.
Related-To: NEO-9695
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Enable programming pat indexes on mtl linux for device buffers.
Change DrmMemoryManager::allocateMemoryByKMD to use gemCreateExt.
Set mmap flags based on coherency.
Map as write back on legacy and coherent.
On non-coherent map as write combined.
Changes currently disabled, to enable use debug keys:
DisableGemCreateExtSetPat=0
UseGemCreateExtInAllocateMemoryByKMD=1
Reorder BufferObject to decrease padding.
Related-To: NEO-7896
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Enable programming pat indexes on mtl linux for device buffers.
Change DrmMemoryManager::allocateMemoryByKMD to use gemCreateExt.
Related-To: NEO-7896
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Added OverrideImmediateCmdListSynchronousMode to override synchronous
mode for immediate command list
Related-To: NEO-10316
Signed-off-by: Yoon, Young Jin <young.jin.yoon@intel.com>
Query system total memory size and limit usm host allocation recycle to
use at most x%.
x is read from ExperimentalEnableDeviceAllocationCache for device and
ExperimentalEnableHostAllocationCache for host.
Related-To: GSD-7497
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
for cached recouces: OverridePatIndexForCachedTypes
for uncached resouces: OverridePatIndexForUncachedTypes
Related-To: NEO-10157
Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>
Modified ioctl_helper_prelim to support the extension of gem_create_ext,
i.e. prelim_drm_i915_gem_create_ext_mempolicy.
Added two debug variables to be used for the mempolicy extension.
Modified functions in memory_info and drm_memory_manager to support extension
Added numaif.h from https://github.com/numactl/numactl/tree/master,
v2.0.14
Related-To: NEO-8276
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
EnableDeviceUsmAllocationPool and EnableHostUsmAllocationPool for device
and host allocations respectively.
Pool size will be set to flag value * MB.
Allocation size threshold to be pooled is 1MB.
Pools are created per context.
Related-To: NEO-9700
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Align to the new PAT and cache coherency support
There is an issue with coherency=non_coh, which
is default option for some platforms.
Add temporary W/A until this issue is resolved.
xe_drm.h header is generated from the series
"PAT and cache coherency support"
from https://patchwork.freedesktop.org/series/123027/
Related-To: NEO-9421, NEO-8324
Signed-off-by: Naklicki, Mateusz <mateusz.naklicki@intel.com>
Align to the new PAT and cache coherency support
xe_drm.h header is generated from the series
"PAT and cache coherency support"
from https://patchwork.freedesktop.org/series/123027/
Related-To: NEO-9421, NEO-8324
Signed-off-by: Naklicki, Mateusz <mateusz.naklicki@intel.com>
Temporarily opt-out from additional compatibility checks
on DG2 and MTL for Blender and its derivatives AOT-compiled kernels.
This prevents a long kernel recompilation.
Additionally, same behavior can be enforced for other applications
manually via NEO debug key named DoNotUseProductConfigForValidationWa.
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Related-To: NEO-9240
Device allocation chunking only applies for multi-tile mode for implicit scaling
Related-To: NEO-9051
Signed-off-by: John Falkowski <john.falkowski@intel.com>
Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
Related-To: NEO-6989
-Prevent imbalance in multi dimensional dispatches
-Make sure to utilize as much Eus as possible
-Prefer highest possible tg dspatch count possible
-Make sure that xe_core doesn't have uneven workgroups
Add mechanism to preallocate cmd buffer allocations in command stream
receiver reusable allocations list per command queue initialized.
This should limit additional allocations during hot loop.
Needs to be enabled in subsequent commits by setting product helper
method.
Related-To: NEO-8152
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Temporarily opt-out from additional compatibility checks
on DG2 for Blender AOT-compiled kernels.
This prevents a long kernel recompilation.
Additionally, same behavior can be enforced for other applications
manually via NEO debug key named DoNotUseProductConfigForValidationWa.
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Related-To: NEO-9240
Use debug flag PrintKernelDispatchParameters to print params used in
thread group dispatch size heuristic when encoding kernel dispatch.
Related-To: NEO-6989
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Added getDefaultDeviceHierarchy call that describes default device
hierarchy for a gfx core. Refactored L0 and OCL paths to use this
value by default and override this value when user sets
ZE_FLAT_DEVICE_HIERARCHY environment variable or
ReturnSubDevicesAsApiDevices debug key.
Updated ReturnSubDevicesAsApiDevices to force COMPOSITE device hierarchy
when set to 0.
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
Program barrier to task stream, before next enqueue kernel.
This will reduce the number of batch buffer starts for sequences of
enqueue, barrier, enqueue, ... .
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
This patch avoids returning error for system addresses in setArg
Related-To: GSD-3597
Signed-off-by: Joshua Santosh Ranjan <joshua.santosh.ranjan@intel.com>
Add USER_FENCE before PREFETCH call and after the BIND
Related-To: NEO-8098
Signed-off by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Signed-off-by: John Falkowski <john.falkowski@intel.com>
Add debug flag ProgramBarrierInCommandStreamTask to program barrier
pipe control in task command stream instead of csr command stream.
This will reduce the number of batch buffer starts.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
This change disables CPU caching for resources
not accessed by CPU for MTL devices.
Related-To: NEO-7194
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
When upstream ioctl helper is created it will try to create small
allocation, adding I915_GEM_CREATE_EXT_SET_PAT extension. If it
succeeds, for all resources with valid pat index value it will then
explicitly program pat index value with gem create ext call.
PrintBOCreateDestroyResult value can be used to:
- print whether the set pat extension is supported by the kernel, when
ioctl helper is created
- print whether set pat extension was added for a given gem create ext
call and what pat index value was programmed
Note: introduced changes are disabled by defualt.
Toggle DisableGemCreateExtSetPat can be used to enable new functionality.
Related-To: NEO-7896
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
This would avoid recalculating reference timestamps
when event is used with different command lists.
Related-To: LOCI-4563
Signed-off-by: Joshua Santosh Ranjan <joshua.santosh.ranjan@intel.com>
By default prefer allocating memory first by KMD, instead of malloc first.
By default prefer not caching allocations on MTL devices. This results
in allocations being handled with non-coherent pat index.
For integrated devices when caching is not preferred do not allow
direct memory access in CPU domain. For map/unmap operations create
a dedicated memory allocation for CPU access, instead of accessing it
directly, reusing the same logic as when mapping/unmapping local memory.
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
If waitForBarrier is not passed outEvent then do
dcFlush on the next synchronize call.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Related-to: NEO-7695
New debug keys added:
EnableBOChunking is now a mask
0 = no chunking (default).
1 = shared allocations only
2 = device allocations only
3 = shared and device allocations
MinimalAllocationSizeForChunking sets the minimum allocation
size to apply chunking. Default is 2MB.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
- remove useless flag ForceNumberOfThreadsInGpgpuThreadGroup
- add new flag "RemoveRestrictionsOnNumberOfThreadsInGpgpuThreadGroup"
to restore old path without restrictions about number of threads in
thread group
- fix forwarding information about hw local ids generations to
calculate numOfThreadsInThreadGroup correctly
Related-To: NEO-7952, NEO-7982
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- by default ZE_ENABLE_PCI_ID_DEVICE_ORDER is disabled
- by default devices are sorted by type (discrete first), then by pci order
- when ZE_ENABLE_PCI_ID_DEVICE_ORDER is enabled, devices are sorted by pci id
Related-To: LOCI-4520
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Read if support for chunking is available in the KMD.
If available, KMD will create a BO with 1 or more chunks,
depending on the chunk size selected.
Related-To: NEO-7695
Sync to
https://github.com/intel-gpu/drm-uapi-helper/releases/tag/v2.0-rc18
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Signed-off-by: John Falkowski <john.falkowski@intel.com>
On Warm reset, With default bar size set by bios, VF bar
allocation is getting failed because of bug in pci driver
which impacts SRIOV functionality.
Resize VF bar size for succesful allocation of VF bar
post warm reset.
Related-To: LOCI-4481
Signed-off-by: Bellekallu Rajkiran <bellekallu.rajkiran@intel.com>
- new debug key EnableDeviceStateVerification to check device state not
ony in debug mode
Related-To: NEO-7669
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
when flag disabled, gmm flag Cacheable won't set on xe_hp and later
Related-To: NEO-7194
Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>
- add debug flag EnableCpuCacheForResources to be able to allow coherency when
resources could be cacheable
Resolves: NEO-7194
Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>
Add debug variable to set sleep duration for HBM
IFR to complete
Related-To: LOCI-4298
Signed-off-by: Bellekallu Rajkiran <bellekallu.rajkiran@intel.com>
- set by default flag ZebinIgnoreIcbeVersion to true
- for zebin icbe version check is only inside flag
- only when use patchtoken then check icbe version is mandatory
Resolves: NEO-7904
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
Add "DumpZEBin" debug flag. When this flag is enabled, Zebin will be
dumped to a .elf file (with appropiate suffix, in case such file has
been dumped before).
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Related-To: NEO-7895
Add mechanism to increase direct submission timeout up to a maximum
value when no new submissions were made since last sleep.
This should help in workloads that have delays between iterations larger
than current direct submission controller timeout.
Related-To: NEO-7878
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Add debug key LogZEInfo for logging ZE Info from zebin elf.
ZE Info will be dumped to a file (default igdrcl.log)
Related-To: NEO-7895
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Add new regkey KMDSupportForCrossTileMigrationPolicy
(disabled by default, in absence of KMD suppport for cross-tile migrations)
to control placement of shared allocation and memory prefetch behavior.
Related-To: NEO-7885
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Add the regkey ForceMemoryPrefetchForKmdMigratedSharedAllocations
to force meory prefetch of kmd-migrated shared allocation
in zeCommandQueueExecuteCommandLists().
Related-To: NEO-7841
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Apply the KMD advise with preferred device location for KMD-migrated
shared allocation to migrate to lmem on every GPU page fault
(default KMD migration policy).
Related-To: NEO-7851
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Do not make indirect allocations resident if kernel does not use
indirect access.
For both level zero and opencl.
Currently disabled by default, enable with debug flag
DetectIndirectAccessInKernel
Related-To: NEO-7712
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Related-To: LOCI-3884
- Added check for valid device properties stype to remove the feature
specific debug vars that enabled/disabled reading of the pNext.
- Requires applications to properly set the device properties stype
in order for the pNext to be read for extensions.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
If applications call Prefetch APIs, like
zeCommandListAppendMemoryPrefetch and
clEnqueueMigrateMemINTEL, then enable the use of KMD calls
by default.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
to guarantee that all subblt got complete for previous copy
affect xe hpg
Related-To: NEO-7450
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
Current support in the stack does not allow for concurrent access to
shared-allocations from host and peer devices when using page-faults.
So disable caps for now and introduce debug key for experimentation.
Access will be added by default as support in the stack becomes
available.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
This change introduces an extension to query the device IP version for L0,
which corresponds to the PRODUCT_CONFIG value.
For OCL, the old mechanism is maintained with a debug flag,
and the default behavior has been unified with L0.
Signed-off-by: Daria Hinz <daria.hinz@intel.com>
Related-To: NEO-7735
- both drivers: OpenCL and LevelZero cannot be debugged within single
process
Related-To: NEO-7025
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
When env variable is set, then copies are always done on CPU.
Change the logic of CPU copy to make sure we lock if targeting device memory.
Related-To: NEO-7564
Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
All problems with single address space mode have
been resolved and this Debug Key is no longer needed.
Related-to: NEO-7191
Signed-off-by: Yates, Brandon <brandon.yates@intel.com>
PVC platform with no support for atomic operations on system memory
must always allocate buffers in local memory to avoid atomic access violation.
Note: the feature is being implemented under the new registry key
AllocateBuffersInLocalMemoryForMultiRootDeviceContexts (disabled by default)
Related-To: NEO-7092
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Heuristic is enabled by default
to disable, set:
AdjustThreadGroupDispatchSize=0
Related-To: NEO-6989
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Improves performance in workloads that create small opencl buffers.
To enable, set env var ExperimentalSmallBufferPoolAllocator=1
Known issues (will be addressed in further commits):
- cannot create subBuffer from such buffer
- pool buffer allocation should be reused
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- Added Support for reading the Device LUID of the given device used in
Windows WDDM given EnableL0ReadLUIDExtension=1.
- Added inital support for passing back the NodeMask of 1.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
This change is intended to be used in immediate command lists that are
using flush task functionality.
With this change all immediate command list using the same csr will consume
shared allocations for dsh and ssh heaps. This will decrease number of SBA
commands dispatched when multiple command lists coexists and dispatch kernels.
With this change new SBA command should be dispatched only when current heap
allocation is exhausted.
Functionality is currently disabled and available under debug key.
Functionality will be enabled by default for all immediate command lists
with flush task functionality enabled.
Related-To: NEO-7142
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Improve performance by binding the command buffer together with other
allocations if VM_BIND feature is available. Remove the legacy
flag PassBoundBOToExec from DebugManager to simplify the logic.
Adapt unit tests and reuse handy macros to generate proxy mock-methods.
Related-To: NEO-7348
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
use cpu copy with locked pointer if possible
because this is faster than copy on gpu
limit to buffers of size at most 64kb
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Related-To: NEO-7237
Enable copy on cpu by default.
This commit also changes barrierCounter to bool
barrierCalled
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
This change reflects exact nature of debug variable and what is code
actually doing
Related-To: NEO-7187
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- enable tile attach mode by default
- both root device and subdevice may be attached to
Related-To: NEO-7347
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Related-To: NEO-7237
If size is small enough, it is more efficient to
perform copy through locked ptr on CPU.
This change also introduces experimental flag to
enable this.
Signed-off-by: Szymon Morek <szymon.morek@intel.com>