Do not make indirect allocations resident if kernel does not use
indirect access.
For both level zero and opencl.
Currently disabled by default, enable with debug flag
DetectIndirectAccessInKernel
Related-To: NEO-7712
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Related-To: LOCI-3884
- Added check for valid device properties stype to remove the feature
specific debug vars that enabled/disabled reading of the pNext.
- Requires applications to properly set the device properties stype
in order for the pNext to be read for extensions.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
If applications call Prefetch APIs, like
zeCommandListAppendMemoryPrefetch and
clEnqueueMigrateMemINTEL, then enable the use of KMD calls
by default.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
to guarantee that all subblt got complete for previous copy
affect xe hpg
Related-To: NEO-7450
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
Current support in the stack does not allow for concurrent access to
shared-allocations from host and peer devices when using page-faults.
So disable caps for now and introduce debug key for experimentation.
Access will be added by default as support in the stack becomes
available.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
This change introduces an extension to query the device IP version for L0,
which corresponds to the PRODUCT_CONFIG value.
For OCL, the old mechanism is maintained with a debug flag,
and the default behavior has been unified with L0.
Signed-off-by: Daria Hinz <daria.hinz@intel.com>
Related-To: NEO-7735
- both drivers: OpenCL and LevelZero cannot be debugged within single
process
Related-To: NEO-7025
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
When env variable is set, then copies are always done on CPU.
Change the logic of CPU copy to make sure we lock if targeting device memory.
Related-To: NEO-7564
Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
All problems with single address space mode have
been resolved and this Debug Key is no longer needed.
Related-to: NEO-7191
Signed-off-by: Yates, Brandon <brandon.yates@intel.com>
PVC platform with no support for atomic operations on system memory
must always allocate buffers in local memory to avoid atomic access violation.
Note: the feature is being implemented under the new registry key
AllocateBuffersInLocalMemoryForMultiRootDeviceContexts (disabled by default)
Related-To: NEO-7092
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Heuristic is enabled by default
to disable, set:
AdjustThreadGroupDispatchSize=0
Related-To: NEO-6989
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Improves performance in workloads that create small opencl buffers.
To enable, set env var ExperimentalSmallBufferPoolAllocator=1
Known issues (will be addressed in further commits):
- cannot create subBuffer from such buffer
- pool buffer allocation should be reused
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- Added Support for reading the Device LUID of the given device used in
Windows WDDM given EnableL0ReadLUIDExtension=1.
- Added inital support for passing back the NodeMask of 1.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
This change is intended to be used in immediate command lists that are
using flush task functionality.
With this change all immediate command list using the same csr will consume
shared allocations for dsh and ssh heaps. This will decrease number of SBA
commands dispatched when multiple command lists coexists and dispatch kernels.
With this change new SBA command should be dispatched only when current heap
allocation is exhausted.
Functionality is currently disabled and available under debug key.
Functionality will be enabled by default for all immediate command lists
with flush task functionality enabled.
Related-To: NEO-7142
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Improve performance by binding the command buffer together with other
allocations if VM_BIND feature is available. Remove the legacy
flag PassBoundBOToExec from DebugManager to simplify the logic.
Adapt unit tests and reuse handy macros to generate proxy mock-methods.
Related-To: NEO-7348
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
use cpu copy with locked pointer if possible
because this is faster than copy on gpu
limit to buffers of size at most 64kb
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Related-To: NEO-7237
Enable copy on cpu by default.
This commit also changes barrierCounter to bool
barrierCalled
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
This change reflects exact nature of debug variable and what is code
actually doing
Related-To: NEO-7187
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- enable tile attach mode by default
- both root device and subdevice may be attached to
Related-To: NEO-7347
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Related-To: NEO-7237
If size is small enough, it is more efficient to
perform copy through locked ptr on CPU.
This change also introduces experimental flag to
enable this.
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
This optimization removes pipeline select from command list preamble
and presented to command queue for necessary state update.
Code is disabled by default and available under debug key.
Related-To: NEO-5019
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- Enabled default setting of Program & Global Symbols to be generated by
IGC when building L0 Modules with the ability to fallback to previous
behavior thru build failure checks.
- Enabled selective disable of default program or global symbol
generation thru debug variables.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
This change gives fine grain control over front end configuration for each
kernel.
As it gives possible to inject FE command in command queue and return to exact
place in command list.
Programming commands in queue makes patching commands in command lists
not needed as that operation is costly.
And it allows to program context information for each command list too.
Related-To: NEO-5019
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- Enabled default setting of Program & Global Symbols to be generated by
IGC when building L0 Modules with the ability to fallback to previous
behavior thru build failure checks.
- Enabled selective disable of default program or global symbol
generation thru debug variables.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
Introduce debug variable to control which engines
the tranfser will be split into
Related-To: NEO-7173
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
Certain platforms might not require prefetcher to
be disabled in direct submission. This change
provides a way to control that behaviour.
Signed-off-by: Rafal Maziejuk <rafal.maziejuk@intel.com>
Related-To: NEO-7218
optimization available under flag
ForceCsrLockInBcsEnqueueOnlyForGpgpuSubmission
Related-To: NEO-7011
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
New debug toggle disables limitation of work-group count for related queries.
Additionally OverrideMaxWorkGroupCount toggle was updated
to behave the same way, ignoring underlying engine type
when max-work group count is queried.
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
EOT WA requires allocating last 64KB of kernel heap and putting EOT
signature at the last 16 bytes of kernel heap
Related-To: NEO-7099
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
The new regkey is EnableUsmConcurrentAccessSupport that takes a bitmask
with usm capabilities to enable concurrent access for (bit0: host, bit1: device,
bit2: shared single-device, bit3: shared cross-device, bit4: shared system)
Related-To: NEO-6733
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
The new regkey is aimed to test cross-tile migration for buffers,
esp. first touch policy on h/w with support for page faults.
Related-To: NEO-6977
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Related-To: NEO-7003
Add function to control l1 policy for both
stateless and surface state cache.
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
- Added EnableProgramSymbolTableGeneration to enable or disable default
behavior for IGC to generate the program symbol tables for L0 modules
with exported functions.
- Default value set to true to add -library-compilation to all module
builds.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
With flag enabled, when app calls freeSVMAlloc on device usm allocation,
don't free it immediately but save it,
and try to use it on subsequent allocations.
This allocation cache will be trimmed if an allocation fails.
Related-To: NEO-6893
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Add support for PRELIM_I915_GEM_CREATE_EXT_VM_PRIVATE extension
to create VM_PRIVATE BOs.
Related-to: NEO-6730
Signed-off-by: Naklicki, Mateusz <mateusz.naklicki@intel.com>
This commit introduces debug variable to override device name reported
by CL_DEVICE_NAME property in OpenCL and ze_device_properties_t.name in
level_zero
Signed-off-by: Pawel Wilma <pawel.wilma@intel.com>
I've added debug flag FailBuildProgramWithStatefulAccess which makes
possible to fail build program/module creation
with stateful access(except builtins) on
pvc and later platforms.
Related-To: NEO-6075
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
- new feature, enabled with PRELIM build
- implementation of debug session for linux
- move ResourceClass enum from Drm to drm_debug.h
Resolves: NEO-6814
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Add debug variable ForceGrfNumProgrammingWithScm.
Do not update large grf value in StreamProperties when unnecessary.
Related-To: NEO-6659
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
require 48bit resource for ring/semaphore buffer
for multi tile allocations select first tile
for single tile allocation select preferred tile
Related-To: NEO-6698
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
In direct submission scenario command/ring/semaphore buffer allocations
are placed in the same memory bank to ensure that their memory is updated in
correct order
Related-To: NEO-6698
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
This change introduces the new flag called DisableGpuHangDetection.
By default it is disabled. When someone wants to disable hang checking,
then this flag can be set to true.
Related-To: NEO-6681
Signed-off-by: Patryk Wrobel <patryk.wrobel@intel.com>
This feature is disabled by default, controlled with the knob
AppendMemoryPrefetchForKmdMigratedSharedAllocations
Related-To: NEO-6740
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Related-To: NEO-6075
After this change driver will fail clBuildProgram/zeModuleCreate api calls
whenever stateful access is discovered on PVC.
This is required since in this case allocation greater than 4GB
will not work.
If user still wants to use stateful addressing mode,
-cl-opt-smaller-than-4GB-buffers-only / -ze-opt-smaller-than-4GB-buffers-only
build option should be passed as build option, but then user can not use
bufers greater than 4GB.
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
Rename:
- debug flag ProgramPipeControlPriorToNonPipelinedStateCommand
to ProgramExtendedPipeControlPriorToNonPipelinedStateCommand
- local variables
Related-To: NEO-6615
Signed-off-by: Krzysztof Gibala <krzysztof.gibala@intel.com>
Forces extended buffer size by adding pageSize specify by number when
debug flag is >=1 in L0 USM calls
Usage:
ForceExtendedUSMBufferSize=2
size += (2 * pageSize)
Signed-off-by: Krzysztof Gibala <krzysztof.gibala@intel.com>
Forces extended buffer size by adding pageSize specify by number when
debug flag is >=1 in:
- clHostMemAllocINTEL
- clDeviceMemAllocINTEL
- clSharedMemAllocINTEL
Usage:
ForceExtendedUSMBufferSize=2
size += (2 * pageSize)
Signed-off-by: Krzysztof Gibala <krzysztof.gibala@intel.com>
FilterBdfPath is used only on Linux as a filter for BDF
when opening from /dev/dri/by-path
FilterDeviceId is used on both OSes as a filter for device id
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
This patch makes PCIE BDF as the default method for UUID
calculation.
Related-To: LOCI-2909
Signed-off-by: Joshua Santosh Ranjan <joshua.santosh.ranjan@intel.com>
Group small allocations and reuse mapped memory in order to keep map
count small.
Related-To: NEO-6417
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
Forces extended buffer size by adding pageSize specify by number when
debug flag is >=1 in:
- clCreateBuffer
- clCreateBufferWithProperties
- clCreateBufferWithPropertiesINTEL
Usage:
ForceExtendedBufferSize=2
size += (2 * pageSize)
Signed-off-by: Krzysztof Gibala <krzysztof.gibala@intel.com>
- new flag ExperimentalEnableSourceLevelDebugger that
allows communication with debugger library
Related-To: NEO-6514
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Currently only supported scenario is one in order queue.
Instead of resolving dependencies via semaphores, do this with pipe controls.
Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
- update crossthreaddata size according to argument offsets
when processing patchtoken binary when DATA PARAMETER STREAM SIZE
is lower than size required for arguments
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Introduce the debug regkey OverrideMocsIndexForScratchSpace
to control MOCS index in surface state for scratch space
Related-To: NEO-6509
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Introduce the regkey OverrideL1CacheControlInSurfaceStateForScratchSpace
to control cache policy in surface state for scratch space
Related-To: NEO-3227
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
Related-To: NEO-6075
After this change driver will fail clBuildProgram/zeModuleCreate api calls
whenever stateful access is discovered and device has shared system usm caps
enabled.This is required since in this case allocation greater than 4GB
will not work.
If user still wants to use stateful addressing mode,
-cl-opt-smaller-than-4GB-buffers-only / -ze-opt-smaller-than-4GB-buffers-only
build option should be passed as build option, but then user can not use
buffers greater than 4GB.
For now, it will stay enabled with debug key, to allow
customers easier transition to this model. This may be
reenabled by default after customers feel their code
is ready for it.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Rename ProgramAdditionalPipeControlBeforeStateComputeModeCommand to
ProgramPipeControlPriorToNonPipelinedStateCommand
Related-To: NEO-6056
Signed-off-by: Krzysztof Gibala <krzysztof.gibala@intel.com>