PVC platform with no support for atomic operations on system memory
must always allocate buffers in local memory to avoid atomic access violation.
Note: the feature is being implemented under the new registry key
AllocateBuffersInLocalMemoryForMultiRootDeviceContexts (disabled by default)
Related-To: NEO-7092
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Heuristic is enabled by default
to disable, set:
AdjustThreadGroupDispatchSize=0
Related-To: NEO-6989
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Improves performance in workloads that create small opencl buffers.
To enable, set env var ExperimentalSmallBufferPoolAllocator=1
Known issues (will be addressed in further commits):
- cannot create subBuffer from such buffer
- pool buffer allocation should be reused
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- Added Support for reading the Device LUID of the given device used in
Windows WDDM given EnableL0ReadLUIDExtension=1.
- Added inital support for passing back the NodeMask of 1.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
This change is intended to be used in immediate command lists that are
using flush task functionality.
With this change all immediate command list using the same csr will consume
shared allocations for dsh and ssh heaps. This will decrease number of SBA
commands dispatched when multiple command lists coexists and dispatch kernels.
With this change new SBA command should be dispatched only when current heap
allocation is exhausted.
Functionality is currently disabled and available under debug key.
Functionality will be enabled by default for all immediate command lists
with flush task functionality enabled.
Related-To: NEO-7142
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Improve performance by binding the command buffer together with other
allocations if VM_BIND feature is available. Remove the legacy
flag PassBoundBOToExec from DebugManager to simplify the logic.
Adapt unit tests and reuse handy macros to generate proxy mock-methods.
Related-To: NEO-7348
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
use cpu copy with locked pointer if possible
because this is faster than copy on gpu
limit to buffers of size at most 64kb
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Related-To: NEO-7237
Enable copy on cpu by default.
This commit also changes barrierCounter to bool
barrierCalled
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
This change reflects exact nature of debug variable and what is code
actually doing
Related-To: NEO-7187
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- enable tile attach mode by default
- both root device and subdevice may be attached to
Related-To: NEO-7347
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Related-To: NEO-7237
If size is small enough, it is more efficient to
perform copy through locked ptr on CPU.
This change also introduces experimental flag to
enable this.
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
This optimization removes pipeline select from command list preamble
and presented to command queue for necessary state update.
Code is disabled by default and available under debug key.
Related-To: NEO-5019
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- Enabled default setting of Program & Global Symbols to be generated by
IGC when building L0 Modules with the ability to fallback to previous
behavior thru build failure checks.
- Enabled selective disable of default program or global symbol
generation thru debug variables.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
This change gives fine grain control over front end configuration for each
kernel.
As it gives possible to inject FE command in command queue and return to exact
place in command list.
Programming commands in queue makes patching commands in command lists
not needed as that operation is costly.
And it allows to program context information for each command list too.
Related-To: NEO-5019
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- Enabled default setting of Program & Global Symbols to be generated by
IGC when building L0 Modules with the ability to fallback to previous
behavior thru build failure checks.
- Enabled selective disable of default program or global symbol
generation thru debug variables.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
Introduce debug variable to control which engines
the tranfser will be split into
Related-To: NEO-7173
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
Certain platforms might not require prefetcher to
be disabled in direct submission. This change
provides a way to control that behaviour.
Signed-off-by: Rafal Maziejuk <rafal.maziejuk@intel.com>
Related-To: NEO-7218
This commit adds support for ZEX_NUMBER_OF_CCS flag which can be used
for limiting number of CCS engines
Format is as follows:
ZEX_NUMBER_OF_CCS=RootDeviceIndex:NumberOfCCS;RootDeviceIndex:NumberOfCCS...
i.e. setting Root Device Index 0 to 4 CCS, and Root Device Index 1 To 1 CCS
ZEX_NUMBER_OF_CCS=0:4,1:1
Related-To: NEO-7195
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
optimization available under flag
ForceCsrLockInBcsEnqueueOnlyForGpgpuSubmission
Related-To: NEO-7011
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
New debug toggle disables limitation of work-group count for related queries.
Additionally OverrideMaxWorkGroupCount toggle was updated
to behave the same way, ignoring underlying engine type
when max-work group count is queried.
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
EOT WA requires allocating last 64KB of kernel heap and putting EOT
signature at the last 16 bytes of kernel heap
Related-To: NEO-7099
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
The new regkey is EnableUsmConcurrentAccessSupport that takes a bitmask
with usm capabilities to enable concurrent access for (bit0: host, bit1: device,
bit2: shared single-device, bit3: shared cross-device, bit4: shared system)
Related-To: NEO-6733
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
The new regkey is aimed to test cross-tile migration for buffers,
esp. first touch policy on h/w with support for page faults.
Related-To: NEO-6977
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Related-To: NEO-7003
Add function to control l1 policy for both
stateless and surface state cache.
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
- Added EnableProgramSymbolTableGeneration to enable or disable default
behavior for IGC to generate the program symbol tables for L0 modules
with exported functions.
- Default value set to true to add -library-compilation to all module
builds.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
Most customers use this already, so enable by default to have
consistent enumeration of devices.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
With flag enabled, when app calls freeSVMAlloc on device usm allocation,
don't free it immediately but save it,
and try to use it on subsequent allocations.
This allocation cache will be trimmed if an allocation fails.
Related-To: NEO-6893
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Add support for PRELIM_I915_GEM_CREATE_EXT_VM_PRIVATE extension
to create VM_PRIVATE BOs.
Related-to: NEO-6730
Signed-off-by: Naklicki, Mateusz <mateusz.naklicki@intel.com>
Most customers use this already, so enable by default to have
consistent enumeration of devices.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
This commit introduces debug variable to override device name reported
by CL_DEVICE_NAME property in OpenCL and ze_device_properties_t.name in
level_zero
Signed-off-by: Pawel Wilma <pawel.wilma@intel.com>
I've added debug flag FailBuildProgramWithStatefulAccess which makes
possible to fail build program/module creation
with stateful access(except builtins) on
pvc and later platforms.
Related-To: NEO-6075
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
- new feature, enabled with PRELIM build
- implementation of debug session for linux
- move ResourceClass enum from Drm to drm_debug.h
Resolves: NEO-6814
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Define single .clang-tidy configuration with all used checks and use
NOLINT to selectively silence tool. That way cleanup should be easier.
third_part/ has its own configuration that disables clang-tidy for this
folder.
Signed-off-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
Add debug variable ForceGrfNumProgrammingWithScm.
Do not update large grf value in StreamProperties when unnecessary.
Related-To: NEO-6659
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
require 48bit resource for ring/semaphore buffer
for multi tile allocations select first tile
for single tile allocation select preferred tile
Related-To: NEO-6698
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
In direct submission scenario command/ring/semaphore buffer allocations
are placed in the same memory bank to ensure that their memory is updated in
correct order
Related-To: NEO-6698
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
This change introduces the new flag called DisableGpuHangDetection.
By default it is disabled. When someone wants to disable hang checking,
then this flag can be set to true.
Related-To: NEO-6681
Signed-off-by: Patryk Wrobel <patryk.wrobel@intel.com>
This feature is disabled by default, controlled with the knob
AppendMemoryPrefetchForKmdMigratedSharedAllocations
Related-To: NEO-6740
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>