In TBX mode, the host could not write to host buffers after access from device
code due to the lack of a migration mechanism post-initial TBX upload.
Migration is unnecessary with real hardware, but required for TBX.
This patch introduces a new page fault manager type that extends the original
CPU fault manager, enabling automatic migration of host buffers in TBX mode.
Refactoring was necessary to avoid diamond inheritance, achieved by using a
template parameter as the base class for OS-specific fault managers.
Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com>
Leverage features of the mechanism to simplify implementation:
- The maximum number of possible cache-region reservations is a small
value known at compile-time
- Each reservation is unique (described by `CacheRegion`) so can have
a dedicated entry with either zero (free) or non-zero (reserved) value
So, there is no need for a dynamic collection (unordered_map here) to
keep track of reservations. A simple array is enough for that purpose.
Also, add some helper-code to enable array-indexing with the values of
`CacheRegion` enum.
Related-To: NEO-12837
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Refactors the STATE_BASE_ADDRESS to align with the latest specification.
Removes redundant functionality for multiple GPU partial writes and
atomics.
Related-To: NEO-13147
Signed-off-by: Vysochyn, Illia <illia.vysochyn@intel.com>
This PR handles the situation in which a component
has reserved a front window space for itself in the external heap,
so that the Compute Runtime cannot access this area.
In such a situation, we perform the following steps:
1. reserve 4GB chunk in heapStandard
2. split our chunk into 2 parts: heapFrontWindow, heapRegular
3. from this point on, map all linearStream allocations in reserved 4GB
chunk
Patch applies to Windows and WSL.
Patch only applies when the bindless global allocator is enabled.
Related-To: HSD-16025889919
Signed-off-by: Fabian Zwoliński <fabian.zwolinski@intel.com>
- move available device calculcation into common helper
- change interface to have code available where no descriptor is available
- expand unit test for implementation of new inteface
Related-To: NEO-13350
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- split the allocation code from command list or kernel
- allow to call allocation code in all parts of the driver
Related-To: NEO-13350
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Performs minor renaming (mostly capitalization) in order to align with
specification.
Renames L1_CACHE_POLICY to L1_CACHE_CONTROL.
Related-To: NEO-13147
Signed-off-by: Vysochyn, Illia <illia.vysochyn@intel.com>
- change additional size into local region size
- change walk order into dispatch walk order to distinguish for local id walk
Related-To: NEO-13350
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
PrintCalculatedTimestamps - print ts in level zero paths
PrintTimestampPacketContents - add logging also to level zero paths
ForceUseOnlyGlobalTimestamps - force using a global ts
Related-To: HSD-14023527252
Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>
L3 bank count should be queried from KMD
L3 bank size should be queried from device blob
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
This adds abbility to load different versions of the backend
compiler based on underlying device.
Related-To: NEO-12747
Signed-off-by: Chodor, Jaroslaw <jaroslaw.chodor@intel.com>
This adds abbility to load different versions of the backend
compiler based on underlying device.
Related-To: NEO-12747
Signed-off-by: Chodor, Jaroslaw <jaroslaw.chodor@intel.com>
- bitset is 64 bit in size, context ids may go beyond that limit
when multiple devices are available
- this change subtracts contextId of first context for a given root
device - tracked state dirty contexts ids are now zero-based
Resolves: GSD-10025
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Add a constant to describe minimal size of SyncBuffer object.
Related-To: HSD-18039952263, HSD-18039940553, HSD-18039937640
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
mid thread preemption can be enabled only by ftrWalkerMTP flag
pre-Xe2 devices doesn't support MTP
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- prepare bindful ssh when kernel requires ssh heap and
SurfaceStateBaseAddress
- remove lastAppendedKernelBindlesMode - local ssh heap may be needed
for bindless kernels with scratch or misaligned buffer args
- use ssh heap gpu address to program SurfaceStateBaseAddress, global base is
used for BindlessSurfaceState and DynamicState
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
In order to debug on XE, XE_EXEC_QUEUE_SET_PROPERTY_EUDEBUG
needs to be set up.
Related-To: NEO-12691
Signed-off-by: Jitendra Sharma <jitendra.sharma@intel.com>
Allow for different required version for VC compiled kernels.
Define constant for detection disabled.
Related-To: NEO-12491
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Check indirect detection version from igc header for JIT.
Move required version to its own method.
This allows for different required versions per platform.
Related-To: NEO-12491
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Moved pathExists from SysCalls to path.h.
In ULTs, use unchanged pathExists and mock stat, getFileAttributesA instead.
Add Windows and Linux ULTs for pathExists.
Signed-off-by: Fabian Zwoliński <fabian.zwolinski@intel.com>
- always issue flush request before export
Apparently it's expected to flush the object (which might convert them
from one format to another for export, or remove aux buffer uses or
anything not supported by export).
- use modifier to select tiling mode
Previously we just assumed that whatever tiling mode was picked by mesa
will match the one picked by GMMLIB but that's not always the case
and in particular on Arc and Xe it doesn't work ... Mesa picks Tile4
and GMMLIB picks Tile64 ...
Fixes: #761Fixes: #736
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- if no hp copy engine available, create group with regular and hp
contexts
Related-To: NEO-11983
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Future HW will not support cache reservation uniquely for the whole
platform. Implementation of some functions may vary between products.
Related-To: NEO-10158
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Related-To: NEO-12197
Currently for new resources user thread must wait before submitting
actual workload. With this commit, instead of waiting on user thread,
request is sent to background ULLS controller thread and additional
semaphore is programmed. ULLS controller will perform actual wait
and signal semaphore when paging fence reaches required value.
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
computeMaxNeededSubSliceSpace is no longer needed as getHighestEnabledSubSlice
already determines maximum index from all enabled subslices
Related-To: NEO-12073
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
when no subslice info provided, assume max subslices per slice are enabled
Related-To: NEO-12073
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- add support for contect group with HP copy engine
- choose HP copy engine when available
Related-To: NEO-11983
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- add support for contect group with HP copy engine
- choose HP copy engine when available
Related-To: NEO-11983
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
When BCS3 is not available, use last available copy engine as internal.
Related-To: HSD-18039263936
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
This change enables support for
`cl_intel_subgroup_matrix_multiply_accumulate_tf32` extension for PVC B0
and later.
Related-To: GSD-7825
Signed-off-by: Ratajewski, Andrzej <andrzej.ratajewski@intel.com>
This change enables support for cl_intel_subgroup_buffer_prefetch extension for
PVC and later.
Related-To: GSD-7825
Signed-off-by: Ratajewski, Andrzej <andrzej.ratajewski@intel.com>
This feature enables supported platforms in ocloc even
if not enabled for driver.
Allows sharing single ocloc instance for multiple driver-platform
configurations
Related-To: NEO-10531
Signed-off-by: Chodor, Jaroslaw <jaroslaw.chodor@intel.com>
Move getCachingPolicyOptions method present in existing
cache_policy_dg2_and_later.inl in new
get_caching_policy_options.inl file.
This would help in reusing getCachingPolicyOptions
method in any newly created cache_policy_* file.
Related-To: NEO-8306
Signed-off-by: Jitendra Sharma <jitendra.sharma@intel.com>
Modified to print out error messages to stdout when disable scratch page
is used.
Related-To: GSD-7611
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
This change enables support for cl_intel_subgroup_2d_block_io extension for
PVC and later.
Related-To: GSD-7825
Signed-off-by: Ratajewski, Andrzej <andrzej.ratajewski@intel.com>
When remap is enabled, we must set different base offset for copy engines.
Copy engines must use BCS0 base.
Related-To: NEO-10678
Signed-off-by: Andrzej Koska <andrzej.koska@intel.com>