CmdList can be released before Event. In this case, GfxAllocation
destruction must be deferred.
Related-To: NEO-7966
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
- this method allocates System Memory
- argument is not needed - ExternalHeap is selected inside this function
- remove unneeded ults
- allocate memory in Device Pool for external heap allocation in
OsAgnosticMemoryManager
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- introduce 2 reuse pools to bindlessHeapHelper
- one pool stores slots for reuse, second pool stores released slots
- stateCacheDirty flags keep track of state cache - when pools are
switched - flags are set indicating flushing caches is needed after
old slots have been reused for new allocations
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- program surface states for redescribed images correctly. Image copy
to/from memory are using redescribed surface states,
- refactor state base address programming - program address and size
together, set max size at the beginning due to lack of Enable flag
- set GpuBase in WddmAllocation when external heap is used
- return max ssh required size from kernelInfo or based on stateful args
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Add USER_FENCE before PREFETCH call and after the BIND
Related-To: NEO-8098
Signed-off by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Signed-off-by: John Falkowski <john.falkowski@intel.com>
- store surface state info for bindless addressing in graphics
allocation
- remove map in BindlessHeapsHelper - bindlessInfo is constant for
the lifetime of an allocation
- program bindless offsets and surface states for images when used in
bindless kernel
- handle ouf of memory on surface state heap - return error
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
The previous name "pageSize2Mb" defined in
shared/source/helpers/constant.h is inconsistent to other variable,
i.e. pageSize64k.
Furthermore, it's a bit misleading because the page size is defined in
Megabytes (MB), not in Megabits (Mb).
Related-to: NEO-7695
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
When creating shared USM, currently default root device index
is used when accessing memoryManager.
This change fixes this issue, by using device provided by caller.
In case device is not provided, then default root device index
could be used.
Related-To: LOCI-4474
Signed-off-by: Jitendra Sharma <jitendra.sharma@intel.com>
Related-To: LOCI-4381
- Enabled support for customers to use full Virtual reservation range
with multiple physical mappings with additional allocations implicitly
included in residency.
- Buffer Surface state size extended for first allocation to stretch to
the bufferSize requested.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
- getGlobalBindlessHeapConfiguration() should be used to choose global
alloctor for SSH
- remove not needed and incorrect unit tests
- remove not needed branches
- bindless mode controls bindless compilation only
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Read if support for chunking is available in the KMD.
If available, KMD will create a BO with 1 or more chunks,
depending on the chunk size selected.
Related-To: NEO-7695
Sync to
https://github.com/intel-gpu/drm-uapi-helper/releases/tag/v2.0-rc18
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Signed-off-by: John Falkowski <john.falkowski@intel.com>
Related-To: LOCI-4176
- Given a Base Pointer passed into Get Peer Allocation, then the base
pointer is used in the map of the new allocation to the virtual memory.
- Enables users to use the same pointer for all devices in Peer To Peer.
- Currently unsupported on reserved memory due to mapped and exec
resiedency of Virtual addresses.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
- allocateGraphicsMemoryUsingKmdAndMapItToCpuVA in case of no compression
- allocate32BitGraphicsMemoryImpl in case of allocate by KMD
remove redundant ctor of StorageInfo class
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Allows the user to use alignments > 64KB in `createUnifiedMemoryAllocation`
So that the restriction in `piextUSMDeviceAlloc` of the DPC++ runtime
could be lifted
Related-To: LOCI-4168
Signed-off-by: Lu, Wenbin <wenbin.lu@intel.com>
use StackVec instead of unordered map
resize container at MemoryManager's creation time
Related-To: NEO-7925
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
in most cases we need to iterate over engines associated to single root device
Related-To: NEO-7925
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Related-To: LOCI-4172, LOCI-4305, LOCI-4306
- Create a new IPC Memory handle upon call to getIpcMemHandle if the
previous handle has been freed.
- Release the Ipc Memory Handle when zeMemPutIpcHandle is called.
- Create a new IPC Handle for tracking thru zeMemGetAllocProperties
when ze_external_memory_export_fd_t is used.
- Convert FD to opaque IPC handle and IPC Handle to FD.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
- Check allocation root device index during eviction
- Wait for and marked allocation only from the current root device index
Related-To: NEO-7920
Signed-off-by: Krzysztof Gibala <krzysztof.gibala@intel.com>
The memadvise with preferred location for kmd-migrated shared allocation
is set to device associated with cmd list by default to migrate data
to lmem on non-atomic gpu page fault too (for performance reasons).
Related-To: NEO-7252
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Extended the regkey ForceMemoryPrefetchForKmdMigratedSharedAllocations
to force meory prefetch of kmd-migrated shared allocation
in clEnqueueNDRangeKernel(), clEnqueueMemFillINTEL, ...
Related-To: NEO-7841
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Globals surface allocation via USM manager will have correct allocation
type set (instead of just BUFFER) and will use cpu copy when possible.
Related-To: NEO-7796
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Allows the user to use alignments > 64KB in `createUnifiedMemoryAllocation`
So that the restriction in `piextUSMDeviceAlloc` of the DPC++ runtime
could be lifted
Related-To: LOCI-4168
Signed-off-by: Lu, Wenbin <wenbin.lu@intel.com>
Related-To: LOCI-3871
- Relaxed the Virtual Memory Reservation to allow pStart and not fail if
the pStart value is not obtained.
- Moves checks on pStart to the user to check and determine if they want
to re-reserve or use the address allocated.
- Changed reserveGpuAddress to use unit64_t type to allow internal
address range structure assignment without cast.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
reuse EncodeEnableRayTracing in CommandStreamReceiver
add method to determine need for 48b resource flag for RT allocations
Related-To: NEO-7606
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
to guarantee that all subblt got complete for previous copy
affect xe hpg
Related-To: NEO-7450
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
If user provided not-null hostptr field, then the driver
should try to use it. This change adds omitted functionality,
which handles the described case also in createUnifiedMemoryAllocation().
Related-To: NEO-7600
Signed-off-by: Wrobel, Patryk <patryk.wrobel@intel.com>
Related-To: LOCI-3871
- Enabled allocation of specified base address in the targeted heap.
- Enabled virtual memory reservations to grow by allocating at the start
of the heap vs the end of the heap.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
When one process had exported and then opened IPC handle
of memory, then close function was called twice for the
same BO handle. It caused debugBreak() and aborted
an application.
This change allows multiple separate BOs to share one
handle. The last shared handle owner calls close() function.
Related-To: NEO-7200
Signed-off-by: Wrobel, Patryk <patryk.wrobel@intel.com>
This commit adds support for parsing SHT_NOBITS zebin's ELF sections
(containing global/constant zero-initialized data).
- Correction: in CTNI path, do not add related symbol if surface has not
been allocated.
Related-To: NEO-7196
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
- Added support for mapping any portion of a virtual allocation to a
physical mapping with a lookup function for reserved virtual addresses.
- Added support for multiple mappings linked to the same virtual
reservation.
- Fixed bug with 64 bit addresses on windows with invalid addresses
passed to the user.
Related-To: LOCI-3904, LOCI-3914, LOCI-3931
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
Add a clearQueueTillFirstFailure interface to DeferredDeleter, which
iterates the queue from the front and delete the allocations in the
queue till a failure. It is called by defer deletion of allocations
occupied by mutliple contexts to unlock the execution in main thread
Related-To: NEO-7532
Signed-off-by: HeFan2017 <fan.f.he@intel.com>
Add a clearQueueTillFirstFailure interface to DeferredDeleter, which
iterates the queue from the front and delete the allocations in the
queue till a failure. It is called by defer deletion of allocations
occupied by mutliple contexts to unlock the execution in main thread
Related-To: NEO-7532
Signed-off-by: HeFan2017 <fan.f.he@intel.com>
Add a clearQueueTillFirstFailure interface to DeferredDeleter, which
iterates the queue from the front and delete the allocations in the
queue till a failure. It is called by defer deletion of allocations
occupied by mutliple contexts to unlock the execution in main thread.
Related-To: NEO-7532
Signed-off-by: HeFan2017 <fan.f.he@intel.com>
Confirm the allocations used in an appendMemoryCopy operation
belong to the same context as the list.
Related-To: LOCI-1996
Resolves: NEO-6162
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
This patch force KMD allocation path for USM shared
Additionally we force 64kb page from lock which is
required to properly program GPU VA
Related-To: NEO-6913
Signed-off-by: Kamil Diedrich kamil.diedrich@intel.com
Confirm the allocations used in an appendMemoryCopy operation
belong to the same context as the list.
Related-To: LOCI-1996
Resolves: NEO-6162
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
This to follow specification, which says:
zeMemOpenIpcHandle:
- Multiple calls to this function with the same IPC handle will return
unique pointers.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
- Enable support for L0 Virtual Memory reservation on Linux and Windows.
- Excludes support for Linux to allow pStart option
Related-To: LOCI-3397, LOCI-1543
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
If a handle cannot be obtained, like PRIME_HANDLE_TO_FD, then
properly check for the error and propagate it upwards.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Ensure memory prefetch be applied in every execution of command list.
Related-To: NEO-6740
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
This patch force KMD allocation path for USM shared
Additionally we force 64kb page from lock which is
required to properly program GPU VA
Related-To: NEO-6913
Signed-off-by: Kamil Diedrich kamil.diedrich@intel.com
If a handle cannot be obtained, like PRIME_HANDLE_TO_FD, then
properly check for the error and propagate it upwards.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Allow to: disable performance hints, make allocation lockable
Used in BufferPoolAllocator
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Allocations of buffers <= 64KB will be lockable, to
allow copying through locked pointer.
Related-To: NEO-7332
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
PVC platform with no support for atomic operations on system memory
must always allocate buffers in local memory to avoid atomic access violation.
Note: the feature is being implemented under the new registry key
AllocateBuffersInLocalMemoryForMultiRootDeviceContexts (disabled by default)
Related-To: NEO-7092
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
This change replaces unneeded copying of std::vectors
with usage of const references. Furthermore, it adds
reserve() call before filling the container via push_back().
Signed-off-by: Patryk Wrobel <patryk.wrobel@intel.com>
Discarding RAII lock returned from function almost always
is a bug. This change introduces usage of [[no_discard]]
attribute from C++17 to prevent such misues.
Signed-off-by: Patryk Wrobel <patryk.wrobel@intel.com>
Do not use aligned size when storing allocation
Trim allocation cache before deleting devices
Related-To: NEO-6893
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- Properly check for IPC event handle flag to determine if the event
pool memory is sharable between processes.
- Given Host Visible Event Pool, a check is done to determine if the
Host memory can be shared between the processes.
- Enabled handling if Event Host Memory is shareable for DRM
- If Event Pool Memory is Not shareable, then retrieving the IPC Event
Pool Handle returns unsupported.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
Add virtual deconstructor to OsHandle and deconstructor to OsHandleLinux
Add override keyword to destructor
Add overriding deconstructor to OsHandleWin
Add newline before private members
https://github.com/intel/compute-runtime/pull/550
Signed-off-by: Cameron S Murtagh <cameron.murtagh00@gmail.com>
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
EOT WA requires allocating last 64KB of kernel heap and putting EOT
signature at the last 16 bytes of kernel heap
Related-To: NEO-7099
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Remove the restriction on USM allocation created in a single local memory region
with latest KMD fix for cross tile migration thrashing b/t lmem (dii-3516)
Related-To: NEO-6909
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
This commit fixes problem with untransfered shared usm memory to gpu
when there is submit to gpu trigerred by user event. Also there is a fix
for dead lock problem caused by mixed orders of locking mutexes in csr
and in direct submission controller.
Related-To: NEO-6762
Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>
With this commit on DG2 32bit driver will check if passed host ptr for
clEnqueueReadBuffer is write combined memory. If check will be true copy
will be make on CPU.
Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>
With flag enabled, when app calls freeSVMAlloc on device usm allocation,
don't free it immediately but save it,
and try to use it on subsequent allocations.
This allocation cache will be trimmed if an allocation fails.
Related-To: NEO-6893
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Use enum class for MemoryPool in GraphicsAllocation
This change will ensure that GA is constructed in the proper way
- Rename namespace for isSystemMemoryPool method
- Add method getMemoryPoolString for logging actual pool which is in used
- Remove wrong pattern in GraphicsAllocation constructor
Related-To: NEO-6523
Signed-off-by: Krzysztof Gibala <krzysztof.gibala@intel.com>
when using implicit scaling, 2 dma-buf handles, one per tile, are
needed to support dma access from peer.
Related-To: LOCI-3122
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Ensure KMD migrations for USM allocations to occur between smem and lmem only
Related-To: NEO-6969
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Define single .clang-tidy configuration with all used checks and use
NOLINT to selectively silence tool. That way cleanup should be easier.
third_part/ has its own configuration that disables clang-tidy for this
folder.
Signed-off-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
When using implicit scaling, device allocations may have
more than one internal allocation created internally. In that case,
a separate dma-buf handle per internal allocation needs to be
exported.
So introduced two driver experimental extensions to export and
import more than one IPC handle:
- zexMemGetIpcHandles
- zexMemOpenIpcHandles
Related-To: LOCI-2919
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
This is fixed reupload of this commit after auto revert
With this commit OpenCL will track if external host memory is used from
few threads and will secure to update task count in all threads before
destroing allocation.
Resolves: NEO-6807
Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>
With this commit OpenCL will track if external host memory is used from
few threads and will secure to update task count in all threads before
destroing allocation.
Resolves: NEO-6807
Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>
- different physical storage for every HW context
- adds support for debugging with implicit scaling on
- reorganize tests
Relates-To: NEO-6883
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Stack vector will not cause dynamic allocations in most circumstances
ie. number of root device indices not more than 16
Related-To: NEO-6837
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
This change introduces checking of the return value
of wait function in case of blocking version of
evictUnusedAllocations(). Furthermore, it propagates
the error to the callers. It contains also ULTs.
Related-To: NEO-6681
Signed-off-by: Patryk Wrobel <patryk.wrobel@intel.com>
There is no need to force 2MB alignment for CPU allocation in dual
storage usage. Additionaly for WSL this will allow to avoid usage of
malloc in driver path.
Relates-To: NEO-6620
Signed-off-by: Kamil Diedrich <kamil.diedrich@intel.com>
require 48bit resource for ring/semaphore buffer
for multi tile allocations select first tile
for single tile allocation select preferred tile
Related-To: NEO-6698
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
In direct submission scenario command/ring/semaphore buffer allocations
are placed in the same memory bank to ensure that their memory is updated in
correct order
Related-To: NEO-6698
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
When making graphics allocations resident in multi-GPU scenarios,
we should make them resident only if there's an allocation for that
device. So return appropriate null pointer and skip it.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Add a per-instance SVMAllocsManager::nonGpuDomainAllocs container for
all allocations to be removed in
moveAllocationsWithinUMAllocsManagerToGpuDomain. This approach replaces
the current iterative search and performs the task faster.
Add 7 new unit-tests to verify the functionality related to
nonGpuDomainAllocs container, both in expected and unexpected/synthetic
scenarios.
For UTs replace a dummy unifiedMemoryManager pointer with a pointer to
an instace of SVMAllocsManager, otherwise a SegFault error is thrown at
the end of tests.
Perform overall cleanup in related tests implementation, includes but
not limited to removal of:
- givenInitialPlacementGpu\
WhenMovingToGpuDomainThenFirstAccessDoesNotInvokeTransfer
As it is fully covered by:
givenAllocationMovedToGpuDomain\
WhenVerifyingPagefaultThenAllocationIsMovedToCpuDomain
- givenInitialPlacementGpu\
WhenVerifyingPagefaultThenFirstAccessDoesNotInvokeTransfer
As it is fully covered by:
givenTbxAndnitialPlacementGpu\
WhenVerifyingPagefaultThenMemoryIsUnprotectedOnly
Finally, reduce code duplication where possible.
Related-To: NEO-6658
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
This feature is disabled by default, controlled with the knob
AppendMemoryPrefetchForKmdMigratedSharedAllocations
Related-To: NEO-6740
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>