Commit Graph

732 Commits

Author SHA1 Message Date
Jack Myers 7f9fadc314 fix: regression caused by tbx fault mngr
Addresses regressions from the reverted merge
of the tbx fault manager for host memory.

Recursive locking of mutex caused deadlock.

To fix, separate tbx fault data from base
cpu fault data, allowing separate mutexes
for each, eliminating recursive locks on
the same mutex.

By separating, we also help ensure that tbx-related
changes don't affect the original cpu fault manager code
paths.

As an added safe guard preventing critical regressions
and avoiding another auto-revert, the tbx fault manager
is hidden behind a new debug flag which is disabled by default.

Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com>
2025-01-09 07:48:53 +01:00
Semenov Herman (Семенов Герман) 9f07f56f7f performance: align structures for 64-bit platforms
Signed-off-by: Semenov Herman (Семенов Герман) <GermanAizek@yandex.ru>
2025-01-09 06:03:39 +01:00
Lukasz Jobczyk 983b46fbbb performance: Align host USM to 2MB
Only on discrete devices and if size is greater than 2MB

Resolves: NEO-12652

Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2025-01-07 14:32:26 +01:00
Dominik Dabek 5b429dd415 fix: usm reuse, check for in use before returning
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2024-12-20 18:24:18 +01:00
Compute-Runtime-Validation 124e755b9d Revert "fix: regression caused by tbx fault mngr"
This reverts commit 9a14fe2478.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-12-19 17:35:03 +01:00
Jack Myers 9a14fe2478 fix: regression caused by tbx fault mngr
Addresses regressions from the reverted merge
of the tbx fault manager for host memory.

This fixes attempts by the tbx fault manager
to protect/unprotect host buffer memory, even
if the host ptr was not driver-allocated.

In the case of the smoke test that triggered
the critical regression, clCreateBuffer was
called with the CL_MEM_USE_HOST_PTR flag.
The subsequent `mprotect` calls on the
provided host ptr then failed.

Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com>
2024-12-18 23:16:36 +01:00
Filip Hazubski a0cc124b2e performance: Pass RootDeviceIndicesContainer by reference
Additionally pass std::map by reference in UsmMemAllocPoolsManager c-tor.

Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
2024-12-17 14:18:30 +01:00
Dominik Dabek d298e5ddb3 refactor: usm reuse, memory manager pointers
Keep pointers to memory managers in reuse structure.

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2024-12-16 17:09:51 +01:00
Mateusz Hoppe 0589a70dc7 feature(sysman): reinitialize gfxPartition on reset
Related-To: NEO-13203

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2024-12-12 16:27:25 +01:00
Compute-Runtime-Validation 6c5d9a6ed7 Revert "feature: extend TBX page fault manager from CPU implementation"
This reverts commit 51c0e80299.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-12-12 12:30:22 +01:00
Jack Myers 51c0e80299 feature: extend TBX page fault manager from CPU implementation
In TBX mode, the host could not write to host buffers after access from device
code due to the lack of a migration mechanism post-initial TBX upload.
Migration is unnecessary with real hardware, but required for TBX.

This patch introduces a new page fault manager type that extends the original
CPU fault manager, enabling automatic migration of host buffers in TBX mode.

Refactoring was necessary to avoid diamond inheritance, achieved by using a
template parameter as the base class for OS-specific fault managers.

Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com>
2024-12-11 09:09:50 +01:00
Filip Hazubski 3315db7d92 fix: Correct mutex logic in SVMAllocsManager::freeSVMAllocImpl
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
2024-12-10 16:16:53 +01:00
Compute-Runtime-Validation 484210d656 Revert "fix: limit usm device reuse based on used memory"
This reverts commit 1252b10ba9.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-12-05 23:17:51 +01:00
Fabian Zwoliński d2ce3badfc fix: bindlessHeapsHelper handle unavailable external heap
This PR handles the situation in which a component
has reserved a front window space for itself in the external heap,
so that the Compute Runtime cannot access this area.

In such a situation, we perform the following steps:
1. reserve 4GB chunk in heapStandard
2. split our chunk into 2 parts: heapFrontWindow, heapRegular
3. from this point on, map all linearStream allocations in reserved 4GB
chunk

Patch applies to Windows and WSL.
Patch only applies when the bindless global allocator is enabled.

Related-To: HSD-16025889919
Signed-off-by: Fabian Zwoliński <fabian.zwolinski@intel.com>
2024-12-05 14:18:01 +01:00
Dominik Dabek 1252b10ba9 fix: limit usm device reuse based on used memory
Calculate available memory for usm device reuse based as (total device
memory - used memory) * fraction for reuse.

Use sys mem allocs for devices without local memory.

Related-To: NEO-12902

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2024-12-04 08:11:23 +01:00
Mateusz Jablonski 2039b1c41b refactor: remove not needed code
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-11-28 16:16:30 +01:00
Dominik Dabek e55aa958b7 fix: track usm reuse usage in multiple contexts
Add tracking of memory used for usm reuse mechanism when multiple cl
contexts are used.
Tracking for device added to NEO::Device, for host added to
NEO::MemoryManager.

This fixes usm reuse using x% of memory per each context instead of
globally.

Related-To: NEO-13308

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2024-11-26 16:00:45 +01:00
Mateusz Jablonski db6fe7892c fix: remove destroyed allocations from eviction lists
mark explicitly made resident allocations

Related-To: NEO-13246, GSD-10319
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-11-25 22:19:54 +01:00
Wenbin Lu 2ba80ce114 feature: support physical host memory
Related-To: NEO-11981

Signed-off-by: Wenbin Lu <wenbin.lu@intel.com>
2024-11-20 08:19:52 +01:00
Lukasz Jobczyk 5b3d244e97 refactor: Add AIL for hostptrs drain
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2024-11-19 15:13:30 +01:00
Dominik Dabek 471615926f fix: adjust limiting device usm reuse
if limiting, disable device usm reuse (set max size to 0)

do not reserve vector for allocation infos if reuse is disabled

Related-To: NEO-12924

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2024-11-15 14:37:27 +01:00
Bartosz Dunajski 7bf22ed33e feature: counter based allocation peer sharing
Related-To: NEO-13079

Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com>
2024-11-13 15:01:32 +01:00
Lukasz Jobczyk 7f3896d05f performance: Ensure hostptrs removed before creating new one
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2024-11-12 13:55:15 +01:00
Szymon Morek 8aa5331bc1 fix: wait for latest known usage of indirect usm
Related-To: GSD-9989

Signed-off-by: Szymon Morek <szymon.morek@intel.com>
2024-11-04 16:24:30 +01:00
Alicja Lukaszewicz 654fdc1345 feature: add query for additional device properties
Related-To: NEO-12590

Signed-off-by: Alicja Lukaszewicz <alicja.lukaszewicz@intel.com>
2024-10-29 20:40:27 +01:00
Bartosz Dunajski 7f5e6b4124 Revert "fix: Enable 64k pages for TSB allocation"
This reverts commit eed69f45ed.

Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com>
2024-10-28 16:02:34 +01:00
Dominik Dabek 741101551e fix: add infrastructure to limit device usm reuse max memory used
Related-To: NEO-12924

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2024-10-25 21:54:41 +02:00
Jaroslaw Warchulski 2ba5ee2f6b Revert "fix: use full size for HEAP_EXTENDED initialization"
This reverts commit 5afc63df93.

Related-To: GSD-8948
Signed-off-by: Jaroslaw Warchulski <jaroslaw.warchulski@intel.com>
2024-10-22 19:27:31 +02:00
Maciej Bielski 45e78fea76 fix: use productHelper in getPatIndexInfoString() on Windows
Fix the PAT-index reporting in logger as currently on Windows reported
values are simply wrong.

The changed logic dependends on `RootDeviceEnvironment` and in order to
avoid introducing such dependencies into logger.[ch] the
`logAllocation()` is no longer a member of `FileLogger` but
a free-function instead (and a separate .cpp file). This is important
because the source files `logger.[ch]` are also used by ocloc library
and there is no point to contaminate ocloc code structure with
unnecessary dependencies.

Related-To: NEO-9421
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
2024-10-22 19:27:13 +02:00
Wenbin Lu a8a40d2afd feature: support SVM heap in reserveVirtualMem
Related-To: NEO-11981

Signed-off-by: Wenbin Lu <wenbin.lu@intel.com>
2024-10-22 16:47:14 +02:00
Bartosz Dunajski eed69f45ed fix: Enable 64k pages for TSB allocation
Related-To: HSD-18040274716

Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com>
2024-10-22 15:12:42 +02:00
Lukasz Jobczyk 8a647f6a39 Revert "performance: Ensure hostptrs removed before creating new one"
This reverts commit 5b2f2f3d83.

Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2024-10-21 10:36:06 +02:00
Jaroslaw Warchulski 5afc63df93 fix: use full size for HEAP_EXTENDED initialization
Related-To: GSD-8948
Signed-off-by: Jaroslaw Warchulski <jaroslaw.warchulski@intel.com>
2024-10-18 18:14:34 +02:00
Bartosz Dunajski 52e9a6e07f feature: Initial CB events IPC support
Related-To: NEO-11925

Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com>
2024-10-16 13:33:59 +02:00
Jaroslaw Warchulski 5626295e61 fix: don't split HEAP_EXTENDED among root devices
Related-To: GSD-8948
Signed-off-by: Jaroslaw Warchulski <jaroslaw.warchulski@intel.com>
2024-10-15 15:01:29 +02:00
Bartosz Dunajski f430279d62 fix: change doorbell type
Related-To: NEO-12870

Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com>
2024-10-14 14:58:11 +02:00
Mateusz Hoppe 5ae2552b4b fix: track shifted contextIds in bitset in bindlessHeapsHelper
- bitset is 64 bit in size, context ids may go beyond that limit
when multiple devices are available
- this change subtracts contextId of first context for a given root
device - tracked state dirty contexts ids are now zero-based

Resolves: GSD-10025

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2024-10-09 10:32:29 +02:00
Lukasz Jobczyk 5b2f2f3d83 performance: Ensure hostptrs removed before creating new one
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2024-10-08 13:04:56 +02:00
Dominik Dabek 9159e2acd4 fix: limit max size for allocation reuse
Limit max size for allocation reuse mechanism to 256MB.

Related-To: NEO-6893

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2024-10-08 11:52:47 +02:00
Compute-Runtime-Validation 60afb83b3b Revert "performance: Ensure hostptrs removed before creating new one"
This reverts commit a890ed5648.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-10-08 07:45:55 +02:00
Dominik Dabek 3685852ce0 Revert "performance: device usm sets localOnly...
Required"

This reverts commit a479afdbc8.

Related-To: NEO-12879

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2024-10-07 11:59:37 +02:00
Compute-Runtime-Validation 41df1a6f47 Revert "feature: support SVM heap in reserveVirtualMem"
This reverts commit bfaeeb01d6.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-10-03 14:53:50 +02:00
Dominik Dabek 752f313808 fix: limit allocation cache memory wastage
Allocations over a certain size will be checked for memory utilization
when chosen for reuse.
If utilization is below a threshold, they will not be reused.

Related-To: NEO-6893

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2024-10-01 09:49:19 +02:00
Maciej Plewka 80f75ceace fix: submit dummy exec to pin memory during zeContextMakeMemoryResident call
Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>
2024-09-23 14:43:59 +02:00
Lukasz Jobczyk a890ed5648 performance: Ensure hostptrs removed before creating new one
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2024-09-19 14:34:19 +02:00
Compute-Runtime-Validation e4d2f16632 Revert "performance: Ensure hostptrs removed before creating new one"
This reverts commit ac1d203555.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-09-18 19:31:34 +02:00
Lukasz Jobczyk ac1d203555 performance: Ensure hostptrs removed before creating new one
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2024-09-18 14:23:16 +02:00
Michal Mrozek dd631610b3 refactor: move memory tracking to memory manager
- remove wddm specific code
- improve total size reported to be in decimal

Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
2024-09-12 17:32:38 +02:00
Michal Mrozek da59b88122 refactor: improve logging of allocations
add capability to measure total amount of allocated memory
Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
2024-09-12 11:00:27 +02:00
Dominik Dabek f03c2fd4a7 fix: initialize scalar field, usm pooling
Related-To: NEO-6893

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2024-09-11 11:44:56 +02:00