Commit Graph

48 Commits

Author SHA1 Message Date
Jack Myers c26d24e555 fix: tbx page fault manager hang issue
- Updated `isAllocTbxFaultable` to exclude `gpuTimestampDeviceBuffer` from being
faultable.
- Replaced `SpinLock` with `RecursiveSpinLock` in `CpuPageFaultManager` and
`TbxPageFaultManager` to allow recursive locking.
- Added unit tests to verify the correct handling of `gpuTimestampDeviceBuffer`
in `TbxCommandStreamTests`.

Related-To: NEO-13748
Signed-off-by: Jack Myers <jack.myers@intel.com>
2025-02-18 05:05:38 +01:00
Compute-Runtime-Validation 116f7270be Revert "fix: tbx page fault manager hang issue"
This reverts commit 7d4e70a25b.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2025-02-12 10:38:05 +01:00
Jack Myers 7d4e70a25b fix: tbx page fault manager hang issue
- Updated `isAllocTbxFaultable` to exclude `gpuTimestampDeviceBuffer` from being
faultable.
- Replaced `SpinLock` with `RecursiveSpinLock` in `CpuPageFaultManager` and
`TbxPageFaultManager` to allow recursive locking.
- Added unit tests to verify the correct handling of `gpuTimestampDeviceBuffer`
in `TbxCommandStreamTests`.

Related-To: NEO-13748
Signed-off-by: Jack Myers <jack.myers@intel.com>
2025-02-12 02:19:37 +01:00
Jack Myers 7f9fadc314 fix: regression caused by tbx fault mngr
Addresses regressions from the reverted merge
of the tbx fault manager for host memory.

Recursive locking of mutex caused deadlock.

To fix, separate tbx fault data from base
cpu fault data, allowing separate mutexes
for each, eliminating recursive locks on
the same mutex.

By separating, we also help ensure that tbx-related
changes don't affect the original cpu fault manager code
paths.

As an added safe guard preventing critical regressions
and avoiding another auto-revert, the tbx fault manager
is hidden behind a new debug flag which is disabled by default.

Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com>
2025-01-09 07:48:53 +01:00
Compute-Runtime-Validation 124e755b9d Revert "fix: regression caused by tbx fault mngr"
This reverts commit 9a14fe2478.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-12-19 17:35:03 +01:00
Jack Myers 9a14fe2478 fix: regression caused by tbx fault mngr
Addresses regressions from the reverted merge
of the tbx fault manager for host memory.

This fixes attempts by the tbx fault manager
to protect/unprotect host buffer memory, even
if the host ptr was not driver-allocated.

In the case of the smoke test that triggered
the critical regression, clCreateBuffer was
called with the CL_MEM_USE_HOST_PTR flag.
The subsequent `mprotect` calls on the
provided host ptr then failed.

Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com>
2024-12-18 23:16:36 +01:00
Compute-Runtime-Validation 6c5d9a6ed7 Revert "feature: extend TBX page fault manager from CPU implementation"
This reverts commit 51c0e80299.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-12-12 12:30:22 +01:00
Jack Myers 51c0e80299 feature: extend TBX page fault manager from CPU implementation
In TBX mode, the host could not write to host buffers after access from device
code due to the lack of a migration mechanism post-initial TBX upload.
Migration is unnecessary with real hardware, but required for TBX.

This patch introduces a new page fault manager type that extends the original
CPU fault manager, enabling automatic migration of host buffers in TBX mode.

Refactoring was necessary to avoid diamond inheritance, achieved by using a
template parameter as the base class for OS-specific fault managers.

Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com>
2024-12-11 09:09:50 +01:00
Young Jin Yoon b1f73355ac feature: capture multiple cpu pagefault handler
Recorded multiple page fault handlers by using vector in
cpu_page_fault_manager_linux.

Added a static handlerIndex in order to track the depth of
handler logic to call appropriate previous handlers.

Related-To: NEO-11563
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-11-20 02:40:30 +01:00
Szymon Morek b2fd1972a4 fix: add cpu alloc to eviction list only once
Related-To: NEO-12572

Also, before migration to GPU domain, remove it from this list

Signed-off-by: Szymon Morek <szymon.morek@intel.com>
2024-10-01 11:47:32 +02:00
Compute-Runtime-Validation 4f96b6132f Revert "feature: capture multiple cpu pagefault handler"
This reverts commit 4b3a6e9cfe.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-09-28 07:53:30 +02:00
Young Jin Yoon 4b3a6e9cfe feature: capture multiple cpu pagefault handler
Recorded multiple page fault handlers by using vector in
cpu_page_fault_manager_linux.

Added a static handlerIndex in order to track the depth of
handler logic to call appropriate previous handlers.

Related-To: NEO-11563
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-09-27 00:34:45 +02:00
Compute-Runtime-Validation 5569ebbe1f Revert "feature: capture multiple cpu pagefault handler"
This reverts commit 44f2912195.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-09-03 05:18:26 +02:00
Young Jin Yoon 44f2912195 feature: capture multiple cpu pagefault handler
Recorded multiple page fault handlers by using vector in
cpu_page_fault_manager_linux.

Added a static handlerIndex in order to track the depth of
handler logic to call appropriate previous handlers.

Related-To: NEO-11563
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-09-02 16:46:35 +02:00
Kozlowski, Marek bd8fc07bb7 fix: Replace printf with current logging practice
* add missing stdout flush

Signed-off-by: Kozlowski, Marek <marek.kozlowski@intel.com>
2024-07-15 14:22:04 +02:00
Lukasz Jobczyk 8217b76cef refactor: Add key to not register pagefault handler on migration
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2024-05-28 08:45:34 +02:00
Mateusz Jablonski cb2b572e94 feature: add support for null aub mode
In this mode AUB csr will be created, however, no aub file will be created

Related-To: NEO-11097
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-04-09 16:59:42 +02:00
Mateusz Jablonski dd1b9d6abc refactor: correct naming of enum class constants 8/n
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-19 08:18:18 +01:00
Mateusz Jablonski c9664e6bad refactor: rename global debug manager to debugManager
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-11-30 13:00:59 +01:00
Young Jin Yoon 91deddb69b feature: register handler when we migrate to GPU
Created registerFaultHandler() and checkFaultHandlerFromPageFaultManager()
and removed registering sigaction() from the contructor of the
PageFaultManagerLinux class.

Added if statment to check the current pagefault handler is from the
pagefault manager. If not, register the pagefault handler of the current
pagefault manager on linux.

Refactored windows exception vector adding logic to
registerFaultHandler() and call upon the constructor of the
PageFaultManagerWindows, and make
checkFaultHandlerFromPageFaultManager() always return true for windows.

Related-To: NEO-8190
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2023-08-14 11:14:03 +02:00
Milczarek, Slawomir 027c51d396 feature: Add CPU side USM allocation to trim candidate list on page fault
Enable eviction of CPU side USM allocation for UMD migrations on Windows.
Reverts incorrect auto-revert commit 218de586a4f28b1de3e983b9006e7a99d3a4d10e.

Related-To: NEO-8015

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-07-25 15:21:12 +02:00
Compute-Runtime-Validation 918b41d26d Revert "feature: Add CPU side USM allocation to trim candidate list on page f...
This reverts commit 60a4448a07.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-07-24 08:44:22 +02:00
Milczarek, Slawomir 60a4448a07 feature: Add CPU side USM allocation to trim candidate list on page fage fault
Enable eviction of CPU side USM allocation for UMD migrations on Windows.

Related-To: NEO-8015
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-07-23 10:24:28 +02:00
Compute-Runtime-Validation 4a562e352b Revert "feature: Add CPU side USM allocation to trim candidate list on page f...
This reverts commit cce2cc920d.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-07-21 16:40:59 +02:00
Milczarek, Slawomir cce2cc920d feature: Add CPU side USM allocation to trim candidate list on page fault
Enable eviction of CPU side USM allocation for UMD migrations on Windows.

Related-To: NEO-8015

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-07-21 14:18:38 +02:00
Milczarek, Slawomir a6a0b95344 fix: Cpu page fault manager with control of host ptr eviction
Related-To: NEO-8015

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-07-12 14:13:17 +02:00
Jaime Arteaga 1e9e877394 Style: Add 0x prefix to PrintUmdSharedMigration logs
This to align with format used on another tools, like onetrace.

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2023-01-03 03:37:10 +01:00
Warchulski, Jaroslaw f275eea6ec Cleanup includes 14
Cleaned up files:
shared/source/device/device.h

Related-To: NEO-5548

Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2022-12-23 10:46:34 +01:00
Lukasz Jobczyk 8927399cce Set proper gpu domain transfer handler for CAL
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2022-11-17 11:53:02 +01:00
Jaime Arteaga db58e50564 Improve PrintUmdSharedMigration
Add size and timing data.

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2022-07-18 19:47:13 +02:00
Daniel Chabrowski 7463e1970b Cleanup headers
Make TUs and headers self-contained, remove unused headers

Signed-off-by: Daniel Chabrowski <daniel.chabrowski@intel.com>
2022-05-18 11:42:06 +02:00
Milczarek, Slawomir 7cd4ca5ce7 Fixed AUB capture in HW mode for umd-migrated shared allocations
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2022-03-30 12:04:58 +02:00
Maciej Bielski a71b88fefb PageFaultHandler: speedup UM allocations lookup
Add a per-instance SVMAllocsManager::nonGpuDomainAllocs container for
all allocations to be removed in
moveAllocationsWithinUMAllocsManagerToGpuDomain. This approach replaces
the current iterative search and performs the task faster.

Add 7 new unit-tests to verify the functionality related to
nonGpuDomainAllocs container, both in expected and unexpected/synthetic
scenarios.

For UTs replace a dummy unifiedMemoryManager pointer with a pointer to
an instace of SVMAllocsManager, otherwise a SegFault error is thrown at
the end of tests.

Perform overall cleanup in related tests implementation, includes but
not limited to removal of:

- givenInitialPlacementGpu\
WhenMovingToGpuDomainThenFirstAccessDoesNotInvokeTransfer

As it is fully covered by:

givenAllocationMovedToGpuDomain\
WhenVerifyingPagefaultThenAllocationIsMovedToCpuDomain

- givenInitialPlacementGpu\
WhenVerifyingPagefaultThenFirstAccessDoesNotInvokeTransfer

As it is fully covered by:

givenTbxAndnitialPlacementGpu\
WhenVerifyingPagefaultThenMemoryIsUnprotectedOnly

Finally, reduce code duplication where possible.

Related-To: NEO-6658
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
2022-03-16 11:18:24 +01:00
Milczarek, Slawomir 2be98a1e62 Create kmd migrated allocation with initial placement
Implements ZE_HOST_MEM_ALLOC_FLAG_BIAS_INITIAL_PLACEMENT
for zeMemAllocShared with KMD migrated allocation.

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2022-02-01 15:42:10 +01:00
Bartosz Dunajski e8cbcd2ab9 Dont make allocations non-AubWritable during migration
Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com>
2021-10-04 15:36:48 +02:00
Jaime Arteaga 803d7cdd8a Add debug key to print UMD shared migrations
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-07-22 08:37:09 +02:00
Spruit, Neil R 771722f3d7 L0 Support for hints to disable CPU Migration of USM memory
- Added support for disabling CPU migration of USM memory given
ZE_MEMORY_ADVICE_SET_READ_MOSTLY && ZE_MEMORY_ADVICE_SET_PREFERRED_LOCATION

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2021-07-20 04:34:16 +02:00
Maciej Dziuban 7334920ed3 Add UsmInitialPlacement debug flag
Signed-off-by: Maciej Dziuban <maciej.dziuban@intel.com>
2021-07-12 15:57:59 +02:00
Milczarek, Slawomir 997604c2dd AUB CSR to use a common AUB and TBX gpu domain handler
Related-To: NEO-5667

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2021-04-02 15:53:10 +02:00
Mateusz Hoppe c994bf6f00 Fix pagefault Cpu transfers in TBX mode
Related-To: NEO-5286

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2021-03-08 19:22:02 +01:00
Maciej Dziuban 94be510c18 Add initial placement hints for USM in OpenCL
Related-To: NEO-5059
Signed-off-by: Maciej Dziuban <maciej.dziuban@intel.com>
2020-12-01 18:30:26 +01:00
Lukasz Jobczyk f825364770 Unprotect memory after migration to CPU
Change-Id: I280296422ec30583752a0b62c7c1c8777aa84c32
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2020-10-14 11:14:32 +02:00
Maciej Dziuban 97ec64d22c Optimize first access to shared allocations
Change-Id: Ia3ce5f1e448128e7c9dfffb9ad49aaee15bdf948
Signed-off-by: Maciej Dziuban <maciej.dziuban@intel.com>
Related-To: NEO-5059
2020-09-15 12:59:07 +02:00
Maciej Dziuban 7c7cfb1099 Delete unneeded memory transfer for USM
Change-Id: I7b11a132b621069febd5b851f9e29e7177d8d395
Signed-off-by: Maciej Dziuban <maciej.dziuban@intel.com>
Related-To: NEO-5059
2020-09-14 16:13:58 +02:00
Lukasz Jobczyk 5739d526c4 Broadcast signal to all threads while handling USM pagefault
Related-To: NEO-4721

Change-Id: I77185f8db2576f626c1b6b5615ab5d8f9b22076f
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2020-07-08 12:54:01 +02:00
Mateusz Jablonski 93c1e1b976 Add MultiGraphicsAllocation to USM
Related-To: NEO-4672
Change-Id: I53ea4bea73ae6d52840146f63bc561bb90f9fe62
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2020-07-02 09:39:21 +02:00
Mateusz Jablonski 7df9945ebe Add absolute include paths
Change-Id: I67a6919bbbff1d30c7d6cdb257b41c87bad51e7f
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2020-02-23 23:49:12 +01:00
kamdiedrich e072275ae6 Reorganization directory structure [3/n]
Change-Id: If3dfa3f6007f8810a6a1ae1a4f0c7da38544648d
2020-02-23 23:48:28 +01:00