Commit Graph

68 Commits

Author SHA1 Message Date
Jack Myers 7f9fadc314 fix: regression caused by tbx fault mngr
Addresses regressions from the reverted merge
of the tbx fault manager for host memory.

Recursive locking of mutex caused deadlock.

To fix, separate tbx fault data from base
cpu fault data, allowing separate mutexes
for each, eliminating recursive locks on
the same mutex.

By separating, we also help ensure that tbx-related
changes don't affect the original cpu fault manager code
paths.

As an added safe guard preventing critical regressions
and avoiding another auto-revert, the tbx fault manager
is hidden behind a new debug flag which is disabled by default.

Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com>
2025-01-09 07:48:53 +01:00
Compute-Runtime-Validation 124e755b9d Revert "fix: regression caused by tbx fault mngr"
This reverts commit 9a14fe2478.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-12-19 17:35:03 +01:00
Jack Myers 9a14fe2478 fix: regression caused by tbx fault mngr
Addresses regressions from the reverted merge
of the tbx fault manager for host memory.

This fixes attempts by the tbx fault manager
to protect/unprotect host buffer memory, even
if the host ptr was not driver-allocated.

In the case of the smoke test that triggered
the critical regression, clCreateBuffer was
called with the CL_MEM_USE_HOST_PTR flag.
The subsequent `mprotect` calls on the
provided host ptr then failed.

Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com>
2024-12-18 23:16:36 +01:00
Compute-Runtime-Validation 6c5d9a6ed7 Revert "feature: extend TBX page fault manager from CPU implementation"
This reverts commit 51c0e80299.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-12-12 12:30:22 +01:00
Jack Myers 51c0e80299 feature: extend TBX page fault manager from CPU implementation
In TBX mode, the host could not write to host buffers after access from device
code due to the lack of a migration mechanism post-initial TBX upload.
Migration is unnecessary with real hardware, but required for TBX.

This patch introduces a new page fault manager type that extends the original
CPU fault manager, enabling automatic migration of host buffers in TBX mode.

Refactoring was necessary to avoid diamond inheritance, achieved by using a
template parameter as the base class for OS-specific fault managers.

Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com>
2024-12-11 09:09:50 +01:00
Szymon Morek e6d11eb04b performance: stop ULLS for BCS during migration
Related-To: NEO-13340

When regular copy CSR has enabled direct submission,
stop it before migration on internal CSR.

Signed-off-by: Szymon Morek <szymon.morek@intel.com>
2024-12-02 17:57:12 +01:00
Compute-Runtime-Validation bced7e4621 Revert "performance: stop ULLS for BCS during migration"
This reverts commit 81ba52aac4.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-11-29 04:42:26 +01:00
Szymon Morek 81ba52aac4 performance: stop ULLS for BCS during migration
Related-To: NEO-13340

When regular copy CSR has enabled direct submission,
stop it before migration on internal CSR.

Signed-off-by: Szymon Morek <szymon.morek@intel.com>
2024-11-27 20:06:50 +01:00
Szymon Morek b2fd1972a4 fix: add cpu alloc to eviction list only once
Related-To: NEO-12572

Also, before migration to GPU domain, remove it from this list

Signed-off-by: Szymon Morek <szymon.morek@intel.com>
2024-10-01 11:47:32 +02:00
Kozlowski, Marek bd8fc07bb7 fix: Replace printf with current logging practice
* add missing stdout flush

Signed-off-by: Kozlowski, Marek <marek.kozlowski@intel.com>
2024-07-15 14:22:04 +02:00
Mateusz Jablonski dd1b9d6abc refactor: correct naming of enum class constants 8/n
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-19 08:18:18 +01:00
Mateusz Jablonski 739d181026 refactor: correct naming of enum class constants 6/n
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-13 14:48:52 +01:00
Mateusz Jablonski c9664e6bad refactor: rename global debug manager to debugManager
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-11-30 13:00:59 +01:00
Milczarek, Slawomir 027c51d396 feature: Add CPU side USM allocation to trim candidate list on page fault
Enable eviction of CPU side USM allocation for UMD migrations on Windows.
Reverts incorrect auto-revert commit 218de586a4f28b1de3e983b9006e7a99d3a4d10e.

Related-To: NEO-8015

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-07-25 15:21:12 +02:00
Compute-Runtime-Validation 918b41d26d Revert "feature: Add CPU side USM allocation to trim candidate list on page f...
This reverts commit 60a4448a07.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-07-24 08:44:22 +02:00
Milczarek, Slawomir 60a4448a07 feature: Add CPU side USM allocation to trim candidate list on page fage fault
Enable eviction of CPU side USM allocation for UMD migrations on Windows.

Related-To: NEO-8015
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-07-23 10:24:28 +02:00
Compute-Runtime-Validation 4a562e352b Revert "feature: Add CPU side USM allocation to trim candidate list on page f...
This reverts commit cce2cc920d.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-07-21 16:40:59 +02:00
Milczarek, Slawomir cce2cc920d feature: Add CPU side USM allocation to trim candidate list on page fault
Enable eviction of CPU side USM allocation for UMD migrations on Windows.

Related-To: NEO-8015

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-07-21 14:18:38 +02:00
Jaime Arteaga 37ed03a15c feature: Propagate error from makeResident to caller
Have makeResident return error to the caller, instead of always
SUCCESS. This will allow interfaces like zeContextMakeMemoryResident
to fail properly.

Additionally, change the parsing of MemoryOperationsStatus from
ZE_RESULT_ERROR_OUT_OF_HOST_MEMORY to
ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY, since when making resources
resident, it is the device running out of memory, instead of the
host.

Related-To: LOCI-4443

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2023-05-24 21:08:27 +02:00
Mateusz Jablonski 0da5e6f277 refactor l0: cleanup cmake file level_zero/core/source/CMakeLists.txt
Related-To: NEO-7507
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-03-16 12:38:15 +01:00
Michal Mrozek 9d0f1879ca [fix] add migrated pointers to proper container.
when gpu to cpu migration occurs, we need to populate proper container.

Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
2023-03-15 09:51:46 +01:00
Jaime Arteaga 1e9e877394 Style: Add 0x prefix to PrintUmdSharedMigration logs
This to align with format used on another tools, like onetrace.

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2023-01-03 03:37:10 +01:00
Mateusz Jablonski e3ede4bb92 Correct naming in memadvise flags
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2022-12-07 14:49:43 +01:00
Lukasz Jobczyk 8927399cce Set proper gpu domain transfer handler for CAL
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2022-11-17 11:53:02 +01:00
Jaime Arteaga db58e50564 Improve PrintUmdSharedMigration
Add size and timing data.

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2022-07-18 19:47:13 +02:00
Mateusz Hoppe 5956aea18d Limit header includes from level_zero device.h
- remove including debugger_l0.h from device.h
- add getL0Debugger() to shared NEO Device

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2022-07-06 16:41:17 +02:00
Jaime Arteaga 803d7cdd8a Add debug key to print UMD shared migrations
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-07-22 08:37:09 +02:00
Jaime Arteaga 2588997e32 Remove memory.cpp from L0 core source
It was only hosting two methods, which are better in
driver_handl_imp.cpp

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-07-21 16:45:53 +02:00
Spruit, Neil R 771722f3d7 L0 Support for hints to disable CPU Migration of USM memory
- Added support for disabling CPU migration of USM memory given
ZE_MEMORY_ADVICE_SET_READ_MOSTLY && ZE_MEMORY_ADVICE_SET_PREFERRED_LOCATION

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2021-07-20 04:34:16 +02:00
Bartosz Dunajski 3c88492229 Revert "Extended import device memory"
This reverts commit ea6555e788c98314160a11898212c2d664999705.

Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com>
2021-07-16 09:56:52 +02:00
Kamil Diedrich d5fdb949eb Extended import device memory
Signed-off-by: Kamil Diedrich <kamil.diedrich@intel.com>
2021-07-07 16:12:36 +02:00
Jaime Arteaga aa51c5ee76 Add support for ZE_IPC_MEMORY_FLAG_BIAS_UNCACHED
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-07-02 17:56:18 +02:00
Jaime Arteaga 5e29dccddc Add IPC events support
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-06-08 08:11:15 +02:00
lgotszal 3bd4bca911 Copyright header update
Dates corrected in copyright headers to reflect original publication date
(2018 for OpenCL, 2020 for Level Zero).

Signed-off-by: lgotszal <lukasz.gotszald@intel.com>
2021-05-17 20:38:19 +02:00
Compute-Runtime-Validation dd6653892e Revert "Move SVM allocs memory manager to L0::Context (1/N)"
This reverts commit 9080e2ee5b.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2021-05-09 12:37:44 +02:00
Jaime Arteaga 9080e2ee5b Move SVM allocs memory manager to L0::Context (1/N)
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-05-07 22:17:10 +02:00
Jaime Arteaga 5f0e4f8e2a Revert "Move memory managers to L0::Context (1/N)"
This reverts commit 9ce887b8b53a787a7e0a0d808c96e295655ae57b.


Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-05-06 04:56:09 +02:00
Jaime Arteaga 1f1fbb193b Move memory managers to L0::Context (1/N)
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-05-05 23:01:42 +02:00
Jaime Arteaga ef5174f3fc Eliminate wrappers in L0::Context class for driverHandle calls
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-04-20 23:50:23 +02:00
Jaime Arteaga 128cd8a31c Add support for non-IPC P2P access to L0
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-04-20 01:05:40 +02:00
Jaime Arteaga ebb1474210 Isolate shared allocations with respect to context
Related-To: LOCI-1996

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-04-17 03:38:46 +02:00
Jaime Arteaga da7aef49e6 Isolate device allocations with respect to context
Related-To: LOCI-1996

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-04-09 17:47:47 +02:00
Jaime Arteaga ddca333045 Improve support for L0 uncached device allocations
Make sure UNCACHED flags are translated into setting the MOCS index
for uncaching L3.

Related-To: NEO-5500

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-04-08 13:00:03 +02:00
Jaime Arteaga 0561ec183d Add ULT for changeMemoryOperationStatusToL0ResultType
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-03-31 17:26:43 +02:00
Jaime Arteaga c7e65a90d8 Free IPC memory on closeIpcMemHandle() call
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-03-27 11:25:57 +01:00
Jaime Arteaga 0dc73ad686 Isolate host allocations with respect to context
Related-To: LOCI-1996

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-03-25 06:15:59 +01:00
Compute-Runtime-Validation 46a971de81 Revert "Free IPC memory on closeIpcMemHandle() call"
This reverts commit cda914f7d0.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2021-03-24 06:58:47 +01:00
Jaime Arteaga cda914f7d0 Free IPC memory on closeIpcMemHandle() call
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-03-23 19:09:12 +01:00
Jaime Arteaga 6a81edfbe1 Add support for ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-03-15 16:20:10 +01:00
Jaime Arteaga 71940061b8 Make sure IPC handles are correctly copied
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2021-03-13 18:22:56 +01:00