Jack Myers
7f9fadc314
fix: regression caused by tbx fault mngr
...
Addresses regressions from the reverted merge
of the tbx fault manager for host memory.
Recursive locking of mutex caused deadlock.
To fix, separate tbx fault data from base
cpu fault data, allowing separate mutexes
for each, eliminating recursive locks on
the same mutex.
By separating, we also help ensure that tbx-related
changes don't affect the original cpu fault manager code
paths.
As an added safe guard preventing critical regressions
and avoiding another auto-revert, the tbx fault manager
is hidden behind a new debug flag which is disabled by default.
Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com >
2025-01-09 07:48:53 +01:00
Compute-Runtime-Validation
124e755b9d
Revert "fix: regression caused by tbx fault mngr"
...
This reverts commit 9a14fe2478 .
Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com >
2024-12-19 17:35:03 +01:00
Jack Myers
9a14fe2478
fix: regression caused by tbx fault mngr
...
Addresses regressions from the reverted merge
of the tbx fault manager for host memory.
This fixes attempts by the tbx fault manager
to protect/unprotect host buffer memory, even
if the host ptr was not driver-allocated.
In the case of the smoke test that triggered
the critical regression, clCreateBuffer was
called with the CL_MEM_USE_HOST_PTR flag.
The subsequent `mprotect` calls on the
provided host ptr then failed.
Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com >
2024-12-18 23:16:36 +01:00
Compute-Runtime-Validation
6c5d9a6ed7
Revert "feature: extend TBX page fault manager from CPU implementation"
...
This reverts commit 51c0e80299 .
Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com >
2024-12-12 12:30:22 +01:00
Jack Myers
51c0e80299
feature: extend TBX page fault manager from CPU implementation
...
In TBX mode, the host could not write to host buffers after access from device
code due to the lack of a migration mechanism post-initial TBX upload.
Migration is unnecessary with real hardware, but required for TBX.
This patch introduces a new page fault manager type that extends the original
CPU fault manager, enabling automatic migration of host buffers in TBX mode.
Refactoring was necessary to avoid diamond inheritance, achieved by using a
template parameter as the base class for OS-specific fault managers.
Related-To: NEO-12268
Signed-off-by: Jack Myers <jack.myers@intel.com >
2024-12-11 09:09:50 +01:00
Bartosz Dunajski
dab4166837
fix: add missing aub polls on sync points
...
Related-To: HSD-14023925176
Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com >
2024-11-21 09:17:54 +01:00
Bartosz Dunajski
dd8460beba
refactor: reduce TBX download timeout for unit tests
...
Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com >
2024-09-09 19:05:03 +02:00
Bartosz Dunajski
db611962f7
fix: improve task count handling in tbx download path
...
Related-To: HSD-18039789178
Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com >
2024-08-28 15:32:15 +02:00
Szymon Morek
b8f181d50e
performance: remove trim candidate list
...
Related-To: NEO-11755
Removing trim candidate list reduces overhead
caused by residency handling. Allocations required
for eviction are placed in eviction container managed
by CSR.
Signed-off-by: Szymon Morek <szymon.morek@intel.com >
2024-08-23 12:21:50 +02:00
Bartosz Dunajski
696b02bfd3
fix: improve TBX downloading after L0 Event sync
...
Related-To: HSD-18038498579
Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com >
2024-08-23 10:42:17 +02:00
Bartosz Dunajski
24cfd203ab
fix: dont download tbx allocations on heapless first device submission
...
Related-To: HSD-18039476929
Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com >
2024-08-06 14:03:42 +02:00
Mateusz Hoppe
b3d72ddd3d
fix: write memory for resident allocations in simulation mode
...
- refactor and call proceesFlushResdiency() on memoryOperationsHandler
- call free() to remove allocation from resident allocations when
graphics allocation is released
Related-To: NEO-11719
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com >
2024-06-14 18:49:01 +02:00
Mateusz Jablonski
cb2b572e94
feature: add support for null aub mode
...
In this mode AUB csr will be created, however, no aub file will be created
Related-To: NEO-11097
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com >
2024-04-09 16:59:42 +02:00
Filip Hazubski
d25026b263
refactor: Add getTotalMemBankSize function to ReleaseHelper
...
Minor refactor of ULTs to not use hard coded banks size.
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com >
2024-03-06 09:53:56 +01:00
Michal Mrozek
64232ec370
fix: choose proper csr for low priority immediate command lists
...
Resolves: NEO-10168
Signed-off-by: Michal Mrozek <michal.mrozek@intel.com >
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com >
2024-02-28 12:45:02 +01:00
Mateusz Jablonski
de93bc6928
refactor: correct naming of enum class constants 10/n
...
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com >
2023-12-19 11:30:39 +01:00
Mateusz Jablonski
739d181026
refactor: correct naming of enum class constants 6/n
...
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com >
2023-12-13 14:48:52 +01:00
Mateusz Jablonski
c9664e6bad
refactor: rename global debug manager to debugManager
...
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com >
2023-11-30 13:00:59 +01:00
Mateusz Hoppe
83ac95d293
fix: L0 - remove synchronization with events on appends in tbx mode
...
Related-To: NEO-9400
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com >
2023-11-27 10:39:55 +01:00
Compute-Runtime-Validation
fca2159430
Revert "fix: if device hierarchy is flat then getSubDevicesCount return 1u"
...
This reverts commit cb0bb57f49 .
Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com >
2023-10-26 15:40:29 +02:00
Baj, Tomasz
cb0bb57f49
fix: if device hierarchy is flat then getSubDevicesCount return 1u
...
Related-To: NEO-9167
Signed-off-by: Baj, Tomasz <tomasz.baj@intel.com >
2023-10-25 15:51:52 +02:00
Mateusz Hoppe
52b0f32688
fix: offset cpu address when writing chunk in simulated csr
...
- not only gpuAddress is offset but also cpu address with data needs
to be offset while writing memory.
Related-To: GSD-6604
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com >
2023-10-23 17:01:25 +02:00
Dunajski, Bartosz
25195ebc96
fix: capability to write memory chunk in aub/tbx mode
...
Related-To: GSD-6604
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com >
2023-10-19 19:13:11 +02:00
Mateusz Hoppe
f5cb7df7cd
fix: do not download event allocation in TBX mode
...
- only download when allocation was used - inidcated by taskCount
Resolves: NEO-8312
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com >
2023-08-29 16:27:33 +02:00
Dunajski, Bartosz
cd9ad1f04c
fix: decanonize GPU VA during TBX memory read.
...
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com >
2023-07-26 19:44:19 +02:00
Mateusz Jablonski
30c5d8a681
fix: pass gmm helper to getDumpSurfaceInfo function
...
gmm may not exist for buffer allocation
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com >
2023-07-03 11:59:52 +02:00
Dunajski, Bartosz
5fe9d70066
feature: new multitile post sync layout for immediate write [1/n]
...
No functional changes in this commit. This is prework.
Related-To: NEO-7966
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com >
2023-06-07 13:11:10 +02:00
Fabian Zwolinski
e351a90f81
refactor: Rename member variables to camelCase 2/n
...
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com >
2023-04-27 20:39:22 +02:00
Kamil Kopryk
fa8579602f
refactor: rename product helper files n/n
...
Related-To: NEO-7703
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com >
2023-03-10 13:24:38 +01:00
Warchulski, Jaroslaw
0556d543a3
Cleanup includes 56
...
Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com >
2023-02-16 14:42:44 +01:00
Warchulski, Jaroslaw
8c17313c8b
Cleanup includes 53
...
Cleaned up files:
opencl/source/mem_obj/image.inl
shared/offline_compiler/source/decoder/zebin_manipulator.h
shared/source/aub_mem_dump/aub_alloc_dump.h
shared/source/compiler_interface/intermediate_representations.h
shared/source/helpers/blit_commands_helper_base.inl
shared/source/utilities/debug_file_reader.h
shared/source/utilities/software_tags.h
shared/source/xe_hpc_core/hw_cmds_pvc.h
Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com >
2023-02-15 13:21:14 +01:00
Kamil Kopryk
2484c7ceb2
refactor: rename hw_helper files to gfx_core_helper files
...
Related-To: NEO-6853
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com >
2023-02-01 19:37:51 +01:00
Warchulski, Jaroslaw
77501d86ba
Cleanup includes 35
...
Cleaned up files:
shared/source/command_stream/command_stream_receiver.h
Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com >
2023-01-17 18:51:40 +01:00
Warchulski, Jaroslaw
a2fe929f0c
Cleanup includes 18
...
Cleaned up files:
shared/source/command_stream/command_stream_receiver_hw.h
shared/source/compiler_interface/compiler_interface.h
shared/source/direct_submission/direct_submission_hw.h
shared/source/helpers/dirty_state_helpers.h
Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com >
2023-01-02 13:28:29 +01:00
Kamil Kopryk
3c5b3d4bac
Refactor: don't use global ProductHelper getter in shared files 2/n
...
Related-To: NEO-6853
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com >
2022-12-29 09:50:06 +01:00
Kamil Kopryk
232b886056
Rename HwInfoConfig to ProductHelper
...
Related-To: NEO-6853
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com >
2022-12-14 14:39:52 +01:00
Mateusz Jablonski
8f308f24e5
Reduce usage of global gfx core helper getter [1/n]
...
Related-To: NEO-6853
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com >
2022-12-09 17:27:37 +01:00
Kamil Kopryk
03b687881f
Rename HwHelper -> GfxCoreHelper
...
Related-To: NEO-6853
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com >
2022-12-09 10:29:06 +01:00
Maciej Plewka
4b42b066f8
Use dedicated using type for TaskCount
...
Related-To: NEO-7155
Signed-off-by: Maciej Plewka <maciej.plewka@intel.com >
2022-11-28 16:44:44 +01:00
Mateusz Jablonski
a17df8fa86
Return SubmissionStatus from processResidency method
...
it allows to return non-binary status to API layer
Related-To: NEO-7412
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com >
2022-11-15 13:17:43 +01:00
Warchulski, Jaroslaw
6cbb3cfb05
Cleanup includes 3
...
Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com >
2022-11-07 14:52:31 +01:00
Warchulski, Jaroslaw
fb25f96081
Cleanup includes 2
...
Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com >
2022-11-07 10:36:50 +01:00
Dunajski, Bartosz
06a647a5e9
Set SkipResourceCleanup in TBX mode
...
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com >
2022-10-27 12:23:08 +02:00
Compute-Runtime-Validation
638aba45a0
Revert "Set SkipResourceCleanup in TBX mode"
...
This reverts commit cb83c1d935 .
Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com >
2022-10-26 07:09:29 +02:00
Dunajski, Bartosz
cb83c1d935
Set SkipResourceCleanup in TBX mode
...
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com >
2022-10-25 14:31:35 +02:00
Fabian Zwolinski
645600d141
Return error when there is no memory to evict
...
We want to return error code to the application instead of aborting when
we are not able to make more memory resident.
Related-To: NEO-7289
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com >
2022-09-22 14:26:55 +02:00
Jobczyk, Lukasz
a285712cc4
Add missing download allocation calls
...
Signed-off-by: Jobczyk, Lukasz <lukasz.jobczyk@intel.com >
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com >
2022-03-31 09:49:22 +02:00
Lukasz Jobczyk
a230f267e1
Poll task count indefinitely on high throttle command queue
...
Resolves: NEO-6781
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com >
2022-03-25 10:06:16 +01:00
Dominik Dabek
e0c892ed55
Add lock to downloading allocations on tbx
...
When running multiple threads, one thread could clear
allocationsForDownload while another was iterating over it.
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com >
2022-02-16 16:51:41 +01:00
Patryk Wrobel
498cf5e871
Implement GPU hang detection
...
This change uses DRM_IOCTL_I915_GET_RESET_STATS to detect
GPU hangs. When such situation is encountered, then
zeCommandQueueSynchronize returns ZE_RESULT_ERROR_DEVICE_LOST.
Related-To: NEO-5313
Signed-off-by: Patryk Wrobel <patryk.wrobel@intel.com >
2022-01-31 13:48:17 +01:00