Commit Graph

112 Commits

Author SHA1 Message Date
Brandon Yates
106e8be9a9 fix: Don't abort application due to gpu fault when debugging is enabled
Signed-off-by: Brandon Yates <brandon.yates@intel.com>
2025-01-30 23:37:50 +01:00
Naklicki, Mateusz
218122c46b test: make sure scratch pages are disabled on pvc+
Signed-off-by: Naklicki, Mateusz <mateusz.naklicki@intel.com>
2025-01-16 14:23:24 +01:00
Naklicki, Mateusz
3e29ca9057 fix: explicitly disable scratch pages on xekmd platforms
Signed-off-by: Naklicki, Mateusz <mateusz.naklicki@intel.com>
2025-01-16 12:09:27 +01:00
Compute-Runtime-Validation
95b3d16d76 Revert "fix: add scratch page support with xekmd"
This reverts commit dec695f405.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-12-23 19:05:29 +01:00
Naklicki, Mateusz
dec695f405 fix: add scratch page support with xekmd
Also explicitly disable it by default.


Signed-off-by: Naklicki, Mateusz <mateusz.naklicki@intel.com>
2024-12-23 11:00:12 +01:00
Mateusz Jablonski
e7a8936d70 test: don't use product specific ioctl helper in generic tests
Related-To: NEO-13527
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-12-18 17:26:12 +01:00
Mateusz Jablonski
593a6c54ea feature: add debug flag to ignore product specific ioctl helper creation
Related-To: NEO-13527
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-12-18 13:52:30 +01:00
Mateusz Jablonski
270570c5d3 refactor: move i915 specific logic to ioctl helper i915
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-12-18 09:44:30 +01:00
Young Jin Yoon
ebdded1bb9 fix: change error message for GPU page fault
Change the error message for GPU page fault to match
with the message from gdb output

Related-To: NEO-13093
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-10-26 01:29:50 +02:00
Oskar Hubert Weber
3ccab79ed8 test: dir create
Related-To: NEO-11500

Signed-off-by: Oskar Hubert Weber <oskar.hubert.weber@intel.com>
2024-08-05 13:58:25 +02:00
Mateusz Jablonski
8edc40adbc fix: populate SliceInfo based on TopologyMap in drm path
pick minimal config in case of multi tile

Related-To: NEO-12073
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-07-31 15:55:21 +02:00
Wenbin Lu
e2f1735cc5 test: use realistic values in topology query tests
Related-To: NEO-9489

Signed-off-by: Wenbin Lu <wenbin.lu@intel.com>
2024-07-29 16:34:22 +02:00
Mateusz Jablonski
382584067a fix: setup initial l3 bank count before querying topology
Resolves: NEO-12169
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-07-29 12:24:45 +02:00
Mateusz Jablonski
71f4088a1e fix: correct hw info setting in drm path
add fallback to get max eu per ss from topology if not available in other way

Related-To: NEO-12073
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-07-26 17:23:56 +02:00
Mateusz Jablonski
85df385582 fix: ensure system info is queried before querying topology
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-07-26 16:03:59 +02:00
Mateusz Jablonski
0cdfa882eb fix: correct setting hw info in drm flow
firstly, setup hw info using product specific functions
secondly, query system info from GuC to setup max values
then, query memory info
then, query engine info as it depends on memory info
then, query topology as it depends on engine info

Related-To: NEO-12073
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-07-26 12:20:40 +02:00
Compute-Runtime-Validation
4dc737fc4a Revert "fix: correct setting hw info in drm flow"
This reverts commit 3b2a0983f5.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-07-25 05:41:33 +02:00
Mateusz Jablonski
3b2a0983f5 fix: correct setting hw info in drm flow
firstly, setup hw info using product specific functions
secondly, query system info from GuC to setup max values
thirdly, query topology to setup current topology data

Related-To: NEO-12073
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-07-24 16:18:58 +02:00
Mateusz Jablonski
64873a5dd2 test: remove dependency of i915 from mock ioctl helper
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-07-24 13:21:48 +02:00
Wenbin Lu
357a607d22 refactor: correct the naming of some topology-related variables
Related-To: NEO-9489

Signed-off-by: Wenbin Lu <wenbin.lu@intel.com>
2024-07-24 08:39:12 +02:00
Compute-Runtime-Validation
1929e54e91 Revert "feature: temporarily enable scratch page on pvc"
This reverts commit e3b97e3716.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-07-12 09:55:25 +02:00
Mateusz Hoppe
c660784df2 fix: fallback path while creating drm context
- if create VM ioctl fails, fallback to query VM from created context
- in fallback path context's VM will not have flags applied

Related-To: NEO-7813

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2024-07-09 14:10:51 +02:00
Young Jin Yoon
e3b97e3716 feature: temporarily enable scratch page on pvc
Enabled scratch page by default on PVC by setting
isDisableScratchPagesSupported to false for PVC.

Related-To: GSD-7742
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-07-01 21:50:16 +02:00
Bartosz Dunajski
da9c009b88 feature: assign unique interrupt to queue
Related-To: NEO-8179

Signed-off-by: Bartosz Dunajski <bartosz.dunajski@intel.com>
2024-06-07 10:06:31 +02:00
Katarzyna Cencelewska
12d1bce6b9 fix: change gmm resource for externalHostPtr
Resolves: NEO-10157

Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>
2024-05-22 16:50:17 +02:00
Compute-Runtime-Validation
94a4bbac57 Revert "fix: change gmm resource for externalHostPtr"
This reverts commit 63843862df.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-05-21 07:43:53 +02:00
Katarzyna Cencelewska
63843862df fix: change gmm resource for externalHostPtr
Resolves: NEO-10157

Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>
2024-05-21 00:43:29 +02:00
Young Jin Yoon
49cc1a0ba0 fix: use llx for fprintf and IoctlFunctions
Changed format for address printing from %lx to %llx for
fprintf introduced in drm_neo.cpp, and then use
IoctlFunctions::fprintf instead of std::printf to avoid
errors on gcc.

Changed formate for address printing from %lx to %llx for
snprintf introduced in drm_test.cpp, and then type casted
to long long unsigned int explictly to avoid errors.

Related-To: GSD-7611
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-05-20 16:39:49 +02:00
Young Jin Yoon
e204d27190 fix: print to stdout for disable scratch page
Modified to print out error messages to stdout when disable scratch page
is used.

Related-To: GSD-7611
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-05-16 15:05:07 +02:00
Young Jin Yoon
06faaab5bb refactor: read scratch page options during init
Change scratch page logic to initialize during Drm::create.

Related-To: GSD-7742
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-05-15 08:56:14 +02:00
Young Jin Yoon
07aa53fd87 fix: disable scratch page by default only on PVC
Disabled scratch paged by default only on PVC with productHelper.

Related-To: GSD-7742
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-05-01 23:44:48 +02:00
Mateusz Jablonski
9468915768 fix: correct preemption support in xe path
preemption is always supported by xe kmd

Related-To: NEO-10496, HSD-18037744953
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-04-04 13:29:02 +02:00
Young Jin Yoon
907129bb33 feature: disable scratch page by default
Modified default values for disableScratch and gpuPageFault
to true and 10 respectively in drm_neo.cpp, in order to
disable scratch pages by default.
Modified to set gpuPageFault to 0 as a default value when
scratch page is not disabled.

Related-To: GSD-5673
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-04-04 09:50:02 +02:00
Mateusz Jablonski
420e1391b2 fix: handle not aligned gtt size reported by i915
when i915 reports gtt size between 47 and 48 bits we consider
it as 48 bit VA space

Related-To: GSD-8215
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-03-29 07:51:06 +01:00
Compute-Runtime-Validation
e3f50e8aa9 Revert "fix: handle not aligned gtt size reported by i915"
This reverts commit dae901c13f.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-03-28 12:03:23 +01:00
Mateusz Jablonski
dae901c13f fix: handle not aligned gtt size reported by i915
when i915 reports gtt size between 47 and 48 bits we consider
it as 48 bit VA space

Related-To: GSD-8215
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-03-28 08:46:53 +01:00
Mateusz Jablonski
d94be09020 refactor: remove not needed check for exec softpin
Related-To: NEO-10496
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2024-03-22 17:30:49 +01:00
Young Jin Yoon
ec009cf9e3 fix: abort only when disabling scratch page
Modifed getResetStatus to abort only when scratch page is disabled
Removed an incorrect UNRECOVERABLE_IF statement based on the status:
validPageFault can be true when banned flag is not set, if CAT error
does not occur as a result of page fault.

Related-To: GSD-5673
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-03-21 21:55:25 +01:00
Compute-Runtime-Validation
016c234893 Revert "feature: disable scratch page by default"
This reverts commit dab5469f81.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2024-03-16 01:52:00 +01:00
Young Jin Yoon
dab5469f81 feature: disable scratch page by default
Modified default values for disableScratch and gpuPageFault
to true and 10 respectively in drm_nep.cpp, in order to
disable scratch pages by default.

Related-To: GSD-5673
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-03-15 11:44:10 +01:00
Young Jin Yoon
82728ff394 feature: add logic to iterate for all contexts to check GPU pagefault
Implemented to go through entire contexts in the process and then query
reset status to check the unexpected GPU segfault.

Added a new debug variable GpuFaultCheckThreshold to change the checking
frequency for each hang check for performance analysis.

Related-To: GSD-5673
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-03-15 07:48:39 +01:00
Young Jin Yoon
7b81c4e08f feature: abort when unexpected GPU page fault detected
If ResetStats from i915 is from the GPU page fault, abort
the entire process instead of disabling engines.
Added a fallback mechanism when prelim_drm_i915_reset_stats
fails.

Related-To: GSD-5673
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-03-14 08:14:59 +01:00
Brandon Yates
76de854a69 feature: Set Debug Attach Available for Xe
Related-to: NEO-8402

Signed-off-by: Brandon Yates <brandon.yates@intel.com>
2024-01-24 09:04:11 +01:00
Mateusz Jablonski
2eba5b35e4 refactor: correct naming of DrmParam enum values
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-13 15:43:46 +01:00
Mateusz Jablonski
739d181026 refactor: correct naming of enum class constants 6/n
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-13 14:48:52 +01:00
Mateusz Jablonski
432142c574 refactor: correct naming of enum class constants 4/n
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-13 08:08:51 +01:00
Mateusz Jablonski
cff6c81be0 refactor: correct naming of DrmIoctl enums
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-12 10:02:19 +01:00
Mateusz Jablonski
beafea9b39 refactor: correct naming of enum class constants 2/n
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-11 13:13:35 +01:00
Mateusz Jablonski
c9664e6bad refactor: rename global debug manager to debugManager
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-11-30 13:00:59 +01:00
Mateusz Jablonski
36194c4e7d refactor: correct variable namings
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-11-29 23:49:03 +01:00