Commit Graph

72 Commits

Author SHA1 Message Date
Young Jin Yoon 82728ff394 feature: add logic to iterate for all contexts to check GPU pagefault
Implemented to go through entire contexts in the process and then query
reset status to check the unexpected GPU segfault.

Added a new debug variable GpuFaultCheckThreshold to change the checking
frequency for each hang check for performance analysis.

Related-To: GSD-5673
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-03-15 07:48:39 +01:00
Young Jin Yoon 7b81c4e08f feature: abort when unexpected GPU page fault detected
If ResetStats from i915 is from the GPU page fault, abort
the entire process instead of disabling engines.
Added a fallback mechanism when prelim_drm_i915_reset_stats
fails.

Related-To: GSD-5673
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-03-14 08:14:59 +01:00
Brandon Yates 76de854a69 feature: Set Debug Attach Available for Xe
Related-to: NEO-8402

Signed-off-by: Brandon Yates <brandon.yates@intel.com>
2024-01-24 09:04:11 +01:00
Mateusz Jablonski 2eba5b35e4 refactor: correct naming of DrmParam enum values
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-13 15:43:46 +01:00
Mateusz Jablonski 739d181026 refactor: correct naming of enum class constants 6/n
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-13 14:48:52 +01:00
Mateusz Jablonski 432142c574 refactor: correct naming of enum class constants 4/n
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-13 08:08:51 +01:00
Mateusz Jablonski cff6c81be0 refactor: correct naming of DrmIoctl enums
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-12 10:02:19 +01:00
Mateusz Jablonski beafea9b39 refactor: correct naming of enum class constants 2/n
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-11 13:13:35 +01:00
Mateusz Jablonski c9664e6bad refactor: rename global debug manager to debugManager
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-11-30 13:00:59 +01:00
Mateusz Jablonski 36194c4e7d refactor: correct variable namings
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-11-29 23:49:03 +01:00
Mateusz Jablonski 35c1f34672 refactor: move number of threads per eu to release helper
Related-To: HSD-18034098647
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-11-20 12:16:33 +01:00
Mateusz Jablonski 3ceafa2259 fix: remove setting debug flags for ioctl helper xe
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-09-26 15:42:52 +02:00
Cencelewska, Katarzyna 98dae70415 fix: add helper to proper call GemCreate on xe kmd
Related-To: NEO-8325
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
2023-09-08 12:27:11 +02:00
Bari, Pratik 3f083360a2 feature(sysman): Added sysfs filenames for the memory module
- The sysfs filenames have been added in the sysfsNameToFileMap of the
SysmanKmdInterface classes.
- The functions returning the sysfs filenames have been removed from the
shared directory.
- The ULTs have been added to return the sysfs filenames.

Related-To: LOCI-4699

Signed-off-by: Bari, Pratik <pratik.bari@intel.com>
2023-08-03 22:36:17 +02:00
Kamil Kopryk 082d33bb7c fix: correct query topology on xe
Related-To: NEO-7996
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
2023-06-22 13:24:52 +02:00
Matias Cabral 96517a08aa feature: Implement zetMetricGroupGetGlobalTimestampsExp()
Resolves: LOCI-3072

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2023-06-21 09:48:41 +02:00
Bari, Pratik a15e8a9679 feature: Added changes for Porting Memory API with XE driver
The Memory Info object is used in the getState function for memory.
Some of the ULTS in the memory modules has been modified.
A function to return the sysfs nodes for the Memory address range has
been added in the IoctlHelper class corresponding to the XE and i915
driver.

Related-To: LOCI-4397

Signed-off-by: Bari, Pratik <pratik.bari@intel.com>
2023-06-20 21:38:17 +02:00
Matias Cabral cfa187aec6 feature: Support for metrics group exp extension
Support zet_metric_global_timestamps_resolution_exp_t

Resolves: LOCI-4350

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2023-06-16 07:48:32 +02:00
Mateusz Jablonski 3b981331c9 fix: correct handling ZE_ENABLE_PCI_ID_DEVICE_ORDER flag
- by default ZE_ENABLE_PCI_ID_DEVICE_ORDER is disabled
- by default devices are sorted by type (discrete first), then by pci order
- when ZE_ENABLE_PCI_ID_DEVICE_ORDER is enabled, devices are sorted by pci id

Related-To: LOCI-4520

Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-06-14 16:27:55 +02:00
Mateusz Jablonski 6f21d133cf fix: extend MemoryInfo class interface to expose single memory region
unify logic of OverrideDrmRegion debug flag

Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-05-30 16:27:42 +02:00
Mateusz Hoppe 9c17cb9bd9 fix: add CLOEXEC flag when opening gpu cards
- close-on-exec prevents old file descriptor to leak when exec() is
called

Resolves: NEO-7944

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2023-05-09 11:53:57 +02:00
Mateusz Jablonski fd1ad7c1f0 feature: setup heap extended host size based on system memory size
Related-To: NEO-7665
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-04-28 15:23:01 +02:00
Mateusz Jablonski e4a446df58 feature usm: add debug flag to allocate shared USM in heap extended
Related-To: NEO-7665
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-04-13 11:30:09 +02:00
Compute-Runtime-Validation b11a64718a Revert "feature usm: allocate shared USM in heap extended"
This reverts commit 03ed1e1e12.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-03-30 11:39:59 +02:00
Mateusz Jablonski 03ed1e1e12 feature usm: allocate shared USM in heap extended
Related-To: NEO-7665
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-03-29 16:04:05 +02:00
Mateusz Hoppe d8f99161dd fix: create VMs with correct flags when perContextVms used
Related-To: NEO-7813

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2023-03-28 13:09:46 +02:00
Fabian Zwolinski 65c73a690f Introduce Online, Offline, Disabled DebuggingModes
This change allows to set DebuggingMode via
ZET_ENABLE_PROGRAM_DEBUGGING env var
0: Disabled
1: Online
2: Offline

Related-To: NEO-7630
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-03-17 09:31:17 +01:00
Fabian Zwolinski 93a30f002b L0 Debugger - check debug_eu entry.
Related-To: NEO-7790
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-03-15 16:14:49 +01:00
Mateusz Hoppe e62c5e25d5 refactor: change debugging enabled to debugging mode
Related-To: NEO-7630

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2023-03-15 13:41:41 +01:00
Compute-Runtime-Validation 3e1d931296 Revert "L0 Debugger - check debug_eu entry"
This reverts commit 9f935276a0.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-03-15 12:28:08 +01:00
Fabian Zwolinski 9f935276a0 L0 Debugger - check debug_eu entry
Related-To: NEO-7790
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-03-13 14:46:28 +01:00
Mateusz Jablonski 553dd7f21f refactor: return thread per eu from compiler product helper
Related-To: NEO-7442
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-03-08 16:25:20 +01:00
Aravind Gopalakrishnan d75c4d3ec7 fix: Skip adding device to list if context creation fails
Propogate error codes from ioctl failure properly up the layers
so that we skip exposing bad root devices.

Related-To: NEO-7709

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@intel.com>
2023-02-16 11:40:54 +01:00
Warchulski, Jaroslaw 11764dd9bf Cleanup includes 40
Cleaned up files:
shared/source/os_interface/linux/drm_neo.h
shared/source/os_interface/windows/wddm/um_km_data_translator.h

Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2023-01-23 16:19:35 +01:00
Mateusz Jablonski 1fd8b26499 refactor: rename IoctlHelper::get to IoctlHelper::getI915Helper
remove drm version parameter as i915 is always expected

Related-To: NEO-7578
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-01-09 12:32:45 +01:00
Kamil Kopryk 468d722efb Move clGfxCoreHelper ownership to rootDeviceEnv
Related-To: NEO-6853
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
2023-01-05 12:58:38 +01:00
Mateusz Hoppe f19abda0e2 Set root device index in OsContext
- correclty choose default engine context accounting for root device
index and  subdevices bitfield

Related-To: NEO-7516

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2022-11-16 23:02:19 +01:00
Mateusz Jablonski 930ca001a1 Add a layer to translate exec buffer error to SubmissionStatus value
handle errors: EWOULDBLOCK, ENOSPC, ENOMEM, ENXIO

Related-To: NEO-7144, NEO-7412
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2022-11-04 10:32:54 +01:00
Dunajski, Bartosz a9ba581d97 Always use unrecoverable drm context
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
2022-10-31 10:08:57 +01:00
Compute-Runtime-Validation 040d6693cd Revert "Always use unrecoverable drm context"
This reverts commit 343371faad.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2022-10-29 19:28:04 +02:00
Dunajski, Bartosz 343371faad Always use unrecoverable drm context
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
2022-10-28 16:13:15 +02:00
Kamil Diedrich 380e2dcc35 [WSL2] Avoid gdi calls while process exit
Related-To: NEO-7380
Signed-off-by: Kamil Diedrich <kamil.diedrich@intel.com>
2022-10-21 12:37:07 +02:00
Mateusz Hoppe 5bd4b9eb48 Do not call DebuggerOpen ioctl again on EBUSY
Resolves: NEO-7429

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2022-10-19 10:26:35 +02:00
Naklicki, Mateusz ec3668fc18 Add initialization method to ioctl helpers
Signed-off-by: Naklicki, Mateusz <mateusz.naklicki@intel.com>
2022-09-22 11:55:59 +02:00
Mateusz Jablonski 9bde277184 Read frequency from file system based on drm version
Related-To: NEO-7300
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2022-09-21 13:28:18 +02:00
Michal Mrozek 3d5e34f727 Reduce the size of masks to 4.
32 is not required.

Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
2022-09-19 21:53:40 +02:00
Mateusz Jablonski f42e012bd8 Retry calling ioctl when getting -EBUSY from ioctl call
Related-To: NEO-7195
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2022-08-31 11:55:36 +02:00
Warchulski, Jaroslaw aed890a219 Move files between shared/test/unit_test and /common (fixtures, helpers, mocks)
unit_test/fixtures/mock_aub_center_fixture.h -> common/fixtures
unit_test/helpers/raii_hw_helper.h -> common/helpers
unit_test/helpers/static_size3.h -> common/helpers
unit_test/helpers/ult_limits.h -> common/helpers
unit_test/memory_manager/mock_prefetch_manager.h -> common/memory_manager
common/mocks/mock_aub_stream.h -> unit_test/mocks
common/mocks/mock_csr_simulated_common_hw.h -> unit_test/mocks
common/mocks/mock_direct_submission_diagnostic_collector.h -> unit_test/mocks
common/mocks/mock_lrca_helper.h -> unit_test/mocks
common/mocks/mock_tbx_stream.h -> unit_test/mocks
common/mocks/linux/mock_os_context_linux.h -> unit_test/mocks/linux
common/mocks/windows/mock_wddm_direct_submission.h -> unit_test/mocks/windows

Related-To: NEO-6524
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2022-08-11 12:53:19 +02:00
Mateusz Jablonski c3d40c210f ULT refactor: remove i915 header dependency from drm_mock.h/drm_query_mock.h
Related-To: NEO-6999
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2022-08-09 15:04:22 +02:00
Mateusz Jablonski 6450be2414 Remove redundant device and revision id members from Drm class
Drm should set these values directly to hw info in root device environment

Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2022-08-09 10:13:32 +02:00