Commit Graph

187 Commits

Author SHA1 Message Date
Young Jin Yoon
9633f49dab fix: make gpuFaultCheckCounter more robust
Modified drm_neo.h and .cpp to check when condition is greater
than and equal to instead of equal, and changed gpuFaultCheckCounter
to be atomic

Related-To: GSD-5673
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-03-15 10:40:12 +01:00
Young Jin Yoon
82728ff394 feature: add logic to iterate for all contexts to check GPU pagefault
Implemented to go through entire contexts in the process and then query
reset status to check the unexpected GPU segfault.

Added a new debug variable GpuFaultCheckThreshold to change the checking
frequency for each hang check for performance analysis.

Related-To: GSD-5673
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2024-03-15 07:48:39 +01:00
Brandon Yates
76de854a69 feature: Set Debug Attach Available for Xe
Related-to: NEO-8402

Signed-off-by: Brandon Yates <brandon.yates@intel.com>
2024-01-24 09:04:11 +01:00
Jitendra Sharma
aa191b6f88 feature: Set runalone mode for contexts with online debugging
Related-To: NEO-9139

Signed-off-by: Jitendra Sharma <jitendra.sharma@intel.com>
2024-01-17 09:01:30 +01:00
Brandon Yates
ba0db2488a refactor: Implement Xe Resoure Registration (2/x)
Refactor drm_debug.cpp into IoctlHelper

Related-to: NEO-9161

Signed-off-by: Brandon Yates <brandon.yates@intel.com>
2024-01-11 08:26:29 +01:00
Mateusz Jablonski
138fb65401 refactor: correct naming of enum class constants 11/n
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-19 14:52:57 +01:00
Mateusz Jablonski
dd1b9d6abc refactor: correct naming of enum class constants 8/n
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-19 08:18:18 +01:00
Mateusz Jablonski
35c1f34672 refactor: move number of threads per eu to release helper
Related-To: HSD-18034098647
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-11-20 12:16:33 +01:00
John Falkowski
f0175b3916 feature: set device allocation chunking as default
Device allocation chunking only applies for multi-tile mode for implicit scaling

Related-To: NEO-9051

Signed-off-by: John Falkowski <john.falkowski@intel.com>
2023-11-07 10:58:17 +01:00
Mateusz Jablonski
4dfa12c8eb fix: add mechanism to detect gpu timestamp overflows
unify naming CpuGpu to GpuCpu

Related-To: NEO-8394
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-10-19 16:31:06 +02:00
Dunajski, Bartosz
06a02552ce refactor: debug flag to override PAT index for given memory type
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
2023-10-12 15:47:28 +02:00
Mateusz Jablonski
85eafc9e61 fix: query drm info to aligned storages
xe topology info to byte aligned storage
xe engine info to 2 byte aligned storage
system info to 4 byte aligned storage

all other info to 8 byte aligned storage

Related-To: NEO-9038
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-10-06 16:11:40 +02:00
Compute-Runtime-Validation
d5f90ae155 Revert "fix: query drm info to 8 byte aligned storage"
This reverts commit 9b344280d6.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-10-05 06:07:27 +02:00
Mateusz Jablonski
9b344280d6 fix: query drm info to 8 byte aligned storage
Related-To: NEO-9038
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-10-03 18:43:53 +02:00
Compute-Runtime-Validation
8fa0b90f35 Revert "fix: query drm info to 8 byte aligned storage"
This reverts commit d0e615820c.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-10-03 15:41:11 +02:00
Mateusz Jablonski
d0e615820c fix: query drm info to 8 byte aligned storage
Related-To: NEO-9038
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-10-03 11:13:38 +02:00
Bari, Pratik
b9837ef068 feature(sysman): Added changes for Porting Frequency API
The new classes SysmanKmdInterface, SysmanKmdInterfaceI915 and
SysmanKmdInterfaceXe have been introduced.
A map is maintained in the SysmanKmdInterfaceI915 and
SysmanKmdInterfaceXe class for the sysfs file names.
The access specifier of the function getDrmVersion has been changed from
protected to public so as to use it in the sysman code. This is required
for the SysmanKmdInterface pointer to point to the
SysmanKmdInterfaceI915 and SysmanKmdInterfaceXe accordingly.
The ULTs have been added for the new sysfs file path corresponding to
the i915 and the Xe driver.

Related-To: LOCI-4399

Signed-off-by: Bari, Pratik <pratik.bari@intel.com>
2023-07-13 08:41:05 +02:00
Young Jin Yoon
81822e3716 refactor: rename pageSize2Mb to pageSize2M
The previous name "pageSize2Mb" defined in
shared/source/helpers/constant.h is inconsistent to other variable,
i.e. pageSize64k.

Furthermore, it's a bit misleading because the page size is defined in
Megabytes (MB), not in Megabits (Mb).

Related-to: NEO-7695
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
2023-07-10 20:12:09 +02:00
Jaime Arteaga
23eeaf816d feature: Add debug keys for chunking allocation and size
Related-to: NEO-7695

New debug keys added:

EnableBOChunking is now a mask
0 = no chunking (default).
1 = shared allocations only
2 = device allocations only
3 = shared and device allocations

MinimalAllocationSizeForChunking sets the minimum allocation
size to apply chunking. Default is 2MB.

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
2023-07-07 23:39:43 +02:00
Kamil Kopryk
082d33bb7c fix: correct query topology on xe
Related-To: NEO-7996
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
2023-06-22 13:24:52 +02:00
Matias Cabral
96517a08aa feature: Implement zetMetricGroupGetGlobalTimestampsExp()
Resolves: LOCI-3072

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2023-06-21 09:48:41 +02:00
Matias Cabral
cfa187aec6 feature: Support for metrics group exp extension
Support zet_metric_global_timestamps_resolution_exp_t

Resolves: LOCI-4350

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2023-06-16 07:48:32 +02:00
Mateusz Jablonski
3b981331c9 fix: correct handling ZE_ENABLE_PCI_ID_DEVICE_ORDER flag
- by default ZE_ENABLE_PCI_ID_DEVICE_ORDER is disabled
- by default devices are sorted by type (discrete first), then by pci order
- when ZE_ENABLE_PCI_ID_DEVICE_ORDER is enabled, devices are sorted by pci id

Related-To: LOCI-4520

Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-06-14 16:27:55 +02:00
Jaime Arteaga
2efd6e547a feature: Add support for chunking in the UMD (1/N)
Read if support for chunking is available in the KMD.
If available, KMD will create a BO with 1 or more chunks,
depending on the chunk size selected.

Related-To: NEO-7695

Sync to
https://github.com/intel-gpu/drm-uapi-helper/releases/tag/v2.0-rc18

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Signed-off-by: John Falkowski <john.falkowski@intel.com>
2023-06-02 23:27:40 +02:00
Milczarek, Slawomir
ac9a96c07f refactor: Unify getters to check platform support for KMD migration
Related-To: NEO-6465

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-05-17 15:45:42 +02:00
Mateusz Jablonski
88c352c580 refactor: move query engine / memory info logic to ioctl helper
Related-To: NEO-7931
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-05-02 10:05:26 +02:00
Mateusz Jablonski
fd1ad7c1f0 feature: setup heap extended host size based on system memory size
Related-To: NEO-7665
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-04-28 15:23:01 +02:00
Fabian Zwolinski
c441e9e971 refactor: Rename member variables to camelCase
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-04-26 16:05:07 +02:00
Maciej Plewka
bab299ee78 Increment fenceValue only after successful bind operation
Related-To: NEO-7835

Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>
2023-04-19 12:26:45 +02:00
Mateusz Hoppe
d8f99161dd fix: create VMs with correct flags when perContextVms used
Related-To: NEO-7813

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2023-03-28 13:09:46 +02:00
Fabian Zwolinski
93a30f002b L0 Debugger - check debug_eu entry.
Related-To: NEO-7790
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-03-15 16:14:49 +01:00
Compute-Runtime-Validation
3e1d931296 Revert "L0 Debugger - check debug_eu entry"
This reverts commit 9f935276a0.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-03-15 12:28:08 +01:00
Fabian Zwolinski
9f935276a0 L0 Debugger - check debug_eu entry
Related-To: NEO-7790
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-03-13 14:46:28 +01:00
Mateusz Jablonski
553dd7f21f refactor: return thread per eu from compiler product helper
Related-To: NEO-7442
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-03-08 16:25:20 +01:00
Joshua Santosh Ranjan
b180a83a24 [Sysman] Add support to control lifetime of file descriptor
This patch adds support to open and close the drm fd,
so that more granular control of drm fd could be
achieved.

Related-To: LOCI-3986

Signed-off-by: Joshua Santosh Ranjan <joshua.santosh.ranjan@intel.com>
2023-03-06 17:55:13 +01:00
Jitendra Sharma
6968d26f3a Sysman: Add support for sysman APIs
In level_zero/sysman directory This change:
- Adds support for accessing linux based filesystem
- Add support for telemetry
- Add support for LinuxSysmanImp

Related-To: LOCI-3889
Signed-off-by: Jitendra Sharma <jitendra.sharma@intel.com>
2023-02-28 09:50:17 +01:00
František Zatloukal
beaff2b735 Include cstdint to fix GCC 13 build
Signed-off-by: František Zatloukal <fzatlouk@redhat.com>
2023-02-22 08:02:49 +01:00
Aravind Gopalakrishnan
d75c4d3ec7 fix: Skip adding device to list if context creation fails
Propogate error codes from ioctl failure properly up the layers
so that we skip exposing bad root devices.

Related-To: NEO-7709

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@intel.com>
2023-02-16 11:40:54 +01:00
Milczarek, Slawomir
7f5fae4c2f Enable recoverable page faults by default on faultable hardware
Related-To: NEO-6355

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-01-30 13:46:15 +01:00
Compute-Runtime-Validation
06928a345a Revert "Enable recoverable page faults by default on faultable hardware"
This reverts commit 4591b27900.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-01-27 07:02:48 +01:00
Milczarek, Slawomir
4591b27900 Enable recoverable page faults by default on faultable hardware
Related-To: NEO-6355

Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
2023-01-25 19:22:16 +01:00
Warchulski, Jaroslaw
bc13db734d Cleanup includes 41
Cleaned up files:
shared/source/command_stream/aub_command_stream_receiver_hw.h
shared/source/helpers/common_types.h
shared/source/os_interface/linux/drm_neo.h
shared/source/os_interface/windows/hw_device_id.h

Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2023-01-24 14:37:36 +01:00
Warchulski, Jaroslaw
11764dd9bf Cleanup includes 40
Cleaned up files:
shared/source/os_interface/linux/drm_neo.h
shared/source/os_interface/windows/wddm/um_km_data_translator.h

Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2023-01-23 16:19:35 +01:00
Lukasz Jobczyk
3bf4291558 Align mmap cpu va to 2MB for sizes greater than 2MB
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2023-01-17 18:26:49 +01:00
Mateusz Jablonski
103f522f18 Create definition of tag allocation layout
we use tag allocation for multiple purposes, therefore we should define
all offsets in one place

Resolves: NEO-7559
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2022-12-06 16:49:07 +01:00
Mateusz Jablonski
bb308c04ed Refactor aubstream include interface
set include path to third_party/aub_stream
rename third_party/aub_stream/headers -> third_party/aub_stream/aubstream

Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2022-11-23 10:30:13 +01:00
Warchulski, Jaroslaw
f35f59b573 Cleanup includes 5
Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2022-11-18 22:46:38 +01:00
Jaime A Arteaga Molina
192f02785d Use PRELIM_DRM_I915_QUERY_FABRIC_INFO for canAccessPeer when available
When available, PRELIM_DRM_I915_QUERY_FABRIC_INFO is used to query
connectivity between two devices. If not, then a copy is performed.

Signed-off-by: Jaime A Arteaga Molina <jaime.a.arteaga.molina@intel.com>
2022-11-10 07:30:32 +01:00
Joshua Santosh Ranjan
6a2e016d7f Add support for UUID
Related-To: LOCI-3304

Signed-off-by: Joshua Santosh Ranjan <joshua.santosh.ranjan@intel.com>
2022-11-09 12:29:32 +01:00
Mateusz Jablonski
930ca001a1 Add a layer to translate exec buffer error to SubmissionStatus value
handle errors: EWOULDBLOCK, ENOSPC, ENOMEM, ENXIO

Related-To: NEO-7144, NEO-7412
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2022-11-04 10:32:54 +01:00