Commit Graph

53 Commits

Author SHA1 Message Date
Mayank Raghuwanshi a69110a7ec feature: Add support for RAS mdfi errors
Related-To: LOCI-4479

Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2023-06-13 10:14:36 +02:00
Devarinti, Puneeth Kumar Reddy c03867b55c feature: Add debug logs for RAS module
Related-To: LOCI-3880

Signed-off-by: Devarinti, Puneeth Kumar Reddy <puneeth.kumar.reddy.devarinti@intel.com>
2023-05-09 08:12:06 +02:00
Mayank Raghuwanshi 9cc5763800 fix: Revert spec 1.5 RAS changes from Sysman
Related-To: LOCI-4351

Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2023-04-27 05:29:33 +02:00
Jitendra Sharma d29ed25f8b Add support for global_operations in new sysman design
Related-To: LOCI-4135
Signed-off-by: Jitendra Sharma <jitendra.sharma@intel.com>
2023-04-05 17:25:03 +02:00
Mayank Raghuwanshi 3816b85fa0 Add check for memory type before calculating ras hbm errors
Related-To: LOCI-3500

Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2023-03-31 13:47:41 +02:00
Mayank Raghuwanshi 065232eac7 Add support for ras l3 fabric errors
Related-To: LOCI-3966

Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2023-03-30 12:47:45 +02:00
Mateusz Jablonski 659cacf2c9 refactor l0 cmake: reduce include directories
Related-To: NEO-7507
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-03-17 13:41:55 +01:00
Mateusz Jablonski a7830eb478 refactor l0 cmake: add CMakeLists.txt files to solution
Related-To: NEO-7507
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-03-17 12:09:03 +01:00
Mateusz Jablonski cb7437b6b7 refactor l0 tools: cleanup cmake files
append sources in current directory

Related-To: NEO-7507
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-03-16 14:05:14 +01:00
Mayank Raghuwanshi 29ed6ea077 Add support l3_bank and subslice Ras errors
Related-To: LOCI-3662

Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2023-02-22 09:14:15 +01:00
Mayank Raghuwanshi 07d3353b1f Add support for sysman zesFabricPortGetFabricErrorCounters API
Related-To: LOCI-3398

Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2023-02-13 06:50:23 +01:00
Mayank Raghuwanshi 5a833e2c08 Add support for RAS CSC HW errors
Related-To: LOCI-3699

Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2023-02-03 18:36:12 +01:00
Mayank Raghuwanshi 9968857c29 Change category for some sysman ras errors
Related-To: LOCI-3648

Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2022-12-23 18:43:41 +01:00
Joshua Santosh Ranjan 522076cf82 Avoid adding subdevice flag if ReturnSubDevicesAsApiDevices is set
Related-To: LOCI-3656

Signed-off-by: Joshua Santosh Ranjan <joshua.santosh.ranjan@intel.com>
2022-12-19 05:15:51 +01:00
Joshua Santosh Ranjan 7c050291bf Fix fabric ras errors accumulated to all devices
This patch fixes the issue that fabric ras errors
from all devies are reported for all devices.

Related-To: LOCI-3548

Signed-off-by: Joshua Santosh Ranjan <joshua.santosh.ranjan@intel.com>
2022-11-16 12:03:50 +01:00
Mayank Raghuwanshi ffcca3ba53 Use physical subdeviceId for sysman ras, freq and standby module
Related-To: LOCI-2925, LOCI-2926, LOCI-3236
Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2022-11-14 14:10:23 +01:00
Warchulski, Jaroslaw fb25f96081 Cleanup includes 2
Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2022-11-07 10:36:50 +01:00
Joshua Santosh Ranjan 436ec1234b Sysman Add support for auxiliary bus for fabric Ras
Related-To: LOCI-3531

Signed-off-by: Joshua Santosh Ranjan <joshua.santosh.ranjan@intel.com>
2022-10-28 18:18:33 +02:00
Kulkarni, Ashwin Kumar 44649faa0f Defer Sysman Engine Module Initialization
With this change, init for sysman Engine API would not be done during zeInit.
init and thereby Engine API handle creation would be done only
when user explicitly requests to enumerate handles
using zesDeviceEnumEngineGroups

Related-To: LOCI-3127

Signed-off-by: Kulkarni, Ashwin Kumar <ashwin.kumar.kulkarni@intel.com>
2022-08-26 08:41:22 +02:00
Kamil Kopryk 582ed0565b Use memcpy_s instead of memcpy
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
2022-08-01 12:43:29 +02:00
Mateusz Hoppe 5956aea18d Limit header includes from level_zero device.h
- remove including debugger_l0.h from device.h
- add getL0Debugger() to shared NEO Device

Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
2022-07-06 16:41:17 +02:00
Kulkarni, Ashwin Kumar 49aaf62bbd Lazy init implementation for RAS module
Related-To: LOCI-3127

Signed-off-by: Kulkarni, Ashwin Kumar <ashwin.kumar.kulkarni@intel.com>
2022-07-04 18:29:57 +02:00
Mayank Raghuwanshi 281c98dcf9 Add firmware util interface for sysman windows
Related-To: LOCI-3132

Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2022-06-24 08:42:48 +02:00
Artur Harasimiuk e9be9b64c6 clang-tidy configuration cleanup
Define single .clang-tidy configuration with all used checks and use
NOLINT to selectively silence tool. That way cleanup should be easier.
third_part/ has its own configuration that disables clang-tidy for this
folder.

Signed-off-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
2022-05-11 14:02:04 +02:00
Mayank Raghuwanshi c637903132 Modify getSupportedRasErrorTypes function for gt Ras errors
Related-To: LOCI-2934

Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2022-04-22 08:26:07 +02:00
Bellekallu Rajkiran cf9a5ed7d7 Add prelim support for ras diagnostics and firmware
Related-To: LOCI-2864

Signed-off-by: Bellekallu Rajkiran <bellekallu.rajkiran@intel.com>
2022-03-24 06:58:25 +01:00
Filip Hazubski b79d9a8e10 Correct structs to explicitly initialize members
Affected structs are DebugAreaHeader, Ras and APITracerImp.

Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
2022-03-14 15:40:28 +01:00
Compute-Runtime-Validation 1a823356a3 Revert "Add prelim support for ras diagnostics and firmware"
This reverts commit 5a2145ad8d.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2022-03-06 11:31:15 +01:00
Bellekallu Rajkiran 5a2145ad8d Add prelim support for ras diagnostics and firmware
Related-To: LOCI-2864

Signed-off-by: Bellekallu Rajkiran <bellekallu.rajkiran@intel.com>
2022-03-03 18:51:21 +01:00
Mayank Raghuwanshi 94d09f75b7 Get RAS HBM errors count using firmware interface
-- master-commit
Add functionality to retrieve memory errors from Firmware
-- master-commit

Related-To: LOCI-2491, LOCI-2726

Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2021-12-08 18:57:24 +01:00
Mayank Raghuwanshi 2ec2d514ec Update create Handle mechanism for sysman RAS
Use set instead of vector to get the supported error types,
using vector may cause duplication of error types when quering
supported error types from different interfaces which in turn
may cause duplication of handles.

Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2021-12-02 12:39:30 +01:00
Kamil Kopryk 9ccf43e441 Correct branch_dir_suffix in cmake
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
Related-To: NEO-6245
2021-09-14 16:00:20 +02:00
John Falkowski dc77174255 TimerResolution Device Properties 1.2
Signed-off-by: John Falkowski <john.falkowski@intel.com>
2021-07-06 11:37:07 +02:00
lgotszal 5c43c6fd94 Update MIT copyright headers to always use SPDX
Related-to: IGC-4296

Signed-off-by: lgotszal <lukasz.gotszald@intel.com>
2021-06-23 14:00:21 +02:00
Jitendra Sharma 46c51cb8a9 Sysman device reset stability fix
Close PMT, and PMU fds created during Sysman's init before calling
device reset.

Signed-off-by: Jitendra Sharma <jitendra.sharma@intel.com>
2021-04-15 11:53:10 +02:00
Mayank Raghuwanshi 0f973f146e Implement zesRasGetConfig and zesRasSetConfig
Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2021-03-02 16:07:01 +01:00
Mayank Raghuwanshi 5cd5705239 Implement clear option for zesRasGetState
Signed-off-by: Mayank Raghuwanshi <mayank.raghuwanshi@intel.com>
2021-01-21 15:00:13 +01:00
mraghuwa 978003e96e Add subdevice support for RAS module
Change-Id: Iced5aeed86d6b19a4710992155257e420ae1296f
Signed-off-by: mraghuwa <mayank.raghuwanshi@intel.com>
2020-10-27 17:54:29 +01:00
mraghuwa a8a013b0c3 Implement zesRasGetState to retrieve cache errors
Change-Id: I9fbba505db6551f510cb20ea71604af53db61960
Signed-off-by: mraghuwa <mayank.raghuwanshi@intel.com>
2020-10-20 15:45:36 +02:00
mraghuwa 2643346b48 Update Sysman RAS Module
Change-Id: I2b99dae4336811ea4b539da48c1434657a9cf62a
Signed-off-by: mraghuwa <mayank.raghuwanshi@intel.com>
2020-10-09 08:23:19 +02:00
Pawel Cieslak fb821f21f5 Cmake format script
Related-To: NEO-1157

Change-Id: Ie1b907e838cfb9ad0d75cc8971d415f7c77103c9
Signed-off-by: Pawel Cieslak <pawel.cieslak@intel.com>
2020-08-19 16:36:30 +02:00
mraghuwa 220c575850 Update Ras Api's as per latest spec
Change-Id: I29b77eee0832fcca6d989f9ef41b01b17232a91e
Signed-off-by: mraghuwa <mayank.raghuwanshi@intel.com>
2020-08-06 08:31:03 +02:00
Jaime Arteaga 902fc2f6c4 level-zero v1.0 (2/N)
Change-Id: I1419231a721fab210e166d26a264cae04d661dcd
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Signed-off-by: macabral <matias.a.cabral@intel.com>
Signed-off-by: davidoli <david.olien@intel.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
Signed-off-by: Latif, Raiyan <raiyan.latif@intel.com>
Signed-off-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
2020-08-03 13:11:13 +02:00
Bill Jordan c64eab402e Update all L0 Sysman getHandle routines to match spec.
Updated all the getHandle routines to use the following
algorithm:
    n = min(*pCount, available)
    if (*pCount == 0 || *pCount > available) {
        *pCount = available;
    }
    if (pArrayn != nullptr) {
        for(i = 0; i < n; i++) {
            pArray[i] = handle[i];
        }
    }

Change-Id: I3b2a2170c2b52d1651bddae4f85f361fd86167a0
Signed-off-by: Bill Jordan <bill.jordan@intel.com>
2020-07-30 07:27:21 +02:00
Jitendra Sharma 92b15507b0 Initialize variables and validate pointers before actually using them
Change-Id: Iae6fbeac124e1a02da419f5071e1ebc292b390cf
Signed-off-by: Jitendra Sharma <jitendra.sharma@intel.com>
2020-07-14 11:17:58 +02:00
Bill Jordan e8bd440773 Don't allow copy or moving Sysman related objects.
Change-Id: I70dd97bffa1c4d08f05eb796c6d6a2eb66f06f4b
Signed-off-by: Bill Jordan <bill.jordan@intel.com>
2020-07-10 21:05:15 +02:00
Vilvaraj, T J Vivek 3c50e1ede6 fix unintialized class member in Ctor.
Change-Id: Idc3e8a2ddccaf9c94639a3f499824e86de830fd4
2020-07-07 21:48:18 +02:00
T.J. Vivek Vilvaraj 96a7b1e066 add rules to install RAS udev rules
- create rules to install Udev rules in configurable location
- create files relating to RAS counters

Change-Id: Iebd57ba2dd09494ea4586b305cd56c86a71fb8b0
2020-07-02 10:25:21 +02:00
Vilvaraj, T J Vivek 0c9c55cd17 add counter support for RAS.
- added dual handle support for RAS Correctable and Uncorrectable Errors.
- added reset counter for RAS.
- added Os Specific ULT for RAS

Change-Id: Ia10115bf6720ab211f549571e810ec0d6c0801ec
2020-06-25 08:48:11 +02:00
Vilvaraj, T J Vivek 98c6e85ae9 fixes for RAS implementation class
- add default constructor
- fix init function to be a public method

Change-Id: I9e9c3c0d1305497f030f44a1f50b2499b93d3e0c
Signed-off-by: Vilvaraj, T J Vivek <t.j.vivek.vilvaraj@intel.com>
2020-05-01 21:25:48 +02:00