feature: add logic to iterate for all contexts to check GPU pagefault

Implemented to go through entire contexts in the process and then query
reset status to check the unexpected GPU segfault.

Added a new debug variable GpuFaultCheckThreshold to change the checking
frequency for each hang check for performance analysis.

Related-To: GSD-5673
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
This commit is contained in:
Young Jin Yoon
2024-02-26 09:53:24 +00:00
committed by Compute-Runtime-Automation
parent 5111f30116
commit 82728ff394
10 changed files with 149 additions and 9 deletions

View File

@@ -98,6 +98,8 @@ class DrmMemoryManager : public MemoryManager {
size_t getSizeOfChunk(size_t allocSize);
bool checkAllocationForChunking(size_t allocSize, size_t minSize, bool subDeviceEnabled, bool debugDisabled, bool modeEnabled, bool bufferEnabled);
MOCKABLE_VIRTUAL void checkUnexpectedGpuPageFault();
protected:
void registerSharedBoHandleAllocation(DrmAllocation *drmAllocation);
BufferObjectHandleWrapper tryToGetBoHandleWrapperWithSharedOwnership(int boHandle);