documentation: Add metrics GPU specific document
Add new file with L0 GPU specific details of metrics support Resolves: LOCI-4209 Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
This commit is contained in:
parent
d02885767d
commit
8f159308ac
|
@ -1,6 +1,6 @@
|
||||||
<!---
|
<!---
|
||||||
|
|
||||||
Copyright (C) 2018-2021 Intel Corporation
|
Copyright (C) 2018-2023 Intel Corporation
|
||||||
|
|
||||||
SPDX-License-Identifier: MIT
|
SPDX-License-Identifier: MIT
|
||||||
|
|
||||||
|
@ -85,8 +85,8 @@ Directly linking to the runtime library is not supported:
|
||||||
* Intel Graphics Compiler - https://github.com/intel/intel-graphics-compiler
|
* Intel Graphics Compiler - https://github.com/intel/intel-graphics-compiler
|
||||||
|
|
||||||
In addition, to enable performance counters support, the following packages are needed:
|
In addition, to enable performance counters support, the following packages are needed:
|
||||||
* Intel Metrics Discovery - https://github.com/intel/metrics-discovery
|
* Intel(R) Metrics Discovery (MDAPI) - https://github.com/intel/metrics-discovery
|
||||||
* Intel Metrics Library for MDAPI - https://github.com/intel/metrics-library
|
* Intel(R) Metrics Library for MDAPI - https://github.com/intel/metrics-library
|
||||||
|
|
||||||
## How to provide feedback
|
## How to provide feedback
|
||||||
|
|
||||||
|
@ -103,6 +103,7 @@ for more details.
|
||||||
## See also
|
## See also
|
||||||
|
|
||||||
* [Contribution guidelines](https://github.com/intel/compute-runtime/blob/master/CONTRIBUTING.md)
|
* [Contribution guidelines](https://github.com/intel/compute-runtime/blob/master/CONTRIBUTING.md)
|
||||||
|
* [Programmers Guide](https://github.com/intel/compute-runtime/blob/master/programmers-guide/PROGRAMMERS_GUIDE.md)
|
||||||
* [Frequently Asked Questions](https://github.com/intel/compute-runtime/blob/master/FAQ.md)
|
* [Frequently Asked Questions](https://github.com/intel/compute-runtime/blob/master/FAQ.md)
|
||||||
|
|
||||||
### Level Zero specific
|
### Level Zero specific
|
||||||
|
|
|
@ -31,7 +31,7 @@ Since the intention of immediate command lists are to primarily provide a razor
|
||||||
## Programming model
|
## Programming model
|
||||||
|
|
||||||
Pseudo-code for creating immediate command list (async mode):
|
Pseudo-code for creating immediate command list (async mode):
|
||||||
```
|
```cpp
|
||||||
ze_command_queue_desc_t cmdQueueDesc = {ZE_STRUCTURE_TYPE_COMMAND_QUEUE_DESC};
|
ze_command_queue_desc_t cmdQueueDesc = {ZE_STRUCTURE_TYPE_COMMAND_QUEUE_DESC};
|
||||||
cmdQueueDesc.pNext = nullptr;
|
cmdQueueDesc.pNext = nullptr;
|
||||||
cmdQueueDesc.flags = 0;
|
cmdQueueDesc.flags = 0;
|
||||||
|
@ -44,7 +44,7 @@ Pseudo-code for creating immediate command list (async mode):
|
||||||
|
|
||||||
Submitting commands and synchronization:
|
Submitting commands and synchronization:
|
||||||
Launching kernels:
|
Launching kernels:
|
||||||
```
|
```cpp
|
||||||
zeCommandListAppendLaunchKernel(cmdList, kernel, &dispatchTraits,
|
zeCommandListAppendLaunchKernel(cmdList, kernel, &dispatchTraits,
|
||||||
events[0], 0, nullptr);
|
events[0], 0, nullptr);
|
||||||
// If Async mode, use event for sync
|
// If Async mode, use event for sync
|
||||||
|
@ -52,7 +52,7 @@ Launching kernels:
|
||||||
```
|
```
|
||||||
|
|
||||||
Performing copies:
|
Performing copies:
|
||||||
```
|
```cpp
|
||||||
zeCommandListAppendMemoryCopy(cmdList, deviceBuffer, hostBuffer, allocSize,
|
zeCommandListAppendMemoryCopy(cmdList, deviceBuffer, hostBuffer, allocSize,
|
||||||
events[0],
|
events[0],
|
||||||
0, nullptr);
|
0, nullptr);
|
||||||
|
@ -66,7 +66,7 @@ Performing copies:
|
||||||
```
|
```
|
||||||
|
|
||||||
Pseudo-code for creating immediate command list (sync mode):
|
Pseudo-code for creating immediate command list (sync mode):
|
||||||
```
|
```cpp
|
||||||
ze_command_queue_desc_t cmdQueueDesc = {ZE_STRUCTURE_TYPE_COMMAND_QUEUE_DESC};
|
ze_command_queue_desc_t cmdQueueDesc = {ZE_STRUCTURE_TYPE_COMMAND_QUEUE_DESC};
|
||||||
cmdQueueDesc.pNext = nullptr;
|
cmdQueueDesc.pNext = nullptr;
|
||||||
cmdQueueDesc.flags = 0;
|
cmdQueueDesc.flags = 0;
|
||||||
|
@ -78,7 +78,7 @@ Pseudo-code for creating immediate command list (sync mode):
|
||||||
```
|
```
|
||||||
|
|
||||||
Launching kernel:
|
Launching kernel:
|
||||||
```
|
```cpp
|
||||||
zeCommandListAppendLaunchKernel(cmdList, kernel, &dispatchTraits,
|
zeCommandListAppendLaunchKernel(cmdList, kernel, &dispatchTraits,
|
||||||
nullptr, 0, nullptr);
|
nullptr, 0, nullptr);
|
||||||
```
|
```
|
||||||
|
|
|
@ -0,0 +1,58 @@
|
||||||
|
<!---
|
||||||
|
|
||||||
|
Copyright (C) 2023 Intel Corporation
|
||||||
|
|
||||||
|
SPDX-License-Identifier: MIT
|
||||||
|
|
||||||
|
-->
|
||||||
|
|
||||||
|
# GPU Metrics collection in Level Zero
|
||||||
|
|
||||||
|
* [Introduction](#Introduction)
|
||||||
|
* [Dependencies](#Dependencies)
|
||||||
|
* [Environment Setup](#Environment-Setup)
|
||||||
|
* [EU Stall Sampling](#EU-Stall-Sampling)
|
||||||
|
* [Limitations](#Limitations)
|
||||||
|
|
||||||
|
|
||||||
|
# Introduction
|
||||||
|
|
||||||
|
Implementation independent details of Level-Zero metrics are described in the Level-Zero specification [Metrics Section](https://spec.oneapi.io/level-zero/latest/tools/PROG.html#metrics). This implementation supports Time-based and Event-based sampling. Two domains are supported, one for collecting GPU performance metrics and one for collecting EU stall sampling data (type ZET_METRIC_TYPE_IP).
|
||||||
|
|
||||||
|
# Dependencies
|
||||||
|
|
||||||
|
Metrics collection depends on:
|
||||||
|
|
||||||
|
* Intel(R) Metrics Discovery (MDAPI) - https://github.com/intel/metrics-discovery
|
||||||
|
* Intel(R) Metrics Library for MDAPI - https://github.com/intel/metrics-library
|
||||||
|
|
||||||
|
# Environment Setup
|
||||||
|
|
||||||
|
As described in Level-Zero specification [Tools Section](https://spec.oneapi.io/level-zero/latest/tools/PROG.html#environment-variables) environment variable `ZET_ENABLE_METRICS` must be set to 1.
|
||||||
|
|
||||||
|
## Linux
|
||||||
|
Additionally in Linux environment, is is required to disable the kernel module driver i915 performance stream paranoid mode. This can be done with command
|
||||||
|
|
||||||
|
```
|
||||||
|
sudo sysctl dev.i915.perf_stream_paranoid=0
|
||||||
|
```
|
||||||
|
|
||||||
|
# EU Stall Sampling
|
||||||
|
|
||||||
|
HW-assisted EU stall sampling allows statistically correlating Xe-Vector Engine (XVE) stall events to the executed instructions and breaks down the stall events by different stall reasons. Using the Instruction Pointer it is possible to point to the GPU kernel source code line causing the most stalls.
|
||||||
|
|
||||||
|
# Limitations
|
||||||
|
|
||||||
|
## EU Stall Sampling
|
||||||
|
|
||||||
|
* Only supported on Linux
|
||||||
|
* Does not support streamer markers
|
||||||
|
* The inherent nature of the samples only make sense for Time-based sampling. Therefore, Event-based sampling is not supported.
|
||||||
|
|
||||||
|
## GPU performance metrics
|
||||||
|
|
||||||
|
* To obtain the most recent metric values using Time-based sampling, it is necessary to read all metrics reports from the hardware buffer and calculate them all. This may be costly operation if the hardware buffer is not read at frequent intervals. Therefore, it is recommended to call zetMetricStreamerReadData() at a time interval that does not require processing big number of reports. This can be calculated based on the sampling rate decided when opening the metrics streamer (zet_metric_streamer_desc_t.samplingPeriod).
|
||||||
|
|
||||||
|
## notifyEveryNReports
|
||||||
|
|
||||||
|
* Linux support for notifyEveryNReports on performance metrics will always return true when one metric report is available.
|
|
@ -12,8 +12,9 @@ SPDX-License-Identifier: MIT
|
||||||
|
|
||||||
This document provides the architectural design followed in the Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver. Implementation details and optimization guidelines are explained, as well as a description of the different features available for the different supported platforms.
|
This document provides the architectural design followed in the Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver. Implementation details and optimization guidelines are explained, as well as a description of the different features available for the different supported platforms.
|
||||||
|
|
||||||
|
### [Allocations greater than 4GB](ALLOCATIONS_GREATER_THAN_4GB.md)
|
||||||
### [Implicit scaling](IMPLICIT_SCALING.md)
|
### [Implicit scaling](IMPLICIT_SCALING.md)
|
||||||
### [Immediate Commandlist](IMMEDIATE_COMMANDLIST.md)
|
### [Immediate Commandlist](IMMEDIATE_COMMANDLIST.md)
|
||||||
### [System Memory Allocations in Level Zero](SYSTEM_MEMORY_ALLOCATIONS.md)
|
### [L0 Metrics](METRICS.md)
|
||||||
### [Module Symbols and Linking in Level Zero](MODULE_SYMBOL_SUPPORT.md)
|
### [Module Symbols and Linking in Level Zero](MODULE_SYMBOL_SUPPORT.md)
|
||||||
### [Allocations greater than 4GB](ALLOCATIONS_GREATER_THAN_4GB.md)
|
### [System Memory Allocations in Level Zero](SYSTEM_MEMORY_ALLOCATIONS.md)
|
Loading…
Reference in New Issue