mirror of
https://github.com/intel/compute-runtime.git
synced 2025-11-15 10:14:56 +08:00
Add multi-CCS mode documentation for ZEX_NUMBER_OF_CCS
This documentation explains functional and performance considerations when selecting a multi-CCS mode on PVC. Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
This commit is contained in:
committed by
Compute-Runtime-Automation
parent
f219617823
commit
787b71c7d0
@@ -13,7 +13,7 @@ SPDX-License-Identifier: MIT
|
||||
|
||||
The following document describes the driver experimental extensions implemented in the Level Zero Intel(R) GPU driver. These extensions are meant to test and/or gather feedback on interfaces before they could potentially be added Level Zero specification, as well to provide access to functionality specific to Intel(R) GPUs.
|
||||
|
||||
Access to these extensions is possible through [zeDriverGetExtensionFunctionAddress](https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=zedrivergetextensionfunctionaddress#_CPPv435zeDriverGetExtensionFunctionAddress18ze_driver_handle_tPKcPPv). Sample code:
|
||||
Driver extensions may be defined as APIs (interfaces), flags, or environment variables. In the case of interfaces, these shall be accessed through [zeDriverGetExtensionFunctionAddress](https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=zedrivergetextensionfunctionaddress#_CPPv435zeDriverGetExtensionFunctionAddress18ze_driver_handle_tPKcPPv). Sample code:
|
||||
|
||||
|
||||
```cpp
|
||||
@@ -24,4 +24,5 @@ pFnzexMemGetIpcHandles zexMemOpenIpcHandlePointer = nullptr;
|
||||
ze_result_t res = zeDriverGetExtensionFunctionAddress(hDriver, "zexMemOpenIpcHandles", reinterpret_cast<void **>(&zexMemOpenIpcHandlePointer)));
|
||||
```
|
||||
|
||||
### [Multiple IPC Handles](MULTIPLE_IPC_HANDLES.md)
|
||||
### [Multiple IPC Handles](MULTIPLE_IPC_HANDLES.md)
|
||||
### [Multi-CCS Modes](MULTI_CCS_MODES.md)
|
||||
76
level_zero/doc/experimental_extensions/MULTI_CCS_MODES.md
Normal file
76
level_zero/doc/experimental_extensions/MULTI_CCS_MODES.md
Normal file
@@ -0,0 +1,76 @@
|
||||
<!---
|
||||
|
||||
Copyright (C) 2022 Intel Corporation
|
||||
|
||||
SPDX-License-Identifier: MIT
|
||||
|
||||
-->
|
||||
|
||||
# Multi-CCS Modes
|
||||
|
||||
* [Overview](#Overview)
|
||||
* [Functional and Performance Considerations](#Functional-and-Performance-Considerations)
|
||||
* [Interaction with Affinity Mask](#Interaction-with-Affinity-Mask)
|
||||
* [Availability](#Availability)
|
||||
|
||||
# Overview
|
||||
|
||||
Xe HPC (PVC) contains 4 CCSs (Compute-Command Streamers) per tile, which can be used to access a common pool of Execution Units (EUs). Hardware allows for the selection of a specific distribution of EUs among CCSs, such as:
|
||||
|
||||
- All EUs may be assigned to a single CCS, on which case, only 1 CCS is needed to be exposed to users.
|
||||
- EUs may be distributed equally among the 4 CCSs, and all CCSs are exposed.
|
||||
|
||||
Applications query for the number of CCSs exposed in the target device by using Level Zero queue groups [https://spec.oneapi.io/level-zero/latest/core/PROG.html#command-queue-groups](https://spec.oneapi.io/level-zero/latest/core/PROG.html#command-queue-groups).
|
||||
|
||||
Depending on their execution patterns, applications may benefit more from using one or other configuration: Some may benefit from using 1 single CCS to access all EUs, while others may benefit from using more than 1 CCS, each with a fixed number of assigned EUs. For instance:
|
||||
|
||||
- A single process job may benefit from using 1 CCS with access to all EUs.
|
||||
- A two-process job per tile with uniform work may benefit more from using 2 CCSs, with half of EUs assigned to each.
|
||||
- A two-process job per tile with non-uniform work may benefit from using 1 CCS with access to all EUs.
|
||||
- A two-process job per tile with uniform work may benefit more from using 2 CCSs, with a quarter of EUs assigned to each.
|
||||
- A two-process job per tile with non uniform work may benefit from using 1 CCS with access to all EUs.
|
||||
|
||||
To help applications select the best mode that fits their needs, Level Zero driver is introducing a new driver experimental environment variable, named `ZEX_NUMBER_OF_CCS`. This environment variable is read at `zeInit()` time, after reading `ZE_AFFINITY_MASK`, and allows users to select one of the following modes:
|
||||
|
||||
- 1 CCS Mode (DEFAULT): Each tile exposes 1 CCS, which has access to all EUS. Other CCSs are disabled.
|
||||
- 2 CCS Mode: Each tile exposes 2 CCSs, each with half of the EUs assigned to it. If no work is submitted to one of the CCSs, then its EUs remain idle, even if the other CCS has active work.
|
||||
- 4 CCS Mode: Each tile exposes 4 CCSs, each having a quarter of the EUs assigned to it. As with 2 CCS mode, EUs of idle CCSs cannot be used by other CCSs.
|
||||
|
||||
The format for `ZEX_NUMBER_OF_CCS` is a comma-separated list of device-mode pairs, i.e., `ZEX_NUMBER_OF_CCS=<Root Device Index>:<CCS Mode>,<Root Device Index>:<CCS Mode>...`. For instance, in dual-PVC system, an application could have the following to set root device index 0 in 4 CCS mode, and root device index 1 in 1 CCS mode.
|
||||
|
||||
`ZEX_NUMBER_OF_CCS=0:4,1:1`
|
||||
|
||||
# Functional and Performance Considerations
|
||||
|
||||
- *What happens when multiple applications run concurrently with different modes?*
|
||||
When an application submits work in a different mode than the currently being used by another application, submissions from the second application are blocked until all current submissions from the first application finish, since change in mode can only be made when GPU is idle. Mixing applications with different modes therefore should be highly avoided to prevent performance regressions.
|
||||
|
||||
- *What happens when submitting to only 1 CCS from multiple workloads?*
|
||||
Since virtual engines are disabled in 1 CCS mode, all submissions go to the same engine, and these are serialized by GuC. GuC will time slice those submission, each process will get time quanta (default 5ms), after which GuC will try to preempt and switch to other workload.
|
||||
|
||||
- *What happens in a multi-process application, with all processes having similar workload?*
|
||||
2 CCS or 4 CCS modes may be used, as they will ensure concurrent execution of all processes.
|
||||
|
||||
- *What happens if an application wants to use multiple CCS with unbalanced work?*
|
||||
Better to use 1 CCS mode to ensure that all EUs are used by all queues and to avoid having the bigger workload using only a percentage of the EUs. GPU partitioning is static, so if you have small amount of work on one CCS, it will underuse 25/50% of statically assigned resources.
|
||||
|
||||
- *What happens with implicit scaling?*
|
||||
When implicit is enabled, only part of each tile is used for the split workgroups. That is, 100% of each time with 1 CCS mode, 50% with 2 CCS mode, and 25% with 4 CCS mode. It is recommended then to use only 1 CCS mode with implicit scaling to avoid performance regressions with implicit scaling.
|
||||
|
||||
# Interaction with Affinity Mask
|
||||
|
||||
`ZE_AFFINITY_MASK` is read by the Level Zero driver prior to `ZEX_NUMBER_OF_CCS`. Therefore, mask can hide some root devices and change its indexes and `ZEX_NUMBER_OF_CCS` would apply for root device indexes after masking. For instance, in a 4-PVC system, we could have:
|
||||
|
||||
- Process 0 - `ZE_AFFINITY_MASK=0.0` `ZEX_NUMBER_OF_CCS=0:1`
|
||||
- Process 1 - `ZE_AFFINITY_MASK=0.1` `ZEX_NUMBER_OF_CCS=0:1`
|
||||
- Process 2 - `ZE_AFFINITY_MASK=1.0` `ZEX_NUMBER_OF_CCS=0:4`
|
||||
- Process 3 - `ZE_AFFINITY_MASK=1.1` `ZEX_NUMBER_OF_CCS=0:4`
|
||||
|
||||
Alternatively, a process may select different modes for each tile. For instance, the following line selects card 0's tile 1 with 4 CCSs, and card 1's tile 0 with 2 CCSs:
|
||||
|
||||
- `ZE_AFFINITY_MASK=0.1,1.0` `ZEX_NUMBER_OF_CCS=0:4,1:2`
|
||||
|
||||
# Availability
|
||||
|
||||
- `ZEX_NUMBER_OF_CCS` is only supported and meant to be used on PVC.
|
||||
- `ZEX_NUMBER_OF_CCS` can be used also by applications using Intel OpenCL GPU driver.
|
||||
Reference in New Issue
Block a user