Enable implicit scaling via platform config

Related-To: NEO-6819
Signed-off-by: Daniel Chabrowski <daniel.chabrowski@intel.com>
This commit is contained in:
Daniel Chabrowski
2022-05-23 17:03:53 +00:00
committed by Compute-Runtime-Automation
parent 630ecfdd09
commit b5495169ca
13 changed files with 71 additions and 12 deletions

View File

@@ -32,7 +32,7 @@ To manage the resources on those sub-devices, the UMD introduces two main develo
* *Implicit scaling* model, on which application allocates and submits to the root device and driver is responsible for distribution of work and memory across tiles.
* *Explicit scaling* model, on which application is responsible for distributing work and memory across tiles using sub-device handles.
When doing allocations in implicit scaling mode, driver *colors* an allocation among the available tiles. Default coloring divides an allocation size evenly by the number of avaialable tiles. Other policies include dividing the allocation in chunks of a given size, which are then interleaved on each tile.
When doing allocations in implicit scaling mode, driver *colors* an allocation among the available tiles. Default coloring divides an allocation size evenly by the number of available tiles. Other policies include dividing the allocation in chunks of a given size, which are then interleaved on each tile.
When scheduling a kernel for execution, driver distributes the kernel workgroups among the available tiles. Default mechanism is called *Static Partitioning*, where the workgroups are evenly distributed among tiles. For instance, in a 2-tile system, half of the workgroups go to tile 0, and the other half to tile 1.
@@ -40,7 +40,7 @@ The number of CCSs, or compute engines, currently available with implicit scalin
No implicit scaling support is available for BCSs. Considering that, two models are followed in terms of discovery of copy engines:
* In Level Zero, the copy engines from sub-device 0 are exposed also in the root device. This to align the engine model on both the implicit and the non-implicit-scaling scenarios.
* In Level Zero, the copy engines from sub-device 0 are exposed also in the root device. This is to align the engine model on both the implicit and the non-implicit-scaling scenarios.
* In OpenCL, copy engines are not exposed in the root device.
Since implicit scaling is only done for EUs, which are associated only with kernels submitted to CCS, BCSs are currently not being exposed and access to them are done through sub-device handles.
@@ -76,4 +76,4 @@ For workloads with no coherent L3 caches among tiles, such as XeHP_SDV, the foll
* `ForceMultiGpuAtomics`: Set to `0` to have global atomics (slow mode for multi-tile) and `1` to have atomics on L3 cache (fast mode for on tile).
* Caches are flushed after every kernel. This can be disabled with `DoNotFlushCaches=1`.
* Kernels are serialized to maintain functional correctness of split execution.
* Kernels are serialized to maintain functional correctness of split execution.