294 lines
13 KiB
Markdown
294 lines
13 KiB
Markdown
<!---
|
|
|
|
Copyright (C) 2021 Intel Corporation
|
|
|
|
SPDX-License-Identifier: MIT
|
|
|
|
-->
|
|
|
|
# Release Notes v1.3
|
|
|
|
Level Zero Core API.
|
|
|
|
January 2022
|
|
|
|
## Changes in this release:
|
|
|
|
### Implict Scaling
|
|
|
|
Implicit scaling has been enabled by default on Level Zero on Xe HPC (PVC) B and later steppings. The `EnableImplicitScaling` debug key may be used to enable (`EnableImplicitScaling=1`) or disable (`EnableImplicitScaling=0`) implicit scaling on on Xe HPC and other multi-tile architectures.
|
|
|
|
### [Blocking Free](https://spec.oneapi.io/level-zero/latest/core/api.html#zememfreeext)
|
|
|
|
The blocking free memory policy has been implemented for `zeMemFreeExt` extension. Defer free policy will be added in upcoming releases.
|
|
|
|
### [PCI Properties Extension](https://spec.oneapi.io/level-zero/latest/core/EXT_PCIProperties.html#pci-properties-extension)
|
|
|
|
Support for PCI properties extension has been added via `zeDevicePciGetPropertiesExt` interface. This currently provides access to device's BDF address only. Device bandwidth property will be exposed in future based on support from underlying components
|
|
|
|
### [Memory Compression Hints](https://spec.oneapi.io/level-zero/latest/core/EXT_MemoryCompressionHints.html#memory-compression-hints-extension)
|
|
|
|
Memory compression hints for shared and device memory allocations and images have been added.
|
|
|
|
### Sampler Address Modes Fix
|
|
|
|
Level Zero driver had a bug in the implementation of the ZE_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER and ZE_SAMPLER_ADDRESS_MODE_CLAMP address modes, where this were being implemented invertedly. This is now fixed and users can use driver's version to determine which address mode to use. Details on how DPC++ is handling this can be found in:
|
|
|
|
[https://github.com/intel/llvm/blob/756c2e8fb45e44b51b32bd8a22b3c325f17bb5c9/sycl/plugins/level_zero/pi_level_zero.cpp#L5264?]
|
|
|
|
|
|
# Release Notes v1.2
|
|
|
|
Level Zero Core API.
|
|
|
|
August 2021
|
|
|
|
## Changes in this release:
|
|
|
|
### [Extension to create image views for planar formats](https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=relaxed#relaxedalloclimits-enums)
|
|
|
|
This extension allows accessing each plane for planar formats and have different interpretations of created images.
|
|
|
|
Sample code:
|
|
|
|
[https://github.com/intel/compute-runtime/blob/master/level_zero/core/test/black_box_tests/zello_image_view.cpp](https://github.com/intel/compute-runtime/blob/master/level_zero/core/test/black_box_tests/zello_image_view.cpp)
|
|
|
|
### [Extension for querying image properties](https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=ze_image_memory_properties_exp_t#_CPPv432ze_image_memory_properties_exp_t)
|
|
|
|
This extension allows querying the different properties of an image, such as size, row pitch, and slice pitch.
|
|
|
|
### [Definition of ZE_STRUCTURE_TYPE_DEVICE_PROPERTIES_1_2 properties](https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=ze_structure_type_device_properties_1_2#_CPPv439ZE_STRUCTURE_TYPE_DEVICE_PROPERTIES_1_2)
|
|
|
|
`ZE_STRUCTURE_TYPE_DEVICE_PROPERTIES_1_2` properties allows users to request driver to return timer resolution in cycles per seconds,
|
|
as defined v1.2 specification:
|
|
|
|
```cpp
|
|
ze_api_version_t version;
|
|
zeDriverGetApiVersion(hDriver, &version);
|
|
...
|
|
ze_device_properties_t devProperties = {};
|
|
devProperties->stype = ZE_STRUCTURE_TYPE_DEVICE_PROPERTIES_1_2;
|
|
zeDeviceGetProperties(device, &devProperties);
|
|
|
|
uint64_t timerResolutionInCyclesPerSecond = devProperties.timerResolution;
|
|
```
|
|
|
|
If `ZE_STRUCTURE_TYPE_DEVICE_PROPERTIES_1_2` is not set, then timer resolution is returned in nanoseconds, as defined in v1.1.
|
|
|
|
```cpp
|
|
ze_api_version_t version;
|
|
zeDriverGetApiVersion(hDriver, &version);
|
|
...
|
|
ze_device_properties_t devProperties = {};
|
|
zeDeviceGetProperties(device, &devProperties);
|
|
|
|
uint64_t timerResolutionInNanoSeconds = devProperties.timerResolution;
|
|
```
|
|
### Extension to set preferred allocation for USM shared allocations
|
|
[`ZE_DEVICE_MEM_ALLOC_FLAG_BIAS_INITIAL_PLACEMENT`](https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=mem_alloc_flag_bias_initial_placement#_CPPv447ZE_DEVICE_MEM_ALLOC_FLAG_BIAS_INITIAL_PLACEMENT) and [`ZE_HOST_MEM_ALLOC_FLAG_BIAS_INITIAL_PLACEMENT`](https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=mem_alloc_flag_bias_initial_placement#_CPPv445ZE_HOST_MEM_ALLOC_FLAG_BIAS_INITIAL_PLACEMENT) can now be set in
|
|
`ze_device_mem_alloc_flags_t` and `ze_host_mem_alloc_flags_t`, respectively, when creating a shared-alloaction, to indicate
|
|
the driver where a shared-allocation should be initially placed.
|
|
|
|
### [IPC Memory Cache Bias Flags](https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=ze_ipc_memory_flag_bias_cached#ze-ipc-memory-flags-t)
|
|
|
|
`ZE_IPC_MEMORY_FLAG_BIAS_CACHED` and `ZE_IPC_MEMORY_FLAG_BIAS_UNCACHED ` can be passed when opening an IPC
|
|
memory handle with `zeMemOpenIpcHandle` to set the cache settings of the imported allocation.
|
|
|
|
### [Support for preferred group size](https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=ze_kernel_preferred_group_size_properties_t#ze-kernel-preferred-group-size-properties-t)
|
|
|
|
`ze_kernel_preferred_group_size_properties_t` can be used through `zeKernelGetProperties` to query for the preferred
|
|
multiple group size of a kernel for submission. Submitting a kernel with the preferred group size returned by the driver
|
|
may improve performance in certain platforms.
|
|
|
|
### [Module compilation options](https://spec.oneapi.io/level-zero/latest/core/PROG.html#module-build-options)
|
|
|
|
Optimization levels can now be passed to `zeModuleCreate` using the `-ze-opt-level` option, which are then communicated
|
|
to the underlying graphics compiler as hint to indicate the level of optimization desired.
|
|
|
|
### [Extension to read the timestamps of each subdevice](https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=zeeventquerytimestampsexp#zeeventquerytimestampsexp)
|
|
|
|
This extension defines the `zeEventQueryTimestampsExp` interface to query for timestamps of the parent device or
|
|
all of the available subdevices.
|
|
|
|
### [Extension to set thread arbitration policy](https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=ze_structure_type_device_properties_1_2#kernelschedulinghints)
|
|
|
|
The `zeKernelSchedulingHintExp` interface allows applications to set the thread arbitration policy desired for the
|
|
target kernel. Avaialable policies can be queried by application through `zeDeviceGetModuleProperties` with the
|
|
[`ze_scheduling_hint_exp_properties_t`](https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=ze_scheduling_hint_exp_properties_t#_CPPv435ze_scheduling_hint_exp_properties_t) structure.
|
|
|
|
Policies include:
|
|
|
|
* `ZE_SCHEDULING_HINT_EXP_FLAG_OLDEST_FIRST`
|
|
* `ZE_SCHEDULING_HINT_EXP_FLAG_ROUND_ROBIN`
|
|
* `ZE_SCHEDULING_HINT_EXP_FLAG_STALL_BASED_ROUND_ROBIN`
|
|
|
|
### [Extension for cache reservation](https://spec.oneapi.io/level-zero/latest/core/EXT_CacheReservation.html#cache-reservation-extension)
|
|
|
|
With `zeDeviceReserveCacheExt`, applications can reserve sections of the GPU cache for exclusive use. Cache level
|
|
support varies between platforms.
|
|
|
|
Likewise, `zeDeviceSetCacheAdviceExt`, can be used to set a region of the cached as reserved or non-reserved region. If default behavior selected, then non-reserved is used, where region is accessible to all clients or applications.
|
|
|
|
|
|
# Release Notes v1.1
|
|
|
|
Level Zero Core API.
|
|
|
|
April 2021
|
|
|
|
## Changes in this release:
|
|
|
|
### Device allocations larger than 4GB size.
|
|
https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=relaxed#relaxedalloclimits-enums
|
|
|
|
L0 driver now allows the allocation of buffers larger than 4GB. To use, the `ze_relaxed_allocation_limits_exp_desc_t`
|
|
structure needs to be passed to `zeMemAllocHost` or `zeMemAllocShared` as a linked descriptor.
|
|
|
|
Sample code:
|
|
|
|
```cpp
|
|
ze_relaxed_allocation_limits_exp_desc_t relaxedDesc = {};
|
|
relaxedDesc.stype = ZE_STRUCTURE_TYPE_RELAXED_ALLOCATION_LIMITS_EXP_DESC;
|
|
relaxedDesc.flags = ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE;
|
|
|
|
ze_device_mem_alloc_desc_t deviceDesc = {};
|
|
deviceDesc.pNext = &relaxedDesc;
|
|
zeMemAllocDevice(context, &deviceDesc, size, 0, device, &ptr);
|
|
```
|
|
|
|
In addition to this, kernels need to be compiled with `ze-opt-greater-than-4GB-buffer-required`. This needs to be
|
|
passed in `pBuildFlags` field in `ze_module_desc_t` descriptor while calling `zeModuleCreate`.
|
|
|
|
### zeDeviceGetGlobalTimestamps for CPU/GPU synchronized time.
|
|
https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=zedevicegetglobaltimestamps#_CPPv427zeDeviceGetGlobalTimestamps18ze_device_handle_tP8uint64_tP8uint64_t
|
|
|
|
Returns synchronized Host and device global timestamps.
|
|
|
|
Sample code:
|
|
|
|
```cpp
|
|
ze_relaxed_allocation_limits_exp_desc_t relaxedDesc = {};
|
|
relaxedDesc.stype = ZE_STRUCTURE_TYPE_RELAXED_ALLOCATION_LIMITS_EXP_DESC;
|
|
relaxedDesc.flags = ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE;
|
|
|
|
ze_device_mem_alloc_desc_t deviceDesc = {};
|
|
deviceDesc.pNext = &relaxedDesc;
|
|
zeMemAllocDevice(context, &deviceDesc, size, 0, device, &ptr);
|
|
```
|
|
|
|
### Global work offset
|
|
https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=globaloffset#_CPPv426zeKernelSetGlobalOffsetExp18ze_kernel_handle_t8uint32_t8uint32_t8uint32_t
|
|
|
|
Applications now can set a global work offset to kernels.
|
|
|
|
Sample code:
|
|
|
|
```cpp
|
|
...
|
|
uint32_t groupSizeX = sizeX;
|
|
uint32_t groupSizeY = 1u;
|
|
uint32_t groupSizeZ = 1u;
|
|
zeKernelSetGroupSize(kernel, groupSizeX, groupSizeY, groupSizeZ);
|
|
|
|
uint32_t offsetx = offset;
|
|
uint32_t offsety = 0;
|
|
uint32_t offsetz = 0;
|
|
zeKernelSetGlobalOffsetExp(kernel, offsetx, offsety, offsetz);
|
|
...
|
|
```
|
|
|
|
### Atomic floating point properties
|
|
https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=ze_structure_type_float_atomic_ext_properties#_CPPv432ze_float_atomic_ext_properties_t
|
|
|
|
Applications now can query for floating atomic properties supported by the device in a kernel.
|
|
This is done by passing `ze_float_atomic_ext_properties_t` to zeDeviceGetModuleProperties as a linked property structure.
|
|
|
|
Sample code:
|
|
|
|
```cpp
|
|
ze_device_module_properties_t kernelProperties = {};
|
|
ze_float_atomic_ext_properties_t extendedProperties = {};
|
|
extendedProperties.stype = ZE_STRUCTURE_TYPE_FLOAT_ATOMIC_EXT_PROPERTIES;
|
|
kernelProperties.pNext = &extendedProperties;
|
|
zeDeviceGetModuleProperties(hDevice, &kernelProperties);
|
|
|
|
if (extendedProperties.fp16Flags & ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD) {
|
|
// kernel supports floating atomic add and subtract
|
|
}
|
|
```
|
|
|
|
### Context Creation for specific devices
|
|
https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=zecontextcreate#_CPPv417zeContextCreateEx18ze_driver_handle_tPK17ze_context_desc_t8uint32_tP18ze_device_handle_tP19ze_context_handle_t
|
|
|
|
Added `zeContextCreateEX` to create a context with a set of devices. Resources allocated against that context
|
|
are visible only to the devices for which the context was created.
|
|
|
|
Sample code:
|
|
|
|
```cpp
|
|
std::vector<ze_device_handle_t> devices;
|
|
devices.push_back(device0);
|
|
devices.push_back(device1);
|
|
...
|
|
zeContextCreateEx(hDriver, &desc, devices.size(), devices.data(), &phContext);
|
|
```
|
|
|
|
### Change on timer resolution
|
|
https://spec.oneapi.com/level-zero/latest/core/api.html?highlight=timerresolution#_CPPv4N22ze_device_properties_t15timerResolutionE
|
|
|
|
Time resolution returned by device properties has been changed to cycles/second (v1.0 has a resolution of nano-seconds).
|
|
To help libraries with the transtition to the new resolution, the `UseCyclesPerSecondTimer` variable has been defined.
|
|
When set to 1, the driver will return the resolution defined for v1.1 (cycles/second), otherwise, it will still
|
|
return the resolution for v1.0 (nanoseconds). The use of this environment variable is only temporal while applications
|
|
and libraries complete their transition to v1.1 and will be eventually eliminated, leaving the resolution for v1.1 as default.
|
|
|
|
When reading querying for the timere resolution, applications then need to keep in mind:
|
|
|
|
* If `ZE_API_VERSION_1_0` returned by `zeDriverGetApiVersion`: Timer resolution is nanoseconds.
|
|
* If `ZE_API_VERSION_1_1` returned by `zeDriverGetApiVersion`: Timer resolution is nanoseconds, as in v1.0.
|
|
* If `ZE_API_VERSION_1_1` returned by `zeDriverGetApiVersion` and `UseCyclesPerSecondTimer=1`: Timer resolution is cycles per seconds, as in v1.1.
|
|
|
|
Note: In Release builds, `NEOReadDebugKeys=1` may be needed to read environment variables. To confirm the L0 driver is
|
|
reading the environment variables, please use `PrintDebugSettings=1`, which will print them at the beginning of the
|
|
application. See below:
|
|
|
|
```sh
|
|
$ PrintDebugSettings=1 UseCyclesPerSecondTimer=1 ./zello_world_gpu
|
|
Non-default value of debug variable: PrintDebugSettings = 1
|
|
Non-default value of debug variable: UseCyclesPerSecondTimer = 1
|
|
...
|
|
```
|
|
|
|
Sample code:
|
|
|
|
if `UseCyclesPerSecondTimer=1` set
|
|
|
|
```cpp
|
|
ze_api_version_t version;
|
|
zeDriverGetApiVersion(hDriver, &version);
|
|
...
|
|
ze_device_properties_t devProperties = {};
|
|
zeDeviceGetProperties(device, &devProperties);
|
|
|
|
if (version == ZE_API_VERSION_1_1) {
|
|
uint64_t timerResolutionInCyclesPerSecond = devProperties.timerResolution;
|
|
} else {
|
|
uint64_t timerResolutionInNanoSeconds = devProperties.timerResolution;
|
|
}
|
|
|
|
...
|
|
```
|
|
|
|
if `UseCyclesPerSecondTimer` not set
|
|
|
|
```cpp
|
|
ze_api_version_t version;
|
|
zeDriverGetApiVersion(hDriver, &version);
|
|
...
|
|
ze_device_properties_t devProperties = {};
|
|
zeDeviceGetProperties(device, &devProperties);
|
|
|
|
uint64_t timerResolutionInNanoSeconds = devProperties.timerResolution;
|
|
...
|
|
```
|