Add documentation on use of allocations greater than 4GB

Related-To: NEO-7825

Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>
This commit is contained in:
Maciej Plewka 2023-03-29 11:36:16 +00:00 committed by Compute-Runtime-Automation
parent 24f73f4686
commit 2677e11c10
2 changed files with 156 additions and 1 deletions

View File

@ -0,0 +1,154 @@
<!---
Copyright (C) 2023 Intel Corporation
SPDX-License-Identifier: MIT
-->
# Allocations greater than 4GB
* [Introduction](#Introduction)
* [Creating allocations greater than 4GB](#creating-allocations-greater-than-4GB)
* [Intel Graphics Compiler build flags](#intel-graphics-compiler-build-flags)
# Introduction
OpenCL and Level Zero APIs allow to allocate memory with size restrictions. Maximum allocation size for those APIs can be queried by
* `clGetDeviceInfo` with param name `CL_DEVICE_MAX_MEM_ALLOC_SIZE` in OpenCL
* `zeDeviceGetProperties` in Level Zero
According to HW architecture, "stateful addressing model" limits maximum allocation size to 4GB. Because of this limitation, default maximum size supported by NEO is 4GB.
It's possible to relax this limitation for both APIs under certain conditions:
* kernel must be compiled in stateless mode [Intel Graphics Compiler Build Flags](#intel-graphics-compiler-build-flags)
* memory must be allocated with flag allowing bigger allocation size [Creating Allocations Greater Than 4GB](#creating-allocations-greater-than-4GB)
# Creating allocations greater than 4GB
## Level Zero
To allocate memory greater than 4GB in Level Zero, it is necessary to pass `ze_relaxed_allocation_limits_exp_desc_t` struct to API call that allocates memory.
This structure must be passed by `pNext` member of:
* `ze_device_mem_alloc_desc_t` when allocating with `zeMemAllocShared` and `zeMemAllocDevice`
```cpp
ze_relaxed_allocation_limits_exp_desc_t relaxedAllocationLimitsExpDesc = {ZE_STRUCTURE_TYPE_RELAXED_ALLOCATION_LIMITS_EXP_DESC};
relaxedAllocationLimitsExpDesc.flags |= ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE;
ze_device_mem_alloc_desc_t deviceDesc = {ZE_STRUCTURE_TYPE_DEVICE_MEM_ALLOC_DESC};
deviceDesc.pNext = &relaxedAllocationLimitsExpDesc;
zeMemAllocDevice(hContext, &deviceDesc, size, alignment, hDevice, pptr);
```
* `ze_host_mem_alloc_desc_t` when allocating with `zeMemAllocHost`
```cpp
ze_relaxed_allocation_limits_exp_desc_t relaxedAllocationLimitsExpDesc = {ZE_STRUCTURE_TYPE_RELAXED_ALLOCATION_LIMITS_EXP_DESC};
relaxedAllocationLimitsExpDesc.flags |= ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE;
ze_host_mem_alloc_desc_t hostDesc = {ZE_STRUCTURE_TYPE_HOST_MEM_ALLOC_DESC};
hostDesc.pNext = &relaxedAllocationLimitsExpDesc;
zeMemAllocHost(hContext, &hostDesc, size, alignment, pptr);
```
Structure `ze_relaxed_allocation_limits_exp_desc_t` must have `ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE` flag set.
## OpenCL
To allocate memory greater than 4GB in OpenCL you need to use `CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL` flag.
* For api calls:
* `clCreateBuffer`
* `clCreateBufferWithProperties`
* `clCreateBufferWithPropertiesINTEL`
`CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL` flag must be set in passed `cl_mem_flags flags` param.
```cpp
cl_mem_flags flags = 0;
flags |= CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL;
cl_mem buffer = clCreateBuffer(context, flags, size, host_ptr, errcode_ret);
```
* For api call:
* `clSVMAlloc`
`CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL` flag must be set in passed `cl_svm_mem_flags flags` param.
```cpp
cl_svm_mem_flags flags = 0;
flags |= CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL;
void *alloc = clSVMAlloc(context, flags, size, alignment);
```
* For api calls:
* `clSharedMemAllocINTEL`
* `clDeviceMemAllocINTEL`
* `clHostMemAllocINTEL`
`CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL` flag must be set in `cl_mem_flags` or `cl_mem_flags_intel` property, in `cl_mem_properties_intel *properties` param.
```cpp
cl_mem_flags_intel flags = 0;
flags |= CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL;
cl_mem_properties_intel properties[] = {CL_MEM_FLAGS_INTEL, flags, 0};
void *alloc = clSharedMemAllocINTEL(context, device, properties, size, alignment, errcode_ret);
```
## Debug Keys
NEO allows to relax buffer size limitation with Debug Key named `AllowUnrestrictedSize` (Works with both APIs)
When set to 1 - maximum allocation size is ignored during buffer creation, despite `ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE`/`CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL` is not passed.
When set to 0 - size restrictions are enforced.
You need to keep in mind that it's only a debug key which is used for driver development and debug process. It's not a part of specification so there is no guarantee that it will work correctly in every case and can be deprecated in any time.
# Intel Graphics Compiler build flags
To compile a kernel in stateless addressing model required to allow use of buffers that are bigger than 4GB, following compilation flag must be used:
## Level Zero
`-ze-opt-greater-than-4GB-buffer-required` This flag must be set in `pBuildFlags` member of `ze_module_desc_t` that is passed to `zeModuleCreate`
```cpp
ze_module_desc_t moduleDesc = {ZE_STRUCTURE_TYPE_MODULE_DESC};
moduleDesc.format = ZE_MODULE_FORMAT_IL_SPIRV;
moduleDesc.pInputModule = moduleData;
moduleDesc.inputSize = moduleSize;
moduleDesc.pBuildFlags = "-ze-opt-greater-than-4GB-buffer-required";
zeModuleCreate(hContext, hDevice, &moduleDesc, phModule, phBuildLog);
```
## OpenCL
`-cl-intel-greater-than-4GB-buffer-required` This flag must be set in `options` param that is passed to `clBuildProgram`
```cpp
const char options[] = "-cl-intel-greater-than-4GB-buffer-required";
clBuildProgram(program, num_devices, device_list, options, callback, user_data);
```
When above flags are passed, compiler compiles kernels in a stateless addressing model allowing usage of allocations of any size.
# References
https://spec.oneapi.io/level-zero/latest/core/api.html#relaxedalloclimits
https://spec.oneapi.io/level-zero/latest/core/api.html#ze-module-desc-t

View File

@ -15,4 +15,5 @@ This document provides the architectural design followed in the Intel(R) Graphic
### [Implicit scaling](IMPLICIT_SCALING.md)
### [Immediate Commandlist](IMMEDIATE_COMMANDLIST.md)
### [System Memory Allocations in Level Zero](SYSTEM_MEMORY_ALLOCATIONS.md)
### [Module Symbols and Linking in Level Zero](MODULE_SYMBOL_SUPPORT.md)
### [Module Symbols and Linking in Level Zero](MODULE_SYMBOL_SUPPORT.md)
### [Allocations greater than 4GB](ALLOCATIONS_GREATER_THAN_4GB.md)