diff --git a/programmers-guide/ALLOCATIONS_GREATER_THAN_4GB.md b/programmers-guide/ALLOCATIONS_GREATER_THAN_4GB.md new file mode 100644 index 0000000000..672f8cd6b6 --- /dev/null +++ b/programmers-guide/ALLOCATIONS_GREATER_THAN_4GB.md @@ -0,0 +1,154 @@ + + +# Allocations greater than 4GB + +* [Introduction](#Introduction) +* [Creating allocations greater than 4GB](#creating-allocations-greater-than-4GB) +* [Intel Graphics Compiler build flags](#intel-graphics-compiler-build-flags) + +# Introduction + +OpenCL and Level Zero APIs allow to allocate memory with size restrictions. Maximum allocation size for those APIs can be queried by + +* `clGetDeviceInfo` with param name `CL_DEVICE_MAX_MEM_ALLOC_SIZE` in OpenCL +* `zeDeviceGetProperties` in Level Zero + +According to HW architecture, "stateful addressing model" limits maximum allocation size to 4GB. Because of this limitation, default maximum size supported by NEO is 4GB. + + +It's possible to relax this limitation for both APIs under certain conditions: + +* kernel must be compiled in stateless mode [Intel Graphics Compiler Build Flags](#intel-graphics-compiler-build-flags) +* memory must be allocated with flag allowing bigger allocation size [Creating Allocations Greater Than 4GB](#creating-allocations-greater-than-4GB) + +# Creating allocations greater than 4GB + +## Level Zero + +To allocate memory greater than 4GB in Level Zero, it is necessary to pass `ze_relaxed_allocation_limits_exp_desc_t` struct to API call that allocates memory. + +This structure must be passed by `pNext` member of: +* `ze_device_mem_alloc_desc_t` when allocating with `zeMemAllocShared` and `zeMemAllocDevice` + + ```cpp + ze_relaxed_allocation_limits_exp_desc_t relaxedAllocationLimitsExpDesc = {ZE_STRUCTURE_TYPE_RELAXED_ALLOCATION_LIMITS_EXP_DESC}; + relaxedAllocationLimitsExpDesc.flags |= ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE; + + ze_device_mem_alloc_desc_t deviceDesc = {ZE_STRUCTURE_TYPE_DEVICE_MEM_ALLOC_DESC}; + deviceDesc.pNext = &relaxedAllocationLimitsExpDesc; + + zeMemAllocDevice(hContext, &deviceDesc, size, alignment, hDevice, pptr); + ``` + +* `ze_host_mem_alloc_desc_t` when allocating with `zeMemAllocHost` + + ```cpp + ze_relaxed_allocation_limits_exp_desc_t relaxedAllocationLimitsExpDesc = {ZE_STRUCTURE_TYPE_RELAXED_ALLOCATION_LIMITS_EXP_DESC}; + relaxedAllocationLimitsExpDesc.flags |= ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE; + + ze_host_mem_alloc_desc_t hostDesc = {ZE_STRUCTURE_TYPE_HOST_MEM_ALLOC_DESC}; + hostDesc.pNext = &relaxedAllocationLimitsExpDesc; + + zeMemAllocHost(hContext, &hostDesc, size, alignment, pptr); + ``` + +Structure `ze_relaxed_allocation_limits_exp_desc_t` must have `ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE` flag set. + + +## OpenCL + +To allocate memory greater than 4GB in OpenCL you need to use `CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL` flag. + +* For api calls: + * `clCreateBuffer` + * `clCreateBufferWithProperties` + * `clCreateBufferWithPropertiesINTEL` + + `CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL` flag must be set in passed `cl_mem_flags flags` param. + + ```cpp + cl_mem_flags flags = 0; + flags |= CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL; + + cl_mem buffer = clCreateBuffer(context, flags, size, host_ptr, errcode_ret); + ``` +* For api call: + * `clSVMAlloc` + + `CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL` flag must be set in passed `cl_svm_mem_flags flags` param. + + ```cpp + cl_svm_mem_flags flags = 0; + flags |= CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL; + + void *alloc = clSVMAlloc(context, flags, size, alignment); + ``` + +* For api calls: + * `clSharedMemAllocINTEL` + * `clDeviceMemAllocINTEL` + * `clHostMemAllocINTEL` + + `CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL` flag must be set in `cl_mem_flags` or `cl_mem_flags_intel` property, in `cl_mem_properties_intel *properties` param. + + ```cpp + cl_mem_flags_intel flags = 0; + flags |= CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL; + cl_mem_properties_intel properties[] = {CL_MEM_FLAGS_INTEL, flags, 0}; + + void *alloc = clSharedMemAllocINTEL(context, device, properties, size, alignment, errcode_ret); + ``` + +## Debug Keys + +NEO allows to relax buffer size limitation with Debug Key named `AllowUnrestrictedSize` (Works with both APIs) + +When set to 1 - maximum allocation size is ignored during buffer creation, despite `ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE`/`CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL` is not passed. + +When set to 0 - size restrictions are enforced. + +You need to keep in mind that it's only a debug key which is used for driver development and debug process. It's not a part of specification so there is no guarantee that it will work correctly in every case and can be deprecated in any time. + + +# Intel Graphics Compiler build flags + +To compile a kernel in stateless addressing model required to allow use of buffers that are bigger than 4GB, following compilation flag must be used: + +## Level Zero + +`-ze-opt-greater-than-4GB-buffer-required` This flag must be set in `pBuildFlags` member of `ze_module_desc_t` that is passed to `zeModuleCreate` + +```cpp +ze_module_desc_t moduleDesc = {ZE_STRUCTURE_TYPE_MODULE_DESC}; +moduleDesc.format = ZE_MODULE_FORMAT_IL_SPIRV; +moduleDesc.pInputModule = moduleData; +moduleDesc.inputSize = moduleSize; +moduleDesc.pBuildFlags = "-ze-opt-greater-than-4GB-buffer-required"; + +zeModuleCreate(hContext, hDevice, &moduleDesc, phModule, phBuildLog); +``` + +## OpenCL + +`-cl-intel-greater-than-4GB-buffer-required` This flag must be set in `options` param that is passed to `clBuildProgram` + +```cpp +const char options[] = "-cl-intel-greater-than-4GB-buffer-required"; + +clBuildProgram(program, num_devices, device_list, options, callback, user_data); +``` + + +When above flags are passed, compiler compiles kernels in a stateless addressing model allowing usage of allocations of any size. + +# References + +https://spec.oneapi.io/level-zero/latest/core/api.html#relaxedalloclimits + +https://spec.oneapi.io/level-zero/latest/core/api.html#ze-module-desc-t diff --git a/programmers-guide/PROGRAMMERS_GUIDE.md b/programmers-guide/PROGRAMMERS_GUIDE.md index eb9936c219..3bff0ca724 100644 --- a/programmers-guide/PROGRAMMERS_GUIDE.md +++ b/programmers-guide/PROGRAMMERS_GUIDE.md @@ -15,4 +15,5 @@ This document provides the architectural design followed in the Intel(R) Graphic ### [Implicit scaling](IMPLICIT_SCALING.md) ### [Immediate Commandlist](IMMEDIATE_COMMANDLIST.md) ### [System Memory Allocations in Level Zero](SYSTEM_MEMORY_ALLOCATIONS.md) -### [Module Symbols and Linking in Level Zero](MODULE_SYMBOL_SUPPORT.md) \ No newline at end of file +### [Module Symbols and Linking in Level Zero](MODULE_SYMBOL_SUPPORT.md) +### [Allocations greater than 4GB](ALLOCATIONS_GREATER_THAN_4GB.md) \ No newline at end of file