2021-06-16 03:05:19 +08:00
<!-- -
2023-05-09 06:39:55 +08:00
Copyright (C) 2020-2023 Intel Corporation
2021-06-16 03:05:19 +08:00
SPDX-License-Identifier: MIT
-->
2020-02-28 16:30:11 +08:00
# Frequently asked questions (OpenCL)
For general questions,
see the [main FAQ ](https://github.com/intel/compute-runtime/blob/master/FAQ.md ).
## OpenCL version
### Which version of OpenCL is supported?
See [README.md ](https://github.com/intel/compute-runtime/blob/master/README.md ).
2020-02-28 23:11:32 +08:00
## Known Issues and Limitations
OpenCL compliance of a driver built from open-source components should not be
assumed by default. Intel will clearly designate / tag specific builds to
indicate production quality including formal compliance. Other builds should be
considered experimental.
### What is the functional delta to the "Beignet" driver?
2022-07-21 18:11:13 +08:00
Intel's former open-source [Beignet driver ](https://github.com/intel/beignet ) provided
sharing capabilities with MESA OpenGL driver.
2020-02-28 23:11:32 +08:00
NEO supports platforms starting with Gen8 graphics (formerly Broadwell).
For earlier platforms, please use Beignet driver.
2020-02-28 16:30:11 +08:00
## Feature: cl_intel_va_api_media_sharing extension
### Where can I learn more about this extension?
See the enabling [guide ](cl_intel_va_api_media_sharing.md ).
## Feature: cl_cache
2023-05-09 06:39:55 +08:00
Originally, compute-runtime had an experimental cache implementation, which was replaced
in Q2'23 with a more robust approach. Legacy solution is now considered deprecated and the
old experimental controls explained below will be removed by EOY 2023.
2020-02-28 16:30:11 +08:00
### What is cl_cache?
This is a mechanism to cache binary representations of OpenCL kernels provided in text form by
the application. By storing the binary representations, compiling is required only the first time,
which improves performance.
2023-05-09 06:39:55 +08:00
### Linux
#### Official instructions
2023-09-08 19:22:54 +08:00
##### Environment flags
2023-05-09 06:39:55 +08:00
2023-08-26 00:41:59 +08:00
NEO_CACHE_PERSISTENT - integer value to enable (1)/disable (0) on-disk binary cache. When enabled
2023-08-31 19:14:30 +08:00
Neo will try to cache and reuse compiled binaries. Default is on.
2023-05-09 06:39:55 +08:00
NEO_CACHE_DIR - path to persistent cache directory. Default values are $XDG_CACHE_HOME/neo_compiler_cache
2023-08-31 18:22:58 +08:00
if $XDG_CACHE_HOME is set, $HOME/.cache/neo_compiler_cache otherwise. If none of environment
2023-05-09 06:39:55 +08:00
variables are set then on-disk cache is disabled.
NEO_CACHE_MAX_SIZE - Cache eviction is triggered once total size of cached binaries exceeds the value in
bytes (default is 1GB). Set to 0 to disable size-based cache eviction.
##### How cl_cache works (Linux implementation)
2023-09-08 19:22:54 +08:00
When persistent cache is enabled at first occurance driver create config.file which contains the directory
2023-05-09 06:39:55 +08:00
size and is also entry point to caching mechanism.
Each write to disk has following steps:
1. lock config.file (advisor lock)
2. create temporary file
3. write content to file
4. rename temporary file to proper hash name
Reads are unblocked
Eviction mechanism is working as follow:
1. lock config.file (advisor lock)
2. scandir will gather all entries created by the driver
3. stat all files and check last usage time
4. sort files
5. remove least recently used files with 1/3 amount size
#### Legacy approach
2020-02-28 16:30:11 +08:00
In the working directory, manually create *cl_cache* directory.
The driver will use this directory to store the binary representations of the compiled kernels.
Note: This will work on all supported OSes.
2023-05-09 06:39:55 +08:00
##### Configuring cl_cache location
2020-02-28 16:30:11 +08:00
Cached kernels can be stored in a different directory than the default one.
This is useful when the application is installed into a directory
for which the user doesn't have permissions.
Set the environment variable named `cl_cache_dir` to new location of cl_cache directory.
2023-05-09 06:39:55 +08:00
##### Example:
2020-02-28 16:30:11 +08:00
If the application's directory is `/home/user/Document` , by default cl_cache will be stored in
`/home/user/Document/cl_cache` . If the new path should be `/home/user/Desktop/cl_cache_place` ,
set environment variable `cl_cache_dir` to `/home/user/Desktop/cl_cache_place` .
```bash
export cl_cache_dir=/home/user/Desktop/cl_cache_place
```
Subsequent application runs with passed source code and `cl_cache_dir` environment variable set will
reuse previously cached kernel binaries instead of compiling kernels from source.
2023-05-09 06:39:55 +08:00
### Windows
#### Official instructions (implementation pending)
2023-09-08 19:22:54 +08:00
##### Environment flags
NEO_CACHE_PERSISTENT - integer value to enable (1)/disable (0) on-disk binary cache. When enabled
2023-10-04 00:01:50 +08:00
Neo will try to cache and reuse compiled binaries. Default is on.
2023-09-08 19:22:54 +08:00
NEO_CACHE_DIR - path to persistent cache directory. Default values are %LocalAppData%\NEO\neo_compiler_cache
if %LocalAppData% is found. If none of environment
variables are set then on-disk cache is disabled.
NEO_CACHE_MAX_SIZE - Cache eviction is triggered once total size of cached binaries exceeds the value in
bytes (default is 1GB). Set to 0 to disable size-based cache eviction.
##### How cl_cache works (Windows implementation)
When persistent cache is enabled at first occurance driver create config.file which contains the directory
size and is also entry point to caching mechanism.
Each write to disk has following steps:
1. lock config.file (advisor lock)
2. create temporary file
3. write content to file
4. rename temporary file to proper hash name
Reads are unblocked
Eviction mechanism is working as follow:
1. lock config.file (advisor lock)
2. windows system calls will gather all entries created by the driver
3. check last usage time
4. sort files
5. remove least recently used files with 1/3 amount size
2023-05-09 06:39:55 +08:00
#### Legacy approach
##### Windows configuration
2020-02-28 16:30:11 +08:00
2023-09-08 19:22:54 +08:00
To set the new location of cl_cache directory - add new environment variable:
1. variable name: `cl_cache_dir`
1. variable value: < destination_directory_for_cl_cache >
2020-02-28 16:30:11 +08:00
2023-05-09 06:39:55 +08:00
##### Example:
2020-02-28 16:30:11 +08:00
If application is located in `C:\Program Files\application\app.exe` ,
by default cl_cache will be stored in `C:\Program Files\application\cl_cache` .
2023-09-08 19:22:54 +08:00
If the new path should be `C:\Users\USER\Documents\application\cl_cache` , create a new environment variable named `cl_cache_dir` with the value `C:\Users\USER\Documents\application\cl_cache` .
2020-02-28 16:30:11 +08:00
2023-05-09 06:39:55 +08:00
##### What are the known limitations of cl_cache for Windows?
2020-02-28 16:30:11 +08:00
1. Not thread safe.
(Workaround: Make sure your clBuildProgram calls are executed in thread safe fashion.)
1. Binary representation may not be compatible between various versions of NEO and IGC drivers.
(Workaround: Manually empty *cl_cache* directory prior to update)
1. Cache is not automatically cleaned. (Workaround: Manually empty *cl_cache* directory)
1. Cache may exhaust disk space and cause further failures.
(Workaround: Monitor and manually empty *cl_cache* directory)
1. Cache is not process safe.
## Feature: Out of order queues
### Implementation details of out of order queues implementation
Current implementation of out of order queues allows multiple kernels to be run concurently.
This allows for better device utilization in scenarios where single kernel doesn't fill whole device.
More details can be found here:
* [Sample applications ](https://github.com/intel/compute-samples/tree/master/compute_samples/applications/commands_aggregation )
* [IWOCL(*) presentation ](https://www.iwocl.org/wp-content/uploads/iwocl-2019-michal-mrozek-intel-breaking-the-last-line-of-performance-border.pdf )
### Known issues and limitations
1. Turning on profiling on out of order command queue serializes kernel execution.
1. Blocking command queue with user events blocks all further submissions until event is unblocked.
1. Commands blocked by user events, when unblocked are serialized as well.
## Feature: Double-precision emulation (FP64)
By default NEO driver enables double precision operations only on platforms with supporting hardware.
This is signified by exposing the "cl_khr_fp64" extension in the extension string.
For other platforms, this support can be emulated by the compiler (IGC).
### How do I enable emulation?
FP64 emulation can only be enabled on Linux. There are two settings that have to be set.
#### Runtime setting:
There are two ways you can enable this feature in NEO:
* Set an environment variable **OverrideDefaultFP64Settings** to **1** :
`OverrideDefaultFP64Settings=1`
* In **igdrcl.config** configuration file in the same directory as application binary
(you may have to create this file) add a line as such:
`OverrideDefaultFP64Settings = 1`
#### Compiler setting:
IGC reads flags only from environment, so set **IGC_EnableDPEmulation** to **1** as such:
`IGC_EnableDPEmulation=1`
After both settings have been set you can run the application normally.
### Known issues and limitations
Intel does not claim full specification conformance when using emulated mode.
We reserve the right to not fix issues that appear only in emulation mode.
Performance degradation is to be expected and has not been measured by Intel.