diff --git a/level_zero/doc/FAQ.md b/level_zero/doc/FAQ.md index 7c6b3cac20..1f16060988 100644 --- a/level_zero/doc/FAQ.md +++ b/level_zero/doc/FAQ.md @@ -14,7 +14,6 @@ see the [main FAQ](https://github.com/intel/compute-runtime/blob/master/FAQ.md). ## Feature: l0_cache Mechanism to cache binary representations of GPU kernels passed to zeModuleCreate(), -to avoid compilation overheads in subsequent calls. Please see more information in -https://github.com/intel/compute-runtime/blob/master/opencl/doc/FAQ.md#feature-cl_cache. +to avoid compilation overhead in subsequent calls. -To enable it, please create a *l0_cache* directory in the working directory. +Detailed description in [programmers-guide/COMPILER_CACHE.md](https://github.com/intel/compute-runtime/blob/master/programmers-guide/COMPILER_CACHE.md) diff --git a/opencl/doc/FAQ.md b/opencl/doc/FAQ.md index 0a0421d571..c95ff5d1cf 100644 --- a/opencl/doc/FAQ.md +++ b/opencl/doc/FAQ.md @@ -50,128 +50,7 @@ This is a mechanism to cache binary representations of OpenCL kernels provided i the application. By storing the binary representations, compiling is required only the first time, which improves performance. -### Linux - -#### Official instructions - -##### Environment flags - -NEO_CACHE_PERSISTENT - integer value to enable (1)/disable (0) on-disk binary cache. When enabled - Neo will try to cache and reuse compiled binaries. Default is on. - -NEO_CACHE_DIR - path to persistent cache directory. Default values are $XDG_CACHE_HOME/neo_compiler_cache - if $XDG_CACHE_HOME is set, $HOME/.cache/neo_compiler_cache otherwise. If none of environment - variables are set then on-disk cache is disabled. - -NEO_CACHE_MAX_SIZE - Cache eviction is triggered once total size of cached binaries exceeds the value in - bytes (default is 1GB). Set to 0 to disable size-based cache eviction. - -##### How cl_cache works (Linux implementation) - -When persistent cache is enabled at first occurance driver create config.file which contains the directory -size and is also entry point to caching mechanism. - -Each write to disk has following steps: -1. lock config.file (advisor lock) -2. create temporary file -3. write content to file -4. rename temporary file to proper hash name - -Reads are unblocked - -Eviction mechanism is working as follow: -1. lock config.file (advisor lock) -2. scandir will gather all entries created by the driver -3. stat all files and check last usage time -4. sort files -5. remove least recently used files with 1/3 amount size - -#### Legacy approach - -In the working directory, manually create *cl_cache* directory. -The driver will use this directory to store the binary representations of the compiled kernels. -Note: This will work on all supported OSes. - -##### Configuring cl_cache location - -Cached kernels can be stored in a different directory than the default one. -This is useful when the application is installed into a directory -for which the user doesn't have permissions. - -Set the environment variable named `cl_cache_dir` to new location of cl_cache directory. - -##### Example: - -If the application's directory is `/home/user/Document`, by default cl_cache will be stored in - `/home/user/Document/cl_cache`. If the new path should be `/home/user/Desktop/cl_cache_place`, - set environment variable `cl_cache_dir` to `/home/user/Desktop/cl_cache_place`. -```bash -export cl_cache_dir=/home/user/Desktop/cl_cache_place -``` - -Subsequent application runs with passed source code and `cl_cache_dir` environment variable set will -reuse previously cached kernel binaries instead of compiling kernels from source. - -### Windows - -#### Official instructions (implementation pending) - -##### Environment flags - -NEO_CACHE_PERSISTENT - integer value to enable (1)/disable (0) on-disk binary cache. When enabled - Neo will try to cache and reuse compiled binaries. Default is on. - -NEO_CACHE_DIR - path to persistent cache directory. Default values are %LocalAppData%\NEO\neo_compiler_cache - if %LocalAppData% is found. If none of environment - variables are set then on-disk cache is disabled. - -NEO_CACHE_MAX_SIZE - Cache eviction is triggered once total size of cached binaries exceeds the value in - bytes (default is 1GB). Set to 0 to disable size-based cache eviction. - -##### How cl_cache works (Windows implementation) - -When persistent cache is enabled at first occurance driver create config.file which contains the directory -size and is also entry point to caching mechanism. - -Each write to disk has following steps: -1. lock config.file (advisor lock) -2. create temporary file -3. write content to file -4. rename temporary file to proper hash name - -Reads are unblocked - -Eviction mechanism is working as follow: -1. lock config.file (advisor lock) -2. windows system calls will gather all entries created by the driver -3. check last usage time -4. sort files -5. remove least recently used files with 1/3 amount size - -#### Legacy approach - -##### Windows configuration - -To set the new location of cl_cache directory - add new environment variable: -1. variable name: `cl_cache_dir` -1. variable value: - -##### Example: - -If application is located in `C:\Program Files\application\app.exe`, -by default cl_cache will be stored in `C:\Program Files\application\cl_cache`. -If the new path should be `C:\Users\USER\Documents\application\cl_cache`, create a new environment variable named `cl_cache_dir` with the value `C:\Users\USER\Documents\application\cl_cache`. - -##### What are the known limitations of cl_cache for Windows? - -1. Not thread safe. -(Workaround: Make sure your clBuildProgram calls are executed in thread safe fashion.) -1. Binary representation may not be compatible between various versions of NEO and IGC drivers. -(Workaround: Manually empty *cl_cache* directory prior to update) -1. Cache is not automatically cleaned. (Workaround: Manually empty *cl_cache* directory) -1. Cache may exhaust disk space and cause further failures. -(Workaround: Monitor and manually empty *cl_cache* directory) -1. Cache is not process safe. +Detailed description in [programmers-guide/COMPILER_CACHE.md](https://github.com/intel/compute-runtime/blob/master/programmers-guide/COMPILER_CACHE.md) ## Feature: Out of order queues diff --git a/programmers-guide/COMPILER_CACHE.md b/programmers-guide/COMPILER_CACHE.md new file mode 100644 index 0000000000..efe3525850 --- /dev/null +++ b/programmers-guide/COMPILER_CACHE.md @@ -0,0 +1,134 @@ + + +# Compiler Cache + +- [Introduction](#Introduction) +- [Configuration](#Configuration) +- [Implementation](#Implementation) +- [Key Features](#Key-Features) +- [Debug Keys](#Debug-Keys) +- [Potential Problems and Limitations](#Potential-Problems-and-Limitations) + +# Introduction + +Compiler cache (cl_cache) is a mechanism that can be used to improve the performance of just-in-time (JIT) compilation. +When a kernel is compiled, the binary is stored in the cache. If a compilation of the same kernel is requested, the binary from the cache is used instead of being recompiled. Utilizing the cache can substitute the need for compiling kernels in the primary source format - OpenCL C as well as in an intermediate representation, such as SPIRV. Compute Runtime implements a persistent cache, which means storing compiled kernels on a long-term basis. Cached files are preserved across multiple runs of OpenCL/Level Zero applications and only deleted once eviction is needed. + +The cache mechanism in Compute Runtime is supported by two APIs: OpenCL and Level Zero. + +The following file extensions are used: + +- OpenCL: *.cl_cache* +- Level Zero: *.l0_cache* + +Each cache file has a \.\ format.
+Hash is a unique identifier for a file, created based on the following attributes: + +Kernel sources +- Translation Input + +Provided options +- Build API Options +- Build Internal Options + +User defined constants +- Specialization Constant Identifiers +- Specialization Constant Values + +Identifying the compiler's unique version +- IGC Revision +- IGC Library Size +- IGC Library Modification Time + +Additional information +- Hardware Info + +The resulting hash is identical for a particular set of variables, which means that the same variables always generate the same hash. +This ensures that we always read/write the right binary file under the given conditions. + +# Configuration + +## Windows + +| Environment Variable | Value | Description | +| -------------------- | ------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| NEO_CACHE_PERSISTENT | 0: disabled
1: enabled
Default: 1 | Enable or disable on-disk binary cache.
When enabled Compute Runtime will try to cache and reuse compiled binaries. | +| NEO_CACHE_DIR | \
Default: %LocalAppData%\NEO\neo_compiler_cache | Path to persistent cache directory.
If `NEO_CACHE_DIR` is not set and %LocalAppData% could not be accessed,
on-disk cache is disabled. | +| NEO_CACHE_MAX_SIZE | \
Default: 1 GB | Maximum size of compiler cache in bytes.
Total size of files stored in the cache will never exceed this value.
If adding a new binary would cause the cache to exceed its limit, the eviction mechanism is triggered.
Set to 0 to disable size-based cache eviction. | + +## Linux + +| Environment Variable | Value | Description | +| -------------------- | --------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| NEO_CACHE_PERSISTENT | 0: disabled
1: enabled
Default: 1 | Enable or disable on-disk binary cache.
When enabled Compute Runtime will try to cache and reuse compiled binaries. | +| NEO_CACHE_DIR | \
Default: $XDG_CACHE_HOME/neo_compiler_cache | Path to persistent cache directory.
Default value is $XDG_CACHE_HOME/neo_compiler_cache if $XDG_CACHE_HOME is set, $HOME/.cache/neo_compiler_cache otherwise.
If neither `NEO_CACHE_DIR`, $XDG_CACHE_HOME nor $HOME is defined, on-disk cache is disabled. | +| NEO_CACHE_MAX_SIZE | \
Default: 1GB | Maximum size of compiler cache in bytes.
Total size of files stored in the cache will never exceed this value.
If adding a new binary would cause the cache to exceed its limit, the eviction mechanism is triggered.
Set to 0 to disable size-based cache eviction. | + +# Implementation + +## Cache Creation + +When persistent cache is enabled, at first occurance driver creates *config.file* which contains the directory size and is also entry point to caching mechanism. As cache files are created/evicted, the contents of *config.file* are automatically updated. +Such mechanism prevents the total cache size from being calculated at each compilation and prevents iterating through all files each time. + +Each write to disk has following steps: + +1. lock *config.file* (advisor lock) +1. create temporary file +1. write contents to file +1. rename temporary file to proper hash name +1. store updated directory size in the *config.file* +1. unlock *config.file* + +## Cache Eviction + +Since Compute Runtime cl_cache is persistent type of cache, it is not cleared when the application has finished or the system is rebooted. +It stays until eviction is needed, which is when the cache limit set by the `NEO_CACHE_MAX_SIZE` is about to be reached. + +At such point, Compute Runtime activates the eviction mechanism, which works as follows: + +1. lock *config.file* (advisor lock) +1. gather all cache files +1. sort files by last access time value (read/write) +1. remove least recently accessed files with a total size of 1/3 `NEO_CACHE_MAX_SIZE` +1. store updated directory size in the *config.file* +1. unlock *config.file* + +The eviction mechanism first removes the least recently accessed files, which are least likely to be reused. +This keeps the cache as up-to-date as possible. + +# Key Features + +- By using mutex and file locking mechanism, cl_cache provides thread and process safety +- Skipping multiple compilation results in performance gains +- Automatic eviction of old cache + +# Debug Keys + +The cl_cache mechanism consists of many stages and an error can occur in any of them. +For this reason, we have placed error checking at each critical point.
+If a problem occurred, the information about it can be read with a debug key: + +`PrintDebugMessages=0/1` : when enabled, some debug messages will be propagated to console + +Additionally, the `BinaryCacheTrace` flag can be used to provide deeper insights into the hash creation process: + +`BinaryCacheTrace=0/1` : when enabled, cl_cache will generate a trace file containing values of all attributes used to calculate the unique hash for a compiled binary. This information can be invaluable in understanding why a specific kernel is or isn't being retrieved from the cache as expected. + +*Note: `PrintDebugMessages` propagates all debug messages from the driver - it is not exclusive to cl_cache.* + +# Potential Issues and Limitations + +- Since the cl_cache mechanism automatically keeps track of the size of all created and deleted cache files, the user should not manually tamper with the generated files. If this were to happen, *config.file* would not have the correct directory size which could result in failure to meet the limit set by `NEO_CACHE_MAX_SIZE`. However, it is safe to manually delete the entire cache directory or all files at once. + +- If we encounter issues opening or reading data from the *config.file* (e.g., due to corruption), the cache mechanism will not be utilized, and we will fall back to the standard compilation path. In this situation, information about the error will be printed in debug messages. + +- cl_cache relies on the *last access time* of files to manage its eviction process effectively. If *last access time* updates are not enabled in the filesystem, cl_cache will be unable to accurately prioritize files for eviction based on their usage frequency. Consequently, files may not be evicted in the optimal order, potentially affecting cache performance. To address this limitation, future plans include implementing a backup sorting method for eviction, which may utilize file size or creation time as criteria to ensure efficient cache management even when *last access time* is unavailable. + +- When generating a unique hash, Compiler Cache does not take into account the environment variables of external components. Changes in these variables may not trigger cache invalidation, which can lead to unexpected behavior and difficult to debug errors. \ No newline at end of file diff --git a/programmers-guide/PROGRAMMERS_GUIDE.md b/programmers-guide/PROGRAMMERS_GUIDE.md index 72b0052a2d..e4084dc30c 100644 --- a/programmers-guide/PROGRAMMERS_GUIDE.md +++ b/programmers-guide/PROGRAMMERS_GUIDE.md @@ -13,6 +13,7 @@ SPDX-License-Identifier: MIT This document provides the architectural design followed in the Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver. Implementation details and optimization guidelines are explained, as well as a description of the different features available for the different supported platforms. ### [Allocations greater than 4GB](ALLOCATIONS_GREATER_THAN_4GB.md) +### [NEO Compiler Cache](COMPILER_CACHE.md) ### [Implicit scaling](IMPLICIT_SCALING.md) ### [Immediate Commandlist](IMMEDIATE_COMMANDLIST.md) ### [L0 Metrics](METRICS.md)