So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Changes:
- replaced registry keys with environment variables
for cl_cache in OCL
- added compiler cache helpers
- implemented support for new env vars on Windows
- added tests
New env vars mechanism works as follows:
If `PERSISTENT_CACHE` is set,
driver checks if `NEO_CACHE_DIR` is set.
If `NEO_CACHE_DIR` is not set,
driver uses `%LocalAppData%\NEO\neo_compiler_cache`
as `cl_cache` destination folder.
If `NEO_CACHE_DIR` is not set and `%LocalAppData%`
path could not be obtained,
compiler cache is disabled.
In the current Windows implementation,
special characters in the folder path are not supported.
Related-To: NEO-8092
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
Fix bug introduced in neo 27314 - splitBarrierRequired was set for all
commands, should be only for cl_command_barrier.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
value depends on CCS count:
- single CCS mode (default) - 50% available
- two CCS mode - 25% available
- four CCS mode - 12.5% available
Related-To: NEO-8377
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- unify Linux and Windows default settings
- unify override default code
- correct size estimation when fence is required
- call virtual function once for both estimation and dispatch
Related-To: NEO-8395
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- when Surface State is reused for new resource, State Cache needs to be
invalidated
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Program barrier to task stream, before next enqueue kernel.
This will reduce the number of batch buffer starts for sequences of
enqueue, barrier, enqueue, ... .
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- this method allocates System Memory
- argument is not needed - ExternalHeap is selected inside this function
- remove unneeded ults
- allocate memory in Device Pool for external heap allocation in
OsAgnosticMemoryManager
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Use the same file `os_handle.h` on both
Linux and Windows.
Change implementation of `HandleType` -> `UnifiedHandle` to
`std::variant<int, void *>`
use `int` on Linux
use `void *` on Windows
Related-To: NEO-8092
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>