Commit Graph

95 Commits

Author SHA1 Message Date
Compute-Runtime-Validation
5535ef3049 Revert "performance(ocl): enable usm pool allocator"
This reverts commit 7bc8424a69.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-12-29 05:54:07 +01:00
Dominik Dabek
7bc8424a69 performance(ocl): enable usm pool allocator
Enable opencl usm pool allocator by default

Related-To: NEO-9700

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2023-12-28 13:14:41 +01:00
Dominik Dabek
d238a68bae fix(ocl): usm pool allocator correct size
Wrong debug flag was used for setting host allocation pool size

Related-To: NEO-9700

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2023-12-27 23:14:28 +01:00
Dominik Dabek
2fe3804cc2 performance(ocl): add usm allocation pooling flag
EnableDeviceUsmAllocationPool and EnableHostUsmAllocationPool for device
and host allocations respectively.

Pool size will be set to flag value * MB.

Allocation size threshold to be pooled is 1MB.

Pools are created per context.

Related-To: NEO-9700

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2023-12-27 11:42:01 +01:00
Mateusz Jablonski
27fbdde4c5 refactor: correct naming of unified memory enums
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-13 15:58:21 +01:00
Mateusz Jablonski
739d181026 refactor: correct naming of enum class constants 6/n
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-12-13 14:48:52 +01:00
Compute-Runtime-Validation
a2994e9b29 Revert "performance(ocl): set pool allocator threshold 1MB"
This reverts commit fc1d93af8e.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-12-09 07:02:42 +01:00
Dominik Dabek
fc1d93af8e performance(ocl): set pool allocator threshold 1MB
Increase pool allocator threshold to 1MB
Remove stack allocations based on threshold in tests.

Related-To: NEO-9690

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2023-12-06 19:55:48 +01:00
Mateusz Jablonski
c9664e6bad refactor: rename global debug manager to debugManager
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-11-30 13:00:59 +01:00
Maciej Bielski
c7a971a28f feature: add optional onChunkFree callback to AbstractBuffersPool
Instances returned by `getAllocationsVector()` in some cases cannot be
freed (in the `malloc/new` sense) until the `drain()` function invokes
`allocInUse()` on them. Plus, the `chunksToFree` container operates on
pairs `{offset, size}`, not pointers, so such pair cannot be used to
release allocations either.

Provide an optional callback, which can be implemented by the custom
pool derived from `AbstractBuffersPool`. This callback can be used, for
example, to perform actual release of an allocation related to the
currently processed chunk.

Additionally, provide the `drain()` and `tryFreeFromPoolBuffer()`
functions with pool-independent versions and keep the previous versions
as defaults (for allocators with a single pool). The new versions allow
reusing the code for cases when allocator has multiple pools.

In both cases, there was no such needs so far but it arose when working
on `IsaBuffersAllocator`. The latter is coming with future commits, but
the shared code modifications are extracted as an independent step.

Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
2023-07-13 17:26:51 +02:00
Compute-Runtime-Validation
9c7950cd22 Revert "feature: add optional onChunkFree callback to AbstractBuffersPool"
This reverts commit b7ecf99abb.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2023-07-07 04:31:30 +02:00
Maciej Bielski
b7ecf99abb feature: add optional onChunkFree callback to AbstractBuffersPool
Instances returned by `getAllocationsVector()` in some cases cannot be
freed (in the `malloc/new` sense) until the `drain()` function invokes
`allocInUse()` on them. Plus, the `chunksToFree` container operates on
pairs `{offset, size}`, not pointers, so such pair cannot be used to
release allocations either.

Provide an optional callback, which can be implemented by the custom
pool derived from `AbstractBuffersPool`. This callback can be used, for
example, to perform actual release of an allocation related to the
currently processed chunk.

Additionally, provide the `drain()` and `tryFreeFromPoolBuffer()`
functions with pool-independent versions and keep the previous versions
as defaults (for allocators with a single pool). The new versions allow
reusing the code for cases when allocator has multiple pools.

In both cases, there was no such needs so far but it arose when working
on `IsaBuffersAllocator`. The latter is coming with future commits, but
the shared code modifications are extracted as an independent step.

Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
2023-07-06 10:38:55 +02:00
Maciej Bielski
7ea8ed1757 refactor: extract generic parts of small buffers allocator
Currently the whole code resides within the opencl/ tree, but the
mechanism is meant to be reused in L0 for kernel-ISA allocations
optimization (further work).

This commit is a preparation step, which extracts the generic mechanism
and moves the extracted part under the shared/ tree.

Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
2023-06-13 10:46:03 +02:00
Mateusz Jablonski
4f72835b7d fix: create dedicated class for root device indices to store unique values
remove method to removing duplicates from StackVec as the method
implicitly sorted the vector

Related-To: GSD-4692
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-06-12 22:24:06 +02:00
Fabian Zwolinski
e351a90f81 refactor: Rename member variables to camelCase 2/n
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
2023-04-27 20:39:22 +02:00
Krzysztof Gibala
16db7cc890 fix: Add missing checks in multi gpu scenario
- Check allocation root device index during eviction
- Wait for and marked allocation only from the current root device index

Related-To: NEO-7920
Signed-off-by: Krzysztof Gibala <krzysztof.gibala@intel.com>
2023-04-24 23:26:28 +02:00
Igor Venevtsev
3e5101424d Optimize small buffers allocator
- Do not wait for GPU completion on pool exhaust if allocs are in use,
allocate new pool instead
- Free small buffer address range if allocs are not in use and
buffer pool is exhausted

Resolves: NEO-7769, NEO-7836

Signed-off-by: Igor Venevtsev <igor.venevtsev@intel.com>
2023-04-19 11:56:50 +02:00
Igor Venevtsev
6aadf63725 Revert "Optimize small buffers allocator"
This reverts commit f57ff2913c.

Resolves: HSD-15013057572

Signed-off-by: Igor Venevtsev <igor.venevtsev@intel.com>
2023-03-24 12:17:54 +01:00
Igor Venevtsev
f57ff2913c Optimize small buffers allocator
- Do not wait for GPU completion on pool exhaust if allocs are in use,
allocate new pool instead
- Reuse existing pool if allocs are not in use

Related-To: NEO-7769

Signed-off-by: Igor Venevtsev <igor.venevtsev@intel.com>
2023-03-15 19:12:30 +01:00
Kacper Nowak
efba242570 fix(zebin): Extend oneDNN WA for whole application context
When a dummy kernel "kernel void_(){}" is passed in sources - specific
for workloads with ngen backend - enforce fallback to CTNI for the whole
application context (mark the context as non-zebinary).

Related-To: NEO-7772
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
2023-03-07 14:21:57 +01:00
Warchulski, Jaroslaw
64f735481d Cleanup includes 48
Cleaned up files:
shared/source/command_container/command_encoder.inl
shared/source/os_interface/hw_info_config.h

Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2023-02-10 17:23:02 +01:00
Mateusz Jablonski
24c5352350 refactor: remove redundant including of compiler_cache.h
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-02-03 11:16:31 +01:00
Warchulski, Jaroslaw
439aa6c87f Cleanup includes 43
Cleaned up files:
level_zero/core/test/unit_tests/mocks/mock_kernel.h
opencl/source/mem_obj/mem_obj.h

Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2023-01-25 11:33:39 +01:00
Maciej Plewka
fa4830036a feature(ocl) use tags to synchronize multi root device events
Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>
2023-01-23 10:28:01 +01:00
Maciej Plewka
1421796541 Revert "feature(ocl) use tags to synchronize multi root device events"
This reverts commit 353a7510b2bd2d774d0b7ee82ee48eae7f5dc1d3.

Signed-off-by: Maciej Plewka maciej.plewka@intel.com
2023-01-17 11:29:58 +01:00
Dominik Dabek
0c3cde2141 fix(ocl): adjust pool buffer allocator
Increase chunk alignment from 256 to 512.
Restores performance in some workloads with pool enabled but lowers maximum
possible number of buffers in pool from 256 to 128.

MemObj size will keep the value passed to clCreateBuffer ie. will not be
aligned up by chunk alignment.
CL_MEM_SIZE will now return same value as with pool disabled.

Related-To: NEO-7332

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2023-01-13 14:20:29 +01:00
Mateusz Jablonski
c4759884d8 fix: defer initialization of cross root device tag allocations
additional tag allocations are not needed before creating OCL contexts
with multiple root devices

Related-To: NEO-7634

Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
2023-01-13 13:13:05 +01:00
Maciej Plewka
16bc84e27d feature(ocl) use tags to synchronize multi root device events
Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>
2023-01-13 08:09:32 +01:00
Warchulski, Jaroslaw
3d59dce80c Cleanup includes 27
Cleaned up files:
opencl/source/command_queue/command_queue.h
shared/source/built_ins/registry/built_ins_registry.h
shared/source/kernel/kernel_descriptor.h

Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2023-01-11 16:10:28 +01:00
Warchulski, Jaroslaw
4794648978 Cleanup includes 26
Cleaned up files:
opencl/source/command_queue/csr_selection_args.h
opencl/source/event/event.h
shared/source/helpers/engine_control.h
shared/source/sku_info/definitions/sku_info.h

Related-To: NEO-5548

Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2023-01-11 09:10:45 +01:00
Warchulski, Jaroslaw
77b88f19a1 Cleanup includes 23
Cleaned up files:
opencl/source/execution_environment/cl_execution_environment.h
opencl/source/helpers/cl_validators.h
opencl/test/unit_test/mocks/mock_cl_device.h
opencl/test/unit_test/mocks/mock_context.h
shared/source/helpers/cache_policy.h
shared/source/image/image_surface_state.h

Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2023-01-09 12:30:30 +01:00
Warchulski, Jaroslaw
5eef40fedd Cleanup includes 22
Cleaned up files:
opencl/source/built_ins/builtins_dispatch_builder.h
opencl/source/context/context.h
opencl/source/gtpin/gtpin_notify.h
opencl/source/kernel/kernel.h
opencl/source/kernel/multi_device_kernel.h
opencl/source/mem_obj/buffer.h
opencl/source/mem_obj/mem_obj.h
shared/source/built_ins/registry/built_ins_registry.h
shared/source/page_fault_manager/cpu_page_fault_manager.h

Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2023-01-05 16:59:01 +01:00
Kamil Kopryk
7c23ea3928 Refactor: don't use global ProductHelper getter in ocl files 2/n
Related-To: NEO-6853
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
2022-12-29 09:41:39 +01:00
Compute-Runtime-Validation
876de37b92 Revert "Feature(OCL) Use tag nodes for root device synchronization"
This reverts commit 547d1c37b3.

Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>
2022-12-24 19:14:33 +01:00
Maciej Plewka
547d1c37b3 Feature(OCL) Use tag nodes for root device synchronization
With this commit events created on multi root device contexts will
synchronize using signaled TagNodes instead of using taskCounts.

Signed-off-by: Maciej Plewka <maciej.plewka@intel.com>

Related-To: NEO-7105
2022-12-23 15:48:54 +01:00
Dominik Dabek
3cfc8a0b68 Update flag for ocl pool buffer allocator
Flag == -1 - platform default
Flag == 0 - disabled
Flag == 1 - enabled for single device contexts
Flag == 2 - enabled for all contexts

Related-To: NEO-7332

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2022-12-23 10:55:57 +01:00
Dominik Dabek
2d34f00b3e Prepare for pool buffer enabling 3/n
Add per platform config
Reorder checks in allocateBufferFromPool

Related-To: NEO-7332

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2022-12-16 10:47:34 +01:00
Warchulski, Jaroslaw
be647d42d9 Cleanup includes 12
Related-To: NEO-5548
Signed-off-by: Warchulski, Jaroslaw <jaroslaw.warchulski@intel.com>
2022-12-07 13:14:15 +01:00
Dominik Dabek
70dbce12d1 Prepare for pool buffer enabling 1/n
check if flags allow buffer from pool
add buffer offset to aubtests
disable pool buffer where required

Related-To: NEO-7332

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2022-12-05 23:51:30 +01:00
Lukasz Jobczyk
7c572b4090 Do not free SVM alloc under SVM manager lock
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
2022-12-02 12:36:10 +01:00
Dominik Dabek
67bfebb25e Add additional create buffer arguments
Allow to: disable performance hints, make allocation lockable

Used in BufferPoolAllocator

Related-To: NEO-7332

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2022-11-21 16:19:53 +01:00
Dominik Dabek
d1a6054af9 enable create subBuffer from pooled buffer
Allow creating subBuffer from buffer from buffer pool allocator
by redirecting the call to the pool buffer and adjusting offset

Related-To: NEO-7332

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2022-10-19 09:33:10 +02:00
Dominik Dabek
e151bc6e2d [OCL] Flag for allocating small buffers from pool
Improves performance in workloads that create small opencl buffers.

To enable, set env var ExperimentalSmallBufferPoolAllocator=1

Known issues (will be addressed in further commits):
- cannot create subBuffer from such buffer
- pool buffer allocation should be reused

Related-To: NEO-7332

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2022-10-14 12:18:42 +02:00
Kamil Kopryk
d4d54f5093 Cleanup includes
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
2022-07-25 09:58:38 +02:00
Sebastian Luzynski
b8cf0c757a Notify gtpin onCommandBufferComplete
Notify gtpin onContextDestroy before SVM Allocations are deleted.

Resolves: NEO-6985

Signed-off-by: Sebastian Luzynski <sebastian.jozef.luzynski@intel.com>
2022-05-20 16:42:13 +02:00
Artur Harasimiuk
3f04769f07 style: configure readability-identifier-naming.FunctionCase
Signed-off-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
2022-05-17 20:55:56 +02:00
Artur Harasimiuk
e9be9b64c6 clang-tidy configuration cleanup
Define single .clang-tidy configuration with all used checks and use
NOLINT to selectively silence tool. That way cleanup should be easier.
third_part/ has its own configuration that disables clang-tidy for this
folder.

Signed-off-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
2022-05-11 14:02:04 +02:00
Dominik Dabek
8d1ad5a4f3 Refactor: use stack vector for root device indices
Stack vector will not cause dynamic allocations in most circumstances
ie. number of root device indices not more than 16

Related-To: NEO-6837

Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
2022-04-14 14:05:42 +02:00
Katarzyna Cencelewska
a06fbd2077 Remove device enqueue part 10
- remove DeviceQueue

Related-To: NEO-6559
Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>
2022-01-19 17:41:06 +01:00
Katarzyna Cencelewska
d2818aaea2 Remove device enqueue part 5
-remove scheduler and builtin_kernels_simulation

Related-To: NEO-6559
Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>
2022-01-13 14:15:26 +01:00