Each host function gets its unique ID within a CSR,
uses 1 mi store to write ID - to signal that host function is ready,
and 1 mi semaphore wait will wait for the ID to be cleared,
Use 0th bit from ID as pending/completed flag,
host function ID is incremented by 2, and starts with 1.
So each ID will always have 0bit set.
This is a must have since semaphore wait can wait for 4 bytes only.
Adjust command buffer programming and patching logic to IDs.
Add hostFunction callable class - using invoke method,
which stores required information about callback.
Add host function streamer - stores all host function data
for a given CSR.
All user provided host functions are stored in unordered map,
where key is host function ID.
Add host function scheduler, and a thread pool - under debug flag
Single threaded scheduler loops over all registered host function streamers,
dispatch ready to execute host functions to thread pool.
Allow for out of order host functions execution for OOQ - under debug flag,
each host function has bool isInOrder flag which indicates if it can be
executed Out Of Order - in this mode, ID tag will be cleared immediately,
so semaphore wait will unblock before the host function execution.
Remove Host Function worker CV and atomics based implementation.
Rename classes
Related-To: NEO-14577
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
If enabled, usm pools will allocated on first usm allocation.
Use by default in ULTs to avoid not needed allocations of pool storage.
Related-To: NEO-16084
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
calculate the lowest and highest software priority in one place
Related-To: HSD-18043767497
Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>
* add common host function worker interface
* add worker as a single thread per csr with 3 modes
* add logic for waiting on internal tag, check gpu hang
* if tag is in pending state, read callback data, run callback
and signal completion
* threads will exit the work loop once stop request
is called in finish
* add multi thread unit tests
Related-To: NEO-14577
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
Initially under debug flag.
Track residency of pool and chunks.
If pool is already resident or already evicted, we can skip memory
operation on chunk from pool.
Return error on using not allocated chunk in pool.
Related-To: NEO-16303
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Enhanced direct submission idle detection to ensure that
ULLS contexts are not terminated if any context in the same group
is still busy or has pending work.
Idle detection now accurately considers the state of all CSRs
in a context group before terminating any direct submission.
Controlled with DirectSubmissionControllerContextGroupIdleDetection
(note: the feature is disabled by default in first step).
Related-To: NEO-13325
Signed-off-by: Slawomir Milczarek <slawomir.milczarek@intel.com>
Related-to: NEO-15981
- Disables IPC windows support by default by setting
EnableShareableWithoutNTHandle to 0
- EnableShareableWithoutNTHandle can be set to 1 to enable
IPC windows support
- Addresses issue with potential performance issues On Windows
given shareable memory until a better soluition is found
to support IPC on Windows.
Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>