- Command Stream Receiver should be used instead for locking.
- Remove not needed synchronization in clSetUserEventStatus
Change-Id: I17050dc70cb0be03b2003043a9666ba8df1a83c9
- Mark Kernel for aux translation
- Initial implementation of dispatchAuxTranslation for future use
Change-Id: Ifca1c9a893876eecc5678cdc4f564b2bfcae959a
This commit adds a capability to selectively enable/disable AUB capture,
i.e. by toggling the registry key from the outside or specifying the filter
with a kernel name and/or kernel start index and kernel end index.
Change-Id: Ib5d39c21863fbc4a95aa73c949b9779ff993de0f
- This allows applications to force the N:1 aggregation by creating out
of order queue.
- That switches csr to N:1 submission model where commands from multiple
command streams may be aggregated.
- That forces scenarios returning an event to be aggregated as well.
Change-Id: I8fd8d7f88bb2665234ee90870133120b206710a8
We don't need BuiltinDispatchInfoBuilder in every place where built ins
are used. specifically in .cpp files generated from kernel binary.
Change-Id: Ie739951cdc93873993f78ad14cee656122af51fd
Signed-off-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
When inheriting task count from parent events,
don't take into account externally synchronized events
Change-Id: I52d861e482669a18e2aca499c813716bb4951b74
- change the way we handle blocked commands.
- instead of allocating CPU pointer and populating it with commands, create
real IndirectHeap that may be later submitted to the GPU
- that removes a lot of copy operations that were happening on submit time
- for device enqueue, this requires dsh & shh to be passed directly to the
underlying commands, in that scenario device queue buffers are not used
Change-Id: I1124a8edbb46777ea7f7d3a5946f302e7fdf9665
- This removes Instruction Heap allocation from enqueue path
- Blocked path is handled as well
- Heap is no longer allocated on demand it is bind to kernelInfo.
Change-Id: I54545beceed3404ee0330a8bac2b0934944cac30
- Switch to internal heap for kernel ISA allocations.
- remove IH from various functions
- remove IHState from CSR , IH is never dirty
- ISA is no longer copied on enqueue calls.
Change-Id: I0099cf2a9ebab6192ea03a74dd35f7da963fd5a5
- Make sure that blocks ISA is made resident
- both blocked & non blocked path
- fix a bug where private surface was not made resident in blocked path.
Change-Id: Ie564595b176b94ecc7c79d7efeae20598c5874fb
-This is to make sure those functions are not called when gtpin is not used
-This preserves CPU instruction cache pollution.
-Our enqueue path needs to be as thin as possible, even with this small change
there is visible gain in ULT execution time.
Change-Id: I44cc2144754cda95ca1fe058184cd8a151b8d35c
- KmdNotifyProperties struct for CapabilityTable that can be extended by
incoming KmdNotify related optimizations
- Quick KMD sleep optimization that is called from async events handler
- Optimization makes a taskCount check in busy loop with much smaller
delay than basic version of KMD Notify optimization
Change-Id: I60c851c59895f0cf9de1e1f21e755a8b4c2fe900
- read only memory cannot be used for allocation,
Oses cannot create graphics alocation for such memory
- if memory allocation fails for host_ptr passed
to enqueueWrite calls, then try doing new allocation
and copy host_ptr on cpu
Change-Id: I415a4673ae1319ea8f77e53bd8fba7489fe85218
- Due to use cases where one shared buffer may be mapped to multiple CL
buffers we need to flush DC between enqueues.
Change-Id: I05d7f844afe31d52a0004f5e2e5efa776f9dadbe
-For in order queue application can have fine grain granularity of completion
-For out of order queue application wants to execute workloads concurrently
-This change disables pipe control nooping for ioq calls when event returned.
Change-Id: Iaeaf677f768f7434b2efa1842b50653ab80777ad
- This change enabled multiple independent command queues to execute
concurrently without stalling pipe controls in between
- This change removes L3 flushes between kernels
- Dependencies between commands are resolved via task level mechanism
- Out of order queues are not changing task level between submissions
- In order queues are increasing task level between submissions
- Whenever task level changes there is pipe control with cs stall emitted
between GPGPU_WALKERs
Change-Id: I558653b296424e4775d060df3072e2a50684b715
- Call waitForTaskCountAndCleanAllocationList with latest flushed task count
to reflect what was actually sent to HW.
- refactor cleanAllocationList to waitForTaskCountAndCleanAllocationList
Change-Id: I5301185c5fce212e39eb017b952b43c279559cf4