Read indirect_stateless_count in module external functions.
If greater than 0, mark all kernels that have the has_stack_calls flag
set from this module as having indirect accesses.
Related-To: NEO-7712
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Adds a per-kernel and per-device query to determine the
number of GRF registers that a kernel was compiled for.
This is an informal query for now, but may be added to
a formally supported extension in the future.
Related-To: NEO-9807
Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
Program barrier to task stream, before next enqueue kernel.
This will reduce the number of batch buffer starts for sequences of
enqueue, barrier, enqueue, ... .
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Program barrier immediately to task stream.
This will reduce the number of batch buffer starts.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Allows the user to use alignments > 64KB in `createUnifiedMemoryAllocation`
So that the restriction in `piextUSMDeviceAlloc` of the DPC++ runtime
could be lifted
Related-To: LOCI-4168
Signed-off-by: Lu, Wenbin <wenbin.lu@intel.com>
Unit tests should not write output to the console.
Instead, every output should be captured.
Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>
- For static create() method for Kernel and MultiDeviceKernel force errcodeRet
parameter to be passed via reference (instead of a pointer)
- Move part of kernel's creation logic to initialize() method
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
When state base address tracking is enabled and command list use private heaps
then command list at destroy time must calls all compute CSRs that were using
that heap to invalidate state caches.
This allows new command list to reuse the same heap allocation for different
surface states, so before new use cached states are invalidated.
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Allows the user to use alignments > 64KB in `createUnifiedMemoryAllocation`
So that the restriction in `piextUSMDeviceAlloc` of the DPC++ runtime
could be lifted
Related-To: LOCI-4168
Signed-off-by: Lu, Wenbin <wenbin.lu@intel.com>
Do not make indirect allocations resident if kernel does not use
indirect access.
For both level zero and opencl.
Currently disabled by default, enable with debug flag
DetectIndirectAccessInKernel
Related-To: NEO-7712
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- program using device must be released before device is destroyed.
Reset program unique_ptr in fixtures before device fixture tearDown()
- reorder fixtures members to ensure correct order of destructors
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Do not make indirect allocations resident if kernel does not use
indirect access.
Enable for both level zero and opencl.
Related-To: NEO-7712
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
This change allows IGC to not pass enqueued_local_size argument (i.e.
when it is known). Removed expects in NEO ULTs were blocking
corresponding IGC change.
Related-To: NEO-6322
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>