Add flag for forcing execution of kernels on single tile
Force cooperative kernels to use only single tile
Related-to: NEO-6729
Signed-off-by: Naklicki, Mateusz <mateusz.naklicki@intel.com>
- cases with null lws should only fail when computed
lws sizes result in too big number of workgroups
Related-To: NEO-6976
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Introduce debug variable to control which engines
the tranfser will be split into
Related-To: NEO-7173
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
optimization available under flag
ForceCsrLockInBcsEnqueueOnlyForGpgpuSubmission
Related-To: NEO-7011
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
Previous fix was causing the runtime to get buffer size
without gfx allocation, causing a seg fault.
This commit moves the fix logic to enqueue handler,
only changing the enqueueProperties.
Related-To: NEO-6948
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
This change introduces detection of GPU hangs in flushBcsTask()
function. The new code has been covered with ULTs.
Related-To: NEO-6681
Signed-off-by: Patryk Wrobel <patryk.wrobel@intel.com>
This change introduces detection of GPU hangs in blocking
calls to enqueueHandler() function. Moreover, usages of
this function template have been revised and adjusted to
check the exit code. Furthermore, enqueueBlit() and
dispatchBcsOrGpgpuEnqueue() functions returns value now.
ULTs have been added to cover new cases.
Signed-off-by: Patryk Wrobel <patryk.wrobel@intel.com>
Related-To: NEO-6681
In constructor of CommandComputeKernel we had been doing multiple allocations
of memory on heap due to lack of call to std::vector copy-constructor or reserve
member function.
Furthermore, in production code there is only one place, where we create objects
of this type and we redundantly copy the local variable, which could be moved.
This change:
- ensures that constructor of CommandComputeKernel performs single allocation
in the worst case; in the best case, it does not allocate memory due to usage
of std::move on input parameter
- steals the memory of the local variable in place of usage of the constructor
to remove redundant copying and memory allocations
- uses reserve() method to reduce the number of allocations during creation
of this local variable
Signed-off-by: Patryk Wrobel <patryk.wrobel@intel.com>
-program barrier after global fence allocation is programmed
-do not double barrier timestamp in blit enqueue
-flush GPGPU while submitting to BCS when barrier requested
Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>