to guarantee that all subblt got complete for previous copy
affect xe hpg
Related-To: NEO-7450
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- This change gets level one cache policy from cached values instead
of calling virtual methods
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- for DG2 platforms it is valid only for G10/G11/G12
- for MTL platforms it is valid only for 12.70.0 and 12.71.0
Additionally:
- setup default hw ip version for each platform
- merge dg2 specific product helper tests to single file
Related-To: HSD-14015808183, HSD-14015812625, HSD-14016015202
Related-To: HSD-14015812559, HSD-14015816823
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
This is small optimization to replace virtual call and retrieved struct with
cached value.
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Do not make indirect allocations resident if kernel does not use
indirect access.
Enable for both level zero and opencl.
Related-To: NEO-7712
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
This refactor simplifies internal interfaces and implementations for
different platforms. The change is introduction into adapting
state base address helper class for all functional needs of the driver.
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Sizing context (PVC):
When using LargeGRF (a.k.a GRF256) there are only 4 HW threads per EU
(instead of default 8). Together with SIMD16 that means that there can
be max 64 work-items per EU. With 8 EU per subslice this gives 512
work-items on a single subslice. For correct intra-WG synchronization
all its WIs must be executed on the same subslice (to access the same
SLM, where the synchronization primitives are stored). Thus, with SIMD16
and LargeGRF the work-group size must not exceed 512 (PVC example).
So far `maxWorkGroupSize` is taken solely from a DeviceInfo structure
both in `ModuleTranslationUnit::processUnpackedBinary()` and
`ModuleImp::initialize()`. This method does not take kernel parameters
(LargeGRF) into account. It allows to submit a kernel using LargeGRF
with SIMD16 with the work-group size set to 1024. That leads to a hang.
Fix the `.maxWorkGroupSize` computation so that it takes the kernel
parameters into consideration.
Add new (for discrete platforms >= XeHP) and adapt existing tests, fix
cosmetics by the way.
Similar check for OCL:
https://github.com/intel/compute-runtime/blob/master/opencl/source/comma
nd_queue/enqueue_kernel.h#L130
Related-To: NEO-7684
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
This commit switches the device ID logic from the deprecated
to the new one, so that if the user passes a hex value to the -device
parameter, ocloc will use the new implementation in the product config
helper. The change also introduces a fix for setting the values in the
correct order to configure the hwIfno correctly.
Signed-off-by: Daria Hinz daria.hinz@intel.com
Related-To: NEO-7487
This commit is to introduce optimizations in ocloc when building
targets for release and family.
Instead of building fatbinary after all available targets in
the RTL ID table, we introduce optimizations when there is an
acronym available for the platform in the DEVICE table,
we limit to them only.
Signed-off-by: Daria Hinz <daria.hinz@intel.com>
Related-To: NEO-7582