Related-To: NEO-7237
Enable copy on cpu by default.
This commit also changes barrierCounter to bool
barrierCalled
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
Related-To: NEO-7237
If size is small enough, it is more efficient to
perform copy through locked ptr on CPU.
This change also introduces experimental flag to
enable this.
Signed-off-by: Szymon Morek <szymon.morek@intel.com>
Another step towards cleaner callers of
StateBaseAddressHelper<>::programStateBaseAddress.
Export programming state base address into a separate function to
improve code reuse and reduce copy-pasted fragments, which make code
modifications or maintenance more and more difficult over time. Use
specialization for gen-specific variations.
Related-To: NEO-6774
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
Related-To: NEO-6075
Binding table entry count was zeroed even when
ForceBtpPrefetchMode debug flag was enabled
It was only needed for LOAD_BALANCED scenarios, so with recent disabling
of this feature in KMD, it is no longer required.
Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
When header is included for the first time in translation unit,
then preprocessor simply copy-pastes its content. If we define a
constant in a header file and this constant has internal linkage
then each and every translation unit, which includes this header
will have its own copy of this constant.
C++17 introduces inline variables, which are meant to allow creation
of variables in header files, which do not cause multiple instances.
The inline variable has a single instance when:
- constexpr is used without static (constexpr implicitly implies inline)
- inline is used without static
- inline const is used without static (const does not imply internal linkage
when used with inline)
Signed-off-by: Patryk Wrobel <patryk.wrobel@intel.com>
This change is a baseline for tight control over
when dispatch pipeline state commands and which
pipeline state properties can be changed for a
given hardware platform
Related-To: NEO-5019
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Certain platforms might not require prefetcher to
be disabled in direct submission. This change
provides a way to control that behaviour.
Signed-off-by: Rafal Maziejuk <rafal.maziejuk@intel.com>
Related-To: NEO-7218
This commit adds support for ZEX_NUMBER_OF_CCS flag which can be used
for limiting number of CCS engines
Format is as follows:
ZEX_NUMBER_OF_CCS=RootDeviceIndex:NumberOfCCS;RootDeviceIndex:NumberOfCCS...
i.e. setting Root Device Index 0 to 4 CCS, and Root Device Index 1 To 1 CCS
ZEX_NUMBER_OF_CCS=0:4,1:1
Related-To: NEO-7195
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
The default TG dispatch size can be changed
to a better value based on number of threads in TG or
currently available amount of threads on GPU.
Decision on what TG dispatch size should be are based on
implemented heuristics.
Signed-off-by: Rafal Maziejuk <rafal.maziejuk@intel.com>
Related-To: NEO-6989
Use cl_intel_subgroup_matrix_multiply_accumulate in place
of previous cl_intel_subgroup_matrix_multiply_accumulate_for_PVC
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>