- program surface states for redescribed images correctly. Image copy
to/from memory are using redescribed surface states,
- refactor state base address programming - program address and size
together, set max size at the beginning due to lack of Enable flag
- set GpuBase in WddmAllocation when external heap is used
- return max ssh required size from kernelInfo or based on stateful args
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- also use helper when checking that is simd1 to have same flow
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- program global bindless ssba when external allocator used (
UseExternalAllocatorForSshAndDsh)
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Due to this flag was not properly handled on Windows, command buffer
allocations were never reused in immediate command lists in case of
host secondary buffers. This lead to huge host memory consumption
and performance degradation
Related-To: NEO-8072
Signed-off-by: Igor Venevtsev <igor.venevtsev@intel.com>
- use calculateNumThreadsPerThreadGroup instead of getThreadsPerWG to
have same flow and proper values of threads per work groups
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- use calculateNumThreadsPerThreadGroup instead of getThreadsPerWG to
have same flow and proper values of threads per work groups
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
denorm support is controlled by IGC, we should just set zero by default
Related-To: NEO-8059
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- allow bindless kernels to execute
- bindless addressing kernels are using private heaps mode
- do not differentiate bindful and bindless surface state base addresses
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
denorm support is controlled by IGC, we should just set zero by default
Related-To: NEO-8059
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
Related-To: NEO-6075
Ngen binaries contain stateful information, however they are
not used in isa on Pvc. Therefore, we can just ignore them.
- getGlobalBindlessHeapConfiguration() should be used to choose global
alloctor for SSH
- remove not needed and incorrect unit tests
- remove not needed branches
- bindless mode controls bindless compilation only
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- when debugging is enabled, assert() in gpu kernel will trigger SW
exception
Related-To: NEO-5753
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Related-To: LOCI-4332
- Signal non-timestamp Walkers with in-order CL value
- Event host synchronization based on CL signal value
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
adjust thread group dispatch size on pvc if chosen size does not evenly
divide dimension
this is to avoid leftover thread groups
Related-To: NEO-7927
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
This change is part of performance feature to start command list batch buffers
as primary.
Implicit Scaling sometimes require to jump over control section and these jumps
must maintain the same level of batch buffer as the whole command list.
Related-To: NEO-7807
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
This change adds space reservation in command list for returning batch buffer
start hw command.
Primary batch buffer can be run from direct submission or from KMD call and
must be aligned to required size.
Ending patch for batch buffer start must be in the last command buffer of the
command list.
Related-To: NEO-7807
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
This feature is part of performance improvement to dispatch and start
command buffers as primary batch buffers.
When exhausted command buffer is closed, then reserve exact space for chained
batch buffer start and bind it to the next command buffer.
When closing command buffer, then save ending pointer and
reserve aligned space.
Related-To: NEO-7807
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- Fix potential memleak in case ASSERT returns false and test gets
aborted
- Remove not needed function argument
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
- share same code between csr and cmd container to get default heap size
- share handling of debug flag to change heap size
- share platform level surface heap size between csr and command list
- refactor heap size files
- put heap size constant and function into namespace
- command list surface heap size increased to 2MB for xehp+ to match csr
- command list increased surface heap size only for sba tracking
- sba tracking heap consumption increased due to different reset policy
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- this change uses unified approach to reuse command buffer
- unified method takes all available space when reseting stream
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- state base address tracking allows to reuse base address state
- surface state slots can be reused after sba reload or cache flush
- to avoid cache flush after each reset, then allow to gradualy consume heaps
- only until natural heap depletion and then dispatch reload of sba state
Related-To : NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
to guarantee that all subblt got complete for previous copy
affect xe hpg
temporary changes under flag ForceDummyBlitWa
Related-To: NEO-7450
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
to guarantee that all subblt got complete for previous copy
affect xe hpg
Related-To: NEO-7450
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- This change gets level one cache policy from cached values instead
of calling virtual methods
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
This is small optimization to replace virtual call and retrieved struct with
cached value.
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- updates coming from regular list are updated in csr last sent variables
- all per context and per kernel transitions kept in single place
- state updates from intermediate to regular are set in csr properties
- global atomics support duplicates removed
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- add estimation parameter for interface descriptor data count
- add to the heap estimation alignment parameter for dynamic and surface heaps
- extend encode interface and implementations to allow child heaps
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Observed about 50MB reduction in overall binaries size (directory build))
when building all targets
with MSVC (Visual Studio 2022 17.3.0 preview 6)
using Debug 64 configuration.
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>