- check releaseHelper support when selecting bindless mode, if not
disabled, prefer bindless mode in L0 API
- bindless mode can be forced with DebugVariable: UseBindlessMode
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Use debug flag PrintKernelDispatchParameters to print params used in
thread group dispatch size heuristic when encoding kernel dispatch.
Related-To: NEO-6989
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
- SPECIAL_SSH is used for debug surface SurfaceState which must be
located at bindless offset zero
- limit size of external front window
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
- program global bindless ssba when external allocator used (
UseExternalAllocatorForSshAndDsh)
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
denorm support is controlled by IGC, we should just set zero by default
Related-To: NEO-8059
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- allow bindless kernels to execute
- bindless addressing kernels are using private heaps mode
- do not differentiate bindful and bindless surface state base addresses
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
denorm support is controlled by IGC, we should just set zero by default
Related-To: NEO-8059
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
Related-To: NEO-6075
Ngen binaries contain stateful information, however they are
not used in isa on Pvc. Therefore, we can just ignore them.
- getGlobalBindlessHeapConfiguration() should be used to choose global
alloctor for SSH
- remove not needed and incorrect unit tests
- remove not needed branches
- bindless mode controls bindless compilation only
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- when debugging is enabled, assert() in gpu kernel will trigger SW
exception
Related-To: NEO-5753
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Related-To: LOCI-4332
- Signal non-timestamp Walkers with in-order CL value
- Event host synchronization based on CL signal value
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
adjust thread group dispatch size on pvc if chosen size does not evenly
divide dimension
this is to avoid leftover thread groups
Related-To: NEO-7927
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
ensure default hw ip version matches the value from helper
change pvc ult execution to revision 3
Related-To: NEO-7738
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
ensure default hw ip version matches the value from helper
change pvc ult execution to revision 3
Related-To: NEO-7738
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
This change is part of performance feature to start command list batch buffers
as primary.
Implicit Scaling sometimes require to jump over control section and these jumps
must maintain the same level of batch buffer as the whole command list.
Related-To: NEO-7807
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- reduce number of dummy blits where are not needed
- track if dummy blit required in cmdlist
Related-To: NEO-7450
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- share same code between csr and cmd container to get default heap size
- share handling of debug flag to change heap size
- share platform level surface heap size between csr and command list
- refactor heap size files
- put heap size constant and function into namespace
- command list surface heap size increased to 2MB for xehp+ to match csr
- command list increased surface heap size only for sba tracking
- sba tracking heap consumption increased due to different reset policy
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
to guarantee that all subblt got complete for previous copy
affect xe hpg
temporary changes under flag ForceDummyBlitWa
Related-To: NEO-7450
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- This change gets level one cache policy from cached values instead
of calling virtual methods
Related-To: NEO-5055
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>