- For calculating number of threads per workgroup, for SIMD 1, return
local work size (each software thread should be mapped into a whole hardware
thread).
- Correct logic of calculating space for per thread data for SIMD 1.
- Minor: unit tests refactor.
- Corrected naming.
Related-To: NEO-8261
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
Add support for different timestamp packet counts per gfx family.
Change all packet counts to 1 except for xe-hpc.
Related-To: NEO-8154
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
CmdList can be released before Event. In this case, GfxAllocation
destruction must be deferred.
Related-To: NEO-7966
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
- this change handles level zero immediate command lists on copy engine
- monitor fence will be dispatched for blocking calls
- asynchronous mode will dispatch monitor fence only on host synchronization
Related-To: NEO-8395
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
In L0 its not possible to track objects relations. For example CmdList
may be removed before Event.
In such case, Event needs to safely skip unregister call, without
accessing CmdList/CmdQueue object.
Related-To: NEO-8884
Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
value depends on CCS count:
- single CCS mode (default) - 50% available
- two CCS mode - 25% available
- four CCS mode - 12.5% available
Related-To: NEO-8377
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- when Surface State is reused for new resource, State Cache needs to be
invalidated
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
So far, there is a separate page allocated for each kernel's ISA within
`KernelImmutableData::initialize()`. Apparently the ISA blocks are often
much smaller than a 64k page, which leads to poor memory utilization and
was even observed to cause the device OOM error if a single module has
several keys.
Improve the situation by reusing the parent allocation (owned by the
module instance) for modules, which kernel ISAs can fit together within
a single 64k page. This improves the memory utilization on a single
module level.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
- this method allocates System Memory
- argument is not needed - ExternalHeap is selected inside this function
- remove unneeded ults
- allocate memory in Device Pool for external heap allocation in
OsAgnosticMemoryManager
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
value depends on CCS count:
- single CCS mode (default) - no limitations
- two CCS mode - 25% available
- four CCS mode - 12.5% available
Related-To: NEO-8377
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>