Enable eviction of CPU side USM allocation for UMD migrations on Windows.
Related-To: NEO-8015
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Enable eviction of CPU side USM allocation for UMD migrations on Windows.
Related-To: NEO-8015
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
Related-To: LOCI-4615
- Added Support for users to set ZE_FLAT_DEVICE_HIERARCHY to either FLAT
or COMPOSITE to change how devices are returned in zeDeviceGet and
clGetDeviceIDs.
- COMPOSITE is default behavior that exists today.
- FLAT returns all sub devices which have no sub devices and all root
devices that have no sub devices in zeDeviceGet ie with all devices
flattened out in order.
- Added zeDeviceGetRootDevice for one to retrieve the Root Device for
any SubDevice.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
By default prefer allocating memory first by KMD, instead of malloc first.
By default prefer not caching allocations on MTL devices. This results
in allocations being handled with non-coherent pat index.
For integrated devices when caching is not preferred do not allow
direct memory access in CPU domain. For map/unmap operations create
a dedicated memory allocation for CPU access, instead of accessing it
directly, reusing the same logic as when mapping/unmapping local memory.
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
If waitForBarrier is not passed outEvent then do
dcFlush on the next synchronize call.
Related-To: NEO-8147
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
Instances returned by `getAllocationsVector()` in some cases cannot be
freed (in the `malloc/new` sense) until the `drain()` function invokes
`allocInUse()` on them. Plus, the `chunksToFree` container operates on
pairs `{offset, size}`, not pointers, so such pair cannot be used to
release allocations either.
Provide an optional callback, which can be implemented by the custom
pool derived from `AbstractBuffersPool`. This callback can be used, for
example, to perform actual release of an allocation related to the
currently processed chunk.
Additionally, provide the `drain()` and `tryFreeFromPoolBuffer()`
functions with pool-independent versions and keep the previous versions
as defaults (for allocators with a single pool). The new versions allow
reusing the code for cases when allocator has multiple pools.
In both cases, there was no such needs so far but it arose when working
on `IsaBuffersAllocator`. The latter is coming with future commits, but
the shared code modifications are extracted as an independent step.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
If workgroup dimension x is 1, use y to ajust for divisible by dispatch
size.
Related-To: NEO-7927, GSD-5417
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
The new classes SysmanKmdInterface, SysmanKmdInterfaceI915 and
SysmanKmdInterfaceXe have been introduced.
A map is maintained in the SysmanKmdInterfaceI915 and
SysmanKmdInterfaceXe class for the sysfs file names.
The access specifier of the function getDrmVersion has been changed from
protected to public so as to use it in the sysman code. This is required
for the SysmanKmdInterface pointer to point to the
SysmanKmdInterfaceI915 and SysmanKmdInterfaceXe accordingly.
The ULTs have been added for the new sysfs file path corresponding to
the i915 and the Xe driver.
Related-To: LOCI-4399
Signed-off-by: Bari, Pratik <pratik.bari@intel.com>
Added if conditions to enable useChunking flag by checking
with ExecutionEnvironment::isDebuggingEnabled.
Related-To: NEO-8164
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
- Preserve releases on CMake level.
- Instead of generating builtins per platform, generate them per-release
(+ correct naming accordingly).
- Stop using revisions in builtin compilation logic path, as they are
already embedded in release (device ip).
- Remove platform names & revisions from names for generated files
(related to builtins).
- Remove unnecessary code, refactor ULT logic.
Related-To: NEO-7783
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
The previous name "pageSize2Mb" defined in
shared/source/helpers/constant.h is inconsistent to other variable,
i.e. pageSize64k.
Furthermore, it's a bit misleading because the page size is defined in
Megabytes (MB), not in Megabits (Mb).
Related-to: NEO-7695
Signed-off-by: Young Jin Yoon <young.jin.yoon@intel.com>
Related-to: NEO-7695
New debug keys added:
EnableBOChunking is now a mask
0 = no chunking (default).
1 = shared allocations only
2 = device allocations only
3 = shared and device allocations
MinimalAllocationSizeForChunking sets the minimum allocation
size to apply chunking. Default is 2MB.
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
- also use helper when checking that is simd1 to have same flow
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- Adobe Photoshop
- Adobe Premiere Pro
- Adobe After Effect
use RCS as a default engine
Related-To: NEO-8049
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- program global bindless ssba when external allocator used (
UseExternalAllocatorForSshAndDsh)
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- all bos from Module must have requireImmediateBinding
flag set
- this change fixes hang in debugger - where MODULE LOAD event
was not sent
Resolves: NEO-8121
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Instances returned by `getAllocationsVector()` in some cases cannot be
freed (in the `malloc/new` sense) until the `drain()` function invokes
`allocInUse()` on them. Plus, the `chunksToFree` container operates on
pairs `{offset, size}`, not pointers, so such pair cannot be used to
release allocations either.
Provide an optional callback, which can be implemented by the custom
pool derived from `AbstractBuffersPool`. This callback can be used, for
example, to perform actual release of an allocation related to the
currently processed chunk.
Additionally, provide the `drain()` and `tryFreeFromPoolBuffer()`
functions with pool-independent versions and keep the previous versions
as defaults (for allocators with a single pool). The new versions allow
reusing the code for cases when allocator has multiple pools.
In both cases, there was no such needs so far but it arose when working
on `IsaBuffersAllocator`. The latter is coming with future commits, but
the shared code modifications are extracted as an independent step.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
Due to this flag was not properly handled on Windows, command buffer
allocations were never reused in immediate command lists in case of
host secondary buffers. This lead to huge host memory consumption
and performance degradation
Related-To: NEO-8072
Signed-off-by: Igor Venevtsev <igor.venevtsev@intel.com>
- use calculateNumThreadsPerThreadGroup instead of getThreadsPerWG to
have same flow and proper values of threads per work groups
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- global stateless mode should save surface state base address
- correctly retrieve scratch offset for front end programming
- do not override general base address value and use indirect heap property
Related-To: NEO-7808
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
- use calculateNumThreadsPerThreadGroup instead of getThreadsPerWG to
have same flow and proper values of threads per work groups
Related-To: NEO-8087
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
With direct submission disabled this resulted in waitTimeout long enough
that kmdWait fallback was rarely used.
This caused more CPU spin time.
Related-To: GSD-3612
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
When upstream ioctl helper is created it will try to create small
allocation, adding I915_GEM_CREATE_EXT_SET_PAT extension. If it
succeeds, for all resources with valid pat index value it will then
explicitly program pat index value with gem create ext call.
PrintBOCreateDestroyResult value can be used to:
- print whether the set pat extension is supported by the kernel, when
ioctl helper is created
- print whether set pat extension was added for a given gem create ext
call and what pat index value was programmed
Resolves: NEO-7896
Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>
For image view mapped directly to UV plane,
the dimensions should 2 times smaller than
dimensions of the source image.
(1 raw UV pair maps to 2x2 block of original image)
Related-To: NEO-7936
Signed-off-by: Jaroslaw Chodor <jaroslaw.chodor@intel.com>
- Abstracts product helpers logic for uuid
- Add UUID retrieval for Linux for Sysman via zesInit path
Related-To: LOCI-4137
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@intel.com>
- remove useless flag ForceNumberOfThreadsInGpgpuThreadGroup
- add new flag "RemoveRestrictionsOnNumberOfThreadsInGpgpuThreadGroup"
to restore old path without restrictions about number of threads in
thread group
- fix forwarding information about hw local ids generations to
calculate numOfThreadsInThreadGroup correctly
Related-To: NEO-7952, NEO-7982
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
denorm support is controlled by IGC, we should just set zero by default
Related-To: NEO-8059
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- Value correction: IntelGTSectionType::ProductConfig to 6, add new type
IntelGTSectionType::vISAAbiVersion = 5 - currently ignored by the
runtime
- For zebin manipulator: allow to extract PRODUCT_FAMILY from AOT
productConfig - required by IGA wrapper for binary encoding/decoding +
add tests
- Bump ZEInfo version to the latest: 1.32
Related-To: IGC-6300
Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>
The Memory Info object is used in the getState function for memory.
Some of the ULTS in the memory modules has been modified.
A function to return the sysfs nodes for the Memory address range has
been added in the IoctlHelper class corresponding to the XE and i915
driver.
Related-To: LOCI-4397
Signed-off-by: Bari, Pratik <pratik.bari@intel.com>
Set no preference for KMD migrated shared allocation and rely on ACC to migrate.
Related-To: NEO-6839
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
- allow bindless kernels to execute
- bindless addressing kernels are using private heaps mode
- do not differentiate bindful and bindless surface state base addresses
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- by default ZE_ENABLE_PCI_ID_DEVICE_ORDER is disabled
- by default devices are sorted by type (discrete first), then by pci order
- when ZE_ENABLE_PCI_ID_DEVICE_ORDER is enabled, devices are sorted by pci id
Related-To: LOCI-4520
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Use flushStamp=taskCount when passed flushStamp==0.
This will cause driver to busy wait for a short while before falling
back to use kmd notify.
Related-To: GSD-3612
Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>
When creating shared USM, currently default root device index
is used when accessing memoryManager.
This change fixes this issue, by using device provided by caller.
In case device is not provided, then default root device index
could be used.
Related-To: LOCI-4474
Signed-off-by: Jitendra Sharma <jitendra.sharma@intel.com>
denorm support is controlled by IGC, we should just set zero by default
Related-To: NEO-8059
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Currently the whole code resides within the opencl/ tree, but the
mechanism is meant to be reused in L0 for kernel-ISA allocations
optimization (further work).
This commit is a preparation step, which extracts the generic mechanism
and moves the extracted part under the shared/ tree.
Related-To: NEO-7788
Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>
remove method to removing duplicates from StackVec as the method
implicitly sorted the vector
Related-To: GSD-4692
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
Related-To: NEO-6075
Ngen binaries contain stateful information, however they are
not used in isa on Pvc. Therefore, we can just ignore them.
When receiving EAGAIN, UMD should retry the submission again.
On scenarios where KMD runs out of ring-space, an error
should be propagated up to the user. Those scenarios are
more rare, but still we will need KMD support in the future
to differentiate those two cases: retry and propagate error to
user, since right now, KMD returns the same error code.
Related-To: NEO-8040
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Adding properties to selectively copy properties for surface state,
dynamic state and binding table base addresses.
Related-To: NEO-7808
Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>
Related-To: LOCI-4381
- Enabled support for customers to use full Virtual reservation range
with multiple physical mappings with additional allocations implicitly
included in residency.
- Buffer Surface state size extended for first allocation to stretch to
the bufferSize requested.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
- getGlobalBindlessHeapConfiguration() should be used to choose global
alloctor for SSH
- remove not needed and incorrect unit tests
- remove not needed branches
- bindless mode controls bindless compilation only
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Read if support for chunking is available in the KMD.
If available, KMD will create a BO with 1 or more chunks,
depending on the chunk size selected.
Related-To: NEO-7695
Sync to
https://github.com/intel-gpu/drm-uapi-helper/releases/tag/v2.0-rc18
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Signed-off-by: John Falkowski <john.falkowski@intel.com>
On Warm reset, With default bar size set by bios, VF bar
allocation is getting failed because of bug in pci driver
which impacts SRIOV functionality.
Resize VF bar size for succesful allocation of VF bar
post warm reset.
Related-To: LOCI-4481
Signed-off-by: Bellekallu Rajkiran <bellekallu.rajkiran@intel.com>
- when debugging is enabled, assert() in gpu kernel will trigger SW
exception
Related-To: NEO-5753
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- new debug key EnableDeviceStateVerification to check device state not
ony in debug mode
Related-To: NEO-7669
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- when bindless adressing is set in zeinfo, kernel attributes should
reflect that.
Related-To: NEO-7063
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
- move isCachingOnCpuAvailable to product helper
- isCachingOnCpuAvailable should return false on mtl
- if wsl, skip checking method from product helper
Related-To: NEO-7194
Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>
- IGC stores invalid string index at the end of printf buffer,
invalid index is equal to (int32_t)-1 extended to (char*)
- printKernelOutput() should not parse buffer after invalid index was
found to avoid invalid string pointer dereference
Related-To: NEO-5753
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Have makeResident return error to the caller, instead of always
SUCCESS. This will allow interfaces like zeContextMakeMemoryResident
to fail properly.
Additionally, change the parsing of MemoryOperationsStatus from
ZE_RESULT_ERROR_OUT_OF_HOST_MEMORY to
ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY, since when making resources
resident, it is the device running out of memory, instead of the
host.
Related-To: LOCI-4443
Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Related-To: LOCI-4176
- Given a Base Pointer passed into Get Peer Allocation, then the base
pointer is used in the map of the new allocation to the virtual memory.
- Enables users to use the same pointer for all devices in Peer To Peer.
- Currently unsupported on reserved memory due to mapped and exec
resiedency of Virtual addresses.
Signed-off-by: Neil R Spruit <neil.r.spruit@intel.com>
- vm_bind with user fence updates fence value independently for every
VM hence with per-context VMs, every context needs its unique fence
address. This prevents 2 contexts from updating value possibly
writing lower value than the one that was already stored
Resolves: NEO-8004
Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>
Get ipVersion from productHelper function on xe and upstream.
On prelim first try to query ipVersion from kmd,
if it fails, get ipVersion from productHelper function.
Related-To: NEO-7786
Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>
This fixes illegal memory accesses by the job submitted to the GuC.
Also some unit tests are added to harness the vmBind operation.
Related-To: NEO-7996
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
when flag disabled, gmm flag Cacheable won't set on xe_hp and later
Related-To: NEO-7194
Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>
- add debug flag EnableCpuCacheForResources to be able to allow coherency when
resources could be cacheable
Resolves: NEO-7194
Signed-off-by: Katarzyna Cencelewska <katarzyna.cencelewska@intel.com>
This commit keeps KMD migration still disabled by default on PVC platform.
Related-To: NEO-6465
Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>
This patch add new environment variables to control compiler cache.
Works as follow: If persistent cache is set driver check if NEO_CACHE_DIR
is set. If not then driver checks XDG_CACHE_HOME - If exists
then driver create neo_compiler_cache folder, if
not then driver checks HOME directory. If each NEO_CACHE_DIR,
XDG_CACHE_HOME and HOME are not set then compiler cache is disabled.
Current support is for Linux only.
Signed-off-by: Diedrich, Kamil <kamil.diedrich@intel.com>
Related-To: NEO-4262
- allocateGraphicsMemoryUsingKmdAndMapItToCpuVA in case of no compression
- allocate32BitGraphicsMemoryImpl in case of allocate by KMD
remove redundant ctor of StorageInfo class
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>