compute-runtime

Commit Graph

Author	SHA1	Message	Date
Mateusz Jablonski	8dd80efbb1	refactor: move getting thread per eu configs to release helper Related-To: HSD-18034098647 Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>	2023-11-21 09:44:32 +01:00
Dunajski, Bartosz	30777d4d4c	feature: use indirect semaphore for 64b values Related-To: NEO-8145 Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>	2023-11-09 16:58:45 +01:00
Compute-Runtime-Validation	fca2159430	Revert "fix: if device hierarchy is flat then getSubDevicesCount return 1u" This reverts commit `cb0bb57f49`. Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>	2023-10-26 15:40:29 +02:00
Baj, Tomasz	cb0bb57f49	fix: if device hierarchy is flat then getSubDevicesCount return 1u Related-To: NEO-9167 Signed-off-by: Baj, Tomasz <tomasz.baj@intel.com>	2023-10-25 15:51:52 +02:00
Mateusz Jablonski	6d2d16d68e	fix: avoid overflow of gpu time stamp in ns Related-To: NEO-8394 Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>	2023-10-24 14:56:15 +02:00
Filip Hazubski	08e92d154f	fix: Add getDefaultDeviceHierarchy call to GfxCoreHelper Added getDefaultDeviceHierarchy call that describes default device hierarchy for a gfx core. Refactored L0 and OCL paths to use this value by default and override this value when user sets ZE_FLAT_DEVICE_HIERARCHY environment variable or ReturnSubDevicesAsApiDevices debug key. Updated ReturnSubDevicesAsApiDevices to force COMPOSITE device hierarchy when set to 0. Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>	2023-10-06 12:32:41 +02:00
Mateusz Jablonski	a033df33ff	fix: remove preferSmallWorkgroupSizeForKernel method Related-To: HSD-18033866078 Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>	2023-09-29 11:55:09 +02:00
Mateusz Jablonski	09044dfbaa	refactor: remove not needed code Related-To: NEO-7527 Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>	2023-09-27 14:35:49 +02:00
Maciej Bielski	97e7cda912	feature: Optimize intra-module kernel ISA allocations So far, there is a separate page allocated for each kernel's ISA within `KernelImmutableData::initialize()`. Apparently the ISA blocks are often much smaller than a 64k page, which leads to poor memory utilization and was even observed to cause the device OOM error if a single module has several keys. Improve the situation by reusing the parent allocation (owned by the module instance) for modules, which kernel ISAs can fit together within a single 64k page. This improves the memory utilization on a single module level. Related-To: NEO-7788 Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>	2023-09-21 13:55:45 +02:00
Mateusz Hoppe	69f5ca6345	feature: bindless addressing - flush state cache after reusing SS slot - when Surface State is reused for new resource, State Cache needs to be invalidated Related-To: NEO-7063 Signed-off-by: Mateusz Hoppe <mateusz.hoppe@intel.com>	2023-09-20 12:53:32 +02:00
Compute-Runtime-Validation	913a926fd4	Revert "feature: Optimize intra-module kernel ISA allocations" This reverts commit `c348831470`. Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>	2023-09-19 14:16:05 +02:00
Maciej Bielski	c348831470	feature: Optimize intra-module kernel ISA allocations So far, there is a separate page allocated for each kernel's ISA within `KernelImmutableData::initialize()`. Apparently the ISA blocks are often much smaller than a 64k page, which leads to poor memory utilization and was even observed to cause the device OOM error if a single module has several keys. Improve the situation by reusing the parent allocation (owned by the module instance) for modules, which kernel ISAs can fit together within a single 64k page. This improves the memory utilization on a single module level. Related-To: NEO-7788 Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>	2023-09-19 12:05:09 +02:00
Mrozek, Michal	d9f938f3db	refactor: remove not needed code Signed-off-by: Mrozek, Michal <michal.mrozek@intel.com>	2023-09-12 14:25:04 +02:00
Jitendra Sharma	9818ef61a5	feature: Report correct GRF register count Based on Large GRF enabled or not, report correct GRF register. Related-To: NEO-6788 Signed-off-by: Jitendra Sharma <jitendra.sharma@intel.com>	2023-09-04 11:42:48 +02:00
Compute-Runtime-Validation	154530ad23	Revert "feature: Report correct GRF register count" This reverts commit `8eb3fe222e`. Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>	2023-09-01 15:12:57 +02:00
Jitendra Sharma	8eb3fe222e	feature: Report correct GRF register count Based on Large GRF enabled or not, report correct GRF register. Related-To: NEO-6788 Signed-off-by: Jitendra Sharma <jitendra.sharma@intel.com>	2023-08-31 18:48:29 +02:00
Mateusz Jablonski	27e459dfd0	fix: add missing cache flushes on MTL and later integrated GPUs hdc pipeline / untyped dataport cache flushes were applied only on discrete GPUs Related-To: GSD-5085 Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>	2023-07-28 19:30:13 +02:00
Kacper Nowak	b908203001	fix: Compile built-ins per release - Preserve releases on CMake level. - Instead of generating builtins per platform, generate them per-release (+ correct naming accordingly). - Stop using revisions in builtin compilation logic path, as they are already embedded in release (device ip). - Remove platform names & revisions from names for generated files (related to builtins). - Remove unnecessary code, refactor ULT logic. Related-To: NEO-7783 Signed-off-by: Kacper Nowak <kacper.nowak@intel.com>	2023-07-11 16:02:36 +02:00
Cencelewska, Katarzyna	0d7aefe66b	fix: Unify logic calculating threads per work group part 1 Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>	2023-06-29 10:43:22 +02:00
Cencelewska, Katarzyna	68d81c82a7	fix: Use proper value about hw local id generations - remove useless flag ForceNumberOfThreadsInGpgpuThreadGroup - add new flag "RemoveRestrictionsOnNumberOfThreadsInGpgpuThreadGroup" to restore old path without restrictions about number of threads in thread group - fix forwarding information about hw local ids generations to calculate numOfThreadsInThreadGroup correctly Related-To: NEO-7952, NEO-7982 Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>	2023-06-26 16:35:42 +02:00
Cencelewska, Katarzyna	7cb3278eb3	fix: add function to calculate number of threads per tg Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>	2023-06-13 14:02:24 +02:00
Cencelewska, Katarzyna	d2436a8231	fix: add limitations for setting gmm flag Cacheable - move isCachingOnCpuAvailable to product helper - isCachingOnCpuAvailable should return false on mtl - if wsl, skip checking method from product helper Related-To: NEO-7194 Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>	2023-05-30 17:04:57 +02:00
Mateusz Jablonski	61055478d4	fix: adjust scope of disable L3 for debug WA Related-To: HSD-1609398399 Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>	2023-05-30 14:23:16 +02:00
Filip Hazubski	d234bc970d	refactor: Move getMaxNumSamplers function to ProductHelper Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>	2023-05-18 09:25:07 +02:00
Cencelewska, Katarzyna	5f22e9eaca	fix: don't set Cacheable on xe_hp and later Related-To: NEO-7194 Signed-off-by: Cencelewska, Katarzyna <katarzyna.cencelewska@intel.com>	2023-05-18 09:17:32 +02:00
Milczarek, Slawomir	66eb1c9c0a	refactor: Add helpers to control kmd migration support on PVC platform This commit keeps KMD migration still disabled by default on PVC platform. Related-To: NEO-6465 Signed-off-by: Milczarek, Slawomir <slawomir.milczarek@intel.com>	2023-05-15 13:51:19 +02:00
Fabian Zwolinski	cbce863dc2	refactor: Rename member variables to camelCase 3/n Additionally enable clang-tidy check for member variables Signed-off-by: Fabian Zwolinski <fabian.zwolinski@intel.com>	2023-04-28 16:01:14 +02:00
Zbigniew Zdanowicz	4c7bc2ca98	[feature, perf] add alogrithm to chain command buffers in container This feature is part of performance improvement to dispatch and start command buffers as primary batch buffers. When exhausted command buffer is closed, then reserve exact space for chained batch buffer start and bind it to the next command buffer. When closing command buffer, then save ending pointer and reserve aligned space. Related-To: NEO-7807 Signed-off-by: Zbigniew Zdanowicz <zbigniew.zdanowicz@intel.com>	2023-04-05 15:49:01 +02:00
Rafal Maziejuk	b9828b543e	feature: adjust maxWorkGroupSize value Related-To: NEO-7357 Signed-off-by: Rafal Maziejuk <rafal.maziejuk@intel.com>	2023-03-28 15:19:52 +02:00
Mateusz Jablonski	5610eae710	refactor: fix typo Barrierl -> Barrier Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>	2023-03-21 15:58:24 +01:00
Filip Hazubski	0bee81c0c0	refactor: Move isLinearStoragePreferred function from gfx to product helper Signed-off-by: Filip Hazubski <filip.hazubski@intel.com>	2023-03-15 18:51:59 +01:00
Mateusz Jablonski	340f932ca2	refactor: move GfxCoreHelper::getExtensions to CompilerProductHelper Related-To: NEO-7800 Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>	2023-03-14 13:56:19 +01:00
Raiyan Latif	d5c909c9f9	Fix calculation of number of Ray-Tracing stacks MaxDualSubSlicesSupported is filled inside GT_SYSTEM_INFO structure when querying the KMD appropriately with the number of enabled DualSubSlices. However we need to find the highest index of the last enabled DualSubSlice. For proper allocation of thread scratch space, allocation has to be done based on native die config (including unfused or non-enabled DualSubSlices). Since HW doesn't provide us a way to know the exact native die config, in SW we need to allocate RT stacks with enough size based on the last used DualSubSlice. The IsDynamicallyPopulated field in GT_SYSTEM_INFO is used to indicate if system details are populated either via Fuse reg. or hard-coded. Based on this field's value, we calcuate the numRtStacks appropriately. Related-To: LOCI-3954 Signed-off-by: Raiyan Latif <raiyan.latif@intel.com>	2023-03-13 10:48:10 +01:00
Compute-Runtime-Validation	678e47de2d	Revert "Adjust maxWorkGroupSize value" This reverts commit `f7685a93e4`. Signed-off-by: Compute-Runtime-Validation <compute-runtime-validation@intel.com>	2023-02-21 14:45:36 +01:00
Rafal Maziejuk	f7685a93e4	Adjust maxWorkGroupSize value Related-To: NEO-7357 Signed-off-by: Rafal Maziejuk <rafal.maziejuk@intel.com>	2023-02-17 09:34:15 +01:00
Maciej Bielski	2778043d67	fix(l0): check for largeGRF when computing maxWorkGroupSize Sizing context (PVC): When using LargeGRF (a.k.a GRF256) there are only 4 HW threads per EU (instead of default 8). Together with SIMD16 that means that there can be max 64 work-items per EU. With 8 EU per subslice this gives 512 work-items on a single subslice. For correct intra-WG synchronization all its WIs must be executed on the same subslice (to access the same SLM, where the synchronization primitives are stored). Thus, with SIMD16 and LargeGRF the work-group size must not exceed 512 (PVC example). So far `maxWorkGroupSize` is taken solely from a DeviceInfo structure both in `ModuleTranslationUnit::processUnpackedBinary()` and `ModuleImp::initialize()`. This method does not take kernel parameters (LargeGRF) into account. It allows to submit a kernel using LargeGRF with SIMD16 with the work-group size set to 1024. That leads to a hang. Fix the `.maxWorkGroupSize` computation so that it takes the kernel parameters into consideration. Add new (for discrete platforms >= XeHP) and adapt existing tests, fix cosmetics by the way. Similar check for OCL: https://github.com/intel/compute-runtime/blob/master/opencl/source/comma nd_queue/enqueue_kernel.h#L130 Related-To: NEO-7684 Signed-off-by: Maciej Bielski <maciej.bielski@intel.com>	2023-02-08 11:20:52 +01:00
Dominik Dabek	8da362afae	fix(l0): do not memcpy on cpu if need unlock ptr Do not use cpu memory copy on windows if need to unlock locked ptr. Related-To: NEO-7553 Signed-off-by: Dominik Dabek <dominik.dabek@intel.com>	2023-02-02 10:41:39 +01:00
Kamil Kopryk	2484c7ceb2	refactor: rename hw_helper files to gfx_core_helper files Related-To: NEO-6853 Signed-off-by: Kamil Kopryk <kamil.kopryk@intel.com>	2023-02-01 19:37:51 +01:00

38 Commits