intel-graphics-compiler

mirror of https://github.com/intel/intel-graphics-compiler.git synced 2025-11-04 08:21:06 +08:00

Author	SHA1	Message	Date
Wu, Irene W	8ae6734b82	Remove initializeMergeUniformStoresPass	2025-08-22 17:59:32 +02:00
Krzysztof Śliwiński	98bedfba35	Changes in code.	2025-08-22 16:58:07 +02:00
Gojska, Grzegorz	ed88b63592	Add get_coord formula for get_coord for 16 and 32 bit datatypes. Platforms: All Keywords: Feature Related-to: GSD-11139 Resolves:	2025-08-22 12:31:50 +02:00
Kwasniewski, Patryk	d85d0be961	2D block I/O for SIMD32 Expose SPIR-V API for 2D block load/store/prefetch for SIMD32 kernels. Works for platforms with minimum subgroup-size=16.	2025-08-22 10:15:56 +02:00
Y	e679b1de8a	Use Ray Query Return value in Compute Ray Tracing Extension Modified intel_get_hit_candidate and intel_is_traversal_done functions.	2025-08-22 09:40:08 +02:00
Jakacki, Jakub	6a6a8a0f22	Expose option to force linear scan RA via metadata Expose option to force linear scan RA via metadata	2025-08-22 02:21:53 +02:00
Michal Paszkowski	7249d00150	Support width and pointer type agnostic loads/stores for private memory allocas in LowerGEPForPrivMem The old handleStoreInst/loadEltsFromVecAlloca assume 1:1 lane mapping and equal sizes between user value and the promoted vector element type. This is insufficient for mixed widths (e.g. <4 x i8> and <... x i32>), cross-lane accesses created by the new byte-offset GEP lowering, or pointers under opaque pointers (bitcasts between pointers and non-pointers are illegal). With the changes: 1) Stores (handleStoreInst and storeEltsToVecAlloca) normalize the source (scalar or vector) to a single integer of NeedBits = N * DstBits using ptrtoint/bitcast, split the big integer into K = ceil( NeedBits / SrcBits) chunks, bitcast/inttoptr each chunk back to the promoted lane type and insert into K consecutive lanes starting at the scalarized index. 2) Loads (handleLoadInst and loadEltsFromVecAlloca) read K promoted lanes starting at the scalarized index, convert each lane to iSrcBits, pack into i(K*SrcBits), truncate to i(NeedBits), then expand to the requested scalar or <N x DstScalarTy>. Use inttoptr for pointer results. There is also still a simple (old) path. If SrcBits == DstBits, just emit extractelement with casts (if needed). All paths do a single load of the promoted vector, extractelement/insertelement, and in case of stores only a single store back. With these changes, the LLVM IR emitted from LowerGEPForPrivMem will look different. Instead of using plain bitcasts, there are now ptrtoint/inttoptr instructions and there is additional packing/splitting logic. For the simple (old) load path, the new implementation should essentially emit the same pattern (potnetially skipping bitcasts). The additional integer/bitcast instruction sequences should be easily foldable. Memory traffic is unchanged (still one vector load/store). Overall register pressure should be similar, the pass still eliminates GEPs and avoids private/scratch accesses.	2025-08-21 20:25:19 +02:00
Shelegov, Maksim	9e80e069bb	GenXStructSplitter opaque pointers fix Handle the special opaque pointers case when a GEP user indexing into a non-structure type because leading zero indices are omitted.	2025-08-21 19:23:19 +02:00
igcbot	5e4a7b9e87	Drop preprocessor checks for unsupported LLVM versions (#23491 ) Drop preprocessor checks for unsupported LLVM versions Delete all #if LLVM_VERSION_MAJOR checks which were about LLVM versions older than 14.	2025-08-21 15:02:07 +02:00
Sukhov, Egor	47c1f6e944	Stub Vectorization for WAVEALL, CMP, SELECT enabled Stub Vectorization for WAVEALL, CMP, SELECT enabled by default. Dependency check window enlarged to 6 x number elements inside slice.	2025-08-20 13:38:22 +02:00
Zawrotny, Emilian	4f0123a7d6	Bump CMake version Change minimum CMake version to 3.13.4	2025-08-20 12:40:33 +02:00
Milczek, Szymon	9ae78bef34	Apply rule-of-three Applies rule-of-three by removing missing copy ctors.	2025-08-20 11:36:17 +02:00
Milczek, Szymon	6a2055eeeb	Apply rule-of-three Apply rule-of-three by explicitly deleting copy ctors.	2025-08-20 11:34:09 +02:00
Kanclerz, Piotr	0b5f9e11eb	Add has_printf_calls to zeinfo zeinfo now contains information if kernel/function has printf calls and function pointer calls. This allows neo to create printf_buffer when it is really used.	2025-08-20 10:30:06 +02:00
Liou, Jhe-Yu	8ca10e3d2e	Enable loop unrolling in retry for 3D Enable loop unrolling in retry for 3D	2025-08-20 02:41:26 +02:00
Michal Paszkowski	aefd0097e6	Handle single-index GEPs into flat aggregates in SimplifyConstant In opaque pointer mode, GEPs that index into globals often have a different shape. SimplifyConstant pass assumed two-index GEPs (0, index) and directly used the second operand as an element index. However, it is possible to address flat aggregates using single-index GEPs. See the two examples below from SYCL_CTS-math_builtin_float_double_1_ocl run in typed and opaque pointer mode. Two-index GEP example: %130 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(2)* @__stgamma_ep_nofp64__ones, i64 0, i64 %129 %131 = bitcast i32 addrspace(2)* %130 to float addrspace(2)* %132 = load float, float addrspace(2)* %131, align 4, !tbaa !5163, !noalias !5409 Single-index GEP example: %103 = getelementptr inbounds float, ptr addrspace(2) @__stgamma_ep_nofp64__ones, i64 %102 %104 = load float, ptr addrspace(2) %103, align 4, !tbaa !5163, !noalias !5409 This patch changes the pass to always use the last GEP index as the element selector. This works because the pass only transforms top-level arrays of scalars/vectors. In these cases, the element being loaded is always designated by the final GEP index (whether there are earlier indices selecting the actual aggregate or single index in opaque pointer mode).	2025-08-19 21:52:02 +02:00
sys_igc	215e971107	[Autobackout][FunctionalRegression]Revert of change: `6876fb54b2`: Enable loop unrolling in retry Enable loop unrolling in retry	2025-08-19 19:22:15 +02:00
Andrzejewski, Krystian	38f1569e69	Revert of `Remove redundant guard for a pattern of global imm offsets` Revert	2025-08-19 19:15:31 +02:00
Rybalov, Viacheslav G	e2c4ba8d76	Bitcast in StatelessToStateful pass The fix prevents crash in StatelessToStateful pass if all ptr usees are bitcast instructions.	2025-08-19 13:43:37 +02:00
sys_igc	a057740b8a	[Autobackout][FunctionalRegression]Revert of change: `6072b2cdf4`: _OS_SUMMARY _OS_DESCRIPTION	2025-08-19 03:05:22 +02:00
Liou, Jhe-Yu	6876fb54b2	Enable loop unrolling in retry Enable loop unrolling in retry	2025-08-18 21:11:44 +02:00
mkarigan	df0baa89e6	Include chrono explicitly Include chrono explicitly	2025-08-18 20:41:35 +02:00
Dmitrichenko, Aleksei	95ac72d3b8	Fix remat inst handling in CodeScheduling test Temporary fix for the remat inst handling in CodeScheduling LIT test that allows indeterminism of the first two loads order	2025-08-18 18:42:00 +02:00
Shelegov, Maksim	46f497d623	GenXPromoteArray opaque pointers fix Do not rely on bitcasts when deciding whether an index adjustment is necessary. In opaque pointers mode types can change between instructions without bitcasts.	2025-08-18 14:13:46 +02:00
Krzysztof Śliwiński	6072b2cdf4	_OS_SUMMARY _OS_DESCRIPTION	2025-08-18 10:48:02 +02:00
Kwasniewski, Patryk	9d9d6b3e5e	enable ShortImplicitPayloadHeader on PVC Compute workloads add following implicit arguments: * payloadHeader - 8 x i32 packing global_id_offset (3 x i32), local_size (3 x i32) and 2 x i32 reserved. * enqueued_local_size - 3 x i32 Most of the time only enqueued_local_size is used, leaving local_size unnecessary. In the end, payloadHeader has unused 20 bytes. This commit enables short payload header on PVC platform.	2025-08-18 09:07:53 +02:00
sys_igc	3c9eb3b099	[Autobackout][FunctionalRegression]Revert of change: `bdd9b15ad7`: Fix GEP lowering overflow issues This change prevents usage of potentially negative values which are then zero-extended to 64 bits as indexes. v2.18.0	2025-08-16 02:57:21 +02:00
sys_igc	ceb9c26626	[Autobackout][FunctionalRegression]Revert of change: `76b5b50eb2`: Only modify cr0 on debug SIP exit Only modify cr0 on debug SIP exit	2025-08-16 00:29:20 +02:00
Gu, Junjie	43da807c49	Changes in code.	2025-08-15 02:14:34 +02:00
sys_igc	587c7e9603	[Autobackout][FunctionalRegression]Revert of change: `882201b325`: Use Ray Query Return value in Compute Ray Tracing Extension Modified intel_get_hit_candidate and intel_is_traversal_done functions.	2025-08-15 01:01:14 +02:00
Chen, Kai	dcc6f77411	Fix issue in emit pattern with LVN matching for And Fix issue with LVN matching for And when in SIMD32 with mad operation.	2025-08-15 00:20:33 +02:00
Liou, Jhe-Yu	8eb1fe42bd	Enable loop unroll but only for reducing code size during compilation retry Enable loop unroll but only for reducing code size during compilation retry	2025-08-15 00:17:07 +02:00
Liu, Fang L	d81684bd3f	Fix the access bound check issue of src operand for madw instruction For madw instruction, only the dst operand needs special handling in verifier and src operand should be treated as other instructions.	2025-08-14 22:58:17 +02:00
Liou, Jhe-Yu	cedf0f970b	Parameterize UnrollMaxCountForAllocai in GenTTI Parameterize UnrollMaxCountForAllocai in GenTTI	2025-08-14 20:17:15 +02:00
Liu, Fang L	4c2e31a450	Fix the bug of verifying if an operand access exceeds the declared variable size for madw instruction When verifying if an operand access exceeds the declared variable size, we should do special handling for madw instruction as this instruction write both the low and high results to GRFs.	2025-08-14 19:01:27 +02:00
Plyakhin, Yury	941ba382ec	Fix predicated store sub-DW value handling This change addresses the handling of predicated stores for sub-DW values with non-uniform stored values. Predicate alone is not enough to calculate the correct offset. So, we use `EMASK & Predicate` to determine the correct offset.	2025-08-14 18:14:13 +02:00
Pawel Szymichowski	f68235fad2	Bump MINOR to 18	2025-08-14 13:29:44 +02:00
Borzyszkowski, Mateusz	6cad180e82	Add lit test for conversion from i64 to double Add lit test for conversion from i64 to double	2025-08-14 12:46:40 +02:00
Pawel Szymichowski	c442009f88	Bump MINOR to 17	2025-08-14 12:23:27 +02:00
Y	882201b325	Use Ray Query Return value in Compute Ray Tracing Extension Modified intel_get_hit_candidate and intel_is_traversal_done functions.	2025-08-14 10:59:06 +02:00
Cheng, Bu Qi	e4d71856fa	Change schedule priority according to dep type For barrier dep, shouldn't use latency cycle to calcuate priority, because barrier is order issue, not the latecy issue. Use occupancy steady.	2025-08-14 08:09:25 +02:00
bcheng0127	e9afb1822b	Changes in code.	2025-08-14 00:04:07 +02:00
Michal Paszkowski	e8906d0679	Fix i8/opaque pointer byte offset GEP scalarization in PrivateMemoryResolution When LLVM IR uses opaque pointers or inserts a bitcast to i8, a subsequent GEP is expressed in bytes. The legacy handleGEPInst always scalarized indices by starting from pGEP->getSourceElementType(). After the i8 cast, the type is i8, so the algorithm mistakenly treated the byte index as a count of elements, producing misscaled (too large) scalarized index. Example: %a = alloca [16 x [16 x float]], align 4 %b = bitcast [16 x [16 x float]]* %a to i8* %c = getelementptr inbounds i8, i8* %b, i64 64 Here, 64 is a byte offset into the original aggregate. The old implementation, seeing i8, scaled as if 64 elements, not 64 bytes. Yet, the meaningful base of the GEP is alloca's aggregate type [16 x [16 x float]] and the element-calculations should be based on this type. This change: 1. Introduces getFirstNonScalarSourceElementType(GEP), which walks back from the GEP base through pointer casts to find a root aggregate element type. 2. Adds additional handling in handleGEPInst, so that i8 GEP byte offset is converted to an element index of the underlying base type. This way the algorithm avoids basing element index scalarization on incidental i8* and keeps index calculation aligned with the underlying allocation layout. For reference, in typed pointer mode (or without the bitcast), the GEP would look like this: %a = alloca [16 x [16 x float]], align 4 %c = getelementptr inbounds [16 x [16 x float]], [16 x [16 x float]]* %a, i64 0, i64 1 Here, %c is the pointer to the 2nd inner array [16 x float]*.	2025-08-13 22:53:48 +02:00
Yury Plyakhin	bdd9b15ad7	Fix GEP lowering overflow issues This change prevents usage of potentially negative values which are then zero-extended to 64 bits as indexes.	2025-08-13 20:41:06 +02:00
Jakacki, Jakub	dcfe3f25db	Change new inline raytracing setting Change new inline raytracing setting	2025-08-13 17:49:52 +02:00
Sukhov, Egor	4458a3bfcc	Stub vectorization for IGCVectorizer Allow certain instructions to be "stub-vectorized" New tests are added to cover for additional flexibitlity of vectorization.	2025-08-13 14:54:45 +02:00
Krause, Michal	d19cdc5a52	Refactor ZEBinary flags and documentation Refactored all conditions based on enableZEBinary() and supportsZEBin(), as if they were always true. Removed said conditions.	2025-08-13 09:05:48 +02:00
Joel Fuentes	aafca7ed1b	Improve spill threshold handling Improve spill threshold handling in units of GRFs calculated from byte input.	2025-08-12 23:08:27 +02:00
Gorban, Igor	d3ca4a545c	Add -vc-codegen option handling for VLD .	2025-08-12 17:00:51 +02:00
Stefan Ilic	b799e7c1f2	Add GenericCastToPtrOpt pass In cases where we have no local casts to generics and we allocate private memory in global space, we can replace GenericCastToPtrExplicit with simple address space cast.	2025-08-12 15:45:04 +02:00

1 2 3 4 5 ...

18337 Commits