intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-21 12:19:23 +08:00

Author	SHA1	Message	Date
Shreeyash Pandey	2a5420ea51	[libc] move abs_timesout and monotonicity out of linux dir (#167719 ) This patch moves abs_timeout and monotonicity out of the linux dir into common. Both of these functions depend on clock_gettime which is the actual os-dependent component. As other features in `__support/threads` may want to use these, it's better to share it in common.	2025-12-08 22:14:12 +05:30
Men-cotton	614fe6da14	[mlir][OpenMP] Fix crash in MapInfoOp conversion when type conversion fails (#171045 ) Check the result of `convertType` before calling `TypeAttr::get`. This prevents a crash on unsupported types (e.g. `tensor`) by ensuring the pattern fails gracefully. Added regression test: map-info-type-conversion-fail.mlir Fixes: #108159	2025-12-08 17:30:22 +01:00
Rana Pratap Reddy	b32a2f418a	[Clang][OpenCL][AMDGPU] Allow _Float16 and half vector type compatibility (#170605 ) ## Summary Allowing implicit compatibility between `_Float16` vector types and `half` vector types in OpenCL mode. This enables AMDGPU builtins to work correctly across OpenCL, HIP, and C++ without requiring separate builtin definitions. ## Problem Statement When using AMDGPU image builtins that return half-precision vectors in OpenCL, users encounter type incompatibility errors: Builtin Definition: `TARGET_BUILTIN(__builtin_amdgcn_image_load_1d_v4f16_i32, "V4xiiQtii", "nc", "image-insts")` Test Case: ``` typedef half half4 __attribute__((ext_vector_type(4))); half4 test_builtin_image_load_1d_2(half4 v4f16, int i32, __amdgpu_texture_t tex) { return __builtin_amdgcn_image_load_1d_v4f16_i32(100, i32, tex, 120, i32); } ``` Error: ``` error: returning '__attribute__((__vector_size__(4 * sizeof(_Float16)))) _Float16' (vector of 4 '_Float16' values) from a function with incompatible result type 'half4' (vector of 4 'half' values) ``` ## Solution In OpenCL, allow implicit compatibility between `_Float16` vector types and `half` vector types. This is needed for AMDGPU builtins that may return _Float16 vectors to work correctly with OpenCL half vector types.	2025-12-08 21:56:35 +05:30
Michael Liao	fccb65ef8f	[mlir] Fix '-Wtemplate-id-cdtor'. NFC	2025-12-08 11:22:48 -05:00
Simon Pilgrim	a05fc9edb9	HexagonGenWideningVecInstr.cpp - fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC. (#171095 )	2025-12-08 16:17:25 +00:00
zhijian lin	d1ad0856f8	Fix [PowerPC] llc crashed at -O1/O2/O3: Assertion `isImm() && "Wrong MachineOperand mutator"' failed. (#170548 ) Fixed issue [[PowerPC] llc crashed at -O1/O2/O3: Assertion `isImm() && "Wrong MachineOperand mutator"' failed.](https://github.com/llvm/llvm-project/issues/167672) the root cause of the crash, the IMM operand is in different operand num of the instruction PPC::XXSPLTW and PPC::XXSPLTB/PPC::XXSPLTH. and the patch also fix a potential bug that the new element index of PPC::XXSPLTB/PPC::XXSPLTH/XXSPLTW use the same logic. It should be different .We need to convert the element index into the proper unit (byte for VSPLTB, halfword for VSPLTH, word for VSPLTW) because PPC::XXSLDWI interprets its ShiftImm in 32-bit word units.	2025-12-08 11:16:55 -05:00
Sang Ik Lee	447af32fbb	[MLIR][XeGPU][XeVM] create_nd_tdesc: use correct pitch from strides. (#170384 ) Base memory pitch should be derived from base stride, not base width. Remove offset fields from tensor descriptor payload and add pitch field.	2025-12-08 08:15:44 -08:00
Sang Ik Lee	b8ddbc4f03	[MLIR][XeVM] gpu.printf test: use correct runtime. (#170754 ) gpu printf test was not using the runtime required by lit.local.cfg All other tests in the directory are correctly using level zero runtime. But gpu printf test is using sycl runtime.	2025-12-08 08:14:56 -08:00
Ivan Butygin	ca8419d6cc	[mlir][amdgpu] Fuse adjacent `MemoryCounterWaitOp` (#171148 ) Taking the minimum value.	2025-12-08 18:52:26 +03:00
Simon Pilgrim	ebdb903c10	[X86] Handle X86ISD::EXPAND/COMPRESS nodes as target shuffles (#171119 ) Allows for shuffle simplification Required a minor fix to the overly reduced compress-undef-float-passthrough.ll regression test	2025-12-08 15:48:43 +00:00
Anchu Rajendran S	b08c72b26c	[Flang][OpenMP] Enables parsing of threadset clause (#169856 )	2025-12-08 07:47:05 -08:00
Ramkumar Ramachandra	c5b90103da	[VPlan] Use nuw when computing {VF,VScale}xUF (#170710 ) These quantities should never unsigned-wrap. This matches the behavior if only VFxUF is used (and not VF): when computing both VF and VFxUF, nuw should hold for each step separately.	2025-12-08 15:46:02 +00:00
Benjamin Maxwell	9a5fa3075a	[ADT] Add `llvm::reverse_conditionally()` iterator (#171040 ) This patch adds a simple iterator range that allows conditionally iterating a collection in reverse. It works with any collection supported by `llvm::reverse(Collection)`. ``` void foo(bool Reverse, std::vector<int>& C) { for (int I : reverse_conditionally(C, Reverse)) { // ... } } ```	2025-12-08 15:28:09 +00:00
Matt Arsenault	886f54a04c	DAG: Set MachinePointerInfo for stack when expanding divrem libcall (#170537 )	2025-12-08 16:25:19 +01:00
Sameer Sahasrabuddhe	1ae957515c	[AMDGPU][NFC] Update a comment about FLAT v/s LDSDMA The change in #170263 does not do justice to common knowledge in the backend. Fix the comment to reflect the relation between FLAT encoding, flat pointer access, and LDSDMA operations.	2025-12-08 20:49:19 +05:30
Victor Chernyakin	a6fc5a1d77	[clang-tidy][NFC] Refactor `fuchsia-multiple-inheritance` (#171059 )	2025-12-08 07:19:04 -08:00
Matt Arsenault	ce73cbb6ab	clang: Use generic builtins in cuda complex builtins header (#171106 ) There's no reason to use the ocml or nv prefixed functions and maintain this list of alias macros. I left these macros in for NVPTX in the scalbn and logb case, since those have a special case hack in the AMDGPU codegen and probably do not work on ptx.	2025-12-08 16:16:24 +01:00
Dark Steve	cc19f420b9	[AMDGPU][NPM] Port AMDGPUArgumentUsageInfo to NPM (#170886 ) Port AMDGPUArgumentUsageInfo analysis to the NPM to fix suboptimal code generation when NPM is enabled by default. Previously, DAG.getPass() returns nullptr when using NPM, causing the argument usage info to be unavailable during ISel. This resulted in fallback to FixedABIFunctionInfo which assumes all implicit arguments are needed, generating unnecessary register setup code for entry functions. Fixes LLVM::CodeGen/AMDGPU/cc-entry.ll Changes: - Split AMDGPUArgumentUsageInfo into a data class and NPM analysis wrapper - Update SIISelLowering to use DAG.getMFAM() for NPM path - Add RequireAnalysisPass in addPreISel() to ensure analysis availability This follows the same pattern used for PhysicalRegisterUsageInfo.	2025-12-08 20:38:00 +05:30
Tim Gymnich	0487154588	[mlir][amdgpu] Add workgroup_mask to MakeDmaDescriptorOp (#171103 ) - add `workgroup_mask` and `early_timeout`	2025-12-08 16:02:18 +01:00
Luke Lau	e8219e5ce8	[VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690 ) In 531.deepsjeng_r from SPEC CPU 2017 there's a loop that we unprofitably loop vectorize on RISC-V. The loop looks something like: ```c for (int i = 0; i < n; i++) { if (x0[i] == a) if (x1[i] == b) if (x2[i] == c) // do stuff... } ``` Because it's so deeply nested the actual inner level of the loop rarely gets executed. However we still deem it profitable to vectorize, which due to the if-conversion means we now always execute the body. This stems from the fact that `getPredBlockCostDivisor` currently assumes that blocks have 50% chance of being executed as a heuristic. We can fix this by using BlockFrequencyInfo, which gives a more accurate estimate of the innermost block being executed 12.5% of the time. We can then calculate the probability as `HeaderFrequency / BlockFrequency`. Fixing the cost here gives a 7% speedup for 531.deepsjeng_r on RISC-V. Whilst there's a lot of changes in the in-tree tests, this doesn't affect llvm-test-suite or SPEC CPU 2017 that much: - On armv9-a -flto -O3 there's 0.0%/0.2% more geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. - On x86-64 -flto -O3 with PGO there's 0.9%/0% less geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. Overall geomean compile time impact is 0.03% on stage1-ReleaseLTO: https://llvm-compile-time-tracker.com/compare.php?from=9eee396c58d2e24beb93c460141170def328776d&to=32fbff48f965d03b51549fdf9bbc4ca06473b623&stat=instructions%3Au	2025-12-08 14:28:26 +00:00
Erich Keane	dd06214394	[OpenACC][CIR] Implement routine 'bind'-with-a-string lowering (#170916 ) The 'bind' clause emits an attribute on the RoutineOp that states which function it should call on the device side. When provided in double-quotes, the function on the device side should be the exact name given. This patch emits the IR to do that. As a part of that, we add a helper function to the OpenACC dialect to do so, as well as a version that adds the ID version (though we don't exercise th at yet). The 'bind' with an ID should do the MANGLED name, but it isn't quite clear what that name SHOULD be yet. Since the signature of a function is included in its mangling, and we're not providing said signature, we have to come up with something. This is left as an exercise for a future patch.	2025-12-08 06:23:13 -08:00
Simon Pilgrim	bab4d1e8b2	[X86] shift-i512.ll - extend test coverage (#171125 ) Remove v8i64 dependency from original shift-by-1 tests - this was added for #132601 but is unlikely to be necessary Add tests for general shifts as well as shift-by-constant and shift-of-constant examples	2025-12-08 14:17:00 +00:00
Hongyu Chen	11866c499b	[DAGCombiner] Don't peek through bitcast when checking isMulAddWithConstProfitable (#171056 ) Fixes https://github.com/llvm/llvm-project/issues/171035 Peeking through bitcast may cause type mismatch between `AddNode` and `ConstNode` in `isMulAddWithConstProfitable`.	2025-12-08 22:09:12 +08:00
Mend Renovate	f1af9b027e	Update [Github] Update GHA Dependencies (#171064 ) This PR contains the following updates: \| Package \| Type \| Update \| Change \| Pending \| \|---\|---\|---\|---\|---\| \| [actions/checkout](https://redirect.github.com/actions/checkout) \| action \| patch \| `v6.0.0` -> `v6.0.1` \| \| \| [actions/setup-node](https://redirect.github.com/actions/setup-node) \| action \| minor \| `v6.0.0` -> `v6.1.0` \| \| \| [github/codeql-action](https://redirect.github.com/github/codeql-action) \| action \| patch \| `v4.31.5` -> `v4.31.6` \| `v4.31.7` \|	2025-12-08 06:06:43 -08:00
Aiden Grossman	f29f01db8f	[Sanitizer] Bump soft_rss_limit_mb in test (#170911 ) This test is failing on some buildbots now that the internal shell has been turned on and was failing previously on some ppc bots when turning it on a while back (before it got reverted). At least one X86 bot is barely hitting the limit (https://lab.llvm.org/buildbot/#/builders/174/builds/28487 224MB-235MB). This likely needs to be bumped due to changes in the process tree (now that we invoke things through python rather than a bash shell) with the enablement of the internal shell.	2025-12-08 06:04:41 -08:00
David Spickett	7fbd443491	[lldb] Remove printf in breakpoint add command Added in `2110db0f49` / #156067.	2025-12-08 13:53:55 +00:00
Mehdi Amini	c1d030e9a4	[MLIR][ExecutionEngine] Don't create a `_mlir_` wrapper function for internal linkage (#171115 ) This is somehow NFC, we were creating wrapper for interal functions, which are de-facto not callable.	2025-12-08 14:42:00 +01:00
Jay Foad	07bafab83d	[AMDGPU] Do not generate V_FMAC_DX9_ZERO_F32 on GFX12 (#171116 ) GFX12 does not have the FMAC form of this instruction, only the FMA form. Fixes: #170437	2025-12-08 13:20:02 +00:00
Robert Imschweiler	33d779dfbf	[OpenMP] Fix undefined symbol for Darwin builds (#170999 ) cf. https://github.com/llvm/llvm-project/pull/168554#issuecomment-3617253169	2025-12-08 14:15:39 +01:00
Adrian Vogelsgesang	7c832fca53	[lldb] Fix command line of `target frame-provider register` (#167803 ) So far, the syntax was `target frame-provider register <cmd-options> [<run-args>]`. Note the optional `run-args` at the end. They are completely ignored by the actual command, but the command line parser still accepts them. This commit removes them. This was probably a copy-paste error from `CommandObjectProcessLaunch` which was probably used as a blue-print for `target frame-provider register`.	2025-12-08 13:14:41 +00:00
Gergely Bálint	a5e8e77f7c	[BOLT][PAC] Warn about synchronous unwind tables (#165227 ) BOLT currently ignores functions with synchronous PAuth DWARF info. If more than 10% of functions get ignored for inconsistencies, we should emit a warning to only use asynchronous unwind tables. See related issue: #165215	2025-12-08 13:34:48 +01:00
Mehdi Amini	60492898f8	[MLIR] Apply clang-tidy fixes for readability-identifier-naming in ShardOps.cpp (NFC)	2025-12-08 04:12:47 -08:00
Mehdi Amini	1bbff7290f	[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in VulkanRuntimeWrappers.cpp (NFC)	2025-12-08 04:12:47 -08:00
Tirthankar Mazumder	d94958b2f2	[InstCombine] Fold `icmp samesign u{gt/lt} (X +nsw C2), C` -> `icmp s{gt/lt} X, (C - C2)` (#169960 ) Fixes #166973 Partially addresses #134028 Alive2 proof: https://alive2.llvm.org/ce/z/BqHQNN	2025-12-08 13:05:37 +01:00
Simon Pilgrim	3a6781ea4d	[X86] vector-shuffle-combining-avx512f.ll - add tests showing failure to simplify expand/compress nodes (#171113 )	2025-12-08 12:02:43 +00:00
Benjamin Maxwell	32ff7100d7	[AArch64] Lower v8bf16 FMUL to BFMLAL top/bottom with +sve (#169655 ) Assuming the predicate is hoisted, this should have a slightly better throughput: https://godbolt.org/z/jb7aP7Efc Note: SVE must be used to convert back to bf16 as the bfmlalb/t instructions operate on even/odd lanes, but the neon bfcvtn/2 process the top/bottom halves of vectors.	2025-12-08 11:56:18 +00:00
Mehdi Amini	5e3ffd66e7	[MLIR] Apply clang-tidy fixes for readability-identifier-naming in ArmRunnerUtils.cpp (NFC)	2025-12-08 03:48:14 -08:00
Jay Foad	f41edb3fb9	[AMDGPU] Add test cases for v_fmac_dx9_zero_f32 aka v_fmac_legacy_f32 (#171108 )	2025-12-08 11:42:10 +00:00
Manuel Carrasco	56beac9f0c	[SPIRV] Fix assertion violation caused by unexpected ConstantExpr. (#170524 ) `SPIRVEmitIntrinsics::simplifyZeroLengthArrayGepInst` asserted that it always expected a `GetElementPtrInst` from `IRBuilder::CreateGEP` (which returns a `Value`). `IRBuilder` can fold and return a `ConstantExpr` instead, thus violating the assertion. The patch fixes this by using `GetElementPtrInst::Create` to always return a `GetElementPtrInst`. This LLVM defect was identified via the AMD Fuzzing project.	2025-12-08 11:37:16 +00:00
Tom Stellard	e52cddc432	workflows/release-binaries: Use upload-release-artifact action for uploading (#170528 )	2025-12-08 03:35:43 -08:00
David Spickett	405403c8ed	[mlir] Fix GCC compilation warning in TuneExtensionOps.cpp (#168850 ) Building with GCC produces: ``` <...>/TuneExtensionOps.cpp:180:26: warning: comparison of unsigned expression in ‘< 0’ is always false [-Wtype-limits] 180 \| if (selectedRegionIdx < 0 \|\| selectedRegionIdx >= getNumRegions()) \| ~~~~~~~~~~~~~~~~~~~^~~ <...>/TuneExtensionOps.cpp: In member function ‘llvm::LogicalResult mlir::transform::tune::AlternativesOp::verify()’: /home/david.spickett/llvm-project/mlir/lib/Dialect/Transform/TuneExtension/TuneExtensionOps.cpp:236:19: warning: comparison of unsigned expression in ‘< 0’ is always false [-Wtype-limits] 236 \| if (regionIdx < 0 \|\| regionIdx >= getNumRegions()) \| ~~~~~~~~~~^~~ ``` As we are sign extending these variables, use int64_t instead of size_t for their type.	2025-12-08 11:06:42 +00:00
guillem-bartrina-sonarsource	f9e0fa8ba4	[analyzer] MoveChecker: correct invalidation of this-regions (#169626 ) By completely omitting invalidation in the case of InstanceCall, we do not clear the moved state of the fields of the this object after an opaque call to a member function of the object itself.	2025-12-08 11:00:54 +00:00
Mehdi Amini	49496c531d	[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in LLVMIRIntrinsicGen.cpp (NFC)	2025-12-08 02:48:34 -08:00
Simon Pilgrim	bb926c157f	[X86] bitcnt-big-integer.ll - add test coverage for AVX512 targets with no VLX support (#171104 )	2025-12-08 10:34:11 +00:00
Hans Wennborg	2e238bfa36	Build win release packages with LLDB_ENABLE_LIBXML2 (#170513 ) Fixes #170461	2025-12-08 11:20:52 +01:00
Pierre van Houtryve	8aa82eff56	[AMDGPU][SIInsertWaitcnts] Wait on all LDS DMA operations when no aliasing store is found (#170660 ) Previously, we would miss inserting a wait if the ds_read had AA info, but it didn't match any LDS DMA op, for example if we didn't track the LDS DMA op it aliases with because it exceeded the tracking limit.	2025-12-08 11:02:24 +01:00
Jay Foad	7a59ab0e1a	[AMDGPU] Common up some unsafe fexp lowering. NFC. (#170841 )	2025-12-08 09:50:45 +00:00
Petar Avramovic	448ac1fb00	AMDGPU/GlobalISel: Fix broken exp10 lowering for f16 (#170708 )	2025-12-08 10:35:40 +01:00
Stefan Gränitz	c347b2669b	Remove LLVM_ABI from members of RuntimeLibraryAnalysis (NFC) (#170850 ) Fix Windows build error: attribute 'dllexport' cannot be applied to member of 'dllexport' class	2025-12-08 10:27:17 +01:00
Dan Blackwell	bd1bd178f8	[fuzzer][test-only] Bump runs for reduce_inputs.test unseeded run (#169641 ) I have seen a failure whereby the fuzzer failed to reach the expected input and thus failed the test. This patch bumps the max executions to 10,000,000 in order to give the fuzzer a better chance of reaching the expected input. Most runs complete successfully, so I do not see this adding test time in the general case; I believe it's a fair tradeoff for the unlucky seed to run for longer if it reduces the noise from false positives. Note, this updates a different `RUN:` to https://github.com/llvm/llvm-project/pull/165402. rdar://162122184	2025-12-08 09:05:49 +00:00

1 2 3 4 5 ...

561896 Commits