intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-23 07:58:23 +08:00

Author	SHA1	Message	Date
Kelvin Li	2e67dcfdcd	[flang] update ppc lit tests after using vector.insert and vector.extract (NFC) (#148775 ) See https://github.com/llvm/llvm-project/pull/143272	2025-07-18 14:43:15 -07:00
Diego Caballero	c99c213e72	[mlir][Flang][NFC] Replace use of `vector.insertelement/extractelement` (#143272 ) This PR is part of the last step to remove `vector.extractelement` and `vector.insertelement` ops (RFC: https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops). It replaces `vector.insertelement` and `vector.extractelement` with `vector.insert` and `vector.extract` in Flang. It looks like no lit tests are impacted?	2025-07-18 14:43:03 -07:00
Prabhu Rajasekaran	921c6dbeca	[llvm] Introduce callee_type metadata Introduce `callee_type` metadata which will be attached to the indirect call instructions. The `callee_type` metadata will be used to generate `.callgraph` section described in this RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html Reviewers: morehouse, petrhosek, nikic, ilovepi Reviewed By: nikic, ilovepi Pull Request: https://github.com/llvm/llvm-project/pull/87573	2025-07-18 14:40:54 -07:00
Stanislav Mekhanoshin	6d8e53d4af	[AMDGPU] Support nv memory instructions modifier on gfx1250 (#149582 )	2025-07-18 14:38:46 -07:00
Florian Mayer	1b8a136a09	[Sanitizer] remove array-bounds-pseudofn (#149430 ) This has been replaced by -fsanitize-annotate-debug-info	2025-07-18 14:31:21 -07:00
Jonas Devlieghere	3641448e08	[lldb] Use StopInfoSP instead of StopInfo* (NFC) Don't make assumptions about the lifetime of the underlying object and use the shared_ptr to participate in reference counting and extend the lifetime of the object to the end of the lexical scope.	2025-07-18 14:29:20 -07:00
Alex MacLean	965b68e8f2	[NVPTX] Prevent fptrunc of v2f32 from being folded into store (#149571 )	2025-07-18 14:20:13 -07:00
Andres-Salamanca	b02787d33f	[CIR] Fix alignment when lowering set/get bitfield operations (#148999 ) This PR fixes incorrect alignment when lowering `set` and `getBitField` operations to LLVM IR. The issue occurred because during lowering, the function was being called with an alignment of 0, which caused it to default to the alignment of the packed member. For example, if the bitfield was packed inside a `u64i`, it would use an alignment of 8. With this change, the generated code now matches what the classic codegen produces. In the assembly format, I changed to be similar to how it's done in loadOp. If there's a better approach, please feel free to point it out.	2025-07-18 16:13:34 -05:00
Princeton Ferro	d63ab5467d	[NVPTX] don't erase CopyToRegs when folding movs into loads (#149393 ) We may still need to keep CopyToReg even after folding uses into vector loads, since the original register may be used in other blocks. Partially reverts `1fdbe69849`	2025-07-18 14:11:31 -07:00
Jay Foad	3be44e2580	[TableGen] Add some -time-phases support in CodeGenRegisters (#149309 )	2025-07-18 22:05:54 +01:00
Shilei Tian	d46de86ca4	[NFC][AMDGPU] Re-enable two tests previously disabled due to missing upstream features (#149568 ) This PR re-enables two tests that were previously disabled because they depended on features not yet upstreamed.	2025-07-18 17:04:34 -04:00
Shilei Tian	ffb453989b	[NFC][AMDGPU] Align all gfx1250 VOP1 MC tests with downstream (#149567 ) This PR adds all VOP1 tests that haven't yet been upstreamed by copying the relevant test files directly from downstream. Afterward, the auto-generation script is run with the `--unique` option to deduplicate any redundant tests that may have been introduced during the downstream merge. Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-18 17:03:26 -04:00
Ellis Hoag	fb5c94e712	[profdata] Use --hot-func-list to show all hot functions (#149428 ) The `--hot-func-list` flag is used for sample profiles to dump the list of hot functions. Add support to dump hot functions for IRPGO profiles as well. This also removes a `priority_queue` used for `--topn`. We can instead store all functions and sort at the end before dumping. Since we are storing `StringRef`s, I believe this won't consume too much memory.	2025-07-18 14:00:32 -07:00
Florian Hahn	004c67ea25	[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239 ) Update LV to vectorize maxnum/minnum reductions without fast-math flags, by adding an extra check in the loop if any inputs to maxnum/minnum are NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros are already handled consistently by maxnum/minnum. If any input is NaN, exit the vector loop, compute the reduction result up to the vector iteration that contained NaN inputs and * resume in the scalar loop New recurrence kinds are added for reductions using maxnum/minnum without fast-math flags. PR: https://github.com/llvm/llvm-project/pull/148239	2025-07-18 21:58:19 +01:00
Jeffrey Byrnes	695660cdfd	[AMDGPU] Provide control to force VGPR MFMA form (#148079 ) This gives an override to the user to force select VGPR form of MFMA. Eventually we will drop this in favor of compiler making better decisions, but this provides a mechanism for users to address the cases where MayNeedAGPRs favors the AGPR form and performance is degraded due to poor RA.	2025-07-18 13:53:17 -07:00
Andre Kuhlenschmidt	abdd4536ce	[flang][openacc] fix bugs with default(none) checking (#149220 ) A report of the following code not generating an error led to fixing two bugs in directive checking. - We should treat CombinedConstructs as OpenACC Constructs - We should treat DoConstruct index variables as private. ```fortran subroutine sub(nn) integer :: nn, ii !$acc serial loop default(none) do ii = 1, nn end do !$acc end serial loop end subroutine ``` Here `nn` should be flagged as needing a data clause while `ii` should still get one implicitly.	2025-07-18 13:50:09 -07:00
Peter Klausler	b6ea04a37b	[flang][NFC] Fix build-time warning (#149549 ) Don't increment the LHS variable of an assignment that also uses that variable on the RHS.	2025-07-18 13:45:25 -07:00
Peter Klausler	9e5b2fbe86	[flang][runtime] Preserve type when remapping monomorphic pointers (#149427 ) Pointer remappings unconditionally update the element byte size and derived type of the pointer's descriptor. This is okay when the pointer is polymorphic, but not when a pointer is associated with an extended type. To communicate this monomorphic case to the runtime, add a new entry point so as to not break forward binary compatibility.	2025-07-18 13:45:05 -07:00
Peter Klausler	680b8dd707	[flang][runtime] Handle spaces before ')' in alternative list-directe… (#149384 ) …d complex input List-directed reads of complex values that can't go through the usual fast path (as in this bug's test case, which uses DECIMAL='COMMA') didn't skip spaces before the closing right parenthesis correctly. Fixes https://github.com/llvm/llvm-project/issues/149164.	2025-07-18 13:44:44 -07:00
Peter Klausler	97a8476068	[flang][runtime] Further work on speeding up work queue operations (#149189 ) This patch avoids a trip through the work queue engine for cases on a CPU where finalization and destruction actions during assignment were handled without enqueueing another task.	2025-07-18 13:44:25 -07:00
Peter Collingbourne	9878ef3abd	CodeGen: Respect function align attribute if less than preferred alignment. Reviewers: arsenm, efriedma-quic Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/149444	2025-07-18 13:33:46 -07:00
Philip Reames	c5f0c4ad37	[RISCV][IA] Add test coverage for vp.store of interleaveN with one active	2025-07-18 13:33:23 -07:00
Kazu Hirata	cb6370167f	[mlir] Deprecate OpPrintingFlags(std::nullopt_t) (NFC) (#149546 ) This patch deprecates OpPrintingFlags(std::nullopt_t) to avoid use of std::nullopt outside the context of std::optional.	2025-07-18 13:33:05 -07:00
Kazu Hirata	c98b05bd56	[mlir] Deprecate NamedAttrList(std::nullopt_t) (NFC) (#149544 ) This patch deprecates NamedAttrList(std::nullopt_t) to avoid use of std::nullopt outside the context of std::optional.	2025-07-18 13:32:56 -07:00
Kazu Hirata	36c78ec3c8	[DebugInfo] Use llvm::remove_if (NFC) (#149543 ) We can pass a range to llvm::remove_if.	2025-07-18 13:32:49 -07:00
Kazu Hirata	3c1a09d939	[lldb] Use a range-based for loop instead of llvm::for_each (NFC) (#149541 ) LLVM Coding Standards discourages llvm::for_each unless we already have a callable.	2025-07-18 13:32:42 -07:00
Alex MacLean	9d9662e4bd	[NVPTX][test] fixup version for ptxas on trunc-tofp.ll (#149558 )	2025-07-18 13:27:31 -07:00
Chelsea Cassanova	d64802d6d9	[lldb][framework] Glob headers from source for framework (#148736 ) When gathering the headers to fix up and place in LLDB.framework, we were previously globbing the header files from a location in the build directory. This commit changes this to glob from the source directory instead, as we were globbing from the build directory without ensuring that the necessary files were actually in that location before globbing.	2025-07-18 15:26:09 -05:00
Hanumanth	b846d8c3e2	[mlir][tosa] Fix tosa-reduce-transposes to handle large constants better (#148755 ) This change addresses the performance issue in the --tosa-reduce-transposes implementation by working directly with the raw tensor data, eliminating the need for creating the costly intermediate attributes that leads to bottleneck.	2025-07-18 16:12:57 -04:00
Ellis Hoag	4dc6dfd653	[NFC][profdata] Apply lints and other format fixes (#149433 ) Apply lints and other format fixes to `llvm/tools/llvm-profdata/llvm-profdata.cpp`. This is intended to have no functional change.	2025-07-18 13:08:29 -07:00
Jacob Lalonde	6a7f572ef9	[LLDB] Fix Memory64 BaseRVA, move all non-stack memory to Mem64. (#146777 ) ### Context Over a year ago, I landed support for 64b Memory ranges in Minidump (#95312). In this patch we added the Memory64 list stream, which is effectively a Linked List on disk. The layout is a sixteen byte header and then however many Memory descriptors. ### The Bug This is a classic off-by one error, where I added 8 bytes instead of 16 for the header. This caused the first region to start 8 bytes before the correct RVA, thus shifting all memory reads by 8 bytes. We are correctly writing all the regions to disk correctly, with no physical corruption but the RVA is defined wrong, meaning we were incorrectly reading memory ![image](https://github.com/user-attachments/assets/049ef55d-856c-4f3c-9376-aeaa3fe8c0e1) ### Why wasn't this caught? One problem we've had is forcing Minidump to actually use the 64b mode, it would be a massive waste of resources to have a test that actually wrote >4.2gb of IO to validate the 64b regions, and so almost all validation has been manual. As a weakness of manual testing, this issue is psuedo non-deterministic, as what regions end up in 64b or 32b is handled greedily and iterated in the order it's laid out in /proc/pid/maps. We often validated 64b was written correctly by hexdumping the Minidump itself, which was not corrupted (other than the BaseRVA) ![image](https://github.com/user-attachments/assets/b599e3be-2d59-47e2-8a2d-75f182bb0b1d) ### Why is this showing up now? During internal usage, we had a bug report that the Minidump wasn't displaying values. I was unable to repro the issue, but during my investigation I saw the variables were in the 64b regions which resulted in me identifying the bug. ### How do we prevent future regressions? To prevent regressions, and honestly to save my sanity for figuring out where 8 bytes magically came from, I've added a new API to SBSaveCoreOptions. ```SBSaveCoreOptions::GetMemoryRegionsToSave()``` The ability to get the memory regions that we intend to include in the Coredump. I added this so we can compare what we intended to include versus what was actually included. Traditionally we've always had issues comparing regions because Minidump includes `/proc/pid/maps` and it can be difficult to know what memoryregion read failure was a genuine error or just a page that wasn't meant to be included. We are also leveraging this API to choose the memory regions to be generated, as well as for testing what regions should be bytewise 1:1. After much debate with @clayborg, I've moved all non-stack memory to the Memory64 List. This list doesn't incur us any meaningful overhead and Greg originally suggested doing this in the original 64b PR. This also means we're exercising the 64b path every single time we save a Minidump, preventing regressions on this feature from slipping through testing in the future. Snippet produced by [minidump.py](https://github.com/clayborg/scripts) ``` MINIDUMP_MEMORY_LIST: NumberOfMemoryRanges = 0x00000002 MemoryRanges[0] = [0x00007f61085ff9f0 - 0x00007f6108601000) @ 0x0003f655 MemoryRanges[1] = [0x00007ffe47e50910 - 0x00007ffe47e52000) @ 0x00040c65 MINIDUMP_MEMORY64_LIST: NumberOfMemoryRanges = 0x000000000000002e BaseRva = 0x0000000000042669 MemoryRanges[0] = [0x00005584162d8000 - 0x00005584162d9000) MemoryRanges[1] = [0x00005584162d9000 - 0x00005584162db000) MemoryRanges[2] = [0x00005584162db000 - 0x00005584162dd000) MemoryRanges[3] = [0x00005584162dd000 - 0x00005584162ff000) MemoryRanges[4] = [0x00007f6100000000 - 0x00007f6100021000) MemoryRanges[5] = [0x00007f6108800000 - 0x00007f6108828000) MemoryRanges[6] = [0x00007f6108828000 - 0x00007f610899d000) MemoryRanges[7] = [0x00007f610899d000 - 0x00007f61089f9000) MemoryRanges[8] = [0x00007f61089f9000 - 0x00007f6108a08000) MemoryRanges[9] = [0x00007f6108bf5000 - 0x00007f6108bf7000) ``` ### Misc As a part of this fix I had to look at LLDB logs a lot, you'll notice I added `0x` to many of the PRIx64 `LLDB_LOGF`. This is so the user (or I) can directly copy paste the address in the logs instead of adding the hex prefix themselves. Added some SBSaveCore tests for the new GetMemoryAPI, and Docstrings. CC: @DavidSpickett, @da-viper @labath because we've been working together on save-core plugins, review it optional and I didn't tag you but figured you'd want to know	2025-07-18 13:05:15 -07:00
Joseph Huber	de59e7b86c	[libc] Fix GPU benchmarking	2025-07-18 14:36:23 -05:00
Stanislav Mekhanoshin	cfa918bec1	[AMDGPU] Select flat GVS atomics on gfx1250 (#149554 )	2025-07-18 12:31:29 -07:00
Roland McGrath	13f7786f72	[libc] Remove trivial .h.def files (#149466 ) Remove all the .h.def files that already express nothing whatsoever not already expressed in YAML. Clean up a few YAML files without materially changing any generated header output. Many more .h.def files remain that need a bit of conversion in YAML to express macro requirements and such.	2025-07-18 11:35:09 -07:00
Krzysztof Parzyszek	6acc6991f8	[STLForwardCompat] Improve category handling in transformOptional (#149539 ) The old version would prefer the "const &" overload over the "&&" one unless the former was not allowed in the given situation. In particular, if the function passed was "[](auto &&)" the argument would be "const &" even if the value passed to transformOptional was an rvalue reference. This version improves the handling of expression categories, and the lambda argument category will reflect the argument category in the above scenario.	2025-07-18 13:34:15 -05:00
Tobias Decking	10b0dee97d	[X86] Ensure that bit reversals of byte vectors are properly lowered on pure GFNI targets (#148304 ) Fixes #148238. When GFNI is present, custom bit reversal lowerings for scalar integers become active. They work by swapping the bytes in the scalar value and then reversing bits in a vector of bytes. However, the custom bit reversal lowering for a vector of bytes is disabled if GFNI is present in isolation, resulting messed up code. --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-07-18 19:14:34 +01:00
Daniel Chen	4bf4e87576	Static_cast std::size_t to build flang_rt in 32-bit. (#149529 )	2025-07-18 14:14:27 -04:00
Philip Reames	f6641e2f23	[RISCV][IA] Factor out code for extracting operands from mem insts [nfc] (#149344 ) We're going to end up repeating the operand extraction four times once all of the routines have been updated to support both plain load/store and vp.load/vp.store. I plan to add masked.load/masked.store in the near future, and we'd need to add that to each of the four cases. Instead, factor out a single copy of the operand normalization.	2025-07-18 11:04:18 -07:00
Peter Collingbourne	b5e71d727b	Add section type to support CFI jump table relaxation. For context see main pull request: #147424. Reviewers: MaskRay Reviewed By: MaskRay Pull Request: https://github.com/llvm/llvm-project/pull/149259	2025-07-18 10:48:42 -07:00
Kazu Hirata	796d5a89a1	[ADT] Use a range-based for loop instead of llvm::for_each (NFC) (#149542 ) LLVM Coding Standards discourages llvm::for_each unless we already have a callable.	2025-07-18 10:43:51 -07:00
Han-Chung Wang	3ea6da59ec	[mlir][linalg] Allow pack consumer fusion if the tile size is greater than dimension size. (#149438 ) This happens only when you use larger tile size, which is greater than or equal to the dimension size. In this case, it is a full slice, so it is fusible. The IR can be generated during the TileAndFuse process. It is hard to fix in such driver, so we enable the naive fusion for the case. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com>	2025-07-18 10:42:42 -07:00
Philip Reames	87c2adbb58	[RISCV][IA] Precommit tests for deinterleaveN of masked.load	2025-07-18 10:39:11 -07:00
Jaden Angella	7fd91bb6e8	[mlir][EmitC]Expand the MemRefToEmitC pass - Adding scalars (#148055 ) This aims to expand the the MemRefToEmitC pass so that it can accept global scalars. From: ``` memref.global "private" constant @__constant_xi32 : memref<i32> = dense<-1> func.func @globals() { memref.get_global @__constant_xi32 : memref<i32> } ``` To: ``` emitc.global static const @__constant_xi32 : i32 = -1 emitc.func @globals() { %0 = get_global @__constant_xi32 : !emitc.lvalue<i32> %1 = apply "&"(%0) : (!emitc.lvalue<i32>) -> !emitc.ptr<i32> return } ```	2025-07-18 10:15:05 -07:00
Alexey Bataev	ff225b5d88	[SLP][NFC]Add a run line for the test, NFC	2025-07-18 10:14:18 -07:00
Shilei Tian	2c50e4cac2	[AMDGPU] Add support for `v_sat_pk4_i4_[i8,u8]` on gfx1250 (#149528 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Foad, Jay <Jay.Foad@amd.com>	2025-07-18 13:08:50 -04:00
Shilei Tian	e11d28faee	[AMDGPU] Add support for `v_permlane16_swap_b32` on gfx1250 (#149518 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-18 13:05:08 -04:00
Muhammad Bassiouni	7e0ae019f8	[libc][math] Refactor exp10f16 implementation to header-only in src/__support/math folder. (#148408 ) Part of #147386 in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450	2025-07-18 20:00:04 +03:00
Mohammadreza Ameri Mahabadian	10518c76de	[mlir][spirv] Add conversion pass to rewrite splat constant composite… (#148910 ) …s to replicated form This adds a new SPIR-V dialect-level conversion pass `ConversionToReplicatedConstantCompositePass`. This pass looks for splat composite `spirv.Constant` or `spirv.SpecConstantComposite` and rewrites them into `spirv.EXT.ConstantCompositeReplicate` or `spirv.EXT.SpecConstantCompositeReplicate`, respectively. --------- Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>	2025-07-18 12:59:39 -04:00
Eugene Epshteyn	2c2567da95	[flang] Fixed a crash with undeclared variable in implicit-do loop (#149513 ) Fixed a crash in the following example: ``` subroutine sub() implicit none print *, (i, i = 1, 2) ! Problem: using undefined var in implied-do loop end subroutine sub ``` The error message was already generated, but the compiler crashed before it could display it.	2025-07-18 12:58:09 -04:00
Brox Chen	5138b61a25	[AMDGPU][True16][Codegen] remove packed build_vector pattern from true16 (#148715 ) Some of the packed build_vector use vgpr_32 for i16/f16/bf16. In gfx11, bf16 arithmetic get promoted to f32 and this is done via v2i16 pack. In true16 mode this v2i16 pack is selected to a build_vector/v_lshlrev pattern which only accepts VGPR32. This causes isel to insert an illegal copy "vgpr32 = copy vgpr16" between def and use. In the end this illegal copy confuses cse pass and trigger wrong code elimination. Remove the packed build_vector pattern from true16. After removal, ISel will use vgpr16 build_vector patterns instead.	2025-07-18 12:55:11 -04:00

1 2 3 4 5 ...

545382 Commits