intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-24 08:30:34 +08:00

Author	SHA1	Message	Date
sribee8	741df45bc3	[libc] Reland #149423 "wchar string conversion functions mb to wc" (#150667 ) Added missing includes in the test files for null check --------- Co-authored-by: Sriya Pratipati <sriyap@google.com>	2025-07-28 17:40:10 +00:00
Dave Lee	eba0c57411	[llvm][utils] Add summary formatter for SmallBitVector (#150542 ) Originally implemented in https://github.com/swiftlang/swift/pull/29014. I've made a couple changes: 1. Use the target's address size, not lldb 2. Replaced the loop with a format string	2025-07-28 10:38:51 -07:00
Jacek Caban	ac31d64a64	[LLD][COFF] Avoid resolving symbols with -alternatename if the target is undefined (#149496 ) This change fixes an issue with the use of `-alternatename` in the MSVC CRT on ARM64EC, where both mangled and demangled symbol names are specified. Without this patch, the demangled name could be resolved to an anti-dependency alias of the target. Since chaining anti-dependency aliases is not allowed, this results in an undefined symbol. The root cause isn't specific to ARM64EC, it can affect other targets as well, even when anti-dependency aliases aren't involved. The accompanying test case demonstrates a scenario where the symbol could be resolved from an archive. However, because the archive member is pulled in after the first pass of alternate name resolution, and archive members don't override weak aliases, eager resolution would incorrectly skip it.	2025-07-28 19:26:25 +02:00
Han-Chung Wang	3f3fac8478	[mlir][linalg] Enable pack consumer fusion for all perfect tiling cases. (#150672 ) It was disabled because there may be artificial padding. After [refining the pack op semantics](`773e158c64`), we can assume that there is no artificial padding. Thus, the check can be removed, and we can unconditionally enable the consumer fusion if it is a perfect tiling case. Signed-off-by: hanhanW <hanhan0912@gmail.com>	2025-07-28 10:23:54 -07:00
Jasmine Tang	522ac23609	[WebAssembly] Add pattern for relaxed nmadd (#150684 ) Following footstep of https://github.com/llvm/llvm-project/pull/147487 (support for madd), this PR adds support for nmadd. https://github.com/llvm/llvm-project/issues/55932 tracks this	2025-07-28 10:20:04 -07:00
Joseph Huber	2368be38a1	[HIP] Always respect `--gpu-bundle-output` in the new driver (#150989 ) Summary: This is a bit of an awkward transition point for the new and old drivers. Previously AMDGPU uses this to generate offloading bundles, but the new driver much prefers to output the file itself. This patch changes the behavior to always respect `--gpu-bundle-output` instead of having it be the default behavior. This means that we effectively get to override the default new driver behavior with this flag now. This should hoepfully fix some errors in the downstream comgr tests.	2025-07-28 12:04:49 -05:00
satyanarayana reddy janga	c03b0dd9f4	Add MTIA and META to triple (#150236 ) Ref: https://ai.meta.com/blog/next-generation-meta-training-inference-accelerator-AI-MTIA/ This PR contains 1. MTIA: Meta Training and Inference Accelerator as Environment. 2. Meta as the vendor. ### Testing Added a unittest for the relevant changes ### Reviewers @clayborg , @jeffreytan81 , @Jlalond	2025-07-28 10:03:20 -07:00
Alexander Richardson	fc2850fc76	[IR2VecTest] Avoid magic constants Instead make the members of Vocabulary public. This was causing test failures with https://github.com/llvm/llvm-project/pull/139357. Reviewed By: svkeerthy, boomanaiden154 Pull Request: https://github.com/llvm/llvm-project/pull/150878	2025-07-28 09:50:51 -07:00
Joseph Huber	4f58c829fd	[Clang] Search for 'offload-arch' only next to the clang driver (#150965 ) Summary: Previously, querying for the offload architecture tool would invoke the user's PATH, which is bad when potentially using the driver from a direct path. This patch change this to only consider the `offload-arch` that's supposed to live next to the driver executable. Now we will no longer pick up a potentially conflicting version of this tool and it should always be found (Since it's a clang tool that's installazed alongside the driver)	2025-07-28 11:36:31 -05:00
Davide Grohmann	0121a8e431	Reland "[mlir][spirv] Fix int type declaration duplication when serializing" (#145687 ) This relands PRs #143108 and #144538. The original PR was reverted due to a mistake that made all the mlir tests run only if SPIRV target was enabled. This is now resolved since enabling spirv-tools does not required SPIRV target any longer. spirv-tools are not required by default to run SPIRV mlir tests, but they can be optionally enabled in some SPIRV mlir test to verify that the produced SPIRV assembly pass validation. The other reverted PR #144685 is not longer needed and not part of this relanding. Original commit message: > At the MLIR level unsigned integer and signless integers are different types. Indeed when looking up the two types in type definition cache they do not match. > Hence when translating a SPIR-V module which contains both usign and signless integers will contain the same type declaration twice (something like OpTypeInt 32 0) which is not permitted in SPIR-V and such generated modules fail validation. > This patch solves the problem by mapping unisgned integer types to singless integer types before looking up in the type definition cache. --------- Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>	2025-07-28 12:34:30 -04:00
Han-Chung Wang	496d31c8a9	Reapply "[mlir][linalg] Restrict linalg.pack to not have artificial padding." (#150675 ) (#150680 ) This reverts commit `0844812b2e` with a shape fix in `1db4c6b275` The revision restrict the `linalg.pack` op to not have artificial padding semantics. E.g., the below is valid without the change, and it becomes invalid with the change. ```mlir func.func @foo(%src: tensor<9xf32>) -> tensor<100x8xf32> { %cst = arith.constant 0.000000e+00 : f32 %dest = tensor.empty() : tensor<100x8xf32> %pack = linalg.pack %src padding_value(%cst : f32) inner_dims_pos = [0] inner_tiles = [8] into %dest : tensor<9xf32> -> tensor<100x8xf32> return %pack : tensor<100x8xf32> } ``` IMO, it is a misuse if we use pack ops with artificial padding sizes because the intention of the pack op is to relayout the source based on target intrinsics, etc. The output shape is expected to be `tensor<2x8xf32>`. If people need extra padding sizes, they can create a new pad op followed by the pack op. This also makes consumer tiling much easier because the consumer fusion does not support artificial padding sizes. It is very hard to make it work without using ad-hoc patterns because the tiling sizes are about source, which implies that you don't have a core_id/thread_id to write padding values to the whole tile. People may have a question how why pad tiling implementation works. The answer is that it creates an `if-else` branch to handle the case. In my experience, it is very struggle in transformation because most of the time people only need one side of the branch given that the tile sizes are usually greater than padding sizes. However, the implementation is conservatively correct in terms of semantics. Given that the introduction of `pack` op is to serve the relayout needs better, having the restriction makes sense to me. Removed tests: - `no_bubble_up_pack_extending_dimension_through_expand_cannot_reassociate` from `data-layout-propagation.mlir`: it is a dup test to `bubble_up_pack_non_expanded_dims_through_expand` after we fix the shape. - `fuse_pack_consumer_with_untiled_extra_padding` from `tile-and-fuse-consumer.mlir`: it was created for artificial padding in the consumer fusion implementation. The other changes in lit tests are just fixing the shape. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com>	2025-07-28 09:29:15 -07:00
Tomer Shafir	5f20518f5b	[Clang][Docs] Fix typo in clang.rst (#150907 )	2025-07-29 00:13:46 +08:00
Mehdi Amini	9c82f87aec	Introduce a "log level" support for DEBUG_TYPE (#150855 ) This allows to set an optional integer level for a given debug type. The string format is `type[:level]`, and the integer is interpreted as such: - if not provided: all debugging for this debug type is enabled. - if >0: all debug that is < to the level is enabled. - if 0: same as for >0 but also does not disable the other debug-types, it acts as a negative filter. The LDBG() macro is updated to accept an optional log level to illustrate the feature. Here is the expected behavior: LDBG() << "A"; // Identical to LDBG(1) << "A"; LDBG(2) << "B"; With `--debug-only=some_type`: we'll see A and B in the output. With `--debug-only=some_type:1`: we'll see A but not B in the output. With `--debug-only=some_type:2`: we'll see A and B in the output. (same with any level above 2) With `--debug-only=some_type:0`: we'll see neither A nor B in the output, but we'll see any other logging for other debug types.	2025-07-28 18:10:36 +02:00
Joseph Huber	b2322772f2	[libc] Reduce reference counter to a 32-bit integer (#150961 ) Summary: This reference counter tracks how many threads are using a given slab. Currently it's a 64-bit integer, this patch reduces it to a 32-bit integer. The benefit of this is that we save a few registers now that we no longer need to use two for these operations. This increases the risk of overflow, but given that the largest value we accept for a single slab is ~131,000 it is a long way off of the maximum of four billion or so. Obviously we can oversubscribe the reference count by having threads attempt to claim the lock and then try to free it, but I assert that it is exceedingly unlikely that we will somehow have over four billion GPU threads stalled in the same place. A later optimization could be done to split the reference counter and pointers into a struct of arrays, that will save 128 KiB of static memory (as we currently use 512 KiB for the slab array).	2025-07-28 11:05:36 -05:00
enh-google	701de35f67	[libc] Stop duplicating wcschr(). (#150661 ) Three implementations of wcschr() is two too many.	2025-07-28 12:05:19 -04:00
Baghirov Feyruz	f0c90dfcd8	Rename 'free' in warning messages to 'release' (#150935 ) Changed the warning message: - From: 'Attempt to free released memory' To: 'Attempt to release already released memory' - From: 'Attempt to free non-owned memory' To: 'Attempt to release non-owned memory' - From: 'Use of memory after it is freed' To: 'Use of memory after it is released' All connected tests and their expectations have been changed accordingly. Inspired by [this PR](https://github.com/llvm/llvm-project/pull/147542#discussion_r2195197922)	2025-07-28 18:02:56 +02:00
Aaron Ballman	837b2d464f	[[gnu::nonstring]] should work on pointers too (#150974 ) Clang's current implementation only works on array types, but GCC (which is where we got this attribute) supports it on pointers as well as arrays. Fixes #150951	2025-07-28 11:53:33 -04:00
Krishna Pandey	6a45697fa6	[CI] Downgrade to clang-20 for libc fullbuild (#150246 ) To be reverted when llvm-21 issues are resolved with the precommit CIs. Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>	2025-07-28 11:49:19 -04:00
Evgenii Kudriashov	75b79c9238	[LLD][X86] Match delayLoad thunk with MSVC (#149521 ) Previously we saved registers in the shadow space of callee before calling __delayLoadHelper2. Now we save arguments in the shadow space of the caller and allocate shadow space for the callee. Fixes #51941 --------- Co-authored-by: Benjamin Santerre <benjamin.santerre@gmail.com>	2025-07-28 17:45:16 +02:00
David Spickett	0209e76fe6	[lldb][AArch64][Linux] Show MTE store only setting in mte_ctrl (#145033 ) This controls whether tag checking is performed for loads and stores, or stores only. It requires a specific architecture feature which we detect with a HWCAP3 and cpuinfo feature. Live process tests look for this and adjust expectations accordingly, core file tests are using an updated file with this feature enabled. The size of the core file has increased and there's nothing I can do about that. Could be the presence of new architecure features or kernel changes since I last generated them. I can generate a smaller file that has the tag segment, but that segment does not actually contain tag data. So that's no use.	2025-07-28 16:40:00 +01:00
Ellis Hoag	819f020b28	Use F.hasOptSize() instead of checking optsize directly (#147348 )	2025-07-28 08:38:52 -07:00
Matt Arsenault	a496a985d9	AMDGPU: Remove -stress-regalloc arguments from mfma selection tests (#150890 ) I'm not really sure what the point of these was, but they originated in the base support commit for gfx942 mfma support. These don't impact the selection at all, so don't belong in this test. These were causing allocation failure depending on whether or not the AGPR or VGPR form was used.	2025-07-29 00:30:01 +09:00
Matt Arsenault	6fb8e58565	AMDGPU: Disable AGPR allocation in VGPR MFMA tests (#150873 )	2025-07-29 00:26:24 +09:00
Florian Hahn	f9f68af4b8	[SCEV] Make sure LCSSA is preserved when re-using phi if needed. If we insert a new add instruction, it may introduce a new use outside the loop that contains the phi node we re-use. Use fixupLCSSAFormFor to fix LCSSA form, if needed. This fixes a crash reported in https://github.com/llvm/llvm-project/pull/147824#issuecomment-3124670997.	2025-07-28 16:24:46 +01:00
Juan Besa	4d259de2ae	[clang-tidy] Add `IgnoreAliasing` option to `readability-qualified-auto check` (#147060 ) `readability-qualified-auto` check currently looks at the unsugared type, skipping any typedefs, to determine if the variable is a pointer-type. This may not be the desired behaviour, in particular when the type depends on compilation flags. For example ``` #if CONDITION using Handler = int *; #else using Handler = uint64_t; #endif ``` A more common example is some implementations of `std::array` use pointers as iterators. This introduces the IgnoreAliasing option so that `readability-qualified-auto` does not look beyond typedefs. --------- Co-authored-by: juanbesa <juanbesa@devvm33299.lla0.facebook.com> Co-authored-by: Kazu Hirata <kazu@google.com> Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com> Co-authored-by: Baranov Victor <bar.victor.2002@gmail.com>	2025-07-28 18:20:02 +03:00
Luke Lau	5f2092dae3	[RISCV][LV] Update f16/bf16 loop vectorizer tests. NFC This fixes a failing test after the changes in #150908 affected the result in #150882.	2025-07-28 23:19:03 +08:00
Muhammad Bassiouni	5bcbcf8d53	[libc][math] Refactor asinhf implementation to header-only in src/__support/math folder. (#150843 ) Part of #147386 in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450	2025-07-28 18:14:47 +03:00
Orlando Cazalet-Hyams	fbf6271c7d	Reapply (2) [BranchFolding] Kill common hoisted debug instructions (#149999 ) Reapply #140091. branch-folder hoists common instructions from TBB and FBB into their pred. Without this patch it achieves this by splicing the instructions from TBB and deleting the common ones in FBB. That moves the debug locations and debug instructions from TBB into the pred without modification, which is not ideal. Debug locations are handled in #140063. This patch handles debug instructions - in the simplest way possible, which is to just kill (undef) them. We kill and hoist the ones in FBB as well as TBB because otherwise the fact there's an assignment on the code path is deleted (which might lead to a prior location extending further than it should). There's possibly something we could do to preserve some variable locations in some cases, but this is the easiest not-incorrect thing to do. Note I had to replace the constant DBG_VALUEs to use registers in the test- it turns out setDebugValueUndef doesn't undef constant DBG_VALUEs... which feels wrong to me, but isn't something I want to touch right now. --- Fix end-iterator-dereference and add test.	2025-07-28 16:13:35 +01:00
David Spickett	d26ca8b872	[lldb][AArch64] Add HWCAP3 to register field detection (#145029 ) This will be used to detect the presence of Arm's new Memory Tagging store only checking feature. This commit just adds the plumbing to get that value into the detection function. FreeBSD has not allocated a number for HWCAP3 and already has AT_ARGV defined as 29. So instead of attempting to read from FreeBSD processes, I've explicitly passed 0. We don't want to be reading some other entry accidentally. If/when FreeBSD adds HWCAP3 we can handle it like we do for AUXV_FREEBSD_AT_HWCAP. No extra tests here, those will be coming with the next change for MTE support.	2025-07-28 16:09:24 +01:00
David Spickett	0462dfe39f	[llvm][docs] Refresh "Restrict Visibility" in Coding Standards (#150914 ) No change of meaning, just formatting and an extra example to make it easier to comprehend: * Split separate, important points into their own paragraphs. * Remove a contraction. * Finally, show to to use "static" on a function. As before we just showed why namespaces were bad, but not what you should do instead.	2025-07-28 16:04:07 +01:00
Jacek Caban	38cd66a6ce	[LLD][COFF] Move resolving alternate names to SymbolTable (NFC) (#149495 )	2025-07-28 17:02:49 +02:00
Florian Hahn	8437038984	[LoopIdiom] Add test where LCSSA needs preserving when re-using PHI (NFC)	2025-07-28 16:02:18 +01:00
Jacek Caban	1ab04fc94c	[LLD][COFF] Allow symbols with empty chunks to have no associated output section in the PDB writer (#149523 ) If a chunk is empty and there are no other non-empty chunks in the same section, `removeEmptySections()` will remove the entire section. In this case, use a section index of 0, as the MSVC linker does, instead of asserting.	2025-07-28 17:01:26 +02:00
Luke Lau	fe4f6c1a58	[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882 ) When vectorizing with predication some loops that were previously vectorized without zvfhmin/zvfbfmin will no longer be vectorized because the masked load/store or gather/scatter cost returns illegal. This is due to a discrepancy where for these costs we check isLegalElementTypeForRVV but for regular memory accesses we don't. But for bf16 and f16 vectors we don't actually need the extension support for loads and stores, so this adds a new function which takes this into account. For regular memory accesses we should probably also e.g. return an invalid cost for i64 elements on zve32x, but it doesn't look like we have tests for this yet. We also should probably not be vectorizing these bf16/f16 loops to begin with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this is due to the scalar costs being too cheap. I've added tests for this in `a100f63672` to fix in another patch.	2025-07-28 22:59:49 +08:00
Will Froom	4b1d5b8d4f	[MLIR] Fix pipelineInitializationKey never being correctly updated (#150948 ) Prior to this change `pipelineInitializationKey` would never be updated so `initialize` would always be called even if the pipeline didn't change	2025-07-28 15:47:12 +01:00
Dan Blackwell	33cc58f46f	[compiler-rt][libFuzzer] Add support for capturing SIGTRAP exits. (#149120 ) Swift's FatalError raises a SIGTRAP, which currently causes the fuzzer to exit without writing out the crashing input. rdar://142975522	2025-07-28 07:46:48 -07:00
Felix Weiglhofer	a22d010002	opencl: Ensure printf symbol is not mangled. (#150210 ) Fixes #122453.	2025-07-28 16:24:54 +02:00
Joseph Huber	a1a610a128	[libc] Increase the number of times we wait on a slab Summary: This wait restricts how long we wait on a slab. The only reason this isn't an infinite loop is to prevent complete deadlocks. However, this limit was just on the cusp of waiting long enough for the allocation to be done. Just increase this to a sufficiently large value, because this limit only exists to keep the interface wait-free in the absolute worst case scheduling scenario. This MASSIVELY improved performance for mixed allocations as we no longer shuffled around creating more than necessary.	2025-07-28 09:23:29 -05:00
Joseph Huber	a7649007ef	[libc] Rework match any use in hot allocate bitfield loop Summary: We previously used `match_all` as the shortcut to figure out which threads were destined for which slots. This lowers to a for-loop, which even if it often only executes once still causes some slowdown especially when divergent. Instead we use a single ballot call and then calculate it. Here the ballot tells us which lanes are the first in a block, either the starting index or the barrier for a new 32-bit int. We then use some bit magic to figure out for each lane ID its closest leader. For the length we simply use the length calculated by the leader of the remaining bits to be written. This removes the match any and the shuffle, which improves the minimum number of cycles this takes by about 5%.	2025-07-28 09:23:29 -05:00
Joseph Huber	9975dfdf80	[libc] Small performance improvements to GPU allocator Summary: This slightly increases performance in a few places. First, we optimistically assume the cached slab has ample space which lets us avoid the atomic load on the highly contended counter in the case that it is likely to succeed. Second, we no longer call `match_any` twice as we can calculate the uniform slabs at the moment we return them. Thirdly, we always choose a random index on a 32-bit boundary. This means that in the fast case we fulfil the allocation with a single `fetch_or`, and in the other case we quickly move to the free bit. This nets around a 7.75% improvement for the fast path case.	2025-07-28 09:23:29 -05:00
Nikita Popov	166493d692	[FunctionAttrs] Fix function signature mismatch in test (NFC) There was a return type mismatch, which unintentionally blocked attribtue inference in this test.	2025-07-28 16:16:42 +02:00
Timm Baeder	904de95e71	[clang][bytecode][NFC] Fix a few clang-tidy complaints (#150940 )	2025-07-28 15:57:49 +02:00
Nikita Popov	01d4b8e9a6	[FunctionAttrs] Add additional tests (NFC) Add test coverage for noalias, and for unknown function calls.	2025-07-28 15:57:31 +02:00
Pierre van Houtryve	a6532c2ada	[AMDGPU][gfx12] Clean-up implementation of waits before SCOPE_SYS stores (#150587 ) We can do it all in finalizeStore if we ensure it always sees the stores. For that, I needed to fix a hidden bug where finalizeStore wouldn't see all stores because sometimes the iterator got out-of-sync and didn't point to the store anymore. This also removes the waits before volatile LDS stores which never needed it, that was a bug until now.	2025-07-28 15:38:46 +02:00
Michael Buch	c8a091e1b6	[lldb][NFC] Use IterationAction for ModuleList::ForEach callbacks (#150930 )	2025-07-28 14:35:39 +01:00
halbi2	a63bbf2f1e	[clang] Diagnose [[nodiscard]] return types in Objective-C++ (#142541 ) My solution was to copy-paste getUnusedResultAttr and hasUnusedResultAttr from CallExpr into ObjCMessageExpr too. Fixes #141504	2025-07-28 09:26:28 -04:00
Joseph Huber	5dc9937ea9	[libc] Improve starting indices for GPU allocation (#150432 ) Summary: The slots in this allocation scheme are statically allocated. All sizes share the same array of slots, but are given different starting locations to space them apart. The previous implementation used a trivial linear slice. This is inefficient because it provides the more likely allocations (1-1024 bytes) with just as much space as a highly unlikely one (1 MiB). This patch uses a cubic easing function to gradually shrink the gaps. For example, we used to get around 700 free slots for a 16 byte allocation, now we get around 2100 before it starts encroaching on the 32 byte allocation space. This could be improved further, but I think this is sufficient.	2025-07-28 07:54:48 -05:00
Anchu Rajendran S	9d642b0ec8	[flang][MLIR][OpenMP][llvm]Atomic Control Support (#150860 )	2025-07-28 05:46:10 -07:00
Florian Hahn	6ccc9e559d	[AArch64] Add taildup test with computed gotos. Add a test case showing missed optimizations from early taildup with computed gotos for https://github.com/llvm/llvm-project/pull/150911.	2025-07-28 13:26:51 +01:00
Luke Lau	92d09245d6	[VPlan] Fall back to scalar epilogue if possible when EVL isn't legal (#150908 ) When enabling predicated vectorization by default on RISC-V, there's a bunch of performance regressions on llvm-test-suite's LoopInterleaving microbenchmarks: https://lnt.lukelau.me/db_default/v4/nts/788?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=791&baseline=730&submit=Update Most of these regressions stem from the interleave_count pragma, which causes EVL tail folding interleaving to be unsupported (since we don't support unrolling with EVL) Currently if DataWithEVL isn't legal we fall back to DataWithoutLaneMask as the tail folding style, but this is very slow on RISC-V. The order of performance roughly is something like: DataWithEVL > None (scalar-epilogue) > Data[WithoutLaneMask] So this patch tries to prevent the regressions by falling back to a scalar epilogue where possible, i.e. the existing vectorization we have today. Not we may still need to fall back to DataWithoutLaneMask, e.g. if the trip count is low etc or it's forced by -prefer-predicate-over-epilogue=predicate-dont-vectorize.	2025-07-28 20:10:36 +08:00

1 2 3 4 5 ...

546486 Commits