Commit Graph

546486 Commits

Author SHA1 Message Date
sribee8
741df45bc3 [libc] Reland #149423 "wchar string conversion functions mb to wc" (#150667)
Added missing includes in the test files for null check

---------

Co-authored-by: Sriya Pratipati <sriyap@google.com>
2025-07-28 17:40:10 +00:00
Dave Lee
eba0c57411 [llvm][utils] Add summary formatter for SmallBitVector (#150542)
Originally implemented in https://github.com/swiftlang/swift/pull/29014.

I've made a couple changes:
1. Use the target's address size, not lldb
2. Replaced the loop with a format string
2025-07-28 10:38:51 -07:00
Jacek Caban
ac31d64a64 [LLD][COFF] Avoid resolving symbols with -alternatename if the target is undefined (#149496)
This change fixes an issue with the use of `-alternatename` in the MSVC
CRT on ARM64EC, where both mangled and demangled symbol names are
specified. Without this patch, the demangled name could be resolved to
an anti-dependency alias of the target. Since chaining anti-dependency
aliases is not allowed, this results in an undefined symbol.

The root cause isn't specific to ARM64EC, it can affect other targets as
well, even when anti-dependency aliases aren't involved. The
accompanying test case demonstrates a scenario where the symbol could be
resolved from an archive. However, because the archive member is pulled
in after the first pass of alternate name resolution, and archive
members don't override weak aliases, eager resolution would incorrectly
skip it.
2025-07-28 19:26:25 +02:00
Han-Chung Wang
3f3fac8478 [mlir][linalg] Enable pack consumer fusion for all perfect tiling cases. (#150672)
It was disabled because there may be artificial padding. After [refining the pack op semantics](773e158c64),
we can assume that there is no artificial padding. Thus, the check can
be removed, and we can unconditionally enable the consumer fusion if it
is a perfect tiling case.

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-07-28 10:23:54 -07:00
Jasmine Tang
522ac23609 [WebAssembly] Add pattern for relaxed nmadd (#150684)
Following footstep of https://github.com/llvm/llvm-project/pull/147487
(support for madd), this PR adds support for nmadd.

https://github.com/llvm/llvm-project/issues/55932 tracks this
2025-07-28 10:20:04 -07:00
Joseph Huber
2368be38a1 [HIP] Always respect --gpu-bundle-output in the new driver (#150989)
Summary:
This is a bit of an awkward transition point for the new and old
drivers. Previously AMDGPU uses this to generate offloading bundles, but
the new driver much prefers to output the file itself. This patch
changes the behavior to always respect `--gpu-bundle-output` instead of
having it be the default behavior. This means that we effectively get to
override the default new driver behavior with this flag now. This should
hoepfully fix some errors in the downstream comgr tests.
2025-07-28 12:04:49 -05:00
satyanarayana reddy janga
c03b0dd9f4 Add MTIA and META to triple (#150236)
Ref:
https://ai.meta.com/blog/next-generation-meta-training-inference-accelerator-AI-MTIA/
This PR contains 
1. MTIA: Meta Training and Inference Accelerator as Environment.
2. Meta as the vendor.


### Testing 
Added a unittest for the relevant changes

### Reviewers
@clayborg , @jeffreytan81 , @Jlalond
2025-07-28 10:03:20 -07:00
Alexander Richardson
fc2850fc76 [IR2VecTest] Avoid magic constants
Instead make the members of Vocabulary public. This was causing test
failures with https://github.com/llvm/llvm-project/pull/139357.

Reviewed By: svkeerthy, boomanaiden154

Pull Request: https://github.com/llvm/llvm-project/pull/150878
2025-07-28 09:50:51 -07:00
Joseph Huber
4f58c829fd [Clang] Search for 'offload-arch' only next to the clang driver (#150965)
Summary:
Previously, querying for the offload architecture tool would invoke the
user's PATH, which is bad when potentially using the driver from a
direct path. This patch change this to *only* consider the
`offload-arch` that's supposed to live next to the driver executable.
Now we will no longer pick up a potentially conflicting version of this
tool and it should always be found (Since it's a clang tool that's
installazed alongside the driver)
2025-07-28 11:36:31 -05:00
Davide Grohmann
0121a8e431 Reland "[mlir][spirv] Fix int type declaration duplication when serializing" (#145687)
This relands PRs #143108 and #144538.

The original PR was reverted due to a mistake that made all the mlir
tests run only if SPIRV target was enabled. This is now resolved since
enabling spirv-tools does not required SPIRV target any longer.

spirv-tools are not required by default to run SPIRV mlir tests, but
they can be optionally enabled in some SPIRV mlir test to verify that
the produced SPIRV assembly pass validation.

The other reverted PR #144685 is not longer needed and not part of this
relanding.

Original commit message:

> At the MLIR level unsigned integer and signless integers are different
types. Indeed when looking up the two types in type definition cache
they do not match.
> Hence when translating a SPIR-V module which contains both usign and
signless integers will contain the same type declaration twice
(something like OpTypeInt 32 0) which is not permitted in SPIR-V and
such generated modules fail validation.
> This patch solves the problem by mapping unisgned integer types to
singless integer types before looking up in the type definition cache.

---------

Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>
2025-07-28 12:34:30 -04:00
Han-Chung Wang
496d31c8a9 Reapply "[mlir][linalg] Restrict linalg.pack to not have artificial padding." (#150675) (#150680)
This reverts commit
0844812b2e
with a shape fix in
1db4c6b275

The revision restrict the `linalg.pack` op to not have artificial
padding semantics. E.g., the below is valid without the change, and it
becomes invalid with the change.

```mlir
func.func @foo(%src: tensor<9xf32>) -> tensor<100x8xf32> {
  %cst = arith.constant 0.000000e+00 : f32
  %dest = tensor.empty() : tensor<100x8xf32>
  %pack = linalg.pack %src
    padding_value(%cst : f32)
    inner_dims_pos = [0]
    inner_tiles = [8] into %dest
    : tensor<9xf32> -> tensor<100x8xf32>
  return %pack : tensor<100x8xf32>
}
```

IMO, it is a misuse if we use pack ops with artificial padding sizes
because the intention of the pack op is to relayout the source based on
target intrinsics, etc. The output shape is expected to be
`tensor<2x8xf32>`. If people need extra padding sizes, they can create a
new pad op followed by the pack op.

This also makes consumer tiling much easier because the consumer fusion
does not support artificial padding sizes. It is very hard to make it
work without using ad-hoc patterns because the tiling sizes are about
source, which implies that you don't have a core_id/thread_id to write
padding values to the whole tile.

People may have a question how why pad tiling implementation works. The
answer is that it creates an `if-else` branch to handle the case. In my
experience, it is very struggle in transformation because most of the
time people only need one side of the branch given that the tile sizes
are usually greater than padding sizes. However, the implementation is
conservatively correct in terms of semantics. Given that the
introduction of `pack` op is to serve the relayout needs better, having
the restriction makes sense to me.

Removed tests:
-
`no_bubble_up_pack_extending_dimension_through_expand_cannot_reassociate`
from `data-layout-propagation.mlir`: it is a dup test to
`bubble_up_pack_non_expanded_dims_through_expand` after we fix the
shape.
- `fuse_pack_consumer_with_untiled_extra_padding` from
`tile-and-fuse-consumer.mlir`: it was created for artificial padding in
the consumer fusion implementation.

The other changes in lit tests are just fixing the shape.

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-07-28 09:29:15 -07:00
Tomer Shafir
5f20518f5b [Clang][Docs] Fix typo in clang.rst (#150907) 2025-07-29 00:13:46 +08:00
Mehdi Amini
9c82f87aec Introduce a "log level" support for DEBUG_TYPE (#150855)
This allows to set an optional integer level for a given debug type. The
string format is `type[:level]`, and the integer is interpreted as such:

- if not provided: all debugging for this debug type is enabled.
- if >0: all debug that is < to the level is enabled.
- if 0: same as for >0 but also does not disable the other debug-types,
it acts as a negative filter.

The LDBG() macro is updated to accept an optional log level to
illustrate the feature. Here is the expected behavior:

LDBG() << "A"; // Identical to LDBG(1) << "A";
LDBG(2) << "B";

With `--debug-only=some_type`: we'll see A and B in the output.  
With `--debug-only=some_type:1`: we'll see A but not B in the output. 
With `--debug-only=some_type:2`: we'll see A and B in the output. (same
with any level above 2)
With `--debug-only=some_type:0`: we'll see neither A nor B in the
output, but we'll see any other logging for other debug types.
2025-07-28 18:10:36 +02:00
Joseph Huber
b2322772f2 [libc] Reduce reference counter to a 32-bit integer (#150961)
Summary:
This reference counter tracks how many threads are using a given slab.
Currently it's a 64-bit integer, this patch reduces it to a 32-bit
integer. The benefit of this is that we save a few registers now that we
no longer need to use two for these operations. This increases the risk
of overflow, but given that the largest value we accept for a single
slab is ~131,000 it is a long way off of the maximum of four billion or
so. Obviously we can oversubscribe the reference count by having threads
attempt to claim the lock and then try to free it, but I assert that it
is exceedingly unlikely that we will somehow have over four billion GPU
threads stalled in the same place.

A later optimization could be done to split the reference counter and
pointers into a struct of arrays, that will save 128 KiB of static
memory (as we currently use 512 KiB for the slab array).
2025-07-28 11:05:36 -05:00
enh-google
701de35f67 [libc] Stop duplicating wcschr(). (#150661)
Three implementations of wcschr() is two too many.
2025-07-28 12:05:19 -04:00
Baghirov Feyruz
f0c90dfcd8 Rename 'free' in warning messages to 'release' (#150935)
Changed the warning message:

- **From**: 'Attempt to free released memory'
   **To**: 'Attempt to release already released memory'
- **From**: 'Attempt to free non-owned memory'
   **To**: 'Attempt to release non-owned memory'
- **From**: 'Use of memory after it is freed' 
   **To**: 'Use of memory after it is released'

All connected tests and their expectations have been changed
accordingly.

Inspired by [this
PR](https://github.com/llvm/llvm-project/pull/147542#discussion_r2195197922)
2025-07-28 18:02:56 +02:00
Aaron Ballman
837b2d464f [[gnu::nonstring]] should work on pointers too (#150974)
Clang's current implementation only works on array types, but GCC (which
is where we got this attribute) supports it on pointers as well as
arrays.

Fixes #150951
2025-07-28 11:53:33 -04:00
Krishna Pandey
6a45697fa6 [CI] Downgrade to clang-20 for libc fullbuild (#150246)
To be reverted when llvm-21 issues are resolved with the precommit CIs.
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
2025-07-28 11:49:19 -04:00
Evgenii Kudriashov
75b79c9238 [LLD][X86] Match delayLoad thunk with MSVC (#149521)
Previously we saved registers in the shadow space of callee before
calling __delayLoadHelper2. Now we save arguments in the shadow space of
the caller and allocate shadow space for the callee.

Fixes #51941

---------

Co-authored-by: Benjamin Santerre <benjamin.santerre@gmail.com>
2025-07-28 17:45:16 +02:00
David Spickett
0209e76fe6 [lldb][AArch64][Linux] Show MTE store only setting in mte_ctrl (#145033)
This controls whether tag checking is performed for loads and 
stores, or stores only.

It requires a specific architecture feature which we detect
with a HWCAP3 and cpuinfo feature.

Live process tests look for this and adjust expectations
accordingly, core file tests are using an updated file with
this feature enabled.

The size of the core file has increased and there's nothing
I can do about that. Could be the presence of new architecure
features or kernel changes since I last generated them.

I can generate a smaller file that has the tag segment,
but that segment does not actually contain tag data. So
that's no use.
2025-07-28 16:40:00 +01:00
Ellis Hoag
819f020b28 Use F.hasOptSize() instead of checking optsize directly (#147348) 2025-07-28 08:38:52 -07:00
Matt Arsenault
a496a985d9 AMDGPU: Remove -stress-regalloc arguments from mfma selection tests (#150890)
I'm not really sure what the point of these was, but they originated
in the base support commit for gfx942 mfma support. These don't impact
the selection at all, so don't belong in this test. These were causing
allocation failure depending on whether or not the AGPR or VGPR form
was used.
2025-07-29 00:30:01 +09:00
Matt Arsenault
6fb8e58565 AMDGPU: Disable AGPR allocation in VGPR MFMA tests (#150873) 2025-07-29 00:26:24 +09:00
Florian Hahn
f9f68af4b8 [SCEV] Make sure LCSSA is preserved when re-using phi if needed.
If we insert a new add instruction, it may introduce a new use outside
the loop that contains the phi node we re-use. Use fixupLCSSAFormFor to
fix LCSSA form, if needed.

This fixes a crash reported in
https://github.com/llvm/llvm-project/pull/147824#issuecomment-3124670997.
2025-07-28 16:24:46 +01:00
Juan Besa
4d259de2ae [clang-tidy] Add IgnoreAliasing option to readability-qualified-auto check (#147060)
`readability-qualified-auto` check currently looks at the unsugared
type, skipping any typedefs, to determine if the variable is a
pointer-type. This may not be the desired behaviour, in particular when
the type depends on compilation flags.
For example

```
 #if CONDITION
      using Handler = int *;
  #else
      using Handler = uint64_t;
  #endif
```

A more common example is some implementations of `std::array` use
pointers as iterators.

This introduces the IgnoreAliasing option so that
`readability-qualified-auto` does not look beyond typedefs.

---------

Co-authored-by: juanbesa <juanbesa@devvm33299.lla0.facebook.com>
Co-authored-by: Kazu Hirata <kazu@google.com>
Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>
Co-authored-by: Baranov Victor <bar.victor.2002@gmail.com>
2025-07-28 18:20:02 +03:00
Luke Lau
5f2092dae3 [RISCV][LV] Update f16/bf16 loop vectorizer tests. NFC
This fixes a failing test after the changes in #150908 affected the
result in #150882.
2025-07-28 23:19:03 +08:00
Muhammad Bassiouni
5bcbcf8d53 [libc][math] Refactor asinhf implementation to header-only in src/__support/math folder. (#150843)
Part of #147386

in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450
2025-07-28 18:14:47 +03:00
Orlando Cazalet-Hyams
fbf6271c7d Reapply (2) [BranchFolding] Kill common hoisted debug instructions (#149999)
Reapply #140091.

branch-folder hoists common instructions from TBB and FBB into their
pred. Without this patch it achieves this by splicing the instructions from TBB
and deleting the common ones in FBB. That moves the debug locations and debug
instructions from TBB into the pred without modification, which is not
ideal. Debug locations are handled in #140063.

This patch handles debug instructions - in the simplest way possible, which is
to just kill (undef) them. We kill and hoist the ones in FBB as well as TBB
because otherwise the fact there's an assignment on the code path is deleted
(which might lead to a prior location extending further than it should).

There's possibly something we could do to preserve some variable locations in
some cases, but this is the easiest not-incorrect thing to do.

Note I had to replace the constant DBG_VALUEs to use registers in the test- it
turns out setDebugValueUndef doesn't undef constant DBG_VALUEs... which feels
wrong to me, but isn't something I want to touch right now.

---

Fix end-iterator-dereference and add test.
2025-07-28 16:13:35 +01:00
David Spickett
d26ca8b872 [lldb][AArch64] Add HWCAP3 to register field detection (#145029)
This will be used to detect the presence of Arm's new Memory Tagging
store only checking feature. This commit just adds the plumbing to get
that value into the detection function.

FreeBSD has not allocated a number for HWCAP3 and already has AT_ARGV
defined as 29. So instead of attempting to read from FreeBSD processes,
I've explicitly passed 0. We don't want to be reading some other entry
accidentally.

If/when FreeBSD adds HWCAP3 we can handle it like we do for
AUXV_FREEBSD_AT_HWCAP.

No extra tests here, those will be coming with the next change for MTE
support.
2025-07-28 16:09:24 +01:00
David Spickett
0462dfe39f [llvm][docs] Refresh "Restrict Visibility" in Coding Standards (#150914)
No change of meaning, just formatting and an extra example to make it
easier to comprehend:
* Split separate, important points into their own paragraphs.
* Remove a contraction.
* Finally, show to to use "static" on a function. As before we just
showed why namespaces were bad, but not what you should do instead.
2025-07-28 16:04:07 +01:00
Jacek Caban
38cd66a6ce [LLD][COFF] Move resolving alternate names to SymbolTable (NFC) (#149495) 2025-07-28 17:02:49 +02:00
Florian Hahn
8437038984 [LoopIdiom] Add test where LCSSA needs preserving when re-using PHI (NFC) 2025-07-28 16:02:18 +01:00
Jacek Caban
1ab04fc94c [LLD][COFF] Allow symbols with empty chunks to have no associated output section in the PDB writer (#149523)
If a chunk is empty and there are no other non-empty chunks in the same
section, `removeEmptySections()` will remove the entire section. In this
case, use a section index of 0, as the MSVC linker does, instead of
asserting.
2025-07-28 17:01:26 +02:00
Luke Lau
fe4f6c1a58 [RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882)
When vectorizing with predication some loops that were previously
vectorized without zvfhmin/zvfbfmin will no longer be vectorized because
the masked load/store or gather/scatter cost returns illegal.

This is due to a discrepancy where for these costs we check
isLegalElementTypeForRVV but for regular memory accesses we don't.

But for bf16 and f16 vectors we don't actually need the extension
support for loads and stores, so this adds a new function which takes
this into account.

For regular memory accesses we should probably also e.g. return an
invalid cost for i64 elements on zve32x, but it doesn't look like we
have tests for this yet.

We also should probably not be vectorizing these bf16/f16 loops to begin
with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this
is due to the scalar costs being too cheap. I've added tests for this in
a100f63672 to fix in another patch.
2025-07-28 22:59:49 +08:00
Will Froom
4b1d5b8d4f [MLIR] Fix pipelineInitializationKey never being correctly updated (#150948)
Prior to this change `pipelineInitializationKey` would never be updated
so `initialize` would always be called even if the pipeline didn't
change
2025-07-28 15:47:12 +01:00
Dan Blackwell
33cc58f46f [compiler-rt][libFuzzer] Add support for capturing SIGTRAP exits. (#149120)
Swift's FatalError raises a SIGTRAP, which currently causes the fuzzer
to exit without writing out the crashing input.

rdar://142975522
2025-07-28 07:46:48 -07:00
Felix Weiglhofer
a22d010002 opencl: Ensure printf symbol is not mangled. (#150210)
Fixes #122453.
2025-07-28 16:24:54 +02:00
Joseph Huber
a1a610a128 [libc] Increase the number of times we wait on a slab
Summary:
This wait restricts how long we wait on a slab. The only reason this
isn't an infinite loop is to prevent complete deadlocks. However, this
limit was *just* on the cusp of waiting long enough for the allocation
to be done. Just increase this to a sufficiently large value, because
this limit only exists to keep the interface wait-free in the absolute
worst case scheduling scenario. This *MASSIVELY* improved performance
for mixed allocations as we no longer shuffled around creating more than
necessary.
2025-07-28 09:23:29 -05:00
Joseph Huber
a7649007ef [libc] Rework match any use in hot allocate bitfield loop
Summary:
We previously used `match_all` as the shortcut to figure out which
threads were destined for which slots. This lowers to a for-loop, which
even if it often only executes once still causes some slowdown
especially when divergent. Instead we use a single ballot call and then
calculate it.

Here the ballot tells us which lanes are the first in a block, either
the starting index or the barrier for a new 32-bit int. We then use some
bit magic to figure out for each lane ID its closest leader. For the
length we simply use the length calculated by the leader of the
remaining bits to be written. This removes the match any and the
shuffle, which improves the minimum number of cycles this takes by about
5%.
2025-07-28 09:23:29 -05:00
Joseph Huber
9975dfdf80 [libc] Small performance improvements to GPU allocator
Summary:
This slightly increases performance in a few places. First, we
optimistically assume the cached slab has ample space which lets us
avoid the atomic load on the highly contended counter in the case that
it is likely to succeed. Second, we no longer call `match_any` twice as
we can calculate the uniform slabs at the moment we return them.
Thirdly, we always choose a random index on a 32-bit boundary. This
means that in the fast case we fulfil the allocation with a single
`fetch_or`, and in the other case we quickly move to the free bit.

This nets around a 7.75% improvement for the fast path case.
2025-07-28 09:23:29 -05:00
Nikita Popov
166493d692 [FunctionAttrs] Fix function signature mismatch in test (NFC)
There was a return type mismatch, which unintentionally blocked
attribtue inference in this test.
2025-07-28 16:16:42 +02:00
Timm Baeder
904de95e71 [clang][bytecode][NFC] Fix a few clang-tidy complaints (#150940) 2025-07-28 15:57:49 +02:00
Nikita Popov
01d4b8e9a6 [FunctionAttrs] Add additional tests (NFC)
Add test coverage for noalias, and for unknown function calls.
2025-07-28 15:57:31 +02:00
Pierre van Houtryve
a6532c2ada [AMDGPU][gfx12] Clean-up implementation of waits before SCOPE_SYS stores (#150587)
We can do it all in finalizeStore if we ensure it always sees the
stores.
For that, I needed to fix a hidden bug where finalizeStore wouldn't see
all stores
because sometimes the iterator got out-of-sync and didn't point to the
store anymore.

This also removes the waits before volatile LDS stores which never
needed it, that was a bug until now.
2025-07-28 15:38:46 +02:00
Michael Buch
c8a091e1b6 [lldb][NFC] Use IterationAction for ModuleList::ForEach callbacks (#150930) 2025-07-28 14:35:39 +01:00
halbi2
a63bbf2f1e [clang] Diagnose [[nodiscard]] return types in Objective-C++ (#142541)
My solution was to copy-paste getUnusedResultAttr and
hasUnusedResultAttr from CallExpr into ObjCMessageExpr too.

Fixes #141504
2025-07-28 09:26:28 -04:00
Joseph Huber
5dc9937ea9 [libc] Improve starting indices for GPU allocation (#150432)
Summary:
The slots in this allocation scheme are statically allocated. All sizes
share the same array of slots, but are given different starting
locations to space them apart. The previous implementation used a
trivial linear slice. This is inefficient because it provides the more
likely allocations (1-1024 bytes) with just as much space as a highly
unlikely one (1 MiB).

This patch uses a cubic easing function to gradually shrink the gaps.
For example, we used to get around 700 free slots for a 16 byte
allocation, now we get around 2100 before it starts encroaching on the
32 byte allocation space. This could be improved further, but I think
this is sufficient.
2025-07-28 07:54:48 -05:00
Anchu Rajendran S
9d642b0ec8 [flang][MLIR][OpenMP][llvm]Atomic Control Support (#150860) 2025-07-28 05:46:10 -07:00
Florian Hahn
6ccc9e559d [AArch64] Add taildup test with computed gotos.
Add a test case showing missed optimizations from early taildup with
computed gotos for https://github.com/llvm/llvm-project/pull/150911.
2025-07-28 13:26:51 +01:00
Luke Lau
92d09245d6 [VPlan] Fall back to scalar epilogue if possible when EVL isn't legal (#150908)
When enabling predicated vectorization by default on RISC-V, there's a
bunch of performance regressions on llvm-test-suite's LoopInterleaving
microbenchmarks:
https://lnt.lukelau.me/db_default/v4/nts/788?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=791&baseline=730&submit=Update

Most of these regressions stem from the interleave_count pragma, which
causes EVL tail folding interleaving to be unsupported (since we don't
support unrolling with EVL)

Currently if DataWithEVL isn't legal we fall back to DataWithoutLaneMask
as the tail folding style, but this is very slow on RISC-V.

The order of performance roughly is something like:

DataWithEVL > None (scalar-epilogue) > Data[WithoutLaneMask]

So this patch tries to prevent the regressions by falling back to a
scalar epilogue where possible, i.e. the existing vectorization we have
today. Not we may still need to fall back to DataWithoutLaneMask, e.g.
if the trip count is low etc or it's forced by
-prefer-predicate-over-epilogue=predicate-dont-vectorize.
2025-07-28 20:10:36 +08:00