10s looks not enough. With highly parallel test
execution on VMs it's very possible that Asan
report will have no enough time to produce output.
I can reproduce locally 1s is not always enough,
but likely my workstation is faster then buildbot.
Additionally, don't use puts/CHECK to validate
timeout. We can exit with 0 and it should violate
"not" expectation.
Follow up to #131756.
This shows up in the extension UI and in the VSCode extension
marketplace.
I used the llvm/llvm-www/blob/main/img/LLVMWyvernSmall.png for the
image.
Here is what this looks like locally:
<img width="1020" alt="Screenshot 2025-04-28 at 12 30 33 PM"
src="https://github.com/user-attachments/assets/cd615161-06db-4a37-8f8a-df65ca620a67"
/>
Move out the logic to prepare for vectorization to a separate transform,
before creating loop regions. This was discussed as follow-up
in https://github.com/llvm/llvm-project/pull/136455.
This just moves the existing code around slightly and will simplify
follow-up patches to include the exiting edges during initial VPlan
construction.
The OpenMP implementation of the ATOMIC construct will change in the
near future to accommodate atomic conditional-update and conditional-
update-capture operations. This patch separates the shared implemen-
tations to avoid interfering with OpenACC.
The part that verifies the declare attributes are preserved in the
verifier can fail easily during the FIR lowering pipeline. For example,
during FIR lowering to FIRCG, fir.declare can be removed. Thus, any
fir.declare that has acc.declare attributes will cause a verifier
failure. Since the declare attribute only existed to simplify the effort
of locating acc declare enter and exit points, which can be easily
replaced by a def-use chain traversal, we are considering removing the
verification of declare attributes in this MR.
Example:
```
%1 = fir.alloca !fir.array<10xf32> {bindc_name = "arr", uniq_name = "_QMmmFsubEarr"}
%2 = fir.shape %c10 : (index) -> !fir.shape<1>
%3 = fir.declare %1(%2) {acc.declare = #acc.declare<dataClause = acc_create>, uniq_name = "_QMmmFsubEarr"} : (!fir.ref<!fir.array<10xf32>>, !fir.shape<1>) -> !fir.ref<!fir.array<10xf32>>
%4 = acc.create varPtr(%3 : !fir.ref<!fir.array<10xf32>>) -> !fir.ref<!fir.array<10xf32>> {name = "arr"}
%5 = acc.declare_enter dataOperands(%4 : !fir.ref<!fir.array<10xf32>>)
```
the acc.declare_enter itself is enough to identify when the data region
starts.
The cost model in the past returned -1 for unknown costs, but over time
this has largely been removed. This cleans up some of the uses that have
remained. It uses 0/free for the cost of an insert and 1/basic for the
cost of anything that is unknown.
This patch adds the following Dense Math Facility 16-bit half-precision
floating-point calculation instructions: dmxvf16gerx2, dmxvf16gerx2pp,
dmxvf16gerx2pn, dmxvf16gerx2np, dmxvf16gerx2nn, pmdmxvf16gerx2,
pmdmxvf16gerx2pp, pmdmxvf16gerx2pn, pmdmxvf16gerx2np, pmdmxvf16gerx2nn,
along with their corresponding intrinsics and tests.
While working on #134845 I was trying to understand the difference of
how the reinterpret_cast and subview operators see the input memref, but
it was not clear to me.
I did a couple of experiments in which I learned that the subview takes
into account the view of the input memref to create the view of the
output memref, while the reinterpret_cast just uses the underlying
memory of the input memref.
I thought it would help future readers to see these two experiements as
examples in the documentation to quickly figure out the difference
between these two operators.
@matthias-springer
@joker-eph
@sahas3
@Hanumanth04
@dixinzhou
@rafaelubalmw
---------
Co-authored-by: Ivan Garcia <igarcia@vdi-ah2ddp-178.dhcp.mathworks.com>
This PR introduces a Metadata Node Kind allowlist. The purpose is to
prevent newer Metadata Node Kinds to be used and inserted into the
outputted DXIL module. Only the metadata kinds that are accepted in the
DXIL Validator are on the allowlist. The Github DXC validator doesn't
support these newer Metadata Node Kinds, so we need to filter them out.
We introduce this restrictive allowlist into LLVM and strip all metadata
that isn't found in the list.
The accompanying test would add the `llvm.loop.mustprogress` metadata
node kind, but thanks to the allowlist, filters it out, and so the
whitelist is proven to work.
The test also has two separate metadata kinds that are on the allowlist,
and remain after the DXIL Prepare pass.
This PR makes `CompilerInvocation` the sole owner of the
`AnalyzerOptions` instance. Clients can no longer become co-owners by
doing something like this:
```c++
void shareOwnership(CompilerInvocation &CI) {
IntrusiveRefCntPtr Shared = &CI.getAnalyzerOpts();
}
```
The motivation for this is given here:
https://github.com/llvm/llvm-project/pull/133467#issuecomment-2762065443
Make sure all BUILD.bazel files under llvm-libc are specifying
package-level default_visibility and licenses.
Add top-level license notice and comment to new BUILD.bazel for
complex.h functions.
By calling this function after every link, we introduce quadratic
behavior during full LTO links that primarily affects links involving
large numbers of constant arrays, such as the full LTO part of a
ThinLTO link which will involve large numbers of vtable constants.
Removing this call resulted in a 1.3x speedup in the indexing phase
of a large internal Google binary.
This call was originally introduced to reduce memory consumption during
full LTO (see commit dab999d54f), but it
doesn't seem to be the case that this helps any more. I ran 3 stage2
links of clang with full LTO and ThinLTO with and without this change
and measured the max RSS. The results were as follows (all in KB):
```
FullLTO before 22362512 22383524 22387456
after 22383496 22365356 22364352
ThinLTO before 4391404 4478192 4383468
after 4399220 4363100 4417688
```
As you can see, any max RSS differences are in the noise.
Reviewers: nikic, teresajohnson
Reviewed By: teresajohnson, nikic
Pull Request: https://github.com/llvm/llvm-project/pull/137081
Previously the assert macro took one argument named "e", but this led to
possible errors if the caller had commas in their input. C23 changed the
definition of assert to use `__VA_ARGS__` to ensure comma cases are
handled properly. This patch doesn't introduce the enforcement function
mentioned in the standard update, though that may be done in a followup.
Fixes#136184
Currently, if there is already noalias metadata present on loads and
stores, lower module lds pass is generating a more conservative aliasing
set. This results in inhibiting scheduling intrinsics that would have
otherwise generated a better pipelined instruction.
The fix is not to always intersect already existing noalias metadata
with noalias created for lowering of LDS. But to intersect only if
noalias scopes are from the same domain, otherwise concatenate exising
noalias sets with LDS noalias.
There a few patches that have come for scopedAA in the past. Following
three should be enough background information.
https://reviews.llvm.org/D91576https://reviews.llvm.org/D108315https://reviews.llvm.org/D110049
Essentially, after a pass that might change aliasing info, one should
check if that pass results in change number of MayAlias or ModRef using
the following:
`opt -S -aa-pipeline=basic-aa,scoped-noalias-aa -passes=aa-eval
-evaluate-aa-metadata -print-all-alias-modref-info -disable-output`
This reverts commit 8dd160f476.
The recommit contains an adjustment to planContainsAdditionalSimplifications,
which considers changes to the original predicate for compares.
Original commit message:
Add simplification to fold negation into a compare, if the negation is
the only user of the compare. This removes a number of redundant
negations.
Alive2 Proofs for FPCMP test changes: https://alive2.llvm.org/ce/z/WGDz9U
PR: https://github.com/llvm/llvm-project/pull/129430
Allow `-f[no]-sanitize-address-use-after-scope` to take effect under
kernel-address sanitizer (`-fsanitize=kernel-address`). `use-after-scope` is
now enabled by default under kernel-address sanitizer.
Previously, users may have enabled `use-after-scope` checks for kernel-address
sanitizer via `-mllvm -asan-use-after-scope=true`. While this may have worked
for optimization levels > O0, the required lifetime intrinsics to allow for
`use-after-scope` detection were not emitted under O0. This commit ensures
the required lifetime intrinsics are emitted under O0 with kernel-address
sanitizer.
**Edit:**
I suggest we avoid diagnosing initializers for `std::array` type. The
fixit provided is incorrect as observed in **#133715.** The only
workaround would require C99-style array designators which don’t really
align with the purpose of this check. This would also generate extra
compiler warnings.
Fixes#133715
The debugserver code predates modern C++, but with C++11 and later
there's no need to have something like PThreadMutex. This migrates
PThreadEvent away from PThreadMutex in preparation for removing it.
This patch changes the progress count formatting show thousands
separator making it easier to read big numbers.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
Details:
When we have the following scenario:
- lib_a re-exports lib_b
- lib_b re-exports @rpath/lib_c
+ lib_b contains LC_RPATH
Previously, lld-macho cannot find lib_c because it was attempting to
resolve the '@rpath' from lib_b (which had no LC_RPATH defined). The
change here is to also consider all the LC_RPATH rom lib_b when trying
to find lib_c.
Inspired by real-life example when linking with
libXCTestSwiftSupport.dylib (which re-exports XCTest, which re-exports
XCTestCore)
Current implementation tries to fold the operand before
rematerialization because it can reduce one register usage. But if there
is a physical register available we can still rematerialize it without
causing high register pressure.
This patch do this check to find the better choice. Then we can produce
xorps %xmm1, %xmm1
ucomiss %xmm1, %xmm0
instead of
ucomiss LCPI0_1(%rip), %xmm0
This makes it easier to build a new Docker image in the CI. Since the
new image is not used automatically it's safe to commit these changes
directly to main. Then use a PR to test the new image.
The debugserver code predates modern C++, but with C++11 and later
there's no need to have something like PThreadMutex. This migrates
DNBTimer away from that class in preparation for removing PThreadMutex.
This makes them more consistent with the checks performed by regular loads. We can't simply add IsNonExtLoad to the existing atomic_load_8/16/32/64 as that would affect out of tree targets.
The __spirv_GenericCastToPtrExplicit_To* builtins and its equivalent
OpenCL builtins (to_global, to_local and to_private) were mapped to
OpGenericCastToPtr instead of OpGenericCastToPtrExplicit.
The patch now uses OpGenericCastToPtrExplicit for these builtins.
When the SPV_INTEL_arbitrary_precision_integers extension is allowed to
be used, the backend will unconditionnally add it to the module used
extensions.
The patch prevent SPV_INTEL_arbitrary_precision_integers from being
declared if unneeded.