Extends changes from
[ff687af](ff687af04f).
Fixes https://github.com/llvm/llvm-project/issues/131476.
This patch adds a DAG combine to replace an `AND` of an `ATOMIC_LOAD`
with a full-bit mask (e.g. `0xFF`, `0xFFFF`, etc.) which is generated as
a result of `(zext (atomic_load))`, by a zero-extended load, provided
the atomic operation is monotonic or weaker.
This is a NFC patch.
Added error and promote test for fake16 flow. This includes two part:
1. "*vop1_t16_err-fake16.s" is renamed to "*vop1_fake16_err.s"
2. added missing "fake16-promote.s" and other "*fake16_err.s" files
These tests are about promoting the instruction encoding to 64 bits if
the used registers are not encodable in the 32-bit form.
The `qc.c.mienter` and `qc.c.mienter.nest` instructions, broadly only
save the argument and temporary registers. The exceptions are that they
also save `fp` (`s0`) to construct a frame chain from the signal handler
to the frame below, and they also save `ra`. They are designed this way
so that (if needed) push and pop instructions can be used to save the
callee-saved registers.
This patch implements this optimisation, constructing the following
rather than a long sequence of `sw` and `lw` instructions for saving the
callee-saved registers:
```asm
qc.c.mienter
qc.cm.push {ra, s0-sN}, -M
...
qc.cm.pop {ra, s0-sN}, M
qc.c.mileaveret
```
There are some carefully-worked-out details here, especially around CFI
information. For any register saved by both `qc.c.mienter(.nest)` and
the push (which is `ra` and `s0` at most), we point the CFI information
at the version saved by `qc.c.mienter(.nest)`. This ensures the CFI
points at the same `fp` copy as a frame pointer unwinder would find.
The 'loop' emit for OpenACC is particularly complicated/involved, so it
makes sense to be in its own file. This patch splits it out into its own
file, as well as the clause emitter code (as loop is going to require
that).
This PR hides the reference-counted pointer that holds `TargetOptions`
from the public API of `CompilerInvocation`. This gives
`CompilerInvocation` an exclusive control over the lifetime of this
member, which will eventually be leveraged to implement a copy-on-write
behavior.
There are two clients that currently share ownership of that pointer:
* `TargetInfo` - This was refactored to hold a non-owning reference to
`TargetOptions`. The options object is typically owned by the
`CompilerInvocation` or by the new `CompilerInstance::AuxTargetOpts` for
the auxiliary target. This needed a bit of care in `ASTUnit::Parse()` to
keep the `CompilerInvocation` alive.
* `clangd::PreambleData` - This was refactored to exclusively own the
`TargetOptions` that get moved out of the `CompilerInvocation`.
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.
These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.
Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
The current implementation of the ATOMIC construct handles these clauses
individually, and this change does not have an observable effect. At the
same time these clauses are unique as per the OpenMP spec, and this
patch reflects that in the OMP.td file.
As a follow up to 3c4dff3ac6 I audited all
uses of 'process clause and use additive methods', and added explicit
functions to the construct to make it easier for the next project to
attempt to use this mechanism (vs construct all operands/etc in advance,
then add all at once).
I've only done ones that I have attempted to use so far(as a catch-up,
so no var-list clauses, and no constructs that can't be used without a
var-list, and no loop, and no compound constructs). I intend to do those
"as I go" with the lowering of each of those things instead.
---------
Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
Instead of always iterating over all GlobalVariable:s in the Module to
find the case where both Caller and Callee is using the same GV heavily,
first scan Callee (only if less than 200 instructions) for all GVs used
more than 10 times, and then do the counting for the Caller for just
those relevant GVs.
The limit of 200 instructions makes sense as this aims to inline a
relatively small function using a GV +10 times.
This resolves the compile time problem with zig where it is on main
(compared to removing the heuristic) a 380% increase, but with this
change <0.5% increase (total user compile time with opt).
Fixes#134714.
When floating-point operations are legalized to operations of a higher
precision (e.g. f16 fadd being legalized to f32 fadd) then we get
narrowing then widening operations between each operation. With the
appropriate fast math flags (nnan ninf contract) we can eliminate these
casts.
Add a new reduction recurrence kind for reductions with
minimumnum/maximumnum. Such reductions can be vectorized without
nsz/nnans, same as reductions with maximum/minimum intrinsics.
Note that a new reduction kind is needed to make sure partial reductions
are also combined with minimumnum/maximumnum.
Note that the final reduction to a scalar value is performed with
vector.reduce.fmin/fmax. This should be fine, as the results of the
partial reductions with maximumnum/minimumnum silences any sNaNs.
In-loop and reductions in SLP are not supported yet, as there's no
reduction version of maximumnum/minimumnum yet and fmax may be
incorrect.
PR: https://github.com/llvm/llvm-project/pull/137335
[llvm-exegesis][AArch64] Recommit: Disable pauth and ldgm as unsupported instructions.
Skipping AUT and LDGM opcode variants which currently throws "illegal
instruction".
Last pull request
[#132346](https://github.com/llvm/llvm-project/pull/132346) got reviewed
and merged but builder bot got failed. This was due to undefined
`PR_PAC_SET_ENABLED_KEYS` utilized were not defined in x86 arch,
resulting in build failure.
This is followup to merge the changes with following changes to fixup
the build failure.
Changes:
- Fixed up the problem with arch specific check for `prctl` library
import
- Defining `PR_PAC_SET_ENABLED_KEYS` if undefined.
This patch makes the frame-format variables introduced in
https://github.com/llvm/llvm-project/pull/131836 also work when no
debug-info is available. Previously, we assumed `sc.function` was
available, but without debug-info we might only have `sc.symbol`. We
don't really need the `sc.function` apart from when formatting
arguments.
For the function arguments case I added a fallback that will just print
the arguments we get from the demangler (which is what LLDB does for
stacktraces with no debug-info anyway). Ideally we'd have a separate
`FormatEntity::Entry::Type::FunctionArguments` that will just print the
arguments from the demangler and have something like the following in
the `plugin.cplusplus.display.function-name-format`:
```
{ ${function.formatted-arguments} || ${function.arguments} }
```
I.e., when we can't format the arguments, print the ones from the
demangler. But we currently don't have the `||` operator in the
frame-format language yet.
To perform constant folding in math operations, the implementation of
the ConstantFolding Analysis relies on the use of the math functions
from the host's libm. In particular, it relies on checking the value of
errno and IEEE exceptions to determine when an operation is safe to be
constant-folded.
On some platforms, such as BSD or Darwin, math library functions don't
set errno, so the ConstantFolding check depends only on the value of
IEEE exceptions. As the FP exception behaviour is set to `ignore` by
default, the compiler can perform optimisations that would get in the
way of such checks being performed correctly.
This patch sets the FP exception behaviour to `strict` when compiling
the `ConstantFolding.cpp` source file, ensuring the value of IEEE
exceptions can be reliably used by its implementation.
On the buildbots:
```
user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/tools/debugserver/source/DNBLog.cpp
/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/tools/debugserver/source/DNBLog.cpp:66:15: error: no type named 'recursive_mutex' in namespace 'std'
static std::recursive_mutex g_LogThreadedMutex;
~~~~~^
/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/tools/debugserver/source/DNBLog.cpp:67:8: error: no member named 'lock_guard' in namespace 'std'
std::lock_guard<std::recursive_mutex> guard(g_LogThreadedMutex);
~~~~~^
/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/tools/debugserver/source/DNBLog.cpp:67:24: error: no member named 'recursive_mutex' in namespace 'std'
std::lock_guard<std::recursive_mutex> guard(g_LogThreadedMutex);
~~~~~^
```
Some of the changes in the patch include:
1. Using iterators instead of instruction pointers when applicable.
2. Modifying Polly functions to accept iterators instead of inst
pointers.
3. Updating API usages such as use begin instead of front.
let constructor is legacy (do not use in tree!) since the tableGen
backend emits most of the glue logic to build a pass.
Note: The following constructor has been retired:
```cpp
std::unique_ptr<Pass> createAsyncParallelForPass(bool asyncDispatch,
int32_t numWorkerThreads,
int32_t minTaskSize);
```
To update your codebase, replace it with the new options-based API:
```cpp
AsyncParallelForPassOptions options{/*asyncDispatch=*/, /*numWorkerThreads=*/, /*minTaskSize=*/};
createAsyncParallelForPass(options);
```