If I take the command from the page and add my triple like so:
$ cmake -G Ninja -S llvm -B build \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DLLVM_ENABLE_PROJECTS="clang" \ # Configure
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind;compiler-rt" \
-DLLVM_RUNTIME_TARGETS="aarch64-unknown-linux-gnu"
CMake Warning:
Ignoring extra path from command line:
" "
<...>
-- Build files have been written to:
/home/david.spickett/llvm-project/build -bash:
-DLLVM_ENABLE_RUNTIMES=libcxx;libcxxabi;libunwind;compiler-rt: command
not found
As the comment is after the backslash, it's considered part of the next
line. This comments out the ENABLE_RUNTIMES line and makes the
RUNTIME_TARGETS line look like another command.
To fix this, put the comment before the configure command.
I also moved the other inline comments (which are fine) closer to the
text since they don't have to line up with the configure one anymore.
This patch makes use of aggressive interleaving options for the A320
subtarget. This is done by adding a new local parameter to the
AArch64Subtarget class. With this enabled we see an aggregate uplift of
0.7% on internal benchmark suites with up to 51% uplift on individual
benchmark workloads.
This fixes an error on our Armv8 bot:
```
<...>/RemoteJITUtils.cpp:132:24: error: use of undeclared identifier 'DynamicThreadPoolTaskDispatcher'
132 | std::make_unique<DynamicThreadPoolTaskDispatcher>(std::nullopt),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
These examples require LLVM_ENABLE_THREADS to be ON, and cannot run
otherwise. As a comment says elsewhere:
```
// Out of process mode using SimpleRemoteEPC depends on threads.
```
#164405 added specializations of `for_each` that didn't do the ranges
call shenanigans, but instead just did what the classic algorithms have
to do. This updates the calls to work for the ranges overloads as well.
This change adds support for mixed precision floating point
arithmetic for `f16` and `bf16` where the following patterns:
```
%fh = fpext half %h to float
%resfh = fp-operation(%fh, ...)
...
%fb = fpext bfloat %b to float
%resfb = fp-operation(%fb, ...)
where the fp-operation can be any of:
- fadd
- fsub
- llvm.fma.f32
- llvm.nvvm.add(/fma).*
```
are lowered to the corresponding mixed precision instructions which
combine the conversion and operation into one instruction from
`sm_100` onwards.
This also adds the following intrinsics to complete support for
all variants of the floating point `add/fma` operations in order
to support the corresponding mixed-precision instructions:
- `llvm.nvvm.add.(rn/rz/rm/rp){.ftz}.sat.f`
- `llvm.nvvm.fma.(rn/rz/rm/rp){.ftz}.sat.f`
We lower `fneg` followed by one of the above addition
intrinsics to the corresponding `sub` instruction.
Tests are added in `fp-arith-sat.ll` , `fp-fold-sub.ll`, and
`bultins-nvptx.c`
for the newly added intrinsics and builtins, and in
`mixed-precision-fp.ll`
for the mixed precision instructions.
PTX spec reference for mixed precision instructions:
https://docs.nvidia.com/cuda/parallel-thread-execution/#mixed-precision-floating-point-instructions
In an effort to get rid of VPUnrollPartAccessor and directly unroll
recipes, start by directly unrolling VectorPointerRecipe, allowing for
VPlan-based simplifications and simplification of the corresponding
execute.
The `__parent_pointer` type alias was marked to be removed in
d163ab3323.
At that time, <map> still had uses of `__parent_pointer` as a local
variable type in operator[] and at()
Those uses were removed in 4a2dd31f16,
which refactored `__find_equal` to return a pair instead of using an out
parameter
However, the typedef in <map> and the alias in __tree were left behind
This patch removes the unused typedef from <map> and the
`__parent_pointer` alias from __tree
Signed-off-by: Krechals <topala.andrei@gmail.com>
RegBankLegalize using trivial mapping helper, assigns same reg bank
to all operands, vgpr or sgpr.
Uncovers multiple codegen and regbank combiner regressions related to
looking through sgpr to vgpr copies.
Skip regbankselect-concat-vector.mir since agprs are not yet supported.
That PR added an include to `LLVMOps.td` without adding a target
providing that file. Curiously, this does not break the official builds
but it *does* break my bazel build.
Signed-off-by: Ingo Müller <ingomueller@google.com>
This is needed so the llvm-cgdata tool properly builds with
`LLVM_BUILD_LLVM_DYLIB` so LLVM can be built as a DLL on Windows.
This effort is tracked in #109483.
This solves a common issue where users have to manually add the
`.cache/clangd/index/` folder to their `.gitignore`. I got this idea
from [ruff](https://github.com/astral-sh/ruff), which creates
`.ruff_cache/.gitignore` and it would greatly improve the user
experience for everyone without requiring per-computer configurations
and without any significant cost.
In concept checking, we need to transform SubstNTTPExpr when evaluating
constraints.
The value category is initially computed during parameter mapping,
possibly with a dependent expression. However during instantiation, it
wasn't recomputed, and the stale category is propagated into parent
expressions. So we may end up with an 'out-of-thin-air' reference type,
which breaks the evaluation.
We now call BuildSubstNonTypeTemplateParmExpr in TreeTransform, in which
the value category is recomputed.
The issue was brought by both 078e99e and the concept normalization
patch, which are not released yet, so no release note.
Fixes https://github.com/llvm/llvm-project/issues/170856
This adds support for lowering smaller-than-legal masks such as:
```
<vscale x 8 x i1> @llvm.loop.dependence.war.mask.nxv8i1(ptr %a, ptr %b, i64 1)
```
To a whilewr + unpack. It also slightly simplifies the lowering.
This is basically the same change as #162653, but for InstSimplify
instead of ConstantFolding.
It folds `icmp (ptrtoaddr x, ptrtoaddr y)` to `icmp (x, y)` and `icmp
(ptrtoaddr x, C)` to `icmp (x, inttoptr C)`.
The fold is restricted to the case where the result type is the address
type, as icmp only compares the icmp bits. As in the other PR, I think
in practice all the folds are also going to work if the ptrtoint result
type is larger than the address size, but it's unclear how to justify
this in general.
A buildbot failed for the original patch.
https://github.com/llvm/llvm-project/pull/171835 addresses the issue
raised by the buildbot.
After the fix is merged, the original patch is reapplied without any
change.
Custom error types (ErrorInfoBase subclasses) should use ErrorExtends as
of 8f51da369e. Adding a static_assert allows us to enforce that at
compile-time.
This builds on #169858 to fix the divergence in codegen
(https://godbolt.org/z/a9az3h6oq) between two very similar
functions initially observed in #137447 (represented in the diff by test
cases `@transpose_splat_constants` and `@transpose_constants_splat`:
```
int8x16_t f(int8_t x)
{
return (int8x16_t) { x, 0, x, 1, x, 2, x, 3,
x, 4, x, 5, x, 6, x, 7 };
}
int8x16_t g(int8_t x)
{
return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
4, x, 5, x, 6, x, 7, x };
}
```
The PR uses an additional `isTRNMask` call in
`AArch64TTIImpl::getShuffleCost` to ensure that we treat shuffle masks
as transpose masks even if `isTransposeMask` fails to recognise them
(meaning that `Kind == TTI::SK_Transpose` cannot be relied upon).
Follow-up work could consider modifying `isTransposeMask`, but that
would also impact other backends than AArch64.
The ORC runtime needs to work in diverse codebases, both with and
without C++ exceptions enabled (e.g. most LLVM projects compile with
exceptions turned off, but regular C++ codebases will typically have
them turned on). This introduces a tension in the ORC runtime: If a C++
exception is thrown (e.g. by a client-supplied callback) it can't be
ignored, but orc_rt::Error values will assert if not handled prior to
destruction. That makes the following pattern fundamentally unsafe in
the ORC runtime:
```
if (auto Err = orc_rt_operation(...)) {
log("failure, bailing out"); // <- may throw if exceptions enabled
// Exception unwinds stack before Error is handled, triggers Error-not-checked
// assertion here.
return Err;
}
```
We can resolve this tension by preventing any exceptions from unwinding
through ORC runtime stack frames. We can do this while preserving
exception *values* by catching all exceptions (using `catch (...)`) and
capturing their values as a std::exception_ptr into an Error.
This patch adds APIs to simplify conversion between C++ exceptions and
Errors. These APIs are available only when enabled when the ORC runtime
is configured with ORC_RT_ENABLE_EXCEPTIONS=On (the default).
- `ExceptionError` wraps a std::exception_ptr.
- `runCapturingExceptions` takes a T() callback and converts any
exceptions thrown by the body into Errors. If T is Expected or Error
already then runCapturingExceptions returns the same type. If T is void
then runCapturingExceptions returns an Error (returning Error::success()
if no exception is thrown). If T is any other type then
runCapturingExceptions returns an Expected<T>.
- A new Error::throwOnFailure method is added that converts failing
values into thrown exceptions according to the following rules:
1. If the Error is of type ExceptionError then std::rethrow_exception is
called on the contained std::exception_ptr to rethrow the original
exception value.
2. If the Error is of any other type then std::unique_ptr<T> is thrown
where T is the dynamic type of the Error.
These rules allow exceptions to be propagated through the ORC runtime as
Errors, and for ORC runtime errors to be converted to exceptions by
clients.
To gain better control over the functions that go into the output file
and their order, introduce `BinaryContext::getOutputBinaryFunctions()`.
The new API returns a modifiable list of functions in output order.
This list is filled by a new `PopulateOutputFunctions` pass and includes
emittable functions from the input file, plus functions added by BOLT
(injected functions).
The new functionality allows to freely intermix input functions with
injected ones in the output, which will be used in new PRs.
The new function replaces `BinaryContext::getSortedFunctions()`, but
unlike its predecessor, it includes injected functions in the returned
list.
When -DORC_RT_ENABLE_EXCEPTIONS=On and -DORC_RT_ENABLE_RTTI=On are
passed we need to ensure that the resulting compiler flags (e.g.
-fexceptions, -frtti for clang/GCC) are appended so that we override any
inherited options (e.g. -fno-exceptions, -fno-rtti) from LLVM.
Updates unit tests to ensure that these compiler options are applied to
them too.
`std::views::FOO` should in almost all cases be preferred over
`std::ranges::FOO_view`. For a detailed explanation of why that is, see
https://brevzin.github.io/c++/2023/03/14/prefer-views-meow/. The TLDR is
that it's shorter to spell (which is obvious) and can in certain cases
be more efficient (which is less obvious; see the article if curious).