#164405 added specializations of `for_each` that didn't do the ranges
call shenanigans, but instead just did what the classic algorithms have
to do. This updates the calls to work for the ranges overloads as well.
This change adds support for mixed precision floating point
arithmetic for `f16` and `bf16` where the following patterns:
```
%fh = fpext half %h to float
%resfh = fp-operation(%fh, ...)
...
%fb = fpext bfloat %b to float
%resfb = fp-operation(%fb, ...)
where the fp-operation can be any of:
- fadd
- fsub
- llvm.fma.f32
- llvm.nvvm.add(/fma).*
```
are lowered to the corresponding mixed precision instructions which
combine the conversion and operation into one instruction from
`sm_100` onwards.
This also adds the following intrinsics to complete support for
all variants of the floating point `add/fma` operations in order
to support the corresponding mixed-precision instructions:
- `llvm.nvvm.add.(rn/rz/rm/rp){.ftz}.sat.f`
- `llvm.nvvm.fma.(rn/rz/rm/rp){.ftz}.sat.f`
We lower `fneg` followed by one of the above addition
intrinsics to the corresponding `sub` instruction.
Tests are added in `fp-arith-sat.ll` , `fp-fold-sub.ll`, and
`bultins-nvptx.c`
for the newly added intrinsics and builtins, and in
`mixed-precision-fp.ll`
for the mixed precision instructions.
PTX spec reference for mixed precision instructions:
https://docs.nvidia.com/cuda/parallel-thread-execution/#mixed-precision-floating-point-instructions
In an effort to get rid of VPUnrollPartAccessor and directly unroll
recipes, start by directly unrolling VectorPointerRecipe, allowing for
VPlan-based simplifications and simplification of the corresponding
execute.
The `__parent_pointer` type alias was marked to be removed in
d163ab3323.
At that time, <map> still had uses of `__parent_pointer` as a local
variable type in operator[] and at()
Those uses were removed in 4a2dd31f16,
which refactored `__find_equal` to return a pair instead of using an out
parameter
However, the typedef in <map> and the alias in __tree were left behind
This patch removes the unused typedef from <map> and the
`__parent_pointer` alias from __tree
Signed-off-by: Krechals <topala.andrei@gmail.com>
RegBankLegalize using trivial mapping helper, assigns same reg bank
to all operands, vgpr or sgpr.
Uncovers multiple codegen and regbank combiner regressions related to
looking through sgpr to vgpr copies.
Skip regbankselect-concat-vector.mir since agprs are not yet supported.
That PR added an include to `LLVMOps.td` without adding a target
providing that file. Curiously, this does not break the official builds
but it *does* break my bazel build.
Signed-off-by: Ingo Müller <ingomueller@google.com>
This is needed so the llvm-cgdata tool properly builds with
`LLVM_BUILD_LLVM_DYLIB` so LLVM can be built as a DLL on Windows.
This effort is tracked in #109483.
This solves a common issue where users have to manually add the
`.cache/clangd/index/` folder to their `.gitignore`. I got this idea
from [ruff](https://github.com/astral-sh/ruff), which creates
`.ruff_cache/.gitignore` and it would greatly improve the user
experience for everyone without requiring per-computer configurations
and without any significant cost.
In concept checking, we need to transform SubstNTTPExpr when evaluating
constraints.
The value category is initially computed during parameter mapping,
possibly with a dependent expression. However during instantiation, it
wasn't recomputed, and the stale category is propagated into parent
expressions. So we may end up with an 'out-of-thin-air' reference type,
which breaks the evaluation.
We now call BuildSubstNonTypeTemplateParmExpr in TreeTransform, in which
the value category is recomputed.
The issue was brought by both 078e99e and the concept normalization
patch, which are not released yet, so no release note.
Fixes https://github.com/llvm/llvm-project/issues/170856
This adds support for lowering smaller-than-legal masks such as:
```
<vscale x 8 x i1> @llvm.loop.dependence.war.mask.nxv8i1(ptr %a, ptr %b, i64 1)
```
To a whilewr + unpack. It also slightly simplifies the lowering.
This is basically the same change as #162653, but for InstSimplify
instead of ConstantFolding.
It folds `icmp (ptrtoaddr x, ptrtoaddr y)` to `icmp (x, y)` and `icmp
(ptrtoaddr x, C)` to `icmp (x, inttoptr C)`.
The fold is restricted to the case where the result type is the address
type, as icmp only compares the icmp bits. As in the other PR, I think
in practice all the folds are also going to work if the ptrtoint result
type is larger than the address size, but it's unclear how to justify
this in general.
A buildbot failed for the original patch.
https://github.com/llvm/llvm-project/pull/171835 addresses the issue
raised by the buildbot.
After the fix is merged, the original patch is reapplied without any
change.
Custom error types (ErrorInfoBase subclasses) should use ErrorExtends as
of 8f51da369e. Adding a static_assert allows us to enforce that at
compile-time.
This builds on #169858 to fix the divergence in codegen
(https://godbolt.org/z/a9az3h6oq) between two very similar
functions initially observed in #137447 (represented in the diff by test
cases `@transpose_splat_constants` and `@transpose_constants_splat`:
```
int8x16_t f(int8_t x)
{
return (int8x16_t) { x, 0, x, 1, x, 2, x, 3,
x, 4, x, 5, x, 6, x, 7 };
}
int8x16_t g(int8_t x)
{
return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
4, x, 5, x, 6, x, 7, x };
}
```
The PR uses an additional `isTRNMask` call in
`AArch64TTIImpl::getShuffleCost` to ensure that we treat shuffle masks
as transpose masks even if `isTransposeMask` fails to recognise them
(meaning that `Kind == TTI::SK_Transpose` cannot be relied upon).
Follow-up work could consider modifying `isTransposeMask`, but that
would also impact other backends than AArch64.
The ORC runtime needs to work in diverse codebases, both with and
without C++ exceptions enabled (e.g. most LLVM projects compile with
exceptions turned off, but regular C++ codebases will typically have
them turned on). This introduces a tension in the ORC runtime: If a C++
exception is thrown (e.g. by a client-supplied callback) it can't be
ignored, but orc_rt::Error values will assert if not handled prior to
destruction. That makes the following pattern fundamentally unsafe in
the ORC runtime:
```
if (auto Err = orc_rt_operation(...)) {
log("failure, bailing out"); // <- may throw if exceptions enabled
// Exception unwinds stack before Error is handled, triggers Error-not-checked
// assertion here.
return Err;
}
```
We can resolve this tension by preventing any exceptions from unwinding
through ORC runtime stack frames. We can do this while preserving
exception *values* by catching all exceptions (using `catch (...)`) and
capturing their values as a std::exception_ptr into an Error.
This patch adds APIs to simplify conversion between C++ exceptions and
Errors. These APIs are available only when enabled when the ORC runtime
is configured with ORC_RT_ENABLE_EXCEPTIONS=On (the default).
- `ExceptionError` wraps a std::exception_ptr.
- `runCapturingExceptions` takes a T() callback and converts any
exceptions thrown by the body into Errors. If T is Expected or Error
already then runCapturingExceptions returns the same type. If T is void
then runCapturingExceptions returns an Error (returning Error::success()
if no exception is thrown). If T is any other type then
runCapturingExceptions returns an Expected<T>.
- A new Error::throwOnFailure method is added that converts failing
values into thrown exceptions according to the following rules:
1. If the Error is of type ExceptionError then std::rethrow_exception is
called on the contained std::exception_ptr to rethrow the original
exception value.
2. If the Error is of any other type then std::unique_ptr<T> is thrown
where T is the dynamic type of the Error.
These rules allow exceptions to be propagated through the ORC runtime as
Errors, and for ORC runtime errors to be converted to exceptions by
clients.
To gain better control over the functions that go into the output file
and their order, introduce `BinaryContext::getOutputBinaryFunctions()`.
The new API returns a modifiable list of functions in output order.
This list is filled by a new `PopulateOutputFunctions` pass and includes
emittable functions from the input file, plus functions added by BOLT
(injected functions).
The new functionality allows to freely intermix input functions with
injected ones in the output, which will be used in new PRs.
The new function replaces `BinaryContext::getSortedFunctions()`, but
unlike its predecessor, it includes injected functions in the returned
list.
When -DORC_RT_ENABLE_EXCEPTIONS=On and -DORC_RT_ENABLE_RTTI=On are
passed we need to ensure that the resulting compiler flags (e.g.
-fexceptions, -frtti for clang/GCC) are appended so that we override any
inherited options (e.g. -fno-exceptions, -fno-rtti) from LLVM.
Updates unit tests to ensure that these compiler options are applied to
them too.
`std::views::FOO` should in almost all cases be preferred over
`std::ranges::FOO_view`. For a detailed explanation of why that is, see
https://brevzin.github.io/c++/2023/03/14/prefer-views-meow/. The TLDR is
that it's shorter to spell (which is obvious) and can in certain cases
be more efficient (which is less obvious; see the article if curious).
Flag changes reverted as those require the X86 target to be enabled.
Don't have time to test fixes as I need to go to sleep so will revert for now.
Reverts: 423919d31f
Friendlier wrapper for transform.foreach.
To facilitate that friendliness, makes it so that OpResult.owner returns
the relevant OpView instead of Operation. For good measure, also changes
Value.owner to return OpView instead of Operation, thereby ensuring
consistency. That is, makes it is so that all op-returning .owner
accessors return OpView (and thereby give access to all goodies
available on registered OpViews.)
Reland of #171544 due to fixup for integration test.
Implementation files using the Intel syntax typically explicitly specify it.
Do the same for the few files using AT&T syntax.
This enables building LLVM with `-mllvm -x86-asm-syntax=intel` in one's Clang config files
(i.e. a global preference for Intel syntax).
Pass backedge values directly to VPFirstOrderRecurrencePHIRecipe and
VPReductionPHIRecipe directly, as they must be provided and availbale.
Split off from https://github.com/llvm/llvm-project/pull/168291.
Friendlier wrapper for `transform.foreach`.
To facilitate that friendliness, makes it so that `OpResult.owner`
returns the relevant `OpView` instead of `Operation`. For good measure,
also changes `Value.owner` to return `OpView` instead of `Operation`,
thereby ensuring consistency. That is, makes it is so that all
op-returning `.owner` accessors return `OpView` (and thereby give access
to all goodies available on registered `OpView`s.)