This builds on #169858 to fix the divergence in codegen
(https://godbolt.org/z/a9az3h6oq) between two very similar
functions initially observed in #137447 (represented in the diff by test
cases `@transpose_splat_constants` and `@transpose_constants_splat`:
```
int8x16_t f(int8_t x)
{
return (int8x16_t) { x, 0, x, 1, x, 2, x, 3,
x, 4, x, 5, x, 6, x, 7 };
}
int8x16_t g(int8_t x)
{
return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
4, x, 5, x, 6, x, 7, x };
}
```
The PR uses an additional `isTRNMask` call in
`AArch64TTIImpl::getShuffleCost` to ensure that we treat shuffle masks
as transpose masks even if `isTransposeMask` fails to recognise them
(meaning that `Kind == TTI::SK_Transpose` cannot be relied upon).
Follow-up work could consider modifying `isTransposeMask`, but that
would also impact other backends than AArch64.
The ORC runtime needs to work in diverse codebases, both with and
without C++ exceptions enabled (e.g. most LLVM projects compile with
exceptions turned off, but regular C++ codebases will typically have
them turned on). This introduces a tension in the ORC runtime: If a C++
exception is thrown (e.g. by a client-supplied callback) it can't be
ignored, but orc_rt::Error values will assert if not handled prior to
destruction. That makes the following pattern fundamentally unsafe in
the ORC runtime:
```
if (auto Err = orc_rt_operation(...)) {
log("failure, bailing out"); // <- may throw if exceptions enabled
// Exception unwinds stack before Error is handled, triggers Error-not-checked
// assertion here.
return Err;
}
```
We can resolve this tension by preventing any exceptions from unwinding
through ORC runtime stack frames. We can do this while preserving
exception *values* by catching all exceptions (using `catch (...)`) and
capturing their values as a std::exception_ptr into an Error.
This patch adds APIs to simplify conversion between C++ exceptions and
Errors. These APIs are available only when enabled when the ORC runtime
is configured with ORC_RT_ENABLE_EXCEPTIONS=On (the default).
- `ExceptionError` wraps a std::exception_ptr.
- `runCapturingExceptions` takes a T() callback and converts any
exceptions thrown by the body into Errors. If T is Expected or Error
already then runCapturingExceptions returns the same type. If T is void
then runCapturingExceptions returns an Error (returning Error::success()
if no exception is thrown). If T is any other type then
runCapturingExceptions returns an Expected<T>.
- A new Error::throwOnFailure method is added that converts failing
values into thrown exceptions according to the following rules:
1. If the Error is of type ExceptionError then std::rethrow_exception is
called on the contained std::exception_ptr to rethrow the original
exception value.
2. If the Error is of any other type then std::unique_ptr<T> is thrown
where T is the dynamic type of the Error.
These rules allow exceptions to be propagated through the ORC runtime as
Errors, and for ORC runtime errors to be converted to exceptions by
clients.
To gain better control over the functions that go into the output file
and their order, introduce `BinaryContext::getOutputBinaryFunctions()`.
The new API returns a modifiable list of functions in output order.
This list is filled by a new `PopulateOutputFunctions` pass and includes
emittable functions from the input file, plus functions added by BOLT
(injected functions).
The new functionality allows to freely intermix input functions with
injected ones in the output, which will be used in new PRs.
The new function replaces `BinaryContext::getSortedFunctions()`, but
unlike its predecessor, it includes injected functions in the returned
list.
When -DORC_RT_ENABLE_EXCEPTIONS=On and -DORC_RT_ENABLE_RTTI=On are
passed we need to ensure that the resulting compiler flags (e.g.
-fexceptions, -frtti for clang/GCC) are appended so that we override any
inherited options (e.g. -fno-exceptions, -fno-rtti) from LLVM.
Updates unit tests to ensure that these compiler options are applied to
them too.
`std::views::FOO` should in almost all cases be preferred over
`std::ranges::FOO_view`. For a detailed explanation of why that is, see
https://brevzin.github.io/c++/2023/03/14/prefer-views-meow/. The TLDR is
that it's shorter to spell (which is obvious) and can in certain cases
be more efficient (which is less obvious; see the article if curious).
Flag changes reverted as those require the X86 target to be enabled.
Don't have time to test fixes as I need to go to sleep so will revert for now.
Reverts: 423919d31f
Friendlier wrapper for transform.foreach.
To facilitate that friendliness, makes it so that OpResult.owner returns
the relevant OpView instead of Operation. For good measure, also changes
Value.owner to return OpView instead of Operation, thereby ensuring
consistency. That is, makes it is so that all op-returning .owner
accessors return OpView (and thereby give access to all goodies
available on registered OpViews.)
Reland of #171544 due to fixup for integration test.
Implementation files using the Intel syntax typically explicitly specify it.
Do the same for the few files using AT&T syntax.
This enables building LLVM with `-mllvm -x86-asm-syntax=intel` in one's Clang config files
(i.e. a global preference for Intel syntax).
Pass backedge values directly to VPFirstOrderRecurrencePHIRecipe and
VPReductionPHIRecipe directly, as they must be provided and availbale.
Split off from https://github.com/llvm/llvm-project/pull/168291.
Friendlier wrapper for `transform.foreach`.
To facilitate that friendliness, makes it so that `OpResult.owner`
returns the relevant `OpView` instead of `Operation`. For good measure,
also changes `Value.owner` to return `OpView` instead of `Operation`,
thereby ensuring consistency. That is, makes it is so that all
op-returning `.owner` accessors return `OpView` (and thereby give access
to all goodies available on registered `OpView`s.)
Use SCEV to simplify all live-ins during VPlan0 construction. This
enables us to remove special SCEV queries when constructing
VPWidenRecipes and improves results in some cases.
This leads to simplifications in a number of cases in real-world
applications (~250 files changed across LLVM, SPEC, ffmpeg)
PR: https://github.com/llvm/llvm-project/pull/155304
These functions from C++ ABI are defined in
compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp and are supposed to
replace implementations from libstdc++/libc++abi.
We need to export them similar to why we need to export other
interceptors and TSan runtime functions - e.g. if a dlopen-ed shared
library depends on `__cxa_guard_acquire`, it needs to pick up the
exported definition from the TSan runtime that was linked into the main
executable calling the dlopen()
However, because the `__cxa_guard_` functions don't use traditional
interceptor machinery, they are omitted from the auto-generated
`libclang_rt.tsan.a.syms` files. Fix this by adding them to
tsan.syms.extra file explicitly.
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
While figuring out how to perform an atomic exchange on a memref, I
tried the generic atomic rmw with the yielded value captured from the
enclosing scope (instead of a plain atomic_rmw with
`arith::AtomicRMWKind::assign`). Instead of segfaulting, this PR changes
the pass to produce an error when the result is not found in the
region's IR map.
It might be more useful to give a suggestion to the user, but giving an
error message instead of a crash is at least an imrovement, I think.
See: #172184
Add AssumeSingleUse (default = false) argument to mayFoldIntoVector to
allow us to match combineVectorSizedSetCCEquality behaviour with
AssumeSingleUse=true
Hopefully we can drop the AssumeSingleUse entirely soon, but there are a
number of messy test regressions that need handling first
Typedef/using declarations in structs and classes were not created with
the native PDB plugin. The following would only create `Foo` and
`Foo::Bar`:
```cpp
struct Foo {
struct Bar {};
using Baz = Bar;
using Int = int;
};
```
With this PR, they're created. One complication is that typedefs and
nested types show up identical. The example from above gives:
```
0x1006 | LF_FIELDLIST [size = 40, hash = 0x2E844]
- LF_NESTTYPE [name = `Bar`, parent = 0x1002]
- LF_NESTTYPE [name = `Baz`, parent = 0x1002]
- LF_NESTTYPE [name = `Int`, parent = 0x0074 (int)]
```
To distinguish nested types and typedefs, we check if the parent of a
type is equal to the current one (`parent(0x1002) == 0x1006`) and if the
basename matches the nested type name.
Commit c6f501d479 fixed an issue where some tests were incorrectly
marked as unsupported for a bootstrapping build. This exposed in our
'slow' full-bootstrap qemu-system CI that the fdr-mode.cpp fails on
RISC-V. We mark it as unsupported. I believe _xray_ArgLoggerEntry needs
to be implemented in xray_trampoline_risc*.S for this to work.
Noticed while trying to replace the IsVectorBitCastCheap helper with
mayFoldIntoVector (still some work to do as we have a number of multiuse
cases) - technically its possible for a extload to reach this point.
This is the last of the generic instructions created from MVE
intrinsics. It was a little more awkward than the others due to it
taking a Type as one of the arguments. This creates a new function to
create the intrinsic we need.
Plugins appear to be broken on AIX, CI fails. There is logic in
CMakeLists for plugins+AIX, but it was never tested before...
Note: when plugins work, also enable tests in Examples/IRTransforms.
There's no Sparc support for JIT tests, so disable the JIT tests in the
examples (copied from ExecutionEngine/lit.local.cfg).
Following the suggestions in #170061, I replaced `SmallVector<SDValue>`
with `std::array<SDValue, NumPatterns>` and `SmallBitVector` with
`Bitset<NumPatterns>`.
I had to make some changes to the `collectLeaves` and
`reassociatableMatchHelper` functions. In `collectLeaves` specifically,
I changed the return type so I could propagate a failure in case the
number of found leaves is greater than the number of expected patterns.
I also added a new unit test that, together with the one already present
in the previous line, checks if the matching fails in the cases where
the number of patterns is less or more than the number of leaves.
I don't think this is going to completely address the increased compile
time reported in #169644, but hopefully it leads to an improvement.
For some reason, the current `std::barrier`'s wait implementation polls
the underlying atomic in a loop with sleeps instead of using the native
wait.
This change should also indirectly fix the performance issue of
`std::barrier` described in
https://github.com/llvm/llvm-project/issues/123855.
Fixes#123855
Its current effect is to use the `rootUri` provided by the client as the
working directory for fallback commands.
Future effects will include other behaviors appropriate for cases
where clangd is being run in the context of editing a single
workspace, such as using the `compile_commands.json` file in the
workspace root for all opened files.
The flag is hidden until other behaviors are implemented and they
constitute a cohesive "mode" for users.
Original PR in #155905, which was reverted due to a UBSan failure
that is fixed in this version.
It looks like these were copied from fp16 tests, and forgot to update the
intrinsic types. Also remove some old definitions that are no longer required.
This reverts commit 2612dc9b5f but keeps
`Predicates = [HasNDD]` removed.
There are two issues identified related to the change. One is
INSERT_SUBREG cannot guarantee source and dest to be the same register.
It mostly happens on O0. The other one is zero_extend is not a chain
node, as a result, we will lose the chain for SETZUCCr.
If we do not collect the diagnostics from the
CollectDiagnosticsToStringScope, even when the named_sequence applied
successfully, the Scope object's destructor will assert (with a
unhelpful message).
This comment was written 12 years ago. It's no longer correct to say
that diagnostic reporting is under heavy development, and we seem to be
doing just fine without tablegenned IDs, so I think we can simply remove
it.