Namely, these are the functions enabled: rint*, lrint*, llrint*, lround*,
llround*, nearbyint*. They were previously not enabled because they
required rounding mode and FP exception support. Now that rounding mode
and FP exception support is available for Aarch64, they can be enabled.
These patterns move vector.bitcast ops to be before
insert ops or after extract ops where suitable.
With them, bitcast will happen on smaller vectors
and there are more chances to share extract/insert
ops.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D96040
libunwind ARM EHABI does not support _Unwind_ForcedUnwind yet.
In addition, ARM EHABI makes `_Unwind_Exception` a typedef so
`struct _Unwind_Exception*` cannot be used.
Currently the emscripten frontend driver injects this when building
with thread support. Moving this into the clang driver itself makes
the emscripten python driver less magical.
Differential Revision: https://reviews.llvm.org/D96171
Currently, the SmallPtrSet type allows inserting elements but it does
not support inserting elements with a positional hint. The lack of this
signature means that you cannot use SmallPtrSet with
std::insert_iterator or std::inserter(), which makes some code
constructs more awkward. This adds an overload of insert() that can be
used in these scenarios.
The positional hint is unused by SmallPtrSet and the call is equivalent
to calling insert() without a hint.
When a function or a file is excluded using -fprofile-list= option,
don't emit coverage mapping as doing so confuses users since those
functions would always have zero count. This also reduces the binary
size considerably in cases where only a few functions or files are
being instrumented.
Differential Revision: https://reviews.llvm.org/D96000
Switch StdTuple printer from python 2-style "next" to python 3.
Nested iteration changed enough to make the original bitset iteration
code a bit trickier than it needs to be, so unnest.
The end node of a map iterator is sometimes hard to detect in isolation,
don't fail in that case.
Differential Revision: https://reviews.llvm.org/D96167
This does roughly the same as the manual implementation, but checks
a slightly different set of environment variables and has a more
appropriate fallback if no environment variables are available
(/tmp isn't a very useful fallback on windows).
Differential Revision: https://reviews.llvm.org/D91175
This works just fine for windows, as all the functions it calls
are implemented and wrapped for windows.
Differential Revision: https://reviews.llvm.org/D91173
- Quality-of-implementation: Avoid calling __unwrap_iter in constexpr contexts.
The user might conceivably write a contiguous iterator where normal iterator
arithmetic is constexpr-friendly but `std::to_address(it)` isn't.
- Bugfix: When you pass contiguous iterators to `std::copy`, you should get
back your contiguous iterator type, not a raw pointer. That means that
libc++ can't `__unwrap_iter` unless it also does `__rewrap_iter`.
Fortunately, this is implementable.
- Improve test coverage of the new `contiguous_iterator` test iterator.
This catches the bug described above.
- Tests: Stop testing that we can `std::copy` //into// an `input_iterator`.
Our test iterators may currently support that, but it seems nonsensical to me.
Differential Revision: https://reviews.llvm.org/D95983
For -fgpu-rdc, shadow variables should not be internalized, otherwise
they cannot be accessed by other TUs. This is necessary because
the shadow variable of external device variables are always
emitted as undefined symbols, which need to resolve to a global
symbols.
Managed variables need to be emitted as undefined symbols
in device compilations.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D95901
These passes are causing numerical discrepancies after being added to
the pipeline. Disable while investigating.
Reviewed By: rupprecht
Differential Revision: https://reviews.llvm.org/D96166
Adds documentation around libc++'s policy to add noexcept to things that cannot throw but aren't marked as noexcept.
Refs LWG 3518 and D95251.
Differential Revision: https://reviews.llvm.org/D95821
This patch adds a pass to replace calls to vector intrinsics
(i.e., LLVM intrinsics operating on vector operands) with
calls to a vector library.
Currently, calls to LLVM intrinsics are only replaced with
calls to vector libraries when scalar calls to intrinsics are
vectorized by the Loop- or SLP-Vectorizer.
With this pass, it is now possible to replace calls to LLVM
intrinsics already operating on vector operands, e.g., if
such code was generated by MLIR. For the replacement,
information from the TargetLibraryInfo, e.g., as specified
via -vector-library is used.
Differential Revision: https://reviews.llvm.org/D95373
__builtin_isnan currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: kpn
Differential Revision: https://reviews.llvm.org/D95948
Make sure scalable property is preserved by using getVectorElementCount().
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D95967
Previously the code split the string at the first '<', which
incorrectly truncated names like `operator<`.
Differential Revision: https://reviews.llvm.org/D95893
This is a part of https://reviews.llvm.org/D95835.
This change is to address two problems
1) When recording stacks in origin tracking, libunwind is not async signal safe. Inside signal callbacks, we need
to use fast unwind. Fast unwind needs threads
2) StackDepot used by origin tracking is not async signal safe, we set a flag per thread inside
a signal callback to prevent from using it.
The thread registration is similar to ASan and MSan.
Related MSan changes are
* 98f5ea0dba
* f653cda269
* 5a7c364343
Some changes in the diff are used in the next diffs
1) The test case pthread.c is not very interesting for now. It will be
extended to test origin tracking later.
2) DFsanThread::InSignalHandler will be used by origin tracking later.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D95963
The current diagnostic has confused users. The new wording is adapted from one suggested by Ian Lance Taylor.
Differential Revision: https://reviews.llvm.org/D95917
This reverts commit b6ffece320.
The bug is now fixed (it was a stupid cut-and-paste kind of error),
and the regression test added. The new patch is also simpler than the old one!
Differential Revision: https://reviews.llvm.org/D96084
The new trace event includes what's already in the queue when adding.
For tracers that follow contexts, the trace event will span the time that the action
spends in the queue.
For tracers that follow threads, the trace will be a tiny span on the enqueuing thread.
Differential Revision: https://reviews.llvm.org/D96027
The Python code generated by SWIG is compatible with both Python 2 and
Python 3. The -py3 option enables Python 2 incompatible features such as
function annotations and abstract base classes.
Differential revision: https://reviews.llvm.org/D96096
Those types are not needed any longer since LLVM dialect
has migrated to using MLIR's I1, I8, I32 types directly.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D96127
- The failures are all cc1-based tests due to the missing `-aux-triple` options,
which is always prepared by the driver in CUDA/HIP compilation.
- Add extra check on the missing aux-targetinfo to prevent crashing.
[hip][cuda] Enable extended lambda support on Windows.
- On Windows, extended lambda has extra issues due to the numbering
schemes are different between the host compilation (Microsoft C++ ABI)
and the device compilation (Itanium C++ ABI. Additional device side
lambda number is required per lambda for the host compilation to
correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
number a lambda for both host- and device-compilations.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D69322
This reverts commit 4874ff0241.
Summary:
This resolves an issue posted on Bugzilla. https://bugs.llvm.org/show_bug.cgi?id=48764
In this issue, the loop had multiple exit blocks, which resulted in the
function getExitBlock to return a nullptr, which resulted in hitting the assert.
This patch ensures that loops which only have one exit block as allowed to be
unrolled and jammed.
Reviewed By: Whitney, Meinersbur, dmgreen
Differential Revision: https://reviews.llvm.org/D95806
This patch adds possibility to define OpenCL C 3.0 feature macros
via command line option or target setting.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D95776
This patch introduces a few more straightforward patterns
to convert vector ops operating on 1-4 element vectors
to their corresponding SPIR-V counterparts.
This patch also enables converting vector<1xT> to T.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D96042
emitting retainRV or claimRV calls in the IR
This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.
Original commit message:
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
Dynamic allocas that still exist have been verified to be only used
'locally' not accross a suspend point.
rdar://73903220
Differential Revision: https://reviews.llvm.org/D96071
Use IgnoreUnlessSpelledInSource to make the matcher code smaller and
more visibly-related to the code.
Differential Revision: https://reviews.llvm.org/D91303