* The reference type of `common_input_iterator<const int*>` can't be `int&`, because an lvalue of type `const int` _can't_ bind to an `int&`. Fix by changing the return type of `operator*` to `decltype(auto)` to make it fully generic.
* `range.zip/iterator/compare.pass.cpp` verifies that the iterators of a `zip_view` don't support `<=>` when the underlying iterators do not; this is not true after LWG-3692.
* libc++ doesn't yet implement P2165R4 "Compatibility between tuple, pair and tuple-like objects", so the tests expect `zip_view` to use `pair` in places where the working draft requires `tuple`.
Differential Revision: https://reviews.llvm.org/D141216
Implement double precision log10 function correctly rounded for all
rounding modes. This implementation currently needs FMA instructions for
correctness.
Use 2 passes:
Fast pass:
- 1 step range reduction with a lookup table of `2^7 = 128` elements to reduce the ranges to `[-2^-7, 2^-7]`.
- Use a degree-7 minimax polynomial generated by Sollya, evaluated using a mixed of double-double and double precisions.
- Apply Ziv's test for accuracy.
Accurate pass:
- Apply 5 more range reduction steps to reduce the ranges further to [-2^-27, 2^-27].
- Use a degree-4 minimax polynomial generated by Sollya, evaluated using 192-bit precisions.
- By the result of Lefevre (add quote), this is more than enough for correct rounding to all rounding modes.
In progress: Adding detail documentations about the algorithm.
Depend on: https://reviews.llvm.org/D136799
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D139846
This patch introduces a new AA `AAUnderlyingObjects`. It is basically like a wrapper
AA of the function `AA::getAssumedUnderlyingObjects`, but it can recursively do
query if the underlying object is an indirect access, such as a phi node or a select
instruction.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D141164
This patch adds a `SymbolTableAnalysis` that can be used with the
analysis manager. It contains a symbol table collection. This analysis
allows symbol tables to be preserved across passes so that they do not
need to be recomputed. The analysis assumes it remains valid because
most transformations automatically keep symbol tables up-to-date using
its `insert` and `erase` methods.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D139666
This commit adds compiler-rt cmake option COMPILER_RT_DISABLE_AARCH64_FMV
which, when enabled, doesn't include function multiversioning features
initilization code in 'builtins' build.
Differential Revision: https://reviews.llvm.org/D141199
This pseudo-instruction stores two small (8-bit) registers into one wide
(16-bit) register. But apparently the order matters a lot to the
register allocator.
This patch changes the order of inserting the registers to optimize for
the best register allocation in the tests of shift32.ll. It might be
detrimental in other cases, but keeping the registers in the same
physical register seems like it would be a common case.
Differential Revision: https://reviews.llvm.org/D140573
This optimization turns shifts of almost a multiple of 8 into a shift
into the opposite direction. Unfortunately it doesn't compose well with
the other optimizations (I've tried) so it's separate from them.
Differential Revision: https://reviews.llvm.org/D140572
This uses a complicated shift sequence that avr-gcc also uses, but
extended to work over any number of bytes and in both directions
(logical shift left and logical shift right). Unfortunately it can't be
used for an arithmetic shift right: I've tried to come up with a
sequence but couldn't.
Differential Revision: https://reviews.llvm.org/D140571
This patch optimizes 32-bit constant shifts by renaming registers. This
is very effective as the compiler would otherwise need to do a lot of
single bit shift instructions. Instead, the registers are renamed at the
SSA level which means the register allocator will insert the necessary
mov instructions.
Unfortunately, the register allocator will insert some unnecessary movs
with the current code. This will be fixed in a later patch.
Differential Revision: https://reviews.llvm.org/D140570
32-bit shift instructions were previously expanded using the default
SelectionDAG expander, which meant it used 16-bit constant shifts and
ORed them together. This works, but is far from optimal.
I've optimized 32-bit shifts on AVR using a custom inserter. This is
done using three new pseudo-instructions that take the upper and lower
bits of the value in two separate 16-bit registers and outputs two
16-bit registers.
This is the first commit in a series. When completed, shift instructions
will take around 31% less instructions on average for constant 32-bit
shifts, and is in all cases equal or better than the old behavior. It
also tends to match or outperform avr-gcc: the only cases where avr-gcc
does better is when it uses a loop to shift, or when the LLVM register
allocator inserts some unnecessary movs. But it even outperforms avr-gcc
in some cases where avr-gcc does not use a loop.
As a side effect, non-constant 32-bit shifts also become more efficient.
For some real-world differences: the build of compiler-rt I use in
TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9%
smaller. I think picolibc is a better representation of real-world code,
but even a ~1% reduction in code size is really significant.
The current patch just lays the groundwork. The result is actually a
regression in code size. Later patches will use this as a basis to
optimize these shift instructions.
Differential Revision: https://reviews.llvm.org/D140569
Integer legalization already supported splitting the output integer of
llround and llrint, but did not support this for lround and lrint yet.
This is not a problem for 32-bit architectures, but for 8/16-bit
architectures like AVR it results in a crash like this:
ExpandIntegerResult #0: t7: i32 = lround t6
LLVM ERROR: Do not know how to expand the result of this operator!
This patch simply add lrint/lround to the list of ISD opcodes to expand.
Fixes https://github.com/llvm/llvm-project/issues/59573.
Differential Revision: https://reviews.llvm.org/D140822
These two symbols are declared in object files to indicate whether .data
needs to be copied from flash or .bss needs to be cleared. They are
supported on avr-gcc and reduce firmware size a bit, which is especially
important on very small chips.
I checked the behavior of avr-gcc and matched it as well as possible.
From my investigation, it seems to work as follows:
__do_copy_data is set when the compiler finds a data symbol:
* without a section name
* with a section name starting with ".data" or ".gnu.linkonce.d"
* with a section name starting with ".rodata" or ".gnu.linkonce.r" and
flash and RAM are in the same address space
__do_clear_bss is set when the compiler finds a data symbol:
* without a section name
* with a section name that starts with .bss
Simply checking whether the calculated section name starts with ".data",
".rodata" or ".bss" should result in the same behavior.
Fixes: https://github.com/llvm/llvm-project/issues/58857
Differential Revision: https://reviews.llvm.org/D140830
Follow-up to:
6c39a3aae1
That converted a pattern with ashr directly to icmp+zext, and
this updates the pattern that we used to convert to.
This canonicalizes to icmp for better analysis in the minimum case
and shortens patterns where the source type is not the same as dest type:
https://alive2.llvm.org/ce/z/tpXJ64https://alive2.llvm.org/ce/z/dQ405O
This requires an adjustment to an icmp transform to avoid infinite looping.
This patch moves the OpenMPOffloadMappingFlags enum definiition from Clang codegen to OMPConstants.h
Differential Revision: https://reviews.llvm.org/D140292
Ensure the negation required when lowering negative power-of-two
divides uses the scalable vector container type with the fixed
length result extracted from it.
Fixes: #59647
Differential Revision: https://reviews.llvm.org/D140563
[module.import/6] last sentence:
A header unit shall not contain a definition of a non-inline function or
variable whose name has external linkage.
Differential Revision: https://reviews.llvm.org/D140261
Implement https://cplusplus.github.io/CWG/issues/2631.html.
Immediate calls in default arguments and defaults members
are not evaluated.
Instead, we evaluate them when constructing a
`CXXDefaultArgExpr`/`BuildCXXDefaultInitExpr`.
The immediate calls are executed by doing a
transform on the initializing expression.
Note that lambdas are not considering subexpressions so
we do not need to transform them.
As a result of this patch, unused default member
initializers are not considered odr-used, and
errors about members binding to local variables
in an outer scope only surface at the point
where a constructor is defined.
Reviewed By: aaron.ballman, #clang-language-wg, rupprecht
Differential Revision: https://reviews.llvm.org/D136554
This is a dirty, dirty hack to workaround bot failures at
-O0. Currently these fields are only used by OpenCL features and
evidently the HIP runtime isn't expecting to see them in HIP
programs. The code objects should be language agnostic, so just force
optimize these out until the runtime is fixed.
This patch clang-formats AppendPathComponents in PathMappingList.cpp.
Without this patch, clang-format would indent the body of the
following function by four spaces.