This adds legalization, notably libcall lowering for fpowi. It is a
little different to other methods as the function takes both a float and
integer register. Otherwise all vectors get scalarized and fp16 is
promoted to fp32.
[LV] Change loops' interleave count computation
A set of microbenchmarks in llvm-test-suite (https://github.com/llvm/llvm-test-suite/pull/56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both.
The initial tests for this change were submitted in PRs:
https://github.com/llvm/llvm-project/pull/70272 and https://github.com/llvm/llvm-project/pull/74689.
This patch cleans up the duplicate code for folding commutative binops
over `select/phi/minmax`.
Related commits:
+ select support:
88cc35b27e
+ phi support:
8674a023bc
+ minmax support:
624973806c
We can use RISCVISD::VMERGE_VL with an undef passthru operand.
I had to rewrite the FMA patterns to handle both undef and non-undef
cases so we can get the tail policy.
Make it so that PDL in pattern rewrites can be optionally disabled.
PDL is still enabled by default and not optional bazel. So this should
be a NOP for most folks, while enabling other to disable.
This only works with tests disabled. With tests enabled this still
compiles but tests fail as there is no lit config to disable tests that
depend on PDL rewrites yet.
As suggested in https://github.com/llvm/llvm-project/pull/76210, this
patch re-organize the mc tests for apx promoted instrs, instr tests
within same cpuid would be listed in one test.
Also add explicit prefix {evex} tests and 8 displacement memory test,
promoted instrs need set No_CD8 to avoid AVX512 compress encoding.
Uses machine analyses to emit PGOAnalysisMap into the bb-addr-map ELF
section. Implements filecheck tests to verify emitting new fields.
This patch emits optional PGO related analyses into the bb-addr-map ELF
section during AsmPrinter. This currently supports Function Entry Count,
Machine Block Frequencies. and Machine Branch Probabilities. Each is
independently enabled via the `feature` byte of `bb-addr-map` for the given
function.
A part of [RFC - PGO Accuracy Metrics: Emitting and Evaluating Branch and Block Analysis](https://discourse.llvm.org/t/rfc-pgo-accuracy-metrics-emitting-and-evaluating-branch-and-block-analysis/73902).
On most hardware, FCSR.ABS2008 is set the value same with FCSR.NAN2008.
Let's use this behaivor by default.
With this commit, `clang -target mips -mnan=2008 -c fabs.c` will imply
`-mabs=2008`.
And of course, `clang -mnan=2008 -mabs=legacy` can continue workable
like previous.
Co-authored-by: YunQiang Su <yunqiang.su@cipunited.com>
The GCC build is producing the following diagnostic:
llvm-project/libc/src/signal/linux/signal_utils.h: In member function
‘__llvm_libc_18_0_0_git::KernelSigaction&
__llvm_libc_18_0_0_git::KernelSigaction::operator=(const sigaction&)’:
llvm-project/libc/src/signal/linux/signal_utils.h:38:20: warning:
cast between incompatible function types from ‘void (*)(int, siginfo_t*,
void*)’ to ‘void (*)(int)’ [-Wcast-function-type]
38 | sa_handler = reinterpret_cast<HandlerType *>(sa.sa_sigaction);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
llvm-project/libc/src/signal/linux/signal_utils.h: In member function
‘__llvm_libc_18_0_0_git::KernelSigaction::operator sigaction() const’:
llvm-project/libc/src/signal/linux/signal_utils.h:51:25: warning:
cast between incompatible function types from ‘void (*)(int)’ to ‘void
(*)(int, siginfo_t*, void*)’ [-Wcast-function-type]
51 | sa.sa_sigaction = reinterpret_cast<SiginfoHandlerType
*>(sa_handler);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Two issues here:
1. Clang supports -Wcast-function-type, but not as part of the -Wextra
group.
2. The existing implementation tried to work around the oddity that is
the
kernel's struct sigaction != POSIX via reinterpret_cast in a way that's
not
compatible with -Wcast-function-type. Just use a union which is well
defined
(and two function pointers are the same size.)
Link: https://github.com/llvm/llvm-project/issues/76872
Fixes: https://github.com/llvm/llvm-project/issues/74617
WebAssembly doesn't have a single virtual memory space the way other object
formats or architectures do, so "addresses" mean different things depending
on the context.
Function symbol addresses in object files are offsets from the start of the code
section. This is good for linking and relocation. However when dealing with
linked binaries, offsets from the start of the file/module are more often
used (e.g. for stack traces in browsers), and are more useful for use
cases like binary size attribution. This PR changes Object to use
the file offset instead of the section offset for function symbols, but
only for linked (non-DSO) files.
This is a reland of fc5f51cf with a fix for the MSan failure (it was not caused
by this change, but it was revealed by the new tests).
This patch removes the redundant RegisterInitialValues parameter from
assembleToStream and friends as it is included within the BenchmarkKey
struct that is also passed to all the functions that need this
information.
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultimately incorrect and lead to miscompiles, e.g. when using non-splat
constants such as (1, 0) as operands.
Recognize 32-bit float constants for i/u16 instructions. This is a bit
odd conceptually, but it matches HW behavior and SP3.
Remove isFoldableLiteralV216; there was too much magic in the dependency
between it and its use in SIFoldOperands. Instead, we now simply rely on
checking whether a constant is an inline constant, and trying a bunch of
permutations of the low and high halves. This is more obviously correct
and leads to some new cases where inline constants are used as shown by
tests.
Move the logic for switching packed add vs. sub into SIFoldOperands.
This has two benefits: all logic that optimizes for inline constants in
packed math is now in one place; and it applies to both SelectionDAG and
GISel paths.
Disable the use of opsel with v_dot* instructions on gfx11. They are
documented to ignore opsel on src0 and src1. It may be interesting to
re-enable to use of opsel on src2 as a future optimization.
A similar "proper" fix of what inline constants mean could potentially
be applied to unpacked 16-bit ops. However, it's less clear what the
benefit would be, and there are surely places where we'd have to
carefully audit whether values are properly sign- or zero-extended. It
is best to keep such a change separate.
Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)
BreakpointResolverAddress optionally can include the module name related
to the address that gets resolved. Currently this will never work
because it sets the name to itself (which is empty).
As in title,
a8f4397426
broke CI due to the calling convention not available on certain targets.
This patch uses a simpler calling convention and enables the test only
when the attribute exists. It's verified that this test crashes the
compiler before a8f4397426 so it's the
same effect as the previous test. Disabling the test on platforms that
don't have the calling convention is fine because it's guarding against
a frontend bug.
The upstream test relies on jump-tables, which are lowered in
dramatically different ways with later arm64e/ptrauth patches.
Concretely, it's failing for at least two reasons:
- ptrauth removes x16/x17 from tcGPR64 to prevent indirect tail-calls
from using either register as the callee, conflicting with their usage
as scratch for the tail-call LR auth checking sequence. In the
1/2_available_regs_left tests, this causes the MI scheduler to move
the load up across some of the inlineasm register clobbers.
- ptrauth adds an x16/x17-using pseudo for jump-table dispatch, which
looks somewhat different from the regular jump-table dispatch codegen
by itself, but also prevents compression currently.
They seem like sensible changes. But they mean the tests aren't really
testing what they're intented to, because there's always an implicit
x16/x17 clobber when using jump-tables.
This updates the test in a way that should work identically regardless
of ptrauth support, with one exception, #1 above, which merely reorders
the load/inlineasm w.r.t. eachother.
I verified the tests still fail the live-reg assertions when
applicable.
Expand the copying of attributes on GPU kernel arguments during LLVM
lowering.
Support copying attributes from values that are already LLVM pointers.
Support copying attributes, like `noundef`, that aren't specific to (the
pointer parts of) arguments.
After 47a1704ac9 we are able to
reassociate a disjoint Or used as a GEP index to get the constant
closer to a load to fold it. This is show by the first test.
We are not able to do this if the GEP created a shift left to scale
the index as the second test shows.
To make this work, we need to preserve the disjoint flag when pulling
the Or through the shift.
Keep the haveNoCommonBitsSet check because we haven't started inferring
the flag yet.
I've added tests for two transforms, but these are not the only
transforms that use isADDLike.
This diagnoses unexpanded packs in the _unqualified-id_ of a function
template specialization's _declarator-id_. For example:
```cpp
template<typename... Ts>
struct A
{
template<typename U>
void f();
template<>
void f<Ts>(); // error: explicit specialization contains unexpanded parameter pack 'Ts'
};
```
I moved the handling of template-id's so it happens right after we
determine whether we are declaring a function template/function template
specialization so diagnostics are issued in lexical order.
vectorized.
If the insertelement instruction is vectorized, and the extractelement
instruction from such insertelement also vectorized as part of the same
tree, need to extract from the corresponding for insertelement vectorized value rather than original insertelement instruction.
Make it so that PDL in pattern rewrites can be optionally disabled.
PDL is still enabled by default and not optional bazel. So this should
be a NOP for most folks, while enabling other to disable.
This is piped through mlir-tblgen invocation and that could be
changed/avoided by splitting up the passes file instead.
This only works with tests disabled. With tests enabled this still
compiles but tests fail as there is no lit config to disable tests that
depend on PDL rewrites yet.
The change in c1eab57673 fixed the
behavior of `getDiscardableAttrDictionary` for ops that are not using
properties to only return discardable attributes. AsmPrinter was relying
on the wrong behavior when printing such ops in the generic form,
assuming all attributes are discardable.
This PR adds API `makeReproducer` and cl::opt flag
`--mlir-generate-reproducer=<filename>` in order to allow for mlir
reproducer dumps even when the pipeline doesn't crash.
This PR also decouples the code that handles generation of an MLIR
reproducer from the crash recovery portion. The purpose is to allow for
generating reproducers outside of the context of a compiler crash.
This will be useful for frameworks and runtimes that use MLIR where it
is needed to reproduce the pipeline behavior for reasons outside of
diagnosing crashes. An example is for diagnosing performance issues
using offline tools, where being able to dump the reproducer from a
runtime compiler would be helpful.