When changing codegen (e.g. in #130809), offsets in binaries produced by
LTO tests might change. We do not need to match concrete offset values,
it's enough to ensure that hex values in particular places are
identical.
---------
Co-authored-by: Anatoly Trosinenko <atrosinenko@accesssoftek.com>
The hardware min/max follow the IR rules with IEEE mode disabled,
so we can avoid the canonicalizes of the input. We lose the quieting
of a signaling nan if both inputs are nans, but we only require that
with strictfp.
There is a builtin __spirv_SpecConstant that the SPIR-V backend expands
into a specialization constant. However, it is currently only enable for
OpenCL shaders, and not the graphic shaders.
We want to use it for specialization constants coming from HLSL, so we
are enabling it for graphic shaders as well.
Implements https://github.com/llvm/wg-hlsl/pull/287
Fixes https://github.com/llvm/llvm-project/issues/142991
When implementing the vectorization, we potentially need to add shuffles
for external users. In such cases, we may be shuffling a smaller vector
into a larger vector. When this happens `ResizeToVF` will just build a
poison padded identity vector. Then the to build the final shuffle, we
just use the `SK_InsertSubvector` mask.
This is possibly clearer by looking at the included test in
SLPVectorizer/AMDGPU/external-shuffle.ll
In the exit block we have a bunch of shuffles to glue the vectorized
tree match the `InsertElement` users. `TMP25` holds the result of
resizing the v2i16 vectorized sequence to match the `InsertElement` size
v16i16. Then `TMP26` is the final shuffle which replaces the
`InsertElement` sequence. This is just an insertsubvector.
However, when calculating the cost for these shuffles, we aren't
modelling this correctly. `ResizeToVF` will indicate to
`performExtractsShuffleAction` that we cannot use the original mask due
to the resize shuffle. The consequence is that the cost calculation uses
a different shuffle mask than what is ultimately used.
Going back to the included test, we can consider again `TMP26`. Clearly
we can see the shuffle uses a mask {0, 1, 2, 3, 16, 17, poison ..}.
However, we will currently calculate the cost with a mask {0, 1, 2, 3,
20, 21, ...} we have replaced 16 and 17 with 20 and 21 (Index + Vector
Size). Queries like BasicTTImpl::improveShuffleKindFromMask will not
recognize this as an `SK_InsertSubvector` mask, and targets which have
reduced costs for `SK_InsertSubvector` will not accurately calculate the
cost.
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.
This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.
Co-authored-by: Nikita Popov <github@npopov.com>
At the MLIR level unsigned integer and signless integers are different
types. Indeed when looking up the two types in type definition cache
they do not match.
Hence when translating a SPIR-V module which contains both usign and
signless integers will contain the same type declaration twice
(something like OpTypeInt 32 0) which is not permitted in SPIR-V and
such generated modules fail validation.
This patch solves the problem by mapping unisgned integer types to
singless integer types before looking up in the type definition cache.
---------
Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>
We can now let a mixture of combineConcatVectorOps and target shuffle combining handle this instead of creating ISD::CONCAT_VECTORS nodes and hoping they will merge properly.
In the horizontal-sum.ll test changes we were creating a ISD::CONCAT_VECTORS node that was being split shortly after, but not before causing issues with HADD folding due to additional uses.
Update SimplifyICmpOperands to only try subtracting 1 from RHS first, if
RHS is an op we can fold the subtract directly into. Otherwise try
adding to LHS first, as we can preserve NUW flags.
This improves results in a few cases, including the modified test case
from berkeley-abc and new code to be added in
https://github.com/llvm/llvm-project/pull/128061.
Note that there are more cases where the results can be improved by
better ordering here which I'll try to investigate as follow-up.
PR: https://github.com/llvm/llvm-project/pull/144404
Currently when jitting expressions, LLDB scans the IR instructions of
the `$__lldb_expr` and will insert a call to a utility function for each
load/store instruction. The purpose of the utility funciton is to
dereference the load/store operand. If that operand was an invalid
pointer the utility function would trap and LLDB asks the IR checker
whether it was responsible for the trap, in which case it prints out an
error message saying the expression dereferenced an invalid pointer.
This is a lot of setup for not much gain. In fact, creating/running this
utility expression shows up as ~2% of the expression evaluation time
(though we cache them for subsequent expressions). And the error message
we get out of it is arguably less useful than if we hadn't instrumented
the IR. It was also untested.
Before:
```
(lldb) expr int a = *returns_invalid_ptr()
error: Execution was interrupted, reason: Attempted to dereference an invalid pointer..
The process has been returned to the state before expression evaluation.
```
After:
```
(lldb) expr int a = *returns_invalid_ptr()
error: Expression execution was interrupted: EXC_BAD_ACCESS (code=1, address=0x5).
The process has been returned to the state before expression evaluation.
```
This patch removes this IR checker.
Call continuation logic relies on assumptions about fall-through origin:
- the branch is external to the function,
- fall-through start is at the beginning of the block,
- the block is not an entry point or a landing pad.
Leverage trace information to explicitly check whether the origin is a
return instruction, and defer to checks above only in case of
DSO-external branch source.
This covers both regular and BAT cases, addressing call continuation
fall-through undercounting in the latter mode, which improves BAT
profile quality metrics. For example, for one large binary:
- CFG discontinuity 21.83% -> 0.00%,
- CFG flow imbalance 10.77%/100.00% -> 3.40%/13.82% (weighted/worst)
- CG flow imbalance 8.49% —> 8.49%.
Depends on #143289.
Test Plan: updated callcont-fallthru.s
TryDecode/CheckPredicate/SoftFail MCD ops are not used by many targets.
Track the set of opcodes that were emitted and emit code for handling
TryDecode/CheckPredicate/SoftFail ops when decoding only if there were
emitted. This is purely eliminating dead code in the generated
`decodeInstruction` function.
This results in the following reduction in the size of the Disassembler
.so files with a release x86_64 release build on Linux:
```
Target Old Size New Size % reduction
build/lib/libLLVMAArch64Disassembler.so.21.0git 256656 256656 0.00
build/lib/libLLVMAMDGPUDisassembler.so.21.0git 813000 808168 0.59
build/lib/libLLVMARCDisassembler.so.21.0git 44816 43536 2.86
build/lib/libLLVMARMDisassembler.so.21.0git 281744 278808 1.04
build/lib/libLLVMAVRDisassembler.so.21.0git 36040 34496 4.28
build/lib/libLLVMBPFDisassembler.so.21.0git 26248 23168 11.73
build/lib/libLLVMCSKYDisassembler.so.21.0git 55960 53632 4.16
build/lib/libLLVMHexagonDisassembler.so.21.0git 115952 113416 2.19
build/lib/libLLVMLanaiDisassembler.so.21.0git 24360 21008 13.76
build/lib/libLLVMLoongArchDisassembler.so.21.0git 58584 56168 4.12
build/lib/libLLVMM68kDisassembler.so.21.0git 57264 53880 5.91
build/lib/libLLVMMSP430Disassembler.so.21.0git 28896 28440 1.58
build/lib/libLLVMMipsDisassembler.so.21.0git 123128 120568 2.08
build/lib/libLLVMPowerPCDisassembler.so.21.0git 80656 78096 3.17
build/lib/libLLVMRISCVDisassembler.so.21.0git 154080 150200 2.52
build/lib/libLLVMSparcDisassembler.so.21.0git 42040 39568 5.88
build/lib/libLLVMSystemZDisassembler.so.21.0git 97056 94552 2.58
build/lib/libLLVMVEDisassembler.so.21.0git 83944 81352 3.09
build/lib/libLLVMWebAssemblyDisassembler.so.21.0git 25280 25280 0.00
build/lib/libLLVMX86Disassembler.so.21.0git 2920624 2920624 0.00
build/lib/libLLVMXCoreDisassembler.so.21.0git 48320 44288 8.34
build/lib/libLLVMXtensaDisassembler.so.21.0git 42248 35840 15.17
```
On unix systems, we were trying to determine the terminal width using
the `COULMNS` environment variable. Unfortunately, `COLUMNS` is not
exported by all shells and thus not available on some systems.
We were previously using `ioctl()` for this; fall back to doing so if `COLUMNS`
does not exist or does not store a positive integer.
This essentially reverts a3eb3d3d92 and
parts of https://reviews.llvm.org/D61326.
For more information, see #139499.
Fixes#139499.
libstdc++ 15 uses the non-constexpr function
std::__glibcxx_assert_fail() to trigger compilation errors when the
__glibcxx_assert(cond) macro is used in a constantly evaluated context.
Compilation fails when using code from the libstdc++ (such as
std::array) on device code, since these assertions invoke a
non-constexpr host function from device code.
This patch proposes a cuda wrapper header "bits/c++config.h" which adds
a __device__ version of std::__glibcxx_assert_fail().
Solves SWDEV-518041
I believe the new variable names better convey their purpose. However, I
also believe that function is more complex than it needs to be, and this
tiny patch should be seen as a first step towards (maybe) further
refactoring.
The previous names were very generic (Size, Sz, Cnt, StartIdx). This
made it easy to get confused given that the vecotrizeStores() function
is already complex enough.
My hope would be to eventually have a function concise enough to clearly
see what are the different strategies being attempted to vectorise a
group of related store instructions.
Whereas backward-slice matching provides support to limit traversal by
specifying the desired depth level, this pull request introduces support
for limiting traversal with a nested matcher (adding forward-slice
also). It also adds support for variadic operators, including `anyOf`
and `allOf`. Rather than simply stopping traversal when an operation
named foo is encountered, one can now define a matcher that specifies
different exit conditions. Variadic support implementation within
mlir-query is very similar to clang-query.
We have been confusingly, and arguably incorrectly, lowering `m**imumf`
atomic RMW operations in the MemRef dialect to `fm**` atomic RMW
operations in the LLVM dialect, which have different NaN-propagation
semantics: `m**imumf` propagates NaNs from either operand whereas
`fm**`, which lowers to the `fm**num` intrinsic returns the non-NaN
operand. This also contradicts the lowering of `arith.m**imumf` and
`arith.m**numf` operations.
Change the lowering to match the terminology in arith.
Add tests for these lowerings.
Keep a debug message in case of surprising behavior downstream (the code
may be producing more NaNs now).
This patch corrects the state of the error node generated by the
core.DivideZero checker when it detects potential division by zero
involving a tainted denominator.
The checker split in
91ac5ed10a
started to introduce a conflicting assumption about the denominator into
the error node:
Node with the Bug Report "Division by a tainted value, possibly zero"
has an assumption "denominator != 0".
This has been done as a shortcut to continue analysis with the correct
assumption *after* the division - if we proceed, we can only assume the
denominator was not zero. However, this assumption is introduced
one-node too soon, leading to a self-contradictory error node.
In this patch, I make the error node with assumption of zero denominator
fatal, but allow analysis to continue on the second half of the state
split with the assumption of non-zero denominator.
---
CPP-6376
We call clear when checking for potential constant expressions, but that
used to free all the chunks. Keep the last one so we don't have to
re-allocate it.
While migrating towards MemorySSA, account for the memory state modeled
by MemorySSA by hashing it, when computing the symbolic expressions for
the memory operations. Likewise, when phi-translating while walking the
CFG for PRE possibilities, see if the value number of an operand may be
refined with one of the value from the incoming edges of the MemoryPhi
associated to the current phi.
Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
ConstantInt vectors utilise DAG.getConstant() when constructing the
initial DAG. This can have the effect of legalising the constant before
the DAG combiner is run, significant altering the generated code. To
mitigate this (hopefully as a temporary measure) we instead try to
construct the DAG in the same way as shufflevector based splats.
This patch adds support for the `libmvec` vector library on AArch64
targets. Currently, all `libmvec` functions in GLIBC version 2.40 are
supported. The full list of math functions enabled can be found
[here](96abd59bf2/sysdeps/aarch64/fpu/Versions)
(up to GLIBC 2.40).
Previously, `libmvec` was only supported on x86_64 targets. Attempts to
use it on AArch64 resulted in the following error from Clang:
`unsupported option 'libmvec' for target 'aarch64'`.
Visual Studio has an argument to ignore all PCH related switches.
clang-cl has also support option /Y-. Having the same option in clang
would be helpful. This commit is to add support for ignoring PCH options
(-ignore-pch).
The commit includes:
1. Implement -ignore-pch as a Driver option.
2. Add a Driver test and a PCH test.
3. Add a section of -ignore-pch to user manual.
4. Add a release note for the new option '-ignore-pch'.
The change since the original landing:
1. preprocessing-only mode doesn't imply that -include-pch is disabled.
Co-authored-by: Matheus Izvekov <mizvekov@gmail.com>
This commit converts NullabilityChecker to the new checker family
framework that was introduced in the recent commit
6833076a5d
This commit removes the dummy checker `nullability.NullabilityBase`
because it was hidden from the users and didn't have any useful role
except for helping the registration of the checker parts in the old
ad-hoc system (which is replaced by the new standardized framework).
Except for the removal of this dummy checker, no functional changes
intended.
The `RISCVIndirectBranchTracking` pass inserts `lpad` instruction and
could change the basic block alignment, so this should not happen after
the branch relaxation as the adjusted offset is possible to exceed the
branch range.