When compiling compiler-rt with -fsanitize=undefined and running testcases you
end up with the following warning:
UBSan:/repo/uabkaka/llvm-project/compiler-rt/lib/builtins/int_mulo_impl.inc:24:23: signed integer overflow: -1 * -2147483648 cannot be represented in type 'si_int' (aka 'long')
This can be avoided by doing the multiplication in a matching unsigned variant
of the type.
This was found in an out of tree target.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D146623
The 'LPM' instruction has three forms:
------------------------
| form | feature |
| ---------- | --------|
| LPM | hasLPM |
| LPM Rd, Z | hasLPMX |
| LPM Rd, Z+ | hasLPMX |
------------------------
The second form is always selected in ISelDAGToDAG, even on devices
without FeatureLPMX. This patch emits "LPM + MOV" on devices with
only FeatureLPM.
Reviewed By: jacquesguan
Differential Revision: https://reviews.llvm.org/D141246
Quite a few vectoriser tests were using a trip count of 1024,
which meant:
1. For fixed-length VFs we would never actually tail-fold, e.g.
see Transforms/LoopVectorize/RISCV/uniform-load-store.ll. This
is because we can prove at compile-time there will never be a
scalar tail.
2. As of D146199 the same optimisation mentioned above will also
apply to scalable VFs too.
I've changed all such trip counts to be 1025 instead.
Differential Revision: https://reviews.llvm.org/D146219
Regenerate the style documentation, requires some minor sphinx changes to avoid warnings
Reviewed By: klimek
Differential Revision: https://reviews.llvm.org/D146704
This first version lays the foundations for AArch32 support in JITLink. ELFLinkGraphBuilder_aarch32 processes REL-type relocations and populates LinkGraphs from ELF object files for both big- and little-endian systems. The ArmCfg member controls subarchitecture-specific details throughout the linking process (i.e. it's passed to ELFJITLinker_aarch32).
Relocation types follow the ABI documentation's division into classes: Data (endian-sensitive), Arm (32-bit little-endian) and Thumb (2x 16-bit little-endian, "Thumb32" in the docs). The implementation of instruction encoding/decoding for relocation resolution is implemented symmetrically and is testable in isolation (see AArch32 category in JITLinkTests).
Callable Thumb functions are marked with a ThumbSymbol target-flag and stored in the LinkGraph with their real addresses. The thumb-bit is added back in when the owning JITDylib requests the address for such a symbol.
The StubsManager can generate (absolute) Thumb-state stubs for branch range extensions on v7+ targets. Proper GOT/PLT handling is not yet implemented.
This patch is based on the backend implementation in ez-clang and has just enough functionality to model the infrastructure and link a Thumb function `main()` that calls `printf()` to dump "Hello Arm!" on Armv7a. It was tested on Raspberry Pi with 32-bit Raspbian OS.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D144083
This patch extends the `createConst` method so that it can generate
constant vectors (it can already generate scalars). This change is
required to be able to apply the converter for `arith.floordivsi`
(i.e. `FloorDivSIOpConverter`) to vectors.
While `arith.floordivsi` is my main motivation for this change, this
patch should also allow other Arith ops to be converted in vector cases.
In my example, the Linalg vectorizer updates `arith.floordivsi` to
operate on vectors and hence the need for this change.
Differential Revision: https://reviews.llvm.org/D146741
When LLVM_NATIVE_TOOL_DIR was introduced in
d3da9067d1 / D131052, it consisted
of refactoring a couple cases of manual logic for tools in
clang-tools-extra/clang-tidy, clang-tools-extra/pseudo/include
and mlir/tools/mlir-linalg-ods-gen. The former two had the same
consistent behaviour while the latter was slightly different, so
the refactoring would end up slightly adjusting one or the other.
The difference was that the clang-tools-extra tools respected the
external variable for setting the tool name, regardless of the
LLVM_USE_HOST_TOOLS variable, while mlir-linalg-ods-gen tool
only checked its external variable if LLVM_USE_HOST_TOOLS was set.
LLVM_USE_HOST_TOOLS is supposed to be enabled automatically whenever
cross compiling, so this shouldn't have been an issue.
In https://github.com/llvm/llvm-project/issues/60784, it seems like
some users do cross compile LLVM, without CMake knowing about it
(without CMAKE_CROSSCOMPILING being set). In these cases, their
build broke, as the variables for pointing to external host tools
no longer were being respected.
The fact that CMAKE_CROSSCOMPILING wasn't set stems from a
non-obvious behaviour of CMake; CMAKE_CROSSCOMPILING isn't supposed
to be set by the user (and if it was, it gets overridden), but one
has to set CMAKE_SYSTEM_NAME to indicate that one is cross compiling,
even if the target OS is the same as the current host.
Skip the checks for LLVM_USE_HOST_TOOLS and always respect the
variables for pointing to external tools (both the old tool specific
variables, and the new LLVM_NATIVE_TOOL_DIR), if they're set. This
makes the logic within setup_host_tool more exactly match the
logic for the clang-tools-extra tools from before the refactoring
in d3da9067d1. This makes the behaviour
consistent with that of the tablegen executables, which also respect
the externally set variables regardless of LLVM_USE_HOST_TOOLS.
This fixes
https://github.com/llvm/llvm-project/issues/60784.
Differential Revision: https://reviews.llvm.org/D146666
This revealed a test case that wasn't hitting the intended branch
because the inlinees had no function definition.
Depends on D146628
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D146633
This test somewhat unconventionally assembles both aarch64 and
x86 object files.
This fixes test failures in build configurations with the aarch64
target enabled but x86 target disabled.
According to the Google docs, the convention is
TEST(TestSuiteName, TestName). Apply that convention to the
source code, test and documentation of the check.
Differential Revision: https://reviews.llvm.org/D146713
The new algorithm is:
1. Find all multilibs with flags that are a subset of the requested
flags.
2. If more than one multilib matches, choose the last.
In addition a new selection mechanism is permitted via an overload of
MultilibSet::select() for which multiple multilibs are returned.
This allows layering multilibs on top of each other.
Since multilibs are now ordered within a list, they no longer need a
Priority field.
The new algorithm is different to the old algorithm, but in practise
the old algorithm was always used in such a way that the effect is the
same.
The old algorithm was to find the set intersection of the requested
flags (with the first character of each removed) with each multilib's
flags (ditto), and for that intersection check whether the first
character matched. However, ignoring the first characters, the
requested flags were always a superset of all the multilibs flags.
Therefore the new algorithm can be used as a drop-in replacement.
The exception is Fuchsia, which needs adjusting slightly to set both
fexceptions and fno-exceptions flags.
Differential Revision: https://reviews.llvm.org/D142905
This patch adds tests for umax(x, 1u).
This patch fixes:
https://github.com/llvm/llvm-project/issues/60233
It turns out that commit 86b4d8645f on
Feb 8, 2023 already performs the instcombine transformation proposed
in the issue, so the issue requires no change on the codegen side.
In this file, most of the line don't have trailing spaces,
but some of them have. To keep consistent, remove the trailing
spaces.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D146697
By making the 64 bit integer literals unsigned. Otherwise some of them
are unexpectedly sign extended (and the compiler rightly diagnosed this
with warnings)
Initially added in D80506.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D146667
On targets without ADDCARRY or ADDE, we need to emit a separate
SETCC to determine carry from the low half to the high half.
The high half is calculated by a series of ADDs.
When RHSLo and RHSHi are -1, without this patch, we get:
Hi = (add (add LHSHi,(setult Lo, LHSLo), -1)
Where as with the patch we get:
Hi = (sub LHSHi, (seteq LHSLo, 0))
Only RHSLo is -1 we can instead do (setne Lo, 0).
Similar to gcc: https://godbolt.org/z/M83f6rz39
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D146635
Following up on the comments in https://reviews.llvm.org/D144108 this
patch refactors the im2col conversion patterns for `linalg.conv_2d_nhwc_hwcf`
and `linalg.conv_2d_nchw_fchw` convolutions to use gather semantics for the im2col
packing `linalg.generic`.
Follow up work can include a similar pattern for depthwise convolutions
and a generalization of the patterns here to work with any `LinalgOp` as
well.
Differential Revision: https://reviews.llvm.org/D144678
Plumbing from the language level to the assume intrinsics with
separate_storage operand bundles.
Patch by David Goldblatt (davidtgoldblatt)
Differential Revision: https://reviews.llvm.org/D136515
Misc. cleanups for `WebAssemblyDebugValueManager`.
- Use `Register` for registers
- Simpler for loop iteration
- Rename a variable
- Reorder methods
- Reduce `SmallVector` size for `DBG_VALUE`s to 1; one def usually have
a single `DBG_VALUE` attached to it in most cases
- Add a few more lines of comments
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D146743
Sometimes the clang driver will receive a target triple where the
deployment version is too low to support the platform + arch. In those
cases, the compiler upgrades the final minOS which is what gets recorded
ultimately by the linker in LC_BUILD_VERSION. TextAPI should also reuse
this logic for capturing minOS in recorded TBDv5 files.
Reviewed By: ributzka
Differential Revision: https://reviews.llvm.org/D145690
It's possible to segfault in `DevirtModule::applyICallBranchFunnel` when
attempting to call `getCaller` on a call base that was erased in a prior
iteration. This can occur when attempting to find devirtualizable calls
via `findDevirtualizableCallsForTypeTest` if the vtable passed to
llvm.type.test is a global and not a local. The function works by taking
the first argument of the llvm.type.test call (which is a vtable),
iterating through all uses of it, and adding any relevant all uses that
are calls associated with that intrinsic call to a vector. For most
cases where the vtable is actually a *local*, this wouldn't be an issue.
Take for example:
```
define i32 @fn(ptr %obj) #0 {
%vtable = load ptr, ptr %obj
%p = call i1 @llvm.type.test(ptr %vtable, metadata !"typeid2")
call void @llvm.assume(i1 %p)
%fptr = load ptr, ptr %vtable
%result = call i32 %fptr(ptr %obj, i32 1)
ret i32 %result
}
```
`findDevirtualizableCallsForTypeTest` will check the call base ` %result
= call i32 %fptr(ptr %obj, i32 1)`, find that it is associated with a
virtualizable call from `%vtable`, find all loads for `%vtable`, and add
any instances those load results are called into a vector. Now consider
the case where instead `%vtable` was the global itself rather than a
local:
```
define i32 @fn(ptr %obj) #0 {
%p = call i1 @llvm.type.test(ptr @vtable, metadata !"typeid2")
call void @llvm.assume(i1 %p)
%fptr = load ptr, ptr @vtable
%result = call i32 %fptr(ptr %obj, i32 1)
ret i32 %result
}
```
`findDevirtualizableCallsForTypeTest` should work normally and add one
unique call instance to a vector. However, if there are multiple
instances where this same global is used for llvm.type.test, like with:
```
define i32 @fn(ptr %obj) #0 {
%p = call i1 @llvm.type.test(ptr @vtable, metadata !"typeid2")
call void @llvm.assume(i1 %p)
%fptr = load ptr, ptr @vtable
%result = call i32 %fptr(ptr %obj, i32 1)
ret i32 %result
}
define i32 @fn2(ptr %obj) #0 {
%p = call i1 @llvm.type.test(ptr @vtable, metadata !"typeid2")
call void @llvm.assume(i1 %p)
%fptr = load ptr, ptr @vtable
%result = call i32 %fptr(ptr %obj, i32 1)
ret i32 %result
}
```
Then each call base `%result = call i32 %fptr(ptr %obj, i32 1)` will be
added to the vector twice. This is because for either call base `%result
= call i32 %fptr(ptr %obj, i32 1) `, we determine it is associated with
a virtualizable call from `@vtable`, and then we iterate through all the
uses of `@vtable`, which is used across multiple functions. So when
scanning the first `%result = call i32 %fptr(ptr %obj, i32 1)`, then
both call bases will be added to the vector, but when scanning the
second one, both call bases are added again, resulting in duplicate call
bases in the CSInfo.CallSites vector.
Note this is actually accounted for in every other instance WPD iterates
over CallSites. What everything else does is actually add the call base
to the `OptimizedCalls` set and just check if it's already in the set.
We can't reuse that particular set since it serves a different purpose
marking which calls where devirtualized which `applyICallBranchFunnel`
explicitly says it doesn't. For this fix, we can just account for
duplicates with a map and do the actual replacements afterwards by
iterating over the map.
Differential Revision: https://reviews.llvm.org/D146267
Summary:
The previous patch fixed how we handle emitting atomics for targeting
NVPTX directly. This is the only other file that really does that and
has atomics and I forgot to update it.
Since Clang 16.0.0 users can target the `NVPTX` architecture directly
via `--target=nvptx64-nvidia-cuda`. However, this does not set the
atomic inlining size correctly. This leads to spurious warnings and
emission of runtime atomics that are never implemented. This patch
ensures that we set this to the appropriate pointer width. This will
always be 64 in the future as `nvptx64` will only be supported moving
forward.
Fixes: https://github.com/llvm/llvm-project/issues/61410
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D146750