Similar to D97976.
On Linux, most GCC installations are configured with
`--enable-gnu-unique-object` and such GCC emits `@gnu_unique_object` assembly.
The feature is highly controversial and disliked by many folks.
(On glibc DF_1_NODELETE is implicitly enabled and makes dlclose a no-op).
In llvm-project STB_GNU_UNIQUE is assembly only. Clang does not use STB_GNU_UNIQUE.
Use ELFOSABI_GNU to match GNU as behavior and avoid collision with other
OSABI binding values.
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D107861
The gn build doesn't support xray, so there's no reason to make the xray
headers available. Some CMake checks check if xray includes are
available to determine if xray is usable. Since we don't build the xray
runtime, there are link errors.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D108737
Add a new option PackConstructorInitializers and deprecate the
related options ConstructorInitializerAllOnOneLineOrOnePerLine and
AllowAllConstructorInitializersOnNextLine. Below is the mapping:
PackConstructorInitializers ConstructorInitializer... AllowAll...
Never - -
BinPack false -
CurrentLine true false
NextLine true true
The option value Never fixes PR50549 by always placing each
constructor initializer on its own line.
Differential Revision: https://reviews.llvm.org/D108752
We currently complain "could not open /LTCG: no such file or directory",
which isn't very useful. We could emit a warning when we see this flag, but
just ignoring it seems fine.
Final missing part of PR38799.
Differential Revision: https://reviews.llvm.org/D108799
MallocOverflow works in two phases:
1) Collects suspicious malloc calls, whose argument is a multiplication
2) Filters the aggregated list of suspicious malloc calls by iterating
over the BasicBlocks of the CFG looking for comparison binary
operators over the variable constituting in any suspicious malloc.
Consequently, it suppressed true-positive cases when the comparison
check was after the malloc call.
In this patch the checker will consider the relative position of the
relation check to the malloc call.
E.g.:
```lang=C++
void *check_after_malloc(int n, int x) {
int *p = NULL;
if (x == 42)
p = malloc(n * sizeof(int)); // Previously **no** warning, now it
// warns about this.
// The check is after the allocation!
if (n > 10) {
// Do something conditionally.
}
return p;
}
```
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D107804
We try to forward a stored-once-constant-value from one global access
to another, but that's not safe if the constant value is an expression
that can trap.
The tests are reduced from the miscompile examples in:
https://llvm.org/PR47578
Differential Revision: https://reviews.llvm.org/D108771
For vectors that are exactly equal to getMaxSVEVectorSizeInBits, just use
AArch64SVEPredPattern::all, which can enable the use of unpredicated ptrue when available.
TestPlan: check-llvm
Differential Revision: https://reviews.llvm.org/D108706
This patch solely moves convert operation between SVE predicate pattern
and element number into two small functions. It's pre-commit patch for optimize
pture with known sve register width.
Differential Revision: https://reviews.llvm.org/D108705
Lets wavefront size be 32 for amdgpu openmp, as well as 64.
Fixes up as little as possible to pass that through the libraries. This change
is end to end, as opposed to updating clang/devicertl/plugin separately. It can
be broken up for review/commit if preferred. Posting as-is so that others with
a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are
probably bugs remaining as well as the todo: for letting grid values vary more.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D108708
x86_fp80 format allows values that do not fit any of IEEE-754 category.
Previously they were recognized by intrinsic __builtin_isnan as NaNs.
Now this intrinsic is implemented using instruction FXAM, which
distinguish between NaNs and unsupported values. It can make some
programs behave differently.
As a solution, this fix changes lowering of the intrinsic. If floating
point exceptions are ignored, llvm.isnan is lowered into unordered
comparison, as __buildtin_isnan was implemented earlier. In strictfp
functions the intrinsic is lowered using FXAM, which does not raise
exceptions even for signaling NaN, as required by IEEE-754 and C
standards.
Differential Revision: https://reviews.llvm.org/D108037
There are no paths into LowerFormalParms that have already specified
the sret register. We always materialize a virtual and then assign it
to the physical reg at the point of the return.
Differential Revision: https://reviews.llvm.org/D108762
Without this change only the preferred fusion opcode is tested
when attempting to combine FMA operations.
If both FMA and FMAD are available then FMA ops formed prior to
legalization will not be merged post legalization as FMAD becomes
the preferred fusion opcode.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108619
Fixes PR51626.
The M68k requires that all instruction, word and long word reads are
aligned to word boundaries. From the 68020 onwards, there is a
performance benefit from aligning long words to long word boundaries.
The M68k uses the same data layout for pointers and integers.
In line with this, this commit updates the pointer data layout to
match the layout already set for 32-bit integers: 32:16:32.
Differential Revision: https://reviews.llvm.org/D108792
Exegesis is faulty and sometimes when measuring throughput^-1
produces snippets that have loop-carried dependencies,
which must be what caused me to incorrectly measure it originally.
After looking much more carefully, the inverse throughput should match
that of the MULX w/ reg op.
As per llvm-exegesis measurements.
Not only global variables can hold references to dead stack variables.
Consider this example:
void write_stack_address_to(char **q) {
char local;
*q = &local;
}
void test_stack() {
char *p;
write_stack_address_to(&p);
}
The address of 'local' is assigned to 'p', which becomes a dangling
pointer after 'write_stack_address_to()' returns.
The StackAddrEscapeChecker was looking for bindings in the store which
referred to variables of the popped stack frame, but it only considered
global variables in this regard. This patch relaxes this, catching
stack variable bindings as well.
---
This patch also works for temporary objects like:
struct Bar {
const int &ref;
explicit Bar(int y) : ref(y) {
// Okay.
} // End of the constructor call, `ref` is dangling now. Warning!
};
void test() {
Bar{33}; // Temporary object, so the corresponding memregion is
// *not* a VarRegion.
}
---
The return value optimization aka. copy-elision might kick in but that
is modeled by passing an imaginary CXXThisRegion which refers to the
parent stack frame which is supposed to be the 'return slot'.
Objects residing in the 'return slot' outlive the scope of the inner
call, thus we should expect no warning about them - except if we
explicitly disable copy-elision.
Reviewed By: NoQ, martong
Differential Revision: https://reviews.llvm.org/D107078
Currently, it is a bit buried in the file even if this is
pretty important for distro.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D108533
In a future patch, a new set of amdgpu-no-* attributes will be
introduced to indicate when a function does not need an implicitly
passed input. This pass introduces new instances of these intrinsic
calls, and should remove the attributes if they were present before.
Switch to using BitIntegerState for each of the inputs, and invert
their meanings.
This now diverges more from the old AMDGPUAnnotateKernelFeatures, but
this isn't used yet anyway.
We only really want this to add the custom attributes. Theoretically
the regular transforms were already run at this point. Touching
undefined behavior breaks a lot of tests when this is enabled by
default, many of which are expecting to test handling of undef
operations.
amdgpu-calls and amdgpu-stack-objects don't really belong as
attributes, and are currently a hacky way of passing an analysis into
the DAG. These don't really belong in the IR, and don't really fit in
with the other attributes. Remove these to facilitate inverting the
pass.
I don't exactly understand the indirect call test changes. These tests
are using calls which are trivially replacable with a direct call, so
I'm not sure what the point is.
When doing Emscritpen EH, if SjLj is also enabled and used and if the
thrown exception has a possiblity being a longjmp instead of an
exception, we shouldn't swallow it; we should rethrow, or relay it. It
was done in D106525 and the code is here:
8441a8eea8/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L858-L898)
Here is the pseudocode of that part: (copied from comments)
```
if (%__THREW__.val == 0 || %__THREW__.val == 1)
goto %tail
else
goto %longjmp.rethrow
longjmp.rethrow: ;; This is longjmp. Rethrow it
%__threwValue.val = __threwValue
emscripten_longjmp(%__THREW__.val, %__threwValue.val);
tail: ;; Nothing happened or an exception is thrown
... Continue exception handling ...
```
If the current BB (where the `invoke` is created) has successors that
has the current BB as its PHI incoming node, now that has to change to
`tail` in the pseudocode, because `tail` is the latest BB that is
connected with the next BB, but this was missing.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D108785
qsort can reuse qsort_r if available.
bsearch always passes key as the first comparator argument, so we
can use it to wrap the original comparator.
Differential Revision: https://reviews.llvm.org/D108751