This is a follow-up for #81790. This patch basically extends:
* test/Integration/Dialect/Linalg/CPU/mmt4d.mlir
with pack/unpack ops so that to overall computation is a matrix
multiplication (as opposed to linalg.mmt4d). For comparison (and to make
it easier to verify correctness), linalg.matmul is also included in the
test.
Adds improved bitwidth analysis for udiv/urem instructions. The
analysis is based on similar version in InstCombiner.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/85928
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.
This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.
In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
This reapplies d9a685a9dd, which was
reverted because it broke ubsan bots. There seems to be a bug in
coroutine code-gen, which is causing EmitTypeCheck to use the wrong
alignment. For now, pass alignment zero to EmitTypeCheck so that it can
compute the correct alignment based on the passed type (see function
EmitCXXMemberOrOperatorMemberCallExpr).
Similar to 3f46e5453d, this patch allows
the backend to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided, for local-exec TLS
variables that are annotated with the "aix-small-tls" attribute.
The expectation is for local-exec TLS variables to be set with this
attribute through PGO. Furthermore, the optimized access sequence is
only generated for local-exec TLS variables annotated with
"aix-small-tls", only if they are less than ~32KB in size.
Adds support for fusing two scf.for loops occurring in the same block.
Uses the rudimentary checks already in place for scf.forall (like the
target loop's operands being dominated by the source loop).
- Fixes a bug in the dominance check whereby it was checked that values
in the target loop themselves dominated the source loop rather than the
ops that define these operands.
- Renames the LoopFuseSibling op to LoopFuseSiblingOp.
- Updates LoopFuseSiblingOp's description.
- Adds tests for using LoopFuseSiblingOp on scf.for loops, including one
which fails without the fix for the dominance check.
- Adds tests checking the different failure modes of the dominance
checker.
- Adds test for case whereby scf.yield is automatically generated when
there are no loop-carried variables.
This upstreams https://github.com/apple/llvm-project/pull/8063.
If module FooCore is re-exported through module Foo (by using
`export_as` in the modulemap), look for attributes of FooCore symbols in
Foo.apinotes file.
Swift bundles `std.apinotes` file that adds Swift-specific attributes to
the C++ stdlib symbols. In recent versions of libc++, module std got
split into multiple top-level modules, each of them is re-exported
through std. This change allows us to keep using a single modulemap file
for all supported C++ stdlibs.
rdar://121680760
There were several functions, mostly reduction-related, that were only
called from OpenMP.cpp. Remove them from OpenMP.h, and make them local
in OpenMP.cpp:
- genOpenMPReduction
- findReductionChain
- getConvertFromReductionOp
- updateReduction
- removeStoreOp
Also, move the function bodies out of the "public" section.
Failure with testcase toc-conf.c observed when building with
LLVM_REVERSE_ITERATION=ON.
Changing from using llvm::StringSet to std::set<llvm:StringRef> to
ensure iteration order is deterministic. Note: the functionality of the
feature does not require a specific iteration order, however, this will
allow testing to be consistent.
From llvm docs:
The advantages of std::set are that its iterators are stable (deleting
or inserting an element from the set does not affect iterators or
pointers to other elements) and that iteration over the set is
guaranteed to be in sorted order.
Fixes#86757
We missed to handle the invalid case when substituting into the
parameter mapping of an constraint during normalization.
The constructor of `InstantiatingTemplate` will bail out (no
`CodeSynthesisContext` will be added to the instantiation stack) if
there was a fatal error, consequently we should stop doing any further
template instantiations.
Branch weight from sample-based PGO may be not inaccurate due to
sampling. If the loop body must be executed, then origin loop back
edge weight must be not less than exit weight.
Summary:
Debug builds don't optimize out certain parts of the code that end up
making the GPU backend crash. This results in regular builds not being
successful just to build the testing objects. Disable them for now in
debug mode.
Change param names to recommended upper case format for static methods
in CmpInst for consistency
Implement suggestion from @dtcxzyw.
cc @dtcxzyw @tschuett
The color methods in formatted_raw_ostream were forwarding directly to
the underlying stream without considering existing buffered output. This
would cause incorrect colored output for buffered uses of
formatted_raw_ostream.
Fix this issue by applying the color to the formatted_raw_ostream itself
and temporarily disabling scanning of any color related output so as not
to affect the position tracking.
This fix means that workarounds that forced formatted_raw_ostream
buffering to be disabled can be removed. In the case of llvm-objdump,
this can improve disassembly performance when redirecting to a file by
more than an order of magnitude on both Windows and Linux. This
improvement restores the disassembly performance when redirecting to a
file to a level similar to before color support was added.
The __kmpc_omp_taskwait_deps_51 allocates a kmp_depnode_t node on its
stack, and there is currently a race condition where another thread
might still be accessing that node after the function has returned and
its stack frame was released.
While the function does wait until the node's npredecessors count has
reached zero before exiting, there is still a window where the function
that last decremented the npredecessors count assumes the node is still
accessible.
For heap-allocated kmp_depnode_t nodes, this normally works via a
separate ndeps count that only reaches zero at the point where no
accesses to the node are expected at all; in fact, at this point the
heap allocation will be freed.
For this case of a stack-allocated kmp_depnode_t node, it therefore
makes sense to similarly respect the ndeps count; we need to wait until
this reaches 1 (not 0, because it is not heap-allocated so there's
always one extra count to prevent it from being freed), before we can
safely deallocate our stack frame.
As this is expected to be a short race window of only a few
instructions, it should be fine to just use a busy wait loop checking
the ndeps count.
Fixes: https://github.com/llvm/llvm-project/issues/85963
Hi, I spotted a problem when running benchmarking programs on a RISCV64
device.
## Issue
Segmentation faults only occurred while running the programs compiled
with `GlobalISel` enabled.
Here is a small but complete example (it is adopted from [Google's
benchmark
framework](95a9f0d0b4/MicroBenchmarks/libs/benchmark/src/colorprint.cc (L85-L119))
to reproduce the issue,
```cpp
#include <cstdarg>
#include <cstdio>
#include <iostream>
#include <memory>
#include <string>
std::string FormatString(const char* msg, va_list args) {
// we might need a second shot at this, so pre-emptivly make a copy
va_list args_cp;
va_copy(args_cp, args);
std::size_t size = 256;
char local_buff[256];
auto ret = vsnprintf(local_buff, size, msg, args_cp);
va_end(args_cp);
// currently there is no error handling for failure, so this is hack.
// BM_CHECK(ret >= 0);
if (ret == 0) // handle empty expansion
return {};
else if (static_cast<size_t>(ret) < size)
return local_buff;
else {
// we did not provide a long enough buffer on our first attempt.
size = static_cast<size_t>(ret) + 1; // + 1 for the null byte
std::unique_ptr<char[]> buff(new char[size]);
ret = vsnprintf(buff.get(), size, msg, args);
// BM_CHECK(ret > 0 && (static_cast<size_t>(ret)) < size);
return buff.get();
}
}
std::string FormatString(const char* msg, ...) {
va_list args;
va_start(args, msg);
auto tmp = FormatString(msg, args);
va_end(args);
return tmp;
}
int main() {
std::string Str =
FormatString("%-*s %13s %15s %12s", static_cast<int>(20),
"Benchmark", "Time", "CPU", "Iterations");
std::cout << Str << std::endl;
}
```
Use `clang++ -fglobal-isel -o main main.cpp` to compile it.
## Cause
I have examined MIR, it shows that these segmentation faults resulted
from a small mistake about legalizing the intrinsic function
`llvm.va_copy`.
36e74cfdbd/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp (L451-L453)
`DstLst` and `Tmp` are placed in the wrong order.
## Changes
I have tweaked the test case `CodeGen/RISCV/GlobalISel/vararg.ll` so
that `s0` is used as the frame pointer (not in all checks) which points
to the starting address of the save area. I believe that it helps reason
about how `llvm.va_copy` is handled.
We check DAG.haveNoCommonBitsSet so the operands will be known to be
disjoint.
I couldn't think of a codegen test case since most targets aren't
checking hasDisjoint yet, apart from RISCV in the or_is_add pattern, but
it also falls back to computeKnownBits.
This is currently only used in one place, but I'm working on a patch
that will
use this from a second place. And I think this already improves the
readability
of the one place this is used so far.
If you write a 32- and a 64-bit LDR instruction that both refer to the
same constant or symbol using the = syntax:
```
ldr w0, =something
ldr x1, =something
```
then the first call to `ConstantPool::addEntry` will insert the constant
into its cache of existing entries, and the second one will find the
cache entry and reuse it. This results in a 64-bit load from a 32-bit
constant, reading nonsense into the other half of the target register.
In this patch I've done the simplest fix: include the size of the
constant pool entry as part of the key used to index the cache. So now
32- and 64-bit constant loads will never share a constant pool entry.
There's scope for doing this better, in principle: you could imagine
merging the two slots with appropriate overlap, so that the 32-bit load
loads the LSW of the 64-bit value. But that's much more complicated: you
have to take endianness into account, and maybe also adjust the size of
an existing entry. This is the simplest fix that restores correctness.
Follow on from #84915 which adds the DbgRecord function variants. The C API
changes were reviewed in #85657.
# C API
Update the LLVMDIBuilderInsert... functions to insert DbgRecords instead
of debug intrinsics.
LLVMDIBuilderInsertDeclareBefore
LLVMDIBuilderInsertDeclareAtEnd
LLVMDIBuilderInsertDbgValueBefore
LLVMDIBuilderInsertDbgValueAtEnd
Calling these functions will now cause an assertion if the module is in the
wrong debug info format. They should only be used when the module is in "new
debug format".
Use LLVMIsNewDbgInfoFormat to query and LLVMSetIsNewDbgInfoFormat to change the
debug info format of a module.
Please see https://llvm.org/docs/RemoveDIsDebugInfo.html#c-api-change
(RemoveDIsDebugInfo.md) for more info.
# OCaml bindings
Add set_is_new_dbg_info_format and is_new_dbg_info_format to the OCaml bindings.
These can be used to set and query the current debug info mode. These will
eventually be removed, but are useful while we're transitioning between old and
new debug info formats.
Add string_of_lldbgrecord, like string_of_llvalue but prints DbgRecords.
In test dbginfo.ml, unconditionally set the module debug info to the new mode
and update CHECK lines to check for DbgRecords. Without this change the test
crashes because it attempts to insert DbgRecords (new default behaviour of
llvm_dibuild_insert_declare_...) into a module that is in the old debug info
mode.
Fixes#85406.
- Set the invalid bit for alias template decl where it has multiple
written template parameter lists (as the AST node is ill-formed)
- don't perform CTAD for invalid alias template decls
This PR improves type inference in general and deduces types of
composite data structures in particular. Also added a way to insert a
bitcast to make a fun call valid in case of arguments types mismatch due
to opaque pointers type inference.
The attached test `pointers/nested-struct-opaque-pointers.ll`
demonstrates new capabilities: the SPIRV code emitted for this test is
now (1) valid in a sense of data field types and (2) accepted by
`spirv-val`.
More strict LIT checks, support of more composite data structures and
improvement of fun calls from the perspective of type correctness are
main todo's at the moment.
Two fixes related to the callee/inlinee profile:
1. Fix the bug that the matching results are missing to distribute to
the callee profiles (should be pass-by-reference).
2. Narrow imported function matching to checksum mismatched functions.
More context: before we run matchings for all imported functions even
checksums are matched, however, after we fix 1), we got a regression,
it's likely due to the matching is not no-op for checksum matched
function, so we want to make it consistent to only run matching for
checksum mismatched (imported)functions. Since the
metadata(pseudo_probe_desc) are dropped for imported function, we
leverage the function attribute mechanism and add a new function
attribute(`profile-checksum-mismatch`) to transfer the info from
pre-link to post-link.
And fix `if (hasRelocationAddend())` to `usesRela` to properly treat
SHT_LLVM_CALL_GRAPH_PROFILE as SHT_REL. The incorrect does not cause a
problem because the synthesized SHT_LLVM_CALL_GRAPH_PROFILE has zero
addends.
This reverts commit 52431fdb1a.
The PR assumed `__threwValue` couldn't be 0, but it could be when the
thrown thing is not a longjmp but an exception, so that `if` check was
actually necessary.
Hello,
So, I worked on the fmaximum and fminimum functions recently and the
reviewers suggested the structure:
```
if (bitsx ...)
return ...;
if (bitsy ..)
return
...
return ...;
```
So I went ahead and did the same for fmin and fmax. I hope this isnt an
issue for you all. thanks.
---------
Co-authored-by: Job Hernandez <h93@protonmail.com>
- The direct use case (in [1]) is to add `llvm::IntervalMap` [2] and the allocator required by IntervalMap ctor [3]
to class `InstrProfSymtab` as owned members. The allocator class doesn't have a move-assignment operator;
and it's going to take much effort to implement move-assignment operator for the allocator class such that the
enclosing class is movable.
- There is only one use of compiler-generated move-assignment operator in the repo, which is in
CoverageMappingReader.cpp. Luckily it's possible to use std::unique_ptr<InstrProfSymtab> instead, so did the change.
[1] https://github.com/llvm/llvm-project/pull/66825
[2] 4c2f68840e/llvm/include/llvm/ADT/IntervalMap.h (L936)
[3] 4c2f68840e/llvm/include/llvm/ADT/IntervalMap.h (L1041)
The CloudABI (removed from Clang Driver) change from
https://reviews.llvm.org/D29982 does not make sense. GNU ld and gold
don't create dynamic sections for a non-PIC static link when
--export-dynamic is specified.
Creating dynamic sections is harmful in this scenario because we would
consider undefined weak symbols preemptible and generate GLOB_DAT
relocations, breaking the expectation that non-PIC static links only
contain IRELATIVE relocations.
In addition, there are other options that export symbols
(--export-dynamic-symbol, --dynamic-list, etc). It does not make sense
to special case --export-dynamic.