Commit Graph

494189 Commits

Author SHA1 Message Date
Andrzej Warzyński
d7753989ea [mlir][linalg] Add e2e test for linalg.mmt4d + pack/unpack (#84964)
This is a follow-up for #81790. This patch basically extends:

  * test/Integration/Dialect/Linalg/CPU/mmt4d.mlir

with pack/unpack ops so that to overall computation is a matrix
multiplication (as opposed to linalg.mmt4d). For comparison (and to make
it easier to verify correctness), linalg.matmul is also included in the
test.
2024-03-28 14:52:08 +00:00
Alexey Bataev
d7975c9d93 [SLP]Add better minbitwidth analysis for udiv/urem instructions.
Adds improved bitwidth analysis for udiv/urem instructions. The
analysis is based on similar version in InstCombiner.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/85928
2024-03-28 10:35:15 -04:00
Alfie Richards
ff870aeeb7 [ARM] Add reference to ARMAsmParser in ARMOperand (#86110) 2024-03-28 14:06:40 +00:00
Yingwei Zheng
a515ea553f [OCaml] Fix buildbot failure caused by caa2258. NFC.
Closes #86944.
2024-03-28 22:00:04 +08:00
Akira Hatanaka
84780af4b0 [CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86923)
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.

This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.

In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.

This reapplies d9a685a9dd, which was
reverted because it broke ubsan bots. There seems to be a bug in
coroutine code-gen, which is causing EmitTypeCheck to use the wrong
alignment. For now, pass alignment zero to EmitTypeCheck so that it can
compute the correct alignment based on the passed type (see function
EmitCXXMemberOrOperatorMemberCallExpr).
2024-03-28 06:54:36 -07:00
Amy Kwan
a3efc53f16 [AIX][TLS] Produce a faster local-exec access sequence for the "aix-small-tls" global variable attribute (#83053)
Similar to 3f46e5453d, this patch allows
the backend to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided, for local-exec TLS
variables that are annotated with the "aix-small-tls" attribute.

The expectation is for local-exec TLS variables to be set with this
attribute through PGO. Furthermore, the optimized access sequence is
only generated for local-exec TLS variables annotated with
"aix-small-tls", only if they are less than ~32KB in size.
2024-03-28 09:18:45 -04:00
Rolf Morel
eacda36c7d [SCF][Transform] Add support for scf.for in LoopFuseSibling op (#81495)
Adds support for fusing two scf.for loops occurring in the same block.
Uses the rudimentary checks already in place for scf.forall (like the
target loop's operands being dominated by the source loop).

- Fixes a bug in the dominance check whereby it was checked that values
in the target loop themselves dominated the source loop rather than the
ops that define these operands.
- Renames the LoopFuseSibling op to LoopFuseSiblingOp.
- Updates LoopFuseSiblingOp's description.
- Adds tests for using LoopFuseSiblingOp on scf.for loops, including one
which fails without the fix for the dominance check.
- Adds tests checking the different failure modes of the dominance
checker.
- Adds test for case whereby scf.yield is automatically generated when
there are no loop-carried variables.
2024-03-28 14:13:08 +01:00
Oleksandr "Alex" Zinenko
91856b34e3 [mlir] move MatchOpInterface under Transform/Interfaces (#86899)
This is similar to the TransformOpInterface move.
2024-03-28 14:00:22 +01:00
Egor Zhdan
96c8e2e88c [APINotes] For a re-exported module, look for APINotes in the re-exporting module's apinotes file
This upstreams https://github.com/apple/llvm-project/pull/8063.

If module FooCore is re-exported through module Foo (by using
`export_as` in the modulemap), look for attributes of FooCore symbols in
Foo.apinotes file.

Swift bundles `std.apinotes` file that adds Swift-specific attributes to
the C++ stdlib symbols. In recent versions of libc++, module std got
split into multiple top-level modules, each of them is re-exported
through std. This change allows us to keep using a single modulemap file
for all supported C++ stdlibs.

rdar://121680760
2024-03-28 12:59:57 +00:00
Leandro Lupori
a2982a29fd Revert "[compiler-rt] Allow building builtins.a without a libc (#86737)"
This reverts commit 8669225863.

Reverting due to buildbot failures.
2024-03-28 09:56:14 -03:00
Krzysztof Parzyszek
e8e80d07c8 [OpenMP] Apply post-commit review comments in PR86289, NFC (#86828)
Fix include guard name, fix typo, add comments with OpenMP spec
sections.
2024-03-28 07:52:47 -05:00
VitaNuo
56a10a3c79 [clangd][trace] Fix comment to mention that trace spans are measured … (#86938)
…in milliseconds rather than seconds.
2024-03-28 13:48:09 +01:00
Krzysztof Parzyszek
79199753fd [flang][OpenMP] Make several function local to OpenMP.cpp, NFC (#86726)
There were several functions, mostly reduction-related, that were only
called from OpenMP.cpp. Remove them from OpenMP.h, and make them local
in OpenMP.cpp:
- genOpenMPReduction
- findReductionChain
- getConvertFromReductionOp
- updateReduction
- removeStoreOp

Also, move the function bodies out of the "public" section.
2024-03-28 07:46:01 -05:00
Zaara Syeda
4ddd4ed7fe [AIX][TOC] -mtocdata/-mno-tocdata fix non deterministic iteration order (#86840)
Failure with testcase toc-conf.c observed when building with
LLVM_REVERSE_ITERATION=ON.
Changing from using llvm::StringSet to std::set<llvm:StringRef> to
ensure iteration order is deterministic. Note: the functionality of the
feature does not require a specific iteration order, however, this will
allow testing to be consistent.
From llvm docs:
The advantages of std::set are that its iterators are stable (deleting
or inserting an element from the set does not affect iterators or
pointers to other elements) and that iteration over the set is
guaranteed to be in sorted order.
2024-03-28 08:37:25 -04:00
Haojian Wu
a042fcbe45 [clang] Bailout when the substitution of template parameter mapping is invalid. (#86869)
Fixes #86757

We missed to handle the invalid case when substituting into the
parameter mapping of an constraint during normalization.
The constructor of `InstantiatingTemplate` will bail out (no
`CodeSynthesisContext` will be added to the instantiation stack) if
there was a fatal error, consequently we should stop doing any further
template instantiations.
2024-03-28 13:10:02 +01:00
Haojian Wu
fb8cccf88c [AST] Print the "aggregate" for aggregate deduction guide decl. (#84018)
I found this is useful for debugging purpose to identify different kind
of deduction guide decl.
2024-03-28 13:07:58 +01:00
Haohai Wen
896037c75a [LoopRotate] Set loop back edge weight to not less than exit weight (#86496)
Branch weight from sample-based PGO may be not inaccurate due to
sampling. If the loop body must be executed, then origin loop back
edge weight must be not less than exit weight.
2024-03-28 20:07:15 +08:00
Joseph Huber
daa755ba7b [libc] Disable testing for NVPTX debug builds (#86856)
Summary:
Debug builds don't optimize out certain parts of the code that end up
making the GPU backend crash. This results in regular builds not being
successful just to build the testing objects. Disable them for now in
debug mode.
2024-03-28 06:49:15 -05:00
Marc Auberer
9d61f7ea66 [flang] Remove duplicate call to va_end() (#86865)
Fixes #86825
2024-03-28 12:42:44 +01:00
Marc Auberer
a495cfbf7d [IR][NFC] Cleanup CmpInst signatures / code docs (#86441)
Change param names to recommended upper case format for static methods
in CmpInst for consistency
Implement suggestion from @dtcxzyw.

cc @dtcxzyw @tschuett
2024-03-28 12:42:02 +01:00
Andrew Ng
c9db031c48 [Support] Fix color handling in formatted_raw_ostream (#86700)
The color methods in formatted_raw_ostream were forwarding directly to
the underlying stream without considering existing buffered output. This
would cause incorrect colored output for buffered uses of
formatted_raw_ostream.

Fix this issue by applying the color to the formatted_raw_ostream itself
and temporarily disabling scanning of any color related output so as not
to affect the position tracking.

This fix means that workarounds that forced formatted_raw_ostream
buffering to be disabled can be removed. In the case of llvm-objdump,
this can improve disassembly performance when redirecting to a file by
more than an order of magnitude on both Windows and Linux. This
improvement restores the disassembly performance when redirecting to a
file to a level similar to before color support was added.
2024-03-28 11:41:49 +00:00
Matt Arsenault
c13556c0b0 AMDGPU: Document more backend recognized attributes (#80239) 2024-03-28 14:27:14 +03:00
Ulrich Weigand
b999e631c0 [OpenMP] Fix node destruction race in __kmpc_omp_taskwait_deps_51 (#86130)
The __kmpc_omp_taskwait_deps_51 allocates a kmp_depnode_t node on its
stack, and there is currently a race condition where another thread
might still be accessing that node after the function has returned and
its stack frame was released.

While the function does wait until the node's npredecessors count has
reached zero before exiting, there is still a window where the function
that last decremented the npredecessors count assumes the node is still
accessible.

For heap-allocated kmp_depnode_t nodes, this normally works via a
separate ndeps count that only reaches zero at the point where no
accesses to the node are expected at all; in fact, at this point the
heap allocation will be freed.

For this case of a stack-allocated kmp_depnode_t node, it therefore
makes sense to similarly respect the ndeps count; we need to wait until
this reaches 1 (not 0, because it is not heap-allocated so there's
always one extra count to prevent it from being freed), before we can
safely deallocate our stack frame.

As this is expected to be a short race window of only a few
instructions, it should be fine to just use a busy wait loop checking
the ndeps count.

Fixes: https://github.com/llvm/llvm-project/issues/85963
2024-03-28 12:15:39 +01:00
Dmitri Gribenko
28b196e7fc [llvm] Write temporary test files into %t
... instead of the source tree
2024-03-28 11:55:46 +01:00
Freddy Ye
36b4b9d988 [X86] Support immediate folding for CCMP/CTEST (#86616)
E.g.
%0:gr32 = MOV32ri 81
CTEST32rr %0, %1, 2, 10, implicit-def $eflags, implicit $eflags
=>
CTEST32ri %1, 81, 2, 10, implicit-def $eflags, implicit $eflags
2024-03-28 18:54:32 +08:00
Shan Huang
79ba323bdd [Debuginfo][GVNHoist] Fix #86227: update the debug location of the hoisted GEP (#86236)
This PR fixes #86227.
2024-03-28 18:43:03 +08:00
J. Ryan Stinnett
8a7f021f9e [GitHub] Fix typos in automation (#86886) 2024-03-28 10:37:31 +00:00
Shan Huang
8963a476cc Fix #86269: remove unused variable (#86927)
Remove the unused variable `BI` introduced in #86269.
2024-03-28 11:24:18 +01:00
bvlgah
e640d9e725 [RISCV][GlobalISel] Fix legalizing ‘llvm.va_copy’ intrinsic (#86863)
Hi, I spotted a problem when running benchmarking programs on a RISCV64
device.

## Issue

Segmentation faults only occurred while running the programs compiled
with `GlobalISel` enabled.

Here is a small but complete example (it is adopted from [Google's
benchmark
framework](95a9f0d0b4/MicroBenchmarks/libs/benchmark/src/colorprint.cc (L85-L119))
to reproduce the issue,

```cpp
#include <cstdarg>
#include <cstdio>
#include <iostream>
#include <memory>
#include <string>

std::string FormatString(const char* msg, va_list args) {
  // we might need a second shot at this, so pre-emptivly make a copy
  va_list args_cp;
  va_copy(args_cp, args);

  std::size_t size = 256;
  char local_buff[256];
  auto ret = vsnprintf(local_buff, size, msg, args_cp);

  va_end(args_cp);

  // currently there is no error handling for failure, so this is hack.
  // BM_CHECK(ret >= 0);

  if (ret == 0)  // handle empty expansion
    return {};
  else if (static_cast<size_t>(ret) < size)
    return local_buff;
  else {
    // we did not provide a long enough buffer on our first attempt.
    size = static_cast<size_t>(ret) + 1;  // + 1 for the null byte
    std::unique_ptr<char[]> buff(new char[size]);
    ret = vsnprintf(buff.get(), size, msg, args);
    // BM_CHECK(ret > 0 && (static_cast<size_t>(ret)) < size);
    return buff.get();
  }
}

std::string FormatString(const char* msg, ...) {
  va_list args;
  va_start(args, msg);
  auto tmp = FormatString(msg, args);
  va_end(args);
  return tmp;
}

int main() {
  std::string Str =
      FormatString("%-*s %13s %15s %12s", static_cast<int>(20),
                   "Benchmark", "Time", "CPU", "Iterations");
  std::cout << Str << std::endl;
}
```

Use `clang++ -fglobal-isel -o main main.cpp` to compile it.

## Cause

I have examined MIR, it shows that these segmentation faults resulted
from a small mistake about legalizing the intrinsic function
`llvm.va_copy`.


36e74cfdbd/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp (L451-L453)

`DstLst` and `Tmp` are placed in the wrong order.

## Changes

I have tweaked the test case `CodeGen/RISCV/GlobalISel/vararg.ll` so
that `s0` is used as the frame pointer (not in all checks) which points
to the starting address of the save area. I believe that it helps reason
about how `llvm.va_copy` is handled.
2024-03-28 13:09:18 +03:00
Luke Lau
856e815ca1 [DAGCombiner] Set disjoint flag in add->or and xor->or combines (#86925)
We check DAG.haveNoCommonBitsSet so the operands will be known to be
disjoint.

I couldn't think of a codegen test case since most targets aren't
checking hasDisjoint yet, apart from RISCV in the or_is_add pattern, but
it also falls back to computeKnownBits.
2024-03-28 18:08:59 +08:00
Shan Huang
912e2c4758 [Debuginfo][TailCallElim] Fix #86262: drop the debug location of entry branch (#86269)
This pr fixes #86262.

---------

Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
2024-03-28 17:37:33 +08:00
martinboehme
8d77d362af [clang][dataflow] Introduce a helper class for handling record initializer lists. (#86675)
This is currently only used in one place, but I'm working on a patch
that will
use this from a second place. And I think this already improves the
readability
of the one place this is used so far.
2024-03-28 10:12:45 +01:00
Luke Lau
eff4593a64 [RISCV] Add test case for missed vwaddu.vv due to add->or combine. NFC
We should be able to recover this with combineBinOp_VLToVWBinOp_VL if we
check that the or has the disjoint flag set.
2024-03-28 16:58:52 +08:00
Simon Tatham
88b10f3e3a [MC][AArch64] Segregate constant pool caches by size. (#86832)
If you write a 32- and a 64-bit LDR instruction that both refer to the
same constant or symbol using the = syntax:

```
  ldr w0, =something
  ldr x1, =something
```

then the first call to `ConstantPool::addEntry` will insert the constant
into its cache of existing entries, and the second one will find the
cache entry and reuse it. This results in a 64-bit load from a 32-bit
constant, reading nonsense into the other half of the target register.

In this patch I've done the simplest fix: include the size of the
constant pool entry as part of the key used to index the cache. So now
32- and 64-bit constant loads will never share a constant pool entry.

There's scope for doing this better, in principle: you could imagine
merging the two slots with appropriate overlap, so that the 32-bit load
loads the LSW of the 64-bit value. But that's much more complicated: you
have to take endianness into account, and maybe also adjust the size of
an existing entry. This is the simplest fix that restores correctness.
2024-03-28 08:57:27 +00:00
Orlando Cazalet-Hyams
2a2fd488b6 [RemoveDIs] Update DIBuilder C API and OCaml bindings [2/2] (#86529)
Follow on from #84915 which adds the DbgRecord function variants. The C API
changes were reviewed in #85657.

# C API

Update the LLVMDIBuilderInsert... functions to insert DbgRecords instead
of debug intrinsics.

    LLVMDIBuilderInsertDeclareBefore
    LLVMDIBuilderInsertDeclareAtEnd
    LLVMDIBuilderInsertDbgValueBefore
    LLVMDIBuilderInsertDbgValueAtEnd

Calling these functions will now cause an assertion if the module is in the
wrong debug info format. They should only be used when the module is in "new
debug format".

Use LLVMIsNewDbgInfoFormat to query and LLVMSetIsNewDbgInfoFormat to change the
debug info format of a module.

Please see https://llvm.org/docs/RemoveDIsDebugInfo.html#c-api-change
(RemoveDIsDebugInfo.md) for more info.

# OCaml bindings

Add set_is_new_dbg_info_format and is_new_dbg_info_format to the OCaml bindings.
These can be used to set and query the current debug info mode. These will
eventually be removed, but are useful while we're transitioning between old and
new debug info formats.

Add string_of_lldbgrecord, like string_of_llvalue but prints DbgRecords.

In test dbginfo.ml, unconditionally set the module debug info to the new mode
and update CHECK lines to check for DbgRecords. Without this change the test
crashes because it attempts to insert DbgRecords (new default behaviour of
llvm_dibuild_insert_declare_...) into a module that is in the old debug info
mode.
2024-03-28 08:54:27 +00:00
Haojian Wu
63ea5a4088 [clang] Invalidate the alias template decl if it has multiple written template parameter lists. (#85413)
Fixes #85406.

- Set the invalid bit for alias template decl where it has multiple
written template parameter lists (as the AST node is ill-formed)
- don't perform CTAD for invalid alias template decls
2024-03-28 09:13:26 +01:00
Haohai Wen
38f5596fed [LoopRotate] Add test to track update for inaccurate branch weight (#86495)
Branch weight from sample-based PGO may be not inaccurate due to
sampling. This test tracks such case where updateBranchWeights wraps
unsigned.
2024-03-28 15:33:01 +08:00
Vyacheslav Levytskyy
b7ac8fddb5 [SPIR-V] Improve type inference: deduce types of composite data structures (#86782)
This PR improves type inference in general and deduces types of
composite data structures in particular. Also added a way to insert a
bitcast to make a fun call valid in case of arguments types mismatch due
to opaque pointers type inference.

The attached test `pointers/nested-struct-opaque-pointers.ll`
demonstrates new capabilities: the SPIRV code emitted for this test is
now (1) valid in a sense of data field types and (2) accepted by
`spirv-val`.

More strict LIT checks, support of more composite data structures and
improvement of fun calls from the perspective of type correctness are
main todo's at the moment.
2024-03-28 08:08:06 +01:00
Petr Hosek
e5b9399494 [libc] Move baremetal write_to_stderr implementation to io.cpp (#86890)
This is required to avoid multiple definitions error.
2024-03-27 23:59:24 -07:00
hchandel
5dfc446d75 [RISCV] Remove Unnecessary Semicolon. NFC (#86911)
Removes Unnecessary Semicolon

Co-authored-by: Harsh Chandel <hchandel@hu-hchandel-hyd.qualcomm.com>
2024-03-27 23:13:47 -07:00
Kazu Hirata
ed801ab460 [Transforms] Fix an unused variable warning
llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h:89:28:
  error: private field 'LTOPhase' is not used
  [-Werror,-Wunused-private-field]
2024-03-27 23:11:16 -07:00
Lei Wang
f8bab38b6d [CSSPGO] Fix the issue of missing callee profile matches (#85715)
Two fixes related to the callee/inlinee profile:

1. Fix the bug that the matching results are missing to distribute to
the callee profiles (should be pass-by-reference).
2. Narrow imported function matching to checksum mismatched functions. 

More context: before we run matchings for all imported functions even
checksums are matched, however, after we fix 1), we got a regression,
it's likely due to the matching is not no-op for checksum matched
function, so we want to make it consistent to only run matching for
checksum mismatched (imported)functions. Since the
metadata(pseudo_probe_desc) are dropped for imported function, we
leverage the function attribute mechanism and add a new function
attribute(`profile-checksum-mismatch`) to transfer the info from
pre-link to post-link.
2024-03-27 22:27:22 -07:00
Fangrui Song
a41bfea5c0 [MC] Simplify ELFObjectWriter. NFC
And fix `if (hasRelocationAddend())` to `usesRela` to properly treat
SHT_LLVM_CALL_GRAPH_PROFILE as SHT_REL. The incorrect does not cause a
problem because the synthesized SHT_LLVM_CALL_GRAPH_PROFILE has zero
addends.
2024-03-27 22:10:11 -07:00
Heejin Ahn
6b7ecc7979 Revert "[WebAssembly] Remove threwValue comparison after __wasm_setjmp_test (#86633)"
This reverts commit 52431fdb1a.

The PR assumed `__threwValue` couldn't be 0, but it could be when the
thrown thing is not a longjmp but an exception, so that `if` check was
actually necessary.
2024-03-28 04:41:29 +00:00
Owen Pan
d9e3e11ae5 [clang-format] Exit clang-format-diff only after all diffs are printed (#86776)
See
https://github.com/llvm/llvm-project/pull/70883#issuecomment-2020811077.
2024-03-27 21:23:37 -07:00
Owen Pan
e766f87b92 [clang-format] Handle C++ Core Guidelines suppression tags (#86458)
Fixes #86451.
2024-03-27 21:22:57 -07:00
Job Henandez Lara
056b404354 [libc][NFC] refactor fmin and fmax (#86718)
Hello,

So, I worked on the fmaximum and fminimum functions recently and the
reviewers suggested the structure:

```
if (bitsx ...)
  return ...;
if (bitsy ..)
  return 
...
return ...;
```
So I went ahead and did the same for fmin and fmax. I hope this isnt an
issue for you all. thanks.

---------

Co-authored-by: Job Hernandez <h93@protonmail.com>
2024-03-27 23:55:12 -04:00
Mingming Liu
2c7610cc43 [nfc]Make InstrProfSymtab non-copyable and non-movable (#86882)
- The direct use case (in [1]) is to add `llvm::IntervalMap` [2]  and the allocator required by IntervalMap ctor [3]
   to class `InstrProfSymtab` as owned members. The allocator class doesn't have a move-assignment operator; 
   and it's going to take much effort to implement move-assignment operator for the allocator class such that the
   enclosing class is movable.
- There is only one use of compiler-generated move-assignment operator in the repo, which is in 
   CoverageMappingReader.cpp. Luckily it's possible to use std::unique_ptr<InstrProfSymtab> instead, so did the change.

[1] https://github.com/llvm/llvm-project/pull/66825
[2] 4c2f68840e/llvm/include/llvm/ADT/IntervalMap.h (L936)
[3] 4c2f68840e/llvm/include/llvm/ADT/IntervalMap.h (L1041)
2024-03-27 20:40:01 -07:00
Fangrui Song
070d7af0c5 [ELF] --export-dynamic: don't create dynamic sections for non-PIC static links
The CloudABI (removed from Clang Driver) change from
https://reviews.llvm.org/D29982 does not make sense. GNU ld and gold
don't create dynamic sections for a non-PIC static link when
--export-dynamic is specified.

Creating dynamic sections is harmful in this scenario because we would
consider undefined weak symbols preemptible and generate GLOB_DAT
relocations, breaking the expectation that non-PIC static links only
contain IRELATIVE relocations.

In addition, there are other options that export symbols
(--export-dynamic-symbol, --dynamic-list, etc). It does not make sense
to special case --export-dynamic.
2024-03-27 20:04:59 -07:00
Fangrui Song
443baed56c [ELF,test] Update tests that depend on --export-dynamic creating dynamic sections
The CloudABI change from https://reviews.llvm.org/D30175 does not make sense.
Update tests not to rely on the --export-dynamic behavior.
2024-03-27 20:01:30 -07:00