In the process of creating the MDNodes for the TBAA tag operations
we used to produce incomplete MDNodes like:
```
@__tbaa::@tbaa_tag_4 => !{!null, !null, i64 0}
@__tbaa::@tbaa_tag_7 => !{!null, !null, i64 0}
```
This caused the two tags to map to the same incomplete MDNode due to uniquing.
To prevent this, we have to use temporary MDNodes instead of !null's.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D141726
Before D135808, There would be endless loop interchange posibility (no
proper priority was there in profitability check. Any profitable check
may leads to loop-interchange). With this patch, there is no endless
interchange (priority in profitable check is defined. Order of decision
is 'Cache cost' check, 'InstrOrderCost', 'Vectorization'). Corrected the
dependency checking inside isProfitableForVectorization(), corrected the
checking of bad order loops in isProfitablePerInstrOrderCost().
Reviewed By: Meinersbur, bmahjour, #loopoptwg
Differential Revision: https://reviews.llvm.org/D135808
The parser of gpu.launch_func was incorrectly rejecting SSA values with
result numbers (`%0#0`) in the list of function arguments by using the
`parseArgument` function intended for region argument declarations, not
operands. Fix this by directly parsing comma-separated operands and
types.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D141851
Much like the changes in D141859, this patch allows the `nvptx-arch`
tool to be built and provided with every distrubition of LLVM / Clang.
This will make it more reliable for our toolchains to depend on. The
changes here configure a version that dynamically loads CUDA if it was
not found at build time.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D141861
We use the `amdgpu-arch` tool to query the installed GPUs at runtime.
One problem is that this tool is currently not build if the person
building the LLVM binary does not have the HSA runtime on their system.
This means that if someone built and distrubted an installation of LLVM
without HSA, then the user will not be able to use it even if they have
it on their system.
This patch makes us build this tool unconditionally and adds extra logic
to dynamically load HSA if it's present.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D141859
This patch adds an option to the method that fuses a producer with a
tiled consumer, to also yield from the tiled loops a value that can be
used to replace the original producer. This is only valid if it can be
assertained that the slice of the producer computed within each
iteration of the tiled loop nest does not compute slices of the
producer redundantly. The analysis to derive this is very involved. So
this is left to the caller to assertain. A test is added that mimics
the `scf::tileConsumerAndFuseProducersGreedilyUsingSCFForOp`, but also
yields the values of all fused producers. This can be used as a
reference for how a caller could use this functionality.
Differential Revision: https://reviews.llvm.org/D141028
I'm helping with the remaining regressions on D127115, and one of my candidate fixes caused some regressions with MVE interleaved shuffles due to poor handling of 'truncation' style shuffle masks (0,2,4,6,...).
This patch attempts to use the ARMISD::MVETRUNC node to handle these cases, based off existing code in LowerTruncate.
It handles both (0,2,4,6,...) and (1,3,5,7,....) 'top' style patterns (assuming no endian problems). I shift down the 'top' patterns - a basic search of ARM docs suggests MVE has some top/bottom truncation/narrowing instructions but I don't seem to be able to get them to be used.
Differential Revision: https://reviews.llvm.org/D141791
(A s>> (BW - 1)) + (zext (A s> 0)) --> (A s>> (BW - 1)) | (zext (A != 0))
https://alive2.llvm.org/ce/z/V-nM8N
This is not the form that we currently match as m_Signum(),
but I'm not sure if one is better than the other, so there's
a follow-up patch needed either way.
For this patch, it should be better for analysis to use a
not-null test and bitwise logic rather than >0 with add.
Codegen doesn't seem significantly different on any targets
that I looked at.
Also note that none of these variants is shown in issue #60012 -
those generally include at least one 'select', so that's likely
where these patterns will end up.
For each symbol in a /delayloaded library, lld injects a small piece of
code to handle the symbol lazy loading. This code doesn't have unwind
information, which may be troublesome.
Provide these information for AMD64.
Thanks to Yannis Juglaret <yjuglaret@mozilla.com> for contributing the
unwinding info and for his support while crafting this patch.
Fix#59639
Differential Revision: https://reviews.llvm.org/D141691
This fixes the remaining errors when building the llvm-project
with `LLVM_ENABLE_MODULES=ON` (and `LLVM_ENABLE_LOCAL_SUBMODULE_VISIBILITY=ON`,
which currently is the LLVM default).
Previously this would fail in the `CXX_SUPPORTS_MODULES` check.
Differential Revision: https://reviews.llvm.org/D141833
The default extensions would be better added in the TargetParser, not by
the driver. This removes the addition of +i8mm and +bf16 features in the
driver as they are already added in 8.6/9.1 architectures. AEK_MOPS and
AEK_HBC have been added to 8.8/9.3 architectures to replace the need for
+hbc and +mops features.
Differential Revision: https://reviews.llvm.org/D141518
This will probably be the first in a series of patches that tries to
enable code generation for ARM SME (extension of SVE).
Since SME's core operation is the outer product instruction, I figured
that it would probably be a good idea to enable the outer product
operation to properly accept and generate scalable vectors.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D138718
Public getters are provided for other similar members of both the CIE
and FDE, these fields are also displayed by the llvm-drawfdump tool,
so it seems like not exposing them was likely an oversight.
These are needed for tools based on LLVM that need access to all the
parsed DWARF data.
Differential Revision: https://reviews.llvm.org/D141475
All `apply...` functions now return a LogicalResult indicating whether the iterative process converged or not.
Differential Revision: https://reviews.llvm.org/D141845
1. Make explicit that the folder where to build a subproject in stand-alone mode can not be the same folder where LLVM was build.
2. Add a cut 'n paste example for building stand-alone `clang`.
Differential Revision: https://reviews.llvm.org/D141825
This patch is to fix regression of D122875. X86 has fpext instructions
supporting rmb form, which takes advantage of fpext(fplat(X)) than
splat(fpext(X)).
Reviewed By: RKSimon, skan
Differential Revision: https://reviews.llvm.org/D141657
With codegen prior to this patch, truly indirect arguments -- i.e.
those that are not `byval` -- can have their debug information lost even
at O0. Because indirect arguments are passed by pointer, and this
pointer is likely placed in a register as per the function call ABI,
debug information is lost as soon as the register gets clobbered.
This patch solves the issue by storing the address of the parameter on
the stack, using a similar strategy employed when C++ references are
passed. In other words, this patch changes codegen from:
```
define @foo(ptr %arg) {
call void @llvm.dbg.declare(%arg, [...], metadata !DIExpression())
```
To:
```
define @foo(ptr %arg) {
%ptr_storage = alloca ptr
store ptr %arg, ptr %ptr_storage
call void @llvm.dbg.declare(%ptr_storage, [...], metadata !DIExpression(DW_OP_deref))
```
Some common cases where this may happen with C or C++ function calls:
1. "Big enough" trivial structures passed by value under the ARM ABI.
2. Structures that are non-trivial for the purposes of call (as per
the Itanium ABI) when passed by value.
A few tests were matching the wrong alloca (matching against the new
alloca, instead of the old one), so they were updated to either match
both allocas or include a `,` right after the alloca type, to prevent
matching against a pointer type.
Differential Revision: https://reviews.llvm.org/D141381
__config defines _LIBCPP_DEPRECATED_ABI_DISABLE_PAIR_TRIVIAL_COPY_CTOR
on FreeBSD, which conflicts with a command-line definition used by the
non_trivial_copy_move_ABI test.
Add -Wno-macro-redefined to ADDITIONAL_COMPILE_FLAGS in this test.
Reviewed By: philnik
Differential Revision: https://reviews.llvm.org/D141774
generation code from DWARFLinker. It adds command line option:
--build-accelerator [none,DWARF]
Build accelerator tables(default: none)
=none - Do not build accelerators
=DWARF - Build accelerator tables according to the resulting DWARF version
DWARFv4: .debug_pubnames and .debug_pubtypes
DWARFv5: .debug_names
Differential Revision: https://reviews.llvm.org/D139638
There is no need to update the DT here, because there must be a unique
latch. Hence if the latch is not exiting it must directly branch back
to the original loop header and does not dominate any nodes.
Skipping a DT update here simplifies D141487.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D141810
Currently we only reduce vector.reduce.add to sdot if the vectors are either <8 x i8> or <16 x i8>.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D141692
Jump threading can replace select and unconditional branch with
conditional branch, but when doing so loses profile information.
This destructive transform can eventually lead to a performance
degradation due to folding of branches in
shouldFoldCondBranchesToCommonDestination as branch probabilities
are no longer known.
The first version was reverted due to assert caused by i32 overflow,
fixed in this version.
Patch by Roman Paukner!
Differential Revision: https://reviews.llvm.org/D138132
Reviewed By: mkazantsev
Adds an instruction mapping to SMEInstrFormats which matches SME
pseudos with the real instructions they are transformed to.
A new flag is also added to AArch64Inst (SMEMatrixType), which is
used to indicate the base register required when emitting many
of the SME instructions.
This reduces the number of pseudos handled by the switch statement
in EmitInstrWithCustomInserter.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D136856
`opencl-c-base.h` always defines 5 particular feature macros for
SPIR-V, making it impossible to disable those features.
To allow disabling any of those features, let the header recognize
`__undef_<feature>` macros. The user can then pass the
`-D__undef_<feature>` flag on the command line to disable a specific
feature. The __undef macro could potentially also be set from
`-cl-ext=-feature`, but for now only change the header and only
provide __undef macros for the 5 features that are always enabled in
`opencl-c-base.h`.
Differential Revision: https://reviews.llvm.org/D141297