In the presence of an invalid structured binding decomposition, some
binding packs may be invalid and trying to transform them would produce
a recovery expression that does not contains a pack, leading to
assertions in places where we would expect a pack at that stage.
Fixes#125165
Some of the functions in `#functions` may have several inlined
instances, but also an out-of-line definition.
Therefore, for complex enough DWARF input, `#functions` - `#inlined
functions` would not give us the number of out-of-line function
definitions.
`llvm-dwarfdump`, however, already keeps track of those; print it as
part of the statistics, as this number is useful in certain scenarios.
...when there are invalid constraints.
When attaching a `TypeConstraint`, in case of error, the trailing
pointer that is supposed to point to the constraint is left
uninitialized.
Sometimes the uninitialized value will be a `nullptr`, but at other
times it will not. If we traverse the AST (for instance, dumping it, or
when writing the BMI), we may get a crash depending on the value that
was left. The serialization may also contain a bogus value.
In this commit, we always initialize the `PlaceholderTypeConstraint`
with `nullptr`, to avoid accessing this uninitialized memory.
This does not affect only modules, but it causes a segfault more
consistently when they are involved.
The test case was reduced from `mp-units`.
---------
Co-authored-by: Erich Keane <ekeane@nvidia.com>
During my previous cleanup (#127403), I did not notice that we defined a
type alias for the base class. This type alias allows us to use the
shorter form Base::Base, and this PR switches to that.
This patch implements support for the UNROLL_AND_JAM directive to enable
or disable unrolling and jamming on a `DO LOOP`.
It must be placed immediately before a `DO LOOP` and applies only to the
loop that follows. N is an integer that specifying the unrolling factor.
This is done by adding an attribute to the branch into the loop in LLVM
to indicate that the loop should unrolled and jammed.
As linalg.batch_matmul has been moved into tablegen from OpDSL, its
derived python wrapper no longer exist.This patch adds the required
python wrapper.
Also refactors the BatchmatmulOp printer to make it consistent with its
parser.
Update isNoWrap to make the IR Ptr argument optional. This allows using
isNoWrap when dealing with things like pointer-selects, where a select
is translated to multiple pointer SCEV expressions, but there is no IR
value that can be used. We don't try to retrieve pointer values for the
pointer SCEVs and using info from the IR would not be safe. For example,
we cannot use inbounds, because the pointer may never be accessed.
PR: https://github.com/llvm/llvm-project/pull/127410
These member types were deprecated in C++17 by P0174R2 and removed in
C++20 by P0619R4, but the changes in `<variant>` seem missing.
Drive-by: Replace one `_NOEXCEPT` with `noexcept` as the `hash`
specialization is C++17-and-later only.
This patch adds intrinsics for tcgen05.cp and
tcgen05.shift instructions.
lit tests are added and verified with a
ptxas-12.8 executable.
Docs are updated in the NVPTXUsage.rst file.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
During a recent change, the build system accidentally dropped the
(theoretical) support for the CLC builtins library to build
target-specific builtins from the 'amdgpu' directory, due to a change in
variable names. This functionality wasn't being used but was spotted
during another code review.
This commit takes the opportunity to clean up and better document the
code that manages the list of directories to search for builtin
implementations.
While fixing this, some references to now-removed SOURCES files were
discovered which have been cleaned up.
The motivation is #123622 and the fact that is hard to fine the last
line entry in a given range. `FindLineEntryByAddress(range_end-1)` is
the best we have, but it's not ideal because it has a magic -1 and that
it relies on there existing a line entry at that address (generally, it
should be there, but if for some case it isn't, we might end up ignoring
the entries that are there (or -- like my incorrect fix in #123622 did
-- iterating through the entire line table).
What we really want is to get the last entry that exists in the given
range. Or, equivalently (and more STL-like) the first entry after that
range. This is what these functions do. I've used the STL names since
they do pretty much exactly what the standard functions do (the main
head-scratcher comes from the fact that our entries represent ranges
rather than single values).
The functions can also be used to simplify the maze of `if` statements
in `FindLineEntryByAddress`, but I'm keeping that as a separate patch.
For now, I'm just adding some unit testing for that function to gain
more confidence that the patch does not change the function behavior.
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Enable optional ISA extensions on Grace when mcpu=grace
is used: sve2-sm4, sve2-aes, sve2-sha3.
Grace is no longer an alias, but a separate CPU definition.
Part of the DECLARE REDUCTION was already supported by the parser, but
the semantics to add the reduction identifier wasn't implemented.
The semantics would not accept the name given by the reduction, so a few
lines added to support that.
Some tests were in place but not quite working, so fixed those up too.
Adding new tests for unparsing and parse-tree, as well as checking the
symbolic name being generated.
Lowering of DECLARE REDUCTION is not supported in this patch, and a test
that it hits the relevant TODO is in this patch (most of this was
already existing, but not actually testing the TODO message).
This commit improves the behaviour of (__clc_)nextafter around zero.
Specifically, the nextafter value of very small negative numbers in the
positive direction is now negative zero. Previously we'd return positive
zero.
This behaviour is not required as far as OpenCL is concerned: at least,
the CTS isn't testing for it. However, this change does bring our
implementation into bit-equivalence with (libstdc++'s implementation of)
std::nextafter, tested on all possible values of 32-bit float towards
both positive and negative INFINITY.
Furthermore, since the implementation of libclc's floating-point 'rtp'
and 'rtz' conversions use __clc_nextafter, the previous behaviour was
resulting in CTS validation issues. For example, when converting float
-0x1.000002p-25 to half, rounding towards zero or positive infinity,
nextafter was returning +0.0, whereas the correct conversion requires us
to return -0.0.
We could work around this issue in the conversion functions, but since
the change to nextafter is small enough and the behaviour around zero
matches libstdc++, the fix feels at home there.
This commit also converts several variables to unsigned types to avoid
undefined behaviour surrounding signed underflow on the subtractions.
It also converts some variables to be kept in floating-point types, using
fabs to get the absolute value rather than by bit-hacking.
This will allow vectorizing these calls (after a few more patches). This
should not change the codegen for targets that enable the use of AA
during the codegen (in `TargetSubtargetInfo::useAA()`). This includes
targets such as AArch64. This notably does not include x86 but can be
worked around by passing `-mllvm -combiner-global-alias-analysis=true`
to clang.
Follow up to #114086.
This also includes comparing the two ImpliedDo
Details
- For ArrayConstructor, check if x and y have the same
elements and type
- For ImpliedDo, check if x and y have the same lower,
upper, stride and values
Fixes: https://github.com/llvm/llvm-project/issues/104526
gfx940 and gfx941 are no longer supported. This is the last one of a
series of PRs to remove them from the code base.
The ISA documentation still contains a lot of links and file names with
the "gfx940" identifier. Changing them to "gfx942" is probably not worth
the cost of breaking all URLs to these pages that users might have saved
in the past.
For SWDEV-512631
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all documentation occurrences of gfx940/gfx941 except
for the gfx940 ISA description, which will be the subject of a separate
PR.
For SWDEV-512631
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all non-documentation occurrences of gfx940/gfx941 from
the llvm directory, and the remaining occurrences in clang.
Documentation changes will follow.
For SWDEV-512631
The standard libcalls for half to float and float to half conversion are
__extendhfsf2 and __truncsfhf2. However, LLVM currently uses
__gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these
libcalls are an ARM-ism and only provided by libgcc on that platform.
compiler-rt always provides both libcalls.
Use the standard libcalls by default, and only use the __gnu libcalls on
ARM.
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.
For SWDEV-512631
Handles both BWI and non-BWI cases (skips PMOV*XBW without BWI).
The vector-interleaved-store-i8-stride-8.ll VPTERNLOG diffs are due to
better value tracking now recognizing the zero-extension patterns where
before it was any-extension
Delete `equivalenceAnalysis`, which has been incorporated into the
`getAliasingValues` API. Also add an additional test case to ensure that
equivalence is properly propagated across function boundaries.
Issue: Compilation abnormally terminates in parallel default(private)
Documentation reference:
A threadprivate variable must not appear as the base variable of a list
item in any clause except for the copyin and copyprivate clauses
Explanation:
From the reference, the threadprivate symbols cannot be used in the DSA
clauses, which in turn means, the symbol can be skipped for default
privatization
Fixes#123535
This patch adds handling of the RISCVISD::VCPOP_VL node in
RISCVTargetLowering::computeKnownBitsForTargetNode. It eliminates
redundant zero-extension instructions.
CaptureTracking considers insertions into aggregates and vectors as
captures. As such, extractions from aggregates and vectors are escape
sources. A non-escaping identified local cannot alias with the result of
an extractvalue/extractelement.
Fixes https://github.com/llvm/llvm-project/issues/126670.
This patch fixes:
clang/tools/c-index-test/c-index-test.c:1240:15: error: mixing
declarations and code is a C99 extension
[-Werror,-Wdeclaration-after-statement]
clang/tools/c-index-test/c-index-test.c:1367:14: error: mixing
declarations and code is a C99 extension
[-Werror,-Wdeclaration-after-statement]
clang/tools/c-index-test/c-index-test.c:1468:14: error: mixing
declarations and code is a C99 extension
[-Werror,-Wdeclaration-after-statement]