Re-apply https://github.com/llvm/llvm-project/pull/87550 with fixes.
Details:
Some tests in fuchsia failed because of the newly added assertion.
This was because `GetExceptionBreakpoint()` could be called before
`g_dap.debugger` was initted.
The fix here is to just lazily populate the list in
GetExceptionBreakpoint() rather than assuming it's already been initted.
(There is some nuisance here because we can't simply just populate it in
DAP::DAP(), which is a global ctor and is called before
`SBDebugger::Initialize()` is called. )
For regex patterns that produce zero-length matches, there is one
(imaginary) match in-between every character in the sequence being
searched (as well as before the first character and after the last
character). It's easiest to demonstrate using replacement:
`std::regex_replace("abc"s, "!", "")` should produce `!a!b!c!`, where
each exclamation mark makes a zero-length match visible.
Currently our implementation doesn't correctly set the prefix of each
zero-length match, "swallowing" the characters separating the imaginary
matches -- e.g. when going through zero-length matches within `abc`, the
corresponding prefixes should be `{'', 'a', 'b', 'c'}`, but before this
patch they will all be empty (`{'', '', '', ''}`). This happens in the
implementation of `regex_iterator::operator++`. Note that the Standard
spells out quite explicitly that the prefix might need to be adjusted
when dealing with zero-length matches in
[`re.regiter.incr`](http://eel.is/c++draft/re.regiter.incr):
> In all cases in which the call to `regex_search` returns `true`,
`match.prefix().first` shall be equal to the previous value of
`match[0].second`... It is unspecified how the implementation makes
these adjustments.
[Reproduction example](https://godbolt.org/z/8ve6G3dav)
```cpp
#include <iostream>
#include <regex>
#include <string>
int main() {
std::string str = "abc";
std::regex empty_matching_pattern("");
{ // The underlying problem is that `regex_iterator::operator++` doesn't update
// the prefix correctly.
std::sregex_iterator i(str.begin(), str.end(), empty_matching_pattern), e;
std::cout << "\"";
for (; i != e; ++i) {
const std::ssub_match& prefix = i->prefix();
std::cout << prefix.str();
}
std::cout << "\"\n";
// Before the patch: ""
// After the patch: "abc"
}
{ // `regex_replace` makes the problem very visible.
std::string replaced = std::regex_replace(str, empty_matching_pattern, "!");
std::cout << "\"" << replaced << "\"\n";
// Before the patch: "!!!!"
// After the patch: "!a!b!c!"
}
}
```
Fixes#64451
rdar://119912002
The "Emulated" sub-directories under "ArmSVE" and
"ArmSME" have been removed. Associated tests
have been moved up a directory and now include
the "REQUIRES" constraint for the arm-emulator.
This patch integrates CallStackRadixTreeBuilder into the V3 format,
reducing the profile size to about 27% of the V2 profile size.
- Serialization: writeMemProfCallStackArray just needs to write out
the radix tree array prepared by CallStackRadixTreeBuilder.
Mappings from CallStackIds to LinearCallStackIds are moved by new
function CallStackRadixTreeBuilder::takeCallStackPos.
- Deserialization: Deserializing a call stack is the same as
deserializing an array encoded in the obvious manner -- the length
followed by the payload, except that we need to follow a pointer to
the parent to take advantage of common prefixes once in a while.
This patch teaches LinearCallStackIdConverter to how to handle those
pointers.
Parameter "Version" is confusing in deserializeV012 and deserializeV3
because we also have member variable "Version". Fortunately,
parameter "Version" and member variable "Version" always have the same
value because IndexedMemProfReader::deserialize initializes the member
variable and passes it to deserializeV012 and deserializeV3.
This patch removes the parameter.
This PR depends on https://github.com/llvm/llvm-project/pull/90260
We changed the order in which functions are outlined in Machine
Outliner.
The formula for priority is found via a black-box Bayesian optimization
toolbox. Using this formula for sorting consistently reduces the
uncompressed size of large real-world mobile apps. We also ran a few
benchmarks using LLVM test suites, and showed that sorting by priority
consistently reduces the text segment size.
|run (CTMark/) |baseline (1)|priority (2)|diff (1 -> 2)|
|----------------|------------|------------|-------------|
|lencod |349624 |349264 |-0.1030% |
|SPASS |219672 |219480 |-0.0874% |
|kc |271956 |251200 |-7.6321% |
|sqlite3 |223920 |223708 |-0.0947% |
|7zip-benchmark |405364 |402624 |-0.6759% |
|bullet |139820 |139500 |-0.2289% |
|consumer-typeset|295684 |290196 |-1.8560% |
|pairlocalalign |72236 |72092 |-0.1993% |
|tramp3d-v4 |189572 |189292 |-0.1477% |
This is part of an enhanced version of machine outliner -- see
[RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
Summary:
The utilities `nvptx-arch` and `amdgpu-arch` are used to support
`--offload-arch=native` among other utilities in clang. However, these
rely on the GPU drivers to query the features. In certain cases these
drivers can become locked up, which will lead to indefinate hangs on any
compiler jobs running in the meantime.
This patch adds a ten second timeout period for these utilities before
it kills the job and errors out.
This change is an implementation of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
This PR is just for Tan.
Now that x86 tan backend landed:
https://github.com/llvm/llvm-project/pull/90503 we can add other
backends since the shared pieces are in tree now.
Changes:
- `llvm/include/llvm/Analysis/VecFuncs.def` - vectorization of tan for
arm64 backends.
- `llvm/lib/Target/AArch64/AArch64FastISel.cpp` - Add tan to the libcall
table
- `llvm/lib/Target/AArch64/AArch64ISelLowering.cpp` - Add tan expansion
for f128, f16, and vector\neon operations
- `llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp` define
`G_FTAN` as a legal arm64 instruction
resolves#94755
After the `output_shape` field was added to `expand_shape` ops,
dynamically sized expand shapes are now possible, but this was not
accounted for in the folder. This PR tightens the constraints of the
folder to fix this.
This PR adds fusion by collapsing and fusion by expansion patterns for
`tensor.pad` ops in ElementwiseOpFusion. Pad ops can be expanded or
collapsed as long as none of the padded dimensions will be expanded or
collapsed.
Extends delayed privatization support to `taraget .. private(..)`. With
this PR, `private` is support for `target` **only** is delayed
privatization mode.
The file OMP.td is becoming tedious to update by hand due to the
seemingly random ordering of various items in it. This patch brings
order to it by sorting most of the contents.
The clause definitions are sorted alphabetically with respect to the
spelling of the clause.[1]
The directive definitions are split into two leaf directives and
compound directives.[2] Within each, definitions are sorted
alphabetically with respect to the spelling, with the exception that
"end xyz" directives are placed immediately following the definition of
"xyz".[3]
Within each directive definition, the lists of clauses are also sorted
alphabetically.
[1] All spellings are made of lowercase letters, _, or space. Ordering
that includes non-letters follows the order assumed by the `sort`
utility.
[2] Compound directives refer to the consituent leaf directives, hence
the leaf definitions must come first.
[3] Some of the "end xyz" directives have properties derived from the
corresponding "xyz" directive. This exception guarantees that "xyz"
precedes the "end xyz".
Whole quad mode requires inserting a copy of the initial EXEC mask. In a
function that also uses llvm.amdgcn.init.exec, insert the COPY after
initializing EXEC.
Following of https://github.com/llvm/llvm-project/pull/86912
The motivation of the patch series is that, for a module interface unit
`X`, when the dependent modules of `X` changes, if the changes is not
relevant with `X`, we hope the BMI of `X` won't change. For the specific
patch, we hope if the changes was about irrelevant declaration changes,
we hope the BMI of `X` won't change. **However**, I found the patch
itself is not very useful in practice, since the adding or removing
declarations, will change the state of identifiers and types in most
cases.
That said, for the most simple example,
```
// partA.cppm
export module m:partA;
// partA.v1.cppm
export module m:partA;
export void a() {}
// partB.cppm
export module m:partB;
export void b() {}
// m.cppm
export module m;
export import :partA;
export import :partB;
// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
b();
}
```
the BMI of `onlyUseB` will change after we change the implementation of
`partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new
identifiers and types (the function prototype).
So in this patch, we have to write the tests as:
```
// partA.cppm
export module m:partA;
export int getA() { ... }
export int getA2(int) { ... }
// partA.v1.cppm
export module m:partA;
export int getA() { ... }
export int getA(int) { ... }
export int getA2(int) { ... }
// partB.cppm
export module m:partB;
export void b() {}
// m.cppm
export module m;
export import :partA;
export import :partB;
// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
b();
}
```
so that the new introduced declaration `int getA(int)` doesn't introduce
new identifiers and types, then the BMI of `onlyUseB` can keep
unchanged.
While it looks not so great, the patch should be the base of the patch
to erase the transitive change for identifiers and types since I don't
know how can we introduce new types and identifiers without introducing
new declarations. Given how tightly the relationship between
declarations, types and identifiers, I think we can only reach the ideal
state after we made the series for all of the three entties.
The design of the patch is similar to
https://github.com/llvm/llvm-project/pull/86912, which extends the
32-bit DeclID to 64-bit and use the higher bits to store the module file
index and the lower bits to store the Local Decl ID.
A slight difference is that we only use 48 bits to store the new DeclID
since we try to use the higher 16 bits to store the module ID in the
prefix of Decl class. Previously, we use 32 bits to store the module ID
and 32 bits to store the DeclID. I don't want to allocate additional
space so I tried to make the additional space the same as 64 bits. An
potential interesting thing here is about the relationship between the
module ID and the module file index. I feel we can get the module file
index by the module ID. But I didn't prove it or implement it. Since I
want to make the patch itself as small as possible. We can make it in
the future if we want.
Another change in the patch is the new concept Decl Index, which means
the index of the very big array `DeclsLoaded` in ASTReader. Previously,
the index of a loaded declaration is simply the Decl ID minus
PREDEFINED_DECL_NUMs. So there are some places they got used
ambiguously. But this patch tried to split these two concepts.
As https://github.com/llvm/llvm-project/pull/86912 did, the change will
increase the on-disk PCM file sizes. As the declaration ID may be the
most IDs in the PCM file, this can have the biggest impact on the size.
In my experiments, this change will bring 6.6% increase of the on-disk
PCM size. No compile-time performance regression observed. Given the
benefits in the motivation example, I think the cost is worthwhile.
This "small" set grows quite large and it's more performant to store
whether a node has been combined before in the node itself.
As this information is only relevant for nodes that are currently not in
the worklist, add a second state to the CombinerWorklistIndex (-2) to
indicate that a node is currently not in a worklist, but was combined
before.
This brings a substantial performance improvement.
Both `reverseBranchCondition` and `replaceBranchTarget` return a success boolean. But all-but-one caller ignores the return value, and the exception emits a fatal error on failure.
Thus, just return nothing.
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
carries some extra bits of information that are only usable by AMDGCN
targets, forfeiting absolute genericity to obtain greater expressiveness
for target features:
- AMDGCN inline ASM is allowed/supported, under the assumption that the
[SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
extension is enabled/used
- AMDGCN target specific builtins are allowed/supported, under the
assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
enabled when using the downstream translator
- the featureset matches the union of AMDGCN targets' features
- the datalayout string is overspecified to affix both the program
address space and the alloca address space, the latter under the
assumption that the
[SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
extension is enabled/used, case in which the extant SPIRV datalayout
string would lead to pointers to function pointing to the private
address space, which would be wrong.
Existing AMDGCN tests are extended to cover this new target. It is
currently dormant / will require some additional changes, but I thought
I'd rather put it up for review to get feedback as early as possible. I
will note that an alternative option is to place this under AMDGPU, but
that seems slightly less natural, since this is still SPIRV, albeit
relaxed in terms of preconditions & constrained in terms of
postconditions, and only guaranteed to be usable on AMDGCN targets (it
is still possible to obtain pristine portable SPIRV through usage of the
flavoured target, though).
When using the -mframe-chain=aapcs or -mframe-chain=aapcs-leaf options,
we cannot use r11 as an allocatable register, even if
-fomit-frame-pointer is also used. This is so that r11 will always point
to a valid frame record, even if we don't create one in every function.
Regenerate these with --check-globals. The manual global CHECKS
get dropped during regeneration otherwise.
Annoyingly UTC insists on putting the globals directly before the
first function, so the first comment is a bit out of place now.
Currently, during a loop pipelining transformation, operations may be
hoisted out without any checks on the loop bounds, which leads to
incorrect transformations and unexpected behaviour. The following [issue
](https://github.com/llvm/llvm-project/issues/90870) describes the
problem more extensively, including an example.
The proposed fix adds some check in the loop bounds before and applies
the maximum hoisting.
This operation extracts a number of bits at a given offset and sign or
zero extends them, which is done by emitting it as a left shift followed
by a right shift.
This is being added for use in clang for C++ structured bindings of
bitfields that have offset or size that aren't a byte multiple. A new
operation is being added, instead of shifts being used directly, as it
makes correctly handling it in optimisations (which will be done in a
later patch) much easier.
As noted on #94466, NEON has ABDS/ABDU instructions but only handles them via intrinsics, plus some VABDL custom patterns.
This patch flags basic ABDS/ABDU for neon types as legal and updates all tablegen patterns to use abds/abdu instead.
Fixes#94466