intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-24 17:01:00 +08:00

Author	SHA1	Message	Date
Vy Nguyen	35fa2ded2a	Reapply PR/87550 (#94625 ) Re-apply https://github.com/llvm/llvm-project/pull/87550 with fixes. Details: Some tests in fuchsia failed because of the newly added assertion. This was because `GetExceptionBreakpoint()` could be called before `g_dap.debugger` was initted. The fix here is to just lazily populate the list in GetExceptionBreakpoint() rather than assuming it's already been initted. (There is some nuisance here because we can't simply just populate it in DAP::DAP(), which is a global ctor and is called before `SBDebugger::Initialize()` is called. )	2024-06-07 11:27:52 -04:00
Konstantin Varlamov	e9adcc488f	[libc++][regex] Correctly adjust match prefix for zero-length matches. (#94550 ) For regex patterns that produce zero-length matches, there is one (imaginary) match in-between every character in the sequence being searched (as well as before the first character and after the last character). It's easiest to demonstrate using replacement: `std::regex_replace("abc"s, "!", "")` should produce `!a!b!c!`, where each exclamation mark makes a zero-length match visible. Currently our implementation doesn't correctly set the prefix of each zero-length match, "swallowing" the characters separating the imaginary matches -- e.g. when going through zero-length matches within `abc`, the corresponding prefixes should be `{'', 'a', 'b', 'c'}`, but before this patch they will all be empty (`{'', '', '', ''}`). This happens in the implementation of `regex_iterator::operator++`. Note that the Standard spells out quite explicitly that the prefix might need to be adjusted when dealing with zero-length matches in [`re.regiter.incr`](http://eel.is/c++draft/re.regiter.incr): > In all cases in which the call to `regex_search` returns `true`, `match.prefix().first` shall be equal to the previous value of `match[0].second`... It is unspecified how the implementation makes these adjustments. [Reproduction example](https://godbolt.org/z/8ve6G3dav) ```cpp #include <iostream> #include <regex> #include <string> int main() { std::string str = "abc"; std::regex empty_matching_pattern(""); { // The underlying problem is that `regex_iterator::operator++` doesn't update // the prefix correctly. std::sregex_iterator i(str.begin(), str.end(), empty_matching_pattern), e; std::cout << "\""; for (; i != e; ++i) { const std::ssub_match& prefix = i->prefix(); std::cout << prefix.str(); } std::cout << "\"\n"; // Before the patch: "" // After the patch: "abc" } { // `regex_replace` makes the problem very visible. std::string replaced = std::regex_replace(str, empty_matching_pattern, "!"); std::cout << "\"" << replaced << "\"\n"; // Before the patch: "!!!!" // After the patch: "!a!b!c!" } } ``` Fixes #64451 rdar://119912002	2024-06-07 11:15:02 -04:00
Florian Hahn	4f9c0fa223	[LV] Add test with dead load and vector pointer.	2024-06-07 16:14:02 +01:00
David Green	f7018ba0ee	[AArch64] Add patterns for add(uzp1(x,y), uzp2(x, y)) -> addp. If we are extracting the even lanes and the odd lanes and adding them, we can use an addp instruction.	2024-06-07 16:09:57 +01:00
Jake Egan	790992dd40	[libc++][test][AIX] Only XFAIL atomic tests for before clang 19 (#94646 ) These tests pass on 64-bit. They were fixed by `5fdd094837` on 32-bit. So XFAIL only for 32-bit before clang 19.	2024-06-07 11:06:42 -04:00
c8ef	b25b1db819	[KnownBits] Remove `hasConflict()` assertions (#94568 ) Allow KnownBits to represent "always poison" values via conflict. close: #94436	2024-06-07 17:01:22 +02:00
Nico Weber	fc95645e37	[gn] port `cb7690af09` (ntdll dep)	2024-06-07 10:47:51 -04:00
Nico Weber	d099d6c76b	[gn] port `33a6ce1837` (check-clang obj2yaml dep)	2024-06-07 10:43:06 -04:00
Mubashar Ahmad	7d69095fd5	[mlir][vector] Remove Emulated Sub-directory (#94742 ) The "Emulated" sub-directories under "ArmSVE" and "ArmSME" have been removed. Associated tests have been moved up a directory and now include the "REQUIRES" constraint for the arm-emulator.	2024-06-07 15:38:50 +01:00
Kazu Hirata	c348e265bd	[memprof] Use CallStackRadixTreeBuilder in the V3 format (#94708 ) This patch integrates CallStackRadixTreeBuilder into the V3 format, reducing the profile size to about 27% of the V2 profile size. - Serialization: writeMemProfCallStackArray just needs to write out the radix tree array prepared by CallStackRadixTreeBuilder. Mappings from CallStackIds to LinearCallStackIds are moved by new function CallStackRadixTreeBuilder::takeCallStackPos. - Deserialization: Deserializing a call stack is the same as deserializing an array encoded in the obvious manner -- the length followed by the payload, except that we need to follow a pointer to the parent to take advantage of common prefixes once in a while. This patch teaches LinearCallStackIdConverter to how to handle those pointers.	2024-06-07 07:19:36 -07:00
jeanPerier	55bdb36e39	[flang] lower SIZE and SIZEOF for assumed-ranks (#94684 )	2024-06-07 16:09:56 +02:00
Kazu Hirata	eb33e462ba	[memprof] Clean up IndexedMemProfReader (NFC) (#94710 ) Parameter "Version" is confusing in deserializeV012 and deserializeV3 because we also have member variable "Version". Fortunately, parameter "Version" and member variable "Version" always have the same value because IndexedMemProfReader::deserialize initializes the member variable and passes it to deserializeV012 and deserializeV3. This patch removes the parameter.	2024-06-07 07:04:17 -07:00
Xuan Zhang	3b16630c26	[MachineOutliner] Sort by Benefit to Cost Ratio (#90264 ) This PR depends on https://github.com/llvm/llvm-project/pull/90260 We changed the order in which functions are outlined in Machine Outliner. The formula for priority is found via a black-box Bayesian optimization toolbox. Using this formula for sorting consistently reduces the uncompressed size of large real-world mobile apps. We also ran a few benchmarks using LLVM test suites, and showed that sorting by priority consistently reduces the text segment size. \|run (CTMark/) \|baseline (1)\|priority (2)\|diff (1 -> 2)\| \|----------------\|------------\|------------\|-------------\| \|lencod \|349624 \|349264 \|-0.1030% \| \|SPASS \|219672 \|219480 \|-0.0874% \| \|kc \|271956 \|251200 \|-7.6321% \| \|sqlite3 \|223920 \|223708 \|-0.0947% \| \|7zip-benchmark \|405364 \|402624 \|-0.6759% \| \|bullet \|139820 \|139500 \|-0.2289% \| \|consumer-typeset\|295684 \|290196 \|-1.8560% \| \|pairlocalalign \|72236 \|72092 \|-0.1993% \| \|tramp3d-v4 \|189572 \|189292 \|-0.1477% \| This is part of an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).	2024-06-07 06:50:13 -07:00
Liao Chunyu	2afea72968	[RISCV] Codegen support for XCVmem extension (#76916 ) All post-Increment load/store, register-register load/store spec: https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @NandniJamnadas, @PaoloS02, @serkm, @simonpcook, @xingmingjie, @realqhc	2024-06-07 21:45:49 +08:00
Joseph Huber	2981f3a284	[Clang] Add timeout for GPU detection utilities (#94751 ) Summary: The utilities `nvptx-arch` and `amdgpu-arch` are used to support `--offload-arch=native` among other utilities in clang. However, these rely on the GPU drivers to query the features. In certain cases these drivers can become locked up, which will lead to indefinate hangs on any compiler jobs running in the meantime. This patch adds a ten second timeout period for these utilities before it kills the job and errors out.	2024-06-07 08:45:35 -05:00
David Green	c5fcc2ea55	[AArch64] Add addp from shuffles tests. NFC	2024-06-07 14:42:22 +01:00
Farzon Lotfi	2f0308ed02	[arm64] Add tan intrinsic lowering (#94545 ) This change is an implementation of https://github.com/llvm/llvm-project/issues/87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This PR is just for Tan. Now that x86 tan backend landed: https://github.com/llvm/llvm-project/pull/90503 we can add other backends since the shared pieces are in tree now. Changes: - `llvm/include/llvm/Analysis/VecFuncs.def` - vectorization of tan for arm64 backends. - `llvm/lib/Target/AArch64/AArch64FastISel.cpp` - Add tan to the libcall table - `llvm/lib/Target/AArch64/AArch64ISelLowering.cpp` - Add tan expansion for f128, f16, and vector\neon operations - `llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp` define `G_FTAN` as a legal arm64 instruction resolves #94755	2024-06-07 09:42:06 -04:00
David Green	ac02168990	[ARM] Clean up neon_vabd.ll, vaba.ll and vabd.ll tests a bit. NFC Change the target triple to remove some unnecessary instructions.	2024-06-07 14:31:15 +01:00
Max191	2117677e30	[mlir] Fix bugs in expand_shape patterns after semantics changes (#94631 ) After the `output_shape` field was added to `expand_shape` ops, dynamically sized expand shapes are now possible, but this was not accounted for in the folder. This PR tightens the constraints of the folder to fix this.	2024-06-07 09:09:51 -04:00
Max191	c886d66da0	[mlir] Add reshape propagation patterns for tensor.pad (#94489 ) This PR adds fusion by collapsing and fusion by expansion patterns for `tensor.pad` ops in ElementwiseOpFusion. Pad ops can be expanded or collapsed as long as none of the padded dimensions will be expanded or collapsed.	2024-06-07 09:09:06 -04:00
Ryan Holt	5b2f7a1971	[mlir][linalg] Support lowering unpack with outer_dims_perm (#94477 ) This commit adds support for lowering `tensor.unpack` with a non-identity `outer_dims_perm`. This was previously left as a not-yet-implemented case.	2024-06-07 09:00:28 -04:00
Joseph Huber	2c3723d321	[libc] Correctly pass the C++ standard to NVPTX internal builds Summary: The NVPTX build wasn't getting the `C++20` standard necessary for a few files.	2024-06-07 07:55:06 -05:00
Kareem Ergawy	913a8244fe	[flang][OpenMP] Lower `target .. private(..)` to `omp.private` ops (#94195 ) Extends delayed privatization support to `taraget .. private(..)`. With this PR, `private` is support for `target` only is delayed privatization mode.	2024-06-07 14:44:01 +02:00
Krzysztof Parzyszek	acc927ac23	[Frontend][OpenMP] Sort all the things in OMP.td, NFC (#94653 ) The file OMP.td is becoming tedious to update by hand due to the seemingly random ordering of various items in it. This patch brings order to it by sorting most of the contents. The clause definitions are sorted alphabetically with respect to the spelling of the clause.[1] The directive definitions are split into two leaf directives and compound directives.[2] Within each, definitions are sorted alphabetically with respect to the spelling, with the exception that "end xyz" directives are placed immediately following the definition of "xyz".[3] Within each directive definition, the lists of clauses are also sorted alphabetically. [1] All spellings are made of lowercase letters, _, or space. Ordering that includes non-letters follows the order assumed by the `sort` utility. [2] Compound directives refer to the consituent leaf directives, hence the leaf definitions must come first. [3] Some of the "end xyz" directives have properties derived from the corresponding "xyz" directive. This exception guarantees that "xyz" precedes the "end xyz".	2024-06-07 07:31:37 -05:00
Jay Foad	df6750eaa8	[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec (#93680 ) Whole quad mode requires inserting a copy of the initial EXEC mask. In a function that also uses llvm.amdgcn.init.exec, insert the COPY after initializing EXEC.	2024-06-07 13:23:15 +01:00
Chuanqi Xu	5a0181f568	[serialization] no transitive decl change (#92083 ) Following of https://github.com/llvm/llvm-project/pull/86912 The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. However, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. The design of the patch is similar to https://github.com/llvm/llvm-project/pull/86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. As https://github.com/llvm/llvm-project/pull/86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.	2024-06-07 20:21:55 +08:00
Timm Bäder	b8cc85b318	[clang][Interp] Limit lambda capture lazy visting to actual captures Check this by looking at the VarDecl.	2024-06-07 13:29:23 +02:00
Timm Bäder	9eb8a130c5	[clang][Interp][NFC] Fix a const-correctness warning	2024-06-07 13:29:23 +02:00
Timm Bäder	9ece3eb145	[clang][Interp] Check ConstantExpr results for initialization They need to be fully initialized, similar to global variables.	2024-06-07 13:29:23 +02:00
aengelke	74d62c2f73	[CodeGen][SDAG] Remove CombinedNodes SmallPtrSet (#94609 ) This "small" set grows quite large and it's more performant to store whether a node has been combined before in the node itself. As this information is only relevant for nodes that are currently not in the worklist, add a second state to the CombinerWorklistIndex (-2) to indicate that a node is currently not in a worklist, but was combined before. This brings a substantial performance improvement.	2024-06-07 13:17:27 +02:00
Nathan Sidwell	3fefb3c598	[BOLT][NFC] Infailable fns return void (#92018 ) Both `reverseBranchCondition` and `replaceBranchTarget` return a success boolean. But all-but-one caller ignores the return value, and the exception emits a fatal error on failure. Thus, just return nothing.	2024-06-07 06:59:52 -04:00
Alex Voicu	88e2bb4092	[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796 ) This change seeks to add support for vendor flavoured SPIRV - more specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that carries some extra bits of information that are only usable by AMDGCN targets, forfeiting absolute genericity to obtain greater expressiveness for target features: - AMDGCN inline ASM is allowed/supported, under the assumption that the [SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc) extension is enabled/used - AMDGCN target specific builtins are allowed/supported, under the assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is enabled when using the downstream translator - the featureset matches the union of AMDGCN targets' features - the datalayout string is overspecified to affix both the program address space and the alloca address space, the latter under the assumption that the [SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc) extension is enabled/used, case in which the extant SPIRV datalayout string would lead to pointers to function pointing to the private address space, which would be wrong. Existing AMDGCN tests are extended to cover this new target. It is currently dormant / will require some additional changes, but I thought I'd rather put it up for review to get feedback as early as possible. I will note that an alternative option is to place this under AMDGPU, but that seems slightly less natural, since this is still SPIRV, albeit relaxed in terms of preconditions & constrained in terms of postconditions, and only guaranteed to be usable on AMDGCN targets (it is still possible to obtain pristine portable SPIRV through usage of the flavoured target, though).	2024-06-07 11:50:23 +01:00
Haojian Wu	6fe5428ecb	[Flang] Handle the newly-added "Reserved" FramePointerKind for `1a5239251e`	2024-06-07 12:49:41 +02:00
LLVM GN Syncbot	d3e531cf37	[gn build] Port `e622996edd`	2024-06-07 10:42:32 +00:00
David Spickett	54c5dbe7c3	[clang][test] Skip interpreter value test on Arm 32 bit https://github.com/llvm/llvm-project/pull/89811 caused this test to fail, somehow. I think it may not be at fault, but actually be exposing some existing undefined behaviour, see https://github.com/llvm/llvm-project/issues/94741. Skipping this for now to get the bots green again.	2024-06-07 10:38:25 +00:00
WANG Rui	537165bb02	[NFC][LoongArch] Update test for #94590	2024-06-07 18:29:30 +08:00
Jonathan Thackray	917afa8832	[ARM] Add support for Cortex-R52+ (#94633 ) Cortex-R52+ is an Armv8-R AArch32 CPU. Technical Reference Manual for Cortex-R52+: https://developer.arm.com/documentation/102199/latest/	2024-06-07 11:03:32 +01:00
Eisuke Kawashima	fd45dcca26	fix(mlir/**.py): fix comparison to None (#94019 ) from PEP8 (https://peps.python.org/pep-0008/#programming-recommendations): > Comparisons to singletons like None should always be done with is or is not, never the equality operators. Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>	2024-06-07 12:03:07 +02:00
Simon Pilgrim	af3ffff34f	[DAG] Always allow folding XOR patterns to ABS pre-legalization (#94601 ) Removes residual ARM handling for vXi64 ABS nodes to prevent infinite loops.	2024-06-07 11:02:50 +01:00
Oliver Stannard	1a5239251e	[ARM] r11 is reserved when using -mframe-chain=aapcs (#86951 ) When using the -mframe-chain=aapcs or -mframe-chain=aapcs-leaf options, we cannot use r11 as an allocatable register, even if -fomit-frame-pointer is also used. This is so that r11 will always point to a valid frame record, even if we don't create one in every function.	2024-06-07 10:58:10 +01:00
Mubashar Ahmad	b87a80d4eb	[mlir][vector] Add n-d deinterleave lowering (#94237 ) This patch implements the lowering for vector deinterleave for vector of n-dimensions. Process involves unrolling the n-d vector to a series of one-dimensional vectors. The deinterleave operation is then used on these vectors. From: ``` %0, %1 = vector.deinterleave %a : vector<2x8xi8> -> vector<2x4xi8> ``` To: ``` %cst = arith.constant dense<0> : vector<2x4xi32> %0 = vector.extract %arg0[0] : vector<8xi32> from vector<2x8xi32> %res1, %res2 = vector.deinterleave %0 : vector<8xi32> -> vector<4xi32> %1 = vector.insert %res1, %cst [0] : vector<4xi32> into vector<2x4xi32> %2 = vector.insert %res2, %cst [0] : vector<4xi32> into vector<2x4xi32> %3 = vector.extract %arg0[1] : vector<8xi32> from vector<2x8xi32> %res1_0, %res2_1 = vector.deinterleave %3 : vector<8xi32> -> vector<4xi32> %4 = vector.insert %res1_0, %1 [1] : vector<4xi32> into vector<2x4xi32> %5 = vector.insert %res2_1, %2 [1] : vector<4xi32> into vector<2x4xi32> ...etc. ```	2024-06-07 10:57:00 +01:00
Nikita Popov	8719cb88e3	[SimplifyCFG] Regenerate switch to lookup tests (NFC) Regenerate these with --check-globals. The manual global CHECKS get dropped during regeneration otherwise. Annoyingly UTC insists on putting the globals directly before the first function, so the first comment is a bit out of place now.	2024-06-07 11:51:51 +02:00
Nikita Popov	1934c1aa36	[SimplifyCFG] Remove bogus UTC line from test (NFC) The check lines in this test were clearly not generated by UTC.	2024-06-07 11:51:51 +02:00
Timm Bäder	3a31eaeac8	[clang][Interp] Fix refers_to_enclosing_variable_or_capture DREs They do not count into lambda captures, so visit them lazily.	2024-06-07 11:48:44 +02:00
Timm Bäder	5d6acf8196	[clang][Interp][NFC] Properly assign block pointer Pointee	2024-06-07 11:48:44 +02:00
Fotis Kounelis	192cd68512	Add checks before hoisting out in loop pipelining (#90872 ) Currently, during a loop pipelining transformation, operations may be hoisted out without any checks on the loop bounds, which leads to incorrect transformations and unexpected behaviour. The following [issue ](https://github.com/llvm/llvm-project/issues/90870) describes the problem more extensively, including an example. The proposed fix adds some check in the loop bounds before and applies the maximum hoisting.	2024-06-07 11:46:01 +02:00
John Brawn	1721c14e8e	[DebugInfo] Add DW_OP_LLVM_extract_bits (#93990 ) This operation extracts a number of bits at a given offset and sign or zero extends them, which is done by emitting it as a left shift followed by a right shift. This is being added for use in clang for C++ structured bindings of bitfields that have offset or size that aren't a byte multiple. A new operation is being added, instead of shifts being used directly, as it makes correctly handling it in optimisations (which will be done in a later patch) much easier.	2024-06-07 10:38:23 +01:00
Tom Stellard	0d1b3671a9	[CMake][Release] Use the TXZ cpack generator for binaries (#90138 )	2024-06-07 02:27:59 -07:00
Simon Pilgrim	c0b468523c	[ARM] Add NEON support for ISD::ABDS/ABDU nodes. (#94504 ) As noted on #94466, NEON has ABDS/ABDU instructions but only handles them via intrinsics, plus some VABDL custom patterns. This patch flags basic ABDS/ABDU for neon types as legal and updates all tablegen patterns to use abds/abdu instead. Fixes #94466	2024-06-07 10:18:45 +01:00
Chen Zheng	3453dedfaf	[PowerPC] return correct frame address for frameaddress intrinsic	2024-06-07 05:17:22 -04:00

1 2 3 4 5 ...

501089 Commits