intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-02-08 17:28:30 +08:00

Author	SHA1	Message	Date
Andrzej Warzyński	d7753989ea	[mlir][linalg] Add e2e test for linalg.mmt4d + pack/unpack (#84964 ) This is a follow-up for #81790. This patch basically extends: * test/Integration/Dialect/Linalg/CPU/mmt4d.mlir with pack/unpack ops so that to overall computation is a matrix multiplication (as opposed to linalg.mmt4d). For comparison (and to make it easier to verify correctness), linalg.matmul is also included in the test.	2024-03-28 14:52:08 +00:00
Alexey Bataev	d7975c9d93	[SLP]Add better minbitwidth analysis for udiv/urem instructions. Adds improved bitwidth analysis for udiv/urem instructions. The analysis is based on similar version in InstCombiner. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/85928	2024-03-28 10:35:15 -04:00
Alfie Richards	ff870aeeb7	[ARM] Add reference to `ARMAsmParser` in `ARMOperand` (#86110 )	2024-03-28 14:06:40 +00:00
Yingwei Zheng	a515ea553f	[OCaml] Fix buildbot failure caused by `caa2258`. NFC. Closes #86944.	2024-03-28 22:00:04 +08:00
Akira Hatanaka	84780af4b0	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86923 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects. This reapplies `d9a685a9dd`, which was reverted because it broke ubsan bots. There seems to be a bug in coroutine code-gen, which is causing EmitTypeCheck to use the wrong alignment. For now, pass alignment zero to EmitTypeCheck so that it can compute the correct alignment based on the passed type (see function EmitCXXMemberOrOperatorMemberCallExpr).	2024-03-28 06:54:36 -07:00
Amy Kwan	a3efc53f16	[AIX][TLS] Produce a faster local-exec access sequence for the "aix-small-tls" global variable attribute (#83053 ) Similar to `3f46e5453d`, this patch allows the backend to produce a faster access sequence for the local-exec TLS model, where loading from the TOC can be avoided, for local-exec TLS variables that are annotated with the "aix-small-tls" attribute. The expectation is for local-exec TLS variables to be set with this attribute through PGO. Furthermore, the optimized access sequence is only generated for local-exec TLS variables annotated with "aix-small-tls", only if they are less than ~32KB in size.	2024-03-28 09:18:45 -04:00
Rolf Morel	eacda36c7d	[SCF][Transform] Add support for scf.for in LoopFuseSibling op (#81495 ) Adds support for fusing two scf.for loops occurring in the same block. Uses the rudimentary checks already in place for scf.forall (like the target loop's operands being dominated by the source loop). - Fixes a bug in the dominance check whereby it was checked that values in the target loop themselves dominated the source loop rather than the ops that define these operands. - Renames the LoopFuseSibling op to LoopFuseSiblingOp. - Updates LoopFuseSiblingOp's description. - Adds tests for using LoopFuseSiblingOp on scf.for loops, including one which fails without the fix for the dominance check. - Adds tests checking the different failure modes of the dominance checker. - Adds test for case whereby scf.yield is automatically generated when there are no loop-carried variables.	2024-03-28 14:13:08 +01:00
Oleksandr "Alex" Zinenko	91856b34e3	[mlir] move MatchOpInterface under Transform/Interfaces (#86899 ) This is similar to the TransformOpInterface move.	2024-03-28 14:00:22 +01:00
Egor Zhdan	96c8e2e88c	[APINotes] For a re-exported module, look for APINotes in the re-exporting module's apinotes file This upstreams https://github.com/apple/llvm-project/pull/8063. If module FooCore is re-exported through module Foo (by using `export_as` in the modulemap), look for attributes of FooCore symbols in Foo.apinotes file. Swift bundles `std.apinotes` file that adds Swift-specific attributes to the C++ stdlib symbols. In recent versions of libc++, module std got split into multiple top-level modules, each of them is re-exported through std. This change allows us to keep using a single modulemap file for all supported C++ stdlibs. rdar://121680760	2024-03-28 12:59:57 +00:00
Leandro Lupori	a2982a29fd	Revert "[compiler-rt] Allow building builtins.a without a libc (#86737 )" This reverts commit `8669225863`. Reverting due to buildbot failures.	2024-03-28 09:56:14 -03:00
Krzysztof Parzyszek	e8e80d07c8	[OpenMP] Apply post-commit review comments in PR86289, NFC (#86828 ) Fix include guard name, fix typo, add comments with OpenMP spec sections.	2024-03-28 07:52:47 -05:00
VitaNuo	56a10a3c79	[clangd][trace] Fix comment to mention that trace spans are measured … (#86938 ) …in milliseconds rather than seconds.	2024-03-28 13:48:09 +01:00
Krzysztof Parzyszek	79199753fd	[flang][OpenMP] Make several function local to OpenMP.cpp, NFC (#86726 ) There were several functions, mostly reduction-related, that were only called from OpenMP.cpp. Remove them from OpenMP.h, and make them local in OpenMP.cpp: - genOpenMPReduction - findReductionChain - getConvertFromReductionOp - updateReduction - removeStoreOp Also, move the function bodies out of the "public" section.	2024-03-28 07:46:01 -05:00
Zaara Syeda	4ddd4ed7fe	[AIX][TOC] -mtocdata/-mno-tocdata fix non deterministic iteration order (#86840 ) Failure with testcase toc-conf.c observed when building with LLVM_REVERSE_ITERATION=ON. Changing from using llvm::StringSet to std::set<llvm:StringRef> to ensure iteration order is deterministic. Note: the functionality of the feature does not require a specific iteration order, however, this will allow testing to be consistent. From llvm docs: The advantages of std::set are that its iterators are stable (deleting or inserting an element from the set does not affect iterators or pointers to other elements) and that iteration over the set is guaranteed to be in sorted order.	2024-03-28 08:37:25 -04:00
Haojian Wu	a042fcbe45	[clang] Bailout when the substitution of template parameter mapping is invalid. (#86869 ) Fixes #86757 We missed to handle the invalid case when substituting into the parameter mapping of an constraint during normalization. The constructor of `InstantiatingTemplate` will bail out (no `CodeSynthesisContext` will be added to the instantiation stack) if there was a fatal error, consequently we should stop doing any further template instantiations.	2024-03-28 13:10:02 +01:00
Haojian Wu	fb8cccf88c	[AST] Print the "aggregate" for aggregate deduction guide decl. (#84018 ) I found this is useful for debugging purpose to identify different kind of deduction guide decl.	2024-03-28 13:07:58 +01:00
Haohai Wen	896037c75a	[LoopRotate] Set loop back edge weight to not less than exit weight (#86496 ) Branch weight from sample-based PGO may be not inaccurate due to sampling. If the loop body must be executed, then origin loop back edge weight must be not less than exit weight.	2024-03-28 20:07:15 +08:00
Joseph Huber	daa755ba7b	[libc] Disable testing for NVPTX debug builds (#86856 ) Summary: Debug builds don't optimize out certain parts of the code that end up making the GPU backend crash. This results in regular builds not being successful just to build the testing objects. Disable them for now in debug mode.	2024-03-28 06:49:15 -05:00
Marc Auberer	9d61f7ea66	[flang] Remove duplicate call to va_end() (#86865 ) Fixes #86825	2024-03-28 12:42:44 +01:00
Marc Auberer	a495cfbf7d	[IR][NFC] Cleanup CmpInst signatures / code docs (#86441 ) Change param names to recommended upper case format for static methods in CmpInst for consistency Implement suggestion from @dtcxzyw. cc @dtcxzyw @tschuett	2024-03-28 12:42:02 +01:00
Andrew Ng	c9db031c48	[Support] Fix color handling in formatted_raw_ostream (#86700 ) The color methods in formatted_raw_ostream were forwarding directly to the underlying stream without considering existing buffered output. This would cause incorrect colored output for buffered uses of formatted_raw_ostream. Fix this issue by applying the color to the formatted_raw_ostream itself and temporarily disabling scanning of any color related output so as not to affect the position tracking. This fix means that workarounds that forced formatted_raw_ostream buffering to be disabled can be removed. In the case of llvm-objdump, this can improve disassembly performance when redirecting to a file by more than an order of magnitude on both Windows and Linux. This improvement restores the disassembly performance when redirecting to a file to a level similar to before color support was added.	2024-03-28 11:41:49 +00:00
Matt Arsenault	c13556c0b0	AMDGPU: Document more backend recognized attributes (#80239 )	2024-03-28 14:27:14 +03:00
Ulrich Weigand	b999e631c0	[OpenMP] Fix node destruction race in __kmpc_omp_taskwait_deps_51 (#86130 ) The __kmpc_omp_taskwait_deps_51 allocates a kmp_depnode_t node on its stack, and there is currently a race condition where another thread might still be accessing that node after the function has returned and its stack frame was released. While the function does wait until the node's npredecessors count has reached zero before exiting, there is still a window where the function that last decremented the npredecessors count assumes the node is still accessible. For heap-allocated kmp_depnode_t nodes, this normally works via a separate ndeps count that only reaches zero at the point where no accesses to the node are expected at all; in fact, at this point the heap allocation will be freed. For this case of a stack-allocated kmp_depnode_t node, it therefore makes sense to similarly respect the ndeps count; we need to wait until this reaches 1 (not 0, because it is not heap-allocated so there's always one extra count to prevent it from being freed), before we can safely deallocate our stack frame. As this is expected to be a short race window of only a few instructions, it should be fine to just use a busy wait loop checking the ndeps count. Fixes: https://github.com/llvm/llvm-project/issues/85963	2024-03-28 12:15:39 +01:00
Dmitri Gribenko	28b196e7fc	[llvm] Write temporary test files into %t ... instead of the source tree	2024-03-28 11:55:46 +01:00
Freddy Ye	36b4b9d988	[X86] Support immediate folding for CCMP/CTEST (#86616 ) E.g. %0:gr32 = MOV32ri 81 CTEST32rr %0, %1, 2, 10, implicit-def $eflags, implicit $eflags => CTEST32ri %1, 81, 2, 10, implicit-def $eflags, implicit $eflags	2024-03-28 18:54:32 +08:00
Shan Huang	79ba323bdd	[Debuginfo][GVNHoist] Fix #86227 : update the debug location of the hoisted GEP (#86236 ) This PR fixes #86227.	2024-03-28 18:43:03 +08:00
J. Ryan Stinnett	8a7f021f9e	[GitHub] Fix typos in automation (#86886 )	2024-03-28 10:37:31 +00:00
Shan Huang	8963a476cc	Fix #86269 : remove unused variable (#86927 ) Remove the unused variable `BI` introduced in #86269.	2024-03-28 11:24:18 +01:00
bvlgah	e640d9e725	[RISCV][GlobalISel] Fix legalizing ‘llvm.va_copy’ intrinsic (#86863 ) Hi, I spotted a problem when running benchmarking programs on a RISCV64 device. ## Issue Segmentation faults only occurred while running the programs compiled with `GlobalISel` enabled. Here is a small but complete example (it is adopted from [Google's benchmark framework](`95a9f0d0b4/MicroBenchmarks/libs/benchmark/src/colorprint.cc (L85-L119)`) to reproduce the issue, ```cpp #include <cstdarg> #include <cstdio> #include <iostream> #include <memory> #include <string> std::string FormatString(const char* msg, va_list args) { // we might need a second shot at this, so pre-emptivly make a copy va_list args_cp; va_copy(args_cp, args); std::size_t size = 256; char local_buff[256]; auto ret = vsnprintf(local_buff, size, msg, args_cp); va_end(args_cp); // currently there is no error handling for failure, so this is hack. // BM_CHECK(ret >= 0); if (ret == 0) // handle empty expansion return {}; else if (static_cast<size_t>(ret) < size) return local_buff; else { // we did not provide a long enough buffer on our first attempt. size = static_cast<size_t>(ret) + 1; // + 1 for the null byte std::unique_ptr<char[]> buff(new char[size]); ret = vsnprintf(buff.get(), size, msg, args); // BM_CHECK(ret > 0 && (static_cast<size_t>(ret)) < size); return buff.get(); } } std::string FormatString(const char* msg, ...) { va_list args; va_start(args, msg); auto tmp = FormatString(msg, args); va_end(args); return tmp; } int main() { std::string Str = FormatString("%-*s %13s %15s %12s", static_cast<int>(20), "Benchmark", "Time", "CPU", "Iterations"); std::cout << Str << std::endl; } ``` Use `clang++ -fglobal-isel -o main main.cpp` to compile it. ## Cause I have examined MIR, it shows that these segmentation faults resulted from a small mistake about legalizing the intrinsic function `llvm.va_copy`. `36e74cfdbd/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp (L451-L453)` `DstLst` and `Tmp` are placed in the wrong order. ## Changes I have tweaked the test case `CodeGen/RISCV/GlobalISel/vararg.ll` so that `s0` is used as the frame pointer (not in all checks) which points to the starting address of the save area. I believe that it helps reason about how `llvm.va_copy` is handled.	2024-03-28 13:09:18 +03:00
Luke Lau	856e815ca1	[DAGCombiner] Set disjoint flag in add->or and xor->or combines (#86925 ) We check DAG.haveNoCommonBitsSet so the operands will be known to be disjoint. I couldn't think of a codegen test case since most targets aren't checking hasDisjoint yet, apart from RISCV in the or_is_add pattern, but it also falls back to computeKnownBits.	2024-03-28 18:08:59 +08:00
Shan Huang	912e2c4758	[Debuginfo][TailCallElim] Fix #86262 : drop the debug location of entry branch (#86269 ) This pr fixes #86262. --------- Co-authored-by: Stephen Tozer <Melamoto@gmail.com>	2024-03-28 17:37:33 +08:00
martinboehme	8d77d362af	[clang][dataflow] Introduce a helper class for handling record initializer lists. (#86675 ) This is currently only used in one place, but I'm working on a patch that will use this from a second place. And I think this already improves the readability of the one place this is used so far.	2024-03-28 10:12:45 +01:00
Luke Lau	eff4593a64	[RISCV] Add test case for missed vwaddu.vv due to add->or combine. NFC We should be able to recover this with combineBinOp_VLToVWBinOp_VL if we check that the or has the disjoint flag set.	2024-03-28 16:58:52 +08:00
Simon Tatham	88b10f3e3a	[MC][AArch64] Segregate constant pool caches by size. (#86832 ) If you write a 32- and a 64-bit LDR instruction that both refer to the same constant or symbol using the = syntax: ``` ldr w0, =something ldr x1, =something ``` then the first call to `ConstantPool::addEntry` will insert the constant into its cache of existing entries, and the second one will find the cache entry and reuse it. This results in a 64-bit load from a 32-bit constant, reading nonsense into the other half of the target register. In this patch I've done the simplest fix: include the size of the constant pool entry as part of the key used to index the cache. So now 32- and 64-bit constant loads will never share a constant pool entry. There's scope for doing this better, in principle: you could imagine merging the two slots with appropriate overlap, so that the 32-bit load loads the LSW of the 64-bit value. But that's much more complicated: you have to take endianness into account, and maybe also adjust the size of an existing entry. This is the simplest fix that restores correctness.	2024-03-28 08:57:27 +00:00
Orlando Cazalet-Hyams	2a2fd488b6	[RemoveDIs] Update DIBuilder C API and OCaml bindings [2/2] (#86529 ) Follow on from #84915 which adds the DbgRecord function variants. The C API changes were reviewed in #85657. # C API Update the LLVMDIBuilderInsert... functions to insert DbgRecords instead of debug intrinsics. LLVMDIBuilderInsertDeclareBefore LLVMDIBuilderInsertDeclareAtEnd LLVMDIBuilderInsertDbgValueBefore LLVMDIBuilderInsertDbgValueAtEnd Calling these functions will now cause an assertion if the module is in the wrong debug info format. They should only be used when the module is in "new debug format". Use LLVMIsNewDbgInfoFormat to query and LLVMSetIsNewDbgInfoFormat to change the debug info format of a module. Please see https://llvm.org/docs/RemoveDIsDebugInfo.html#c-api-change (RemoveDIsDebugInfo.md) for more info. # OCaml bindings Add set_is_new_dbg_info_format and is_new_dbg_info_format to the OCaml bindings. These can be used to set and query the current debug info mode. These will eventually be removed, but are useful while we're transitioning between old and new debug info formats. Add string_of_lldbgrecord, like string_of_llvalue but prints DbgRecords. In test dbginfo.ml, unconditionally set the module debug info to the new mode and update CHECK lines to check for DbgRecords. Without this change the test crashes because it attempts to insert DbgRecords (new default behaviour of llvm_dibuild_insert_declare_...) into a module that is in the old debug info mode.	2024-03-28 08:54:27 +00:00
Haojian Wu	63ea5a4088	[clang] Invalidate the alias template decl if it has multiple written template parameter lists. (#85413 ) Fixes #85406. - Set the invalid bit for alias template decl where it has multiple written template parameter lists (as the AST node is ill-formed) - don't perform CTAD for invalid alias template decls	2024-03-28 09:13:26 +01:00
Haohai Wen	38f5596fed	[LoopRotate] Add test to track update for inaccurate branch weight (#86495 ) Branch weight from sample-based PGO may be not inaccurate due to sampling. This test tracks such case where updateBranchWeights wraps unsigned.	2024-03-28 15:33:01 +08:00
Vyacheslav Levytskyy	b7ac8fddb5	[SPIR-V] Improve type inference: deduce types of composite data structures (#86782 ) This PR improves type inference in general and deduces types of composite data structures in particular. Also added a way to insert a bitcast to make a fun call valid in case of arguments types mismatch due to opaque pointers type inference. The attached test `pointers/nested-struct-opaque-pointers.ll` demonstrates new capabilities: the SPIRV code emitted for this test is now (1) valid in a sense of data field types and (2) accepted by `spirv-val`. More strict LIT checks, support of more composite data structures and improvement of fun calls from the perspective of type correctness are main todo's at the moment.	2024-03-28 08:08:06 +01:00
Petr Hosek	e5b9399494	[libc] Move baremetal write_to_stderr implementation to io.cpp (#86890 ) This is required to avoid multiple definitions error.	2024-03-27 23:59:24 -07:00
hchandel	5dfc446d75	[RISCV] Remove Unnecessary Semicolon. NFC (#86911 ) Removes Unnecessary Semicolon Co-authored-by: Harsh Chandel <hchandel@hu-hchandel-hyd.qualcomm.com>	2024-03-27 23:13:47 -07:00
Kazu Hirata	ed801ab460	[Transforms] Fix an unused variable warning llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h:89:28: error: private field 'LTOPhase' is not used [-Werror,-Wunused-private-field]	2024-03-27 23:11:16 -07:00
Lei Wang	f8bab38b6d	[CSSPGO] Fix the issue of missing callee profile matches (#85715 ) Two fixes related to the callee/inlinee profile: 1. Fix the bug that the matching results are missing to distribute to the callee profiles (should be pass-by-reference). 2. Narrow imported function matching to checksum mismatched functions. More context: before we run matchings for all imported functions even checksums are matched, however, after we fix 1), we got a regression, it's likely due to the matching is not no-op for checksum matched function, so we want to make it consistent to only run matching for checksum mismatched (imported)functions. Since the metadata(pseudo_probe_desc) are dropped for imported function, we leverage the function attribute mechanism and add a new function attribute(`profile-checksum-mismatch`) to transfer the info from pre-link to post-link.	2024-03-27 22:27:22 -07:00
Fangrui Song	a41bfea5c0	[MC] Simplify ELFObjectWriter. NFC And fix `if (hasRelocationAddend())` to `usesRela` to properly treat SHT_LLVM_CALL_GRAPH_PROFILE as SHT_REL. The incorrect does not cause a problem because the synthesized SHT_LLVM_CALL_GRAPH_PROFILE has zero addends.	2024-03-27 22:10:11 -07:00
Heejin Ahn	6b7ecc7979	Revert "[WebAssembly] Remove threwValue comparison after __wasm_setjmp_test (#86633 )" This reverts commit `52431fdb1a`. The PR assumed `__threwValue` couldn't be 0, but it could be when the thrown thing is not a longjmp but an exception, so that `if` check was actually necessary.	2024-03-28 04:41:29 +00:00
Owen Pan	d9e3e11ae5	[clang-format] Exit clang-format-diff only after all diffs are printed (#86776 ) See https://github.com/llvm/llvm-project/pull/70883#issuecomment-2020811077.	2024-03-27 21:23:37 -07:00
Owen Pan	e766f87b92	[clang-format] Handle C++ Core Guidelines suppression tags (#86458 ) Fixes #86451.	2024-03-27 21:22:57 -07:00
Job Henandez Lara	056b404354	[libc][NFC] refactor fmin and fmax (#86718 ) Hello, So, I worked on the fmaximum and fminimum functions recently and the reviewers suggested the structure: ``` if (bitsx ...) return ...; if (bitsy ..) return ... return ...; ``` So I went ahead and did the same for fmin and fmax. I hope this isnt an issue for you all. thanks. --------- Co-authored-by: Job Hernandez <h93@protonmail.com>	2024-03-27 23:55:12 -04:00
Mingming Liu	2c7610cc43	[nfc]Make InstrProfSymtab non-copyable and non-movable (#86882 ) - The direct use case (in [1]) is to add `llvm::IntervalMap` [2] and the allocator required by IntervalMap ctor [3] to class `InstrProfSymtab` as owned members. The allocator class doesn't have a move-assignment operator; and it's going to take much effort to implement move-assignment operator for the allocator class such that the enclosing class is movable. - There is only one use of compiler-generated move-assignment operator in the repo, which is in CoverageMappingReader.cpp. Luckily it's possible to use std::unique_ptr<InstrProfSymtab> instead, so did the change. [1] https://github.com/llvm/llvm-project/pull/66825 [2] `4c2f68840e/llvm/include/llvm/ADT/IntervalMap.h (L936)` [3] `4c2f68840e/llvm/include/llvm/ADT/IntervalMap.h (L1041)`	2024-03-27 20:40:01 -07:00
Fangrui Song	070d7af0c5	[ELF] --export-dynamic: don't create dynamic sections for non-PIC static links The CloudABI (removed from Clang Driver) change from https://reviews.llvm.org/D29982 does not make sense. GNU ld and gold don't create dynamic sections for a non-PIC static link when --export-dynamic is specified. Creating dynamic sections is harmful in this scenario because we would consider undefined weak symbols preemptible and generate GLOB_DAT relocations, breaking the expectation that non-PIC static links only contain IRELATIVE relocations. In addition, there are other options that export symbols (--export-dynamic-symbol, --dynamic-list, etc). It does not make sense to special case --export-dynamic.	2024-03-27 20:04:59 -07:00
Fangrui Song	443baed56c	[ELF,test] Update tests that depend on --export-dynamic creating dynamic sections The CloudABI change from https://reviews.llvm.org/D30175 does not make sense. Update tests not to rely on the --export-dynamic behavior.	2024-03-27 20:01:30 -07:00

1 2 3 4 5 ...

494189 Commits