intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-02-09 01:52:26 +08:00

Author	SHA1	Message	Date
Amir Ayupov	15fa3ba547	[BOLT][YAML] Allow unknown keys in the input (#100824 ) This ensures forward compatibility, where old BOLT versions can consume the profile created by newer versions with extra keys. Test Plan: added yaml-unknown-keys.test	2024-09-03 11:27:57 -07:00
Maksim Panchenko	abd69b3653	[BOLT] Handle internal calls in ValidateInternalCalls (#105736 ) Move handling of all internal calls into the designated pass. Preserve NOPs and mark functions as non-simple on non-X86 platforms.	2024-08-27 11:31:32 -07:00
Amir Ayupov	a79cf0228e	[MC][NFC] Use vector for GUIDProbeFunctionMap Replace unordered_map with a vector. Pre-parse the section to statically allocate storage. Use BumpPtrAllocator for FuncName strings, keep StringRef in FuncDesc. Reduces peak RSS of pseudo probe parsing from 9.08 GiB to 8.89 GiB as part of perf2bolt with a large binary. Test Plan: ``` bin/llvm-lit -sv test/tools/llvm-profgen ``` Reviewers: wlei-llvm, rafaelauler, dcci, maksfb, ayermolo Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102905	2024-08-26 09:15:53 -07:00
Amir Ayupov	ee09f7d1fc	[MC][NFC] Reduce Address2ProbesMap size Replace the map from addresses to list of probes with a flat vector containing probe references sorted by their addresses. Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from 9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary. Test Plan: ``` bin/llvm-lit -sv test/tools/llvm-profgen ``` Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102904	2024-08-26 09:14:35 -07:00
Amir Ayupov	04ebd1907c	[MC][NFC] Statically allocate storage for decoded pseudo probes and function records Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`) and function records (`InlineTreeVec`). Leverage that to also shrink sizes of `MCDecodedPseudoProbe`: - Drop Guid since it's accessible via `InlineTree`. `MCDecodedPseudoProbeInlineTree`: - Keep track of probes and inlinees using `ArrayRef`s now that probes and function records belonging to the same function are allocated contiguously. This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with 400MiB .pseudo_probe section containing 43M probes and 25M function records. Depends on: #102774 #102787 #102788 Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102789	2024-08-26 09:09:13 -07:00
Amir Ayupov	121ed07975	[MC][NFC] Count pseudo probes and function records Pre-parse pseudo probes section counting the number of probes and function records. These numbers are used in follow-up diff to pre-allocate vectors for decoded probes and inline tree nodes. Additional benefit is avoiding error handling during parsing. This pre-parsing is fast: for a 404MiB .pseudo_probe section with 43373881 probes and 25228770 function records, it only takes 0.68±0.01s. The total time of buildAddress2ProbeMap is 21s. Reviewers: dcci, maksfb, rafaelauler, wlei-llvm, ayermolo Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102774	2024-08-26 09:05:34 -07:00
Harini0924	7f3793207b	[BOLT][test] Removed the use of parentheses in BOLT tests with lit internal shell (#105720 ) This patch addresses compatibility issues with the lit internal shell by removing the use of subshell execution (parentheses and subshell syntax) in the `BOLT` tests. The lit internal shell does not support parentheses, so the tests have been refactored to use separate command invocations, with outputs redirected to temporary files where necessary. This change is relevant for enabling the lit internal shell by default, as outlined in [[RFC] Enabling the Lit Internal Shell by Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179) fixes: #102401	2024-08-23 08:20:11 -07:00
ShatianWang	cbd302410e	[BOLT] Improve BinaryFunction::inferFallThroughCounts() (#105450 ) This PR improves how basic block execution count is updated when using the BOLT option `-infer-fall-throughs`. Previously, if a 0-count fall-through edge is assigned a positive inferred count N, then the successor block's execution count will be incremented by N. Since the successor's execution count is calculated using information besides inflow sum (such as outflow sum), it likely is already correct, and incrementing it by an additional N would be wrong. This PR improves how the successor's execution count is updated by using the max over its current count and N.	2024-08-21 00:35:07 -04:00
Maksim Panchenko	8f3050684e	[BOLT] Reduce CFI warning verbosity (#105336 ) CFI programs may have more saves than restores and this is completely benign from BOLT's perspective. Reduce the verbosity and print the warning only under `-v=1` and above.	2024-08-20 13:41:19 -07:00
Harini0924	4f5d866af7	[llvm-lit] Add REQUIRES: shell to BOLT permission test for lit internal shell (#103012 ) This patch adds the `REQUIRES: shell` directive to the BOLT permission test to ensure it only runs in environments with a full-featured Unix-like shell. This change is necessary because the test relies on advanced shell capabilities that are not supported by lit's internal shell. Reasoning: The BOLT permission test uses features like running commands in the background with `&`, performing arithmetic operations, and handling special number formats (octal). These features require a more capable shell than what lit's internal shell provides. Without a proper shell, the test could fail or behave unpredictably. This change is relevant for enabling the lit internal shell by default, as outlined in [[RFC] Enabling the Lit Internal Shell by Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)	2024-08-13 19:58:59 -07:00
Connie	887f7002b6	[NFC][bolt][test] Change '\|&' to '2>&1 \|' for lit internal shell support (#102402 ) This patches changes all references to '\|&' in bolt tests to instead use the '2>&1 \|' syntax for better consistency across testing and so that lit's internal shell can be used to run these tests. This addresses a suggestion made in the comments of this RFC: https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179. Fixes https://github.com/llvm/llvm-project/issues/102388	2024-08-12 17:18:17 -07:00
Peter Jung	c1912b4dd7	[BOLT][docs] Fix typo (#98640 ) Typo: `chwon` --> `chown` Signed-off-by: Peter Jung <admin@ptr1337.dev>	2024-08-08 18:05:41 -07:00
Sayhaan Siddiqui	6aad62cf5b	[BOLT][DWARF] Add parallelization for processing of DWO debug information (#100282 ) Enables parallelization for the processing of DWO CUs.	2024-08-08 16:41:51 -07:00
Davide Italiano	e49549ff19	Revert "[BOLT] Abort on out-of-section symbols in GOT (#100801 )" This reverts commit `a4900f0d93`.	2024-08-07 20:52:19 -07:00
Vladislav Khmelevsky	445023f173	Revert "[BOLT] Move ADRRelaxationPass (#101371 )" (#102333 ) This reverts commit `750b12f06b`. The pass should run after splitting phase, but before nop removal	2024-08-07 21:03:51 +04:00
Sayhaan Siddiqui	62e894e0d7	[BOLT][DWARF][NFC] Move Arch assignment out of createBinaryContext (#102054 ) Moves the assignment of Arch out of createBinaryContext to prevent data races when parallelized.	2024-08-07 16:55:39 +00:00
Vladislav Khmelevsky	a4900f0d93	[BOLT] Abort on out-of-section symbols in GOT (#100801 ) This patch aborts BOLT execution if it finds out-of-section (section end) symbol in GOT table. In order to handle such situations properly in future, we would need to have an arch-dependent way to analyze relocations or its sequences, e.g., for ARM it would probably be ADRP + LDR analysis in order to get GOT entry address. Currently, it is also challenging because GOT-related relocation symbols are replaced to __BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems to be only? related to static binaries. For the most part, it seems that it should be handled on the linker stage, since static binary should not have GOT table at all. LLD linker with relaxations enabled would replace instruction addresses from GOT directly to target symbols, which eliminates the problem. Anyway, in order to achieve detection of such cases, this patch fixes a few things in BOLT: 1. For the end symbols, we're now using the section provided by ELF binary. Previously it would be tied with a wrong section found by symbol address. 2. The end symbols would have limited registration we would only add them in name->data GlobalSymbols map, since using address->data BinaryDataMap map would likely be impossible due to address duality of such symbols. 3. The outdated BD->getSection (currently returning refence, not pointer) check in postProcessSymbolTable is replaced by getSize check in order to allow zero-sized top-level symbols if they are located in zero-sized sections. For the most part, such things could only be found in tests, but I don't see a reason not to handle such cases. 4. Updated section-end-sym test and removed x86_64 requirement since there is no reason for this (tested on aarch64 linux) The test was provided by peterwaller-arm (thank you) in #100096 and slightly modified by me.	2024-08-07 16:26:12 +04:00
Vladislav Khmelevsky	097ddd3565	[BOLT] Fix relocations handling (#100890 ) After porting BOLT to RISCV some of the relocations were broken on both AArch64 and X86. On AArch64 the example of broken relocations would be GOT, during handling them, we should replace the symbol to __BOLT_got_zero in order to address GOT entry, not the symbol that addresses this entry. This is done further in code, so it is too early to add rel here. On X86 it is a mistake to add relocations without addend. This is the exact problem that is raised on #97937. Due to different code generation I had to use gcc-generated yaml test, since with clang I wasn't able to reproduce problem. Added tests for both architectures and made the problematic condition riscV-specific.	2024-08-07 16:25:46 +04:00
Vladislav Khmelevsky	25acc16fe2	[BOLT][RUNTIME][NFC] Fix aarch64 match (#100866 ) One of the problems related to #93151 is probably that aarch64 target might have different names in different env, so extend aarch64 cmake cpu match with different name aliases.	2024-08-07 16:23:57 +04:00
Vladislav Khmelevsky	750b12f06b	[BOLT] Move ADRRelaxationPass (#101371 ) For non-simple functions we need nop instruction to be presented to transform ADR to ADRP+ADD sequence, so run this pass before remove nops pass.	2024-08-07 16:23:38 +04:00
sinan	6c8933e1a0	[BOLT] Skip PLT search for zero-value weak reference symbols (#69136 ) Take a common weak reference pattern for example ``` __attribute__((weak)) void undef_weak_fun(); if (&undef_weak_fun) undef_weak_fun(); ``` In this case, an undefined weak symbol `undef_weak_fun` has an address of zero, and Bolt incorrectly changes the relocation for the corresponding symbol to symbol@PLT, leading to incorrect runtime behavior.	2024-08-07 18:02:42 +08:00
sinan	734c0488b6	[BOLT] Support map other function entry address (#101466 ) Allow BOLT to map the old address to a new binary address if the old address is the entry of the function.	2024-08-07 15:57:25 +08:00
Amir Ayupov	f83a89c1b1	[BOLT] Turn non-empty CFI StateStack assert into a warning (#102216 ) clang-15 can produce binaries with mismatched RememberState/RestoreState CFIs. This is benign for unwinding, so replace an assert with a warning.	2024-08-06 17:23:43 -07:00
Amir Ayupov	3f51bec466	[BOLT][NFC] Print timers in perf2bolt invocation When BOLT is run in AggregateOnly mode (perf2bolt), it exits with code zero so destructors are not run thus TimerGroup never prints the timers. Add explicit printing just before the exit to honor options requesting timers (`--time-rewrite`, `--time-aggr`). Test Plan: updated bolt/test/timers.c Reviewers: ayermolo, maksfb, rafaelauler, dcci Reviewed By: dcci Pull Request: https://github.com/llvm/llvm-project/pull/101270	2024-07-31 22:14:52 -07:00
Amir Ayupov	fb97b4f962	[BOLT][NFC] Add timers for MetadataManager invocations Test Plan: added bolt/test/timers.c Reviewers: ayermolo, maksfb, rafaelauler, dcci Reviewed By: dcci Pull Request: https://github.com/llvm/llvm-project/pull/101267	2024-07-31 22:12:34 -07:00
Sayhaan Siddiqui	910012e7c5	[BOLT][DWARF][NFC] Split DIEBuilder::finish (#101244 ) Split DIEBuilder::finish so that code updating .debug_names is in a separate function.	2024-07-31 13:41:38 -07:00
Sayhaan Siddiqui	33960ce5a8	[BOLT][DWARF] Sort GDBIndexTUEntryVector (#101264 ) Sorts GDBIndexTUEntryVector in decreasing order by hash to ensure determinism when parallelized.	2024-07-31 11:35:38 -07:00
Sayhaan Siddiqui	79dcd93b70	[BOLT][DWARF] Remove option to write to DWP (#100771 ) Remove the --write-dwp option as well as related code and tests.	2024-07-30 16:58:01 -07:00
Vladislav Khmelevsky	803eaf2926	[BOLT][NFC] Fix test requirement (#100867 ) Tests that are using instrumentation should have bolt-runtime in requirements	2024-07-27 18:44:58 +04:00
Sayhaan Siddiqui	9a3e66e314	[BOLT][DWARF][NFC] Fix DebugStrOffsetsWriter (#100672 ) Fix DebugStrOffsetsWriter so updateAddressMap can't be called after it is finalized.	2024-07-26 18:58:25 -07:00
Sayhaan Siddiqui	b33ef5bd68	[BOLT][DWARF][NFC] Add mc opt to DWARFRewriter.cpp (#100800 ) Running into an error with removing DWP where the assertion `RelaxAllView && "RegisterMCTargetOptionsFlags not created."'` failed. This is a result of DWP bringing the mc::RegisterMCTargetOptionsFlags option in, and the option being removed with DWP. The need for this option didn't originally exist because we didn't use MC in DWARFRewriter, but we switched to using DWARFStreamer which needed the option. https://reviews.llvm.org/D75579 https://reviews.llvm.org/D106417	2024-07-26 14:09:46 -07:00
Tristan Ross	5909979869	[BOLT] Fix archive output directory for standalone on Mac (#100643 ) CC @gulfemsavrun Fixes a line which wasn't changed in #97130	2024-07-25 13:29:38 -07:00
Tristan Ross	ffd6240248	[BOLT] Update Docker to use Ubuntu 24.04 (#99421 ) Updates the Dockerfile to use Ubuntu 24.04 due to CMake wanting a newer version. Can be tested by trying to build the Docker image currently in main and then try building the Docker image in this PR.	2024-07-25 08:20:57 -07:00
Tristan Ross	abc2eae682	[BOLT] Enable standalone build (#97130 ) Continue from #87196 as author did not have much time, I have taken over working on this PR. We would like to have this so it'll be easier to package for Nix. Can be tested by copying cmake, bolt, third-party, and llvm directories out into their own directory with this PR applied and then build bolt. --------- Co-authored-by: pca006132 <john.lck40@gmail.com>	2024-07-25 08:18:14 -07:00
Amir Ayupov	4d19676de4	[BOLT] Add profile-use-pseudo-probes option Move pseudo probe profile generation under --profile-use-pseudo-probes option. Note that updating pseudo probes is independent from this flag. Test Plan: updated pseudoprobe-decoding-inline.test Reviewers: maksfb, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe Pull Request: https://github.com/llvm/llvm-project/pull/100299	2024-07-24 07:31:01 -07:00
Amir Ayupov	9d2dd009b6	[BOLT] Support more than two jump table parents Multi-way splitting can cause multiple fragments to access the same jump table. Relax the assumption that a jump table can only have up to two parents. Test Plan: added bolt/test/X86/three-way-split-jt.s Reviewers: ayermolo, dcci, rafaelauler, maksfb Reviewed By: rafaelauler, dcci Pull Request: https://github.com/llvm/llvm-project/pull/99988	2024-07-24 07:16:39 -07:00
Amir Ayupov	83ea7ce3a1	[BOLT][NFC] Track fragment relationships using EquivalenceClasses Three-way splitting can create references between split fragments (warm to cold or vice versa) that are not handled by `isChildOf/isParentOf/isChildOrParentOf`. Generalize fragment relationships to allow checking if two functions belong to one group, potentially in presence of ICF which can join multiple groups. Test Plan: NFC for existing tests Reviewers: maksfb, ayermolo, rafaelauler, dcci Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/99979	2024-07-24 07:15:10 -07:00
Sayhaan Siddiqui	ea4a348098	[BOLT][DWARF][NFC] Move initialization of DWOName outside of lambda (#99728 ) Followup to the splitting of processUnitDIE, moves code that accesses common resource to be outside of the function that will be parallelized. Followup to #99957	2024-07-23 17:30:54 -07:00
Sayhaan Siddiqui	7cd7a1eab4	[BOLT][DWARF][NFC] Split processUnitDIE into two lambdas (#99957 ) Split processUnitDIE into two lambdas to separate the processing of DWO CUs and CUs in the main binary.	2024-07-23 12:59:40 -07:00
Jordan Brantner	d251a328b8	[BOLT] Fix typo from alterantive to alternative (#99704 ) Fix typo from `alterantive` -> `alternative` Signed-off-by: Jordan Brantner <brantnej@oregonstate.edu>	2024-07-22 18:35:20 -07:00
Sayhaan Siddiqui	bdee9b05de	Revert "[BOLT][DWARF][NFC] Split processUnitDIE into two lambdas" (#99904 ) Reverts llvm/llvm-project#99225	2024-07-22 12:31:51 -07:00
Fangrui Song	867faeec05	[MC] Migrate to createAsmStreamer without unused bool parameters In bolt/lib/Passes/AsmDump.cpp, the MCInstPrinter is created with false AsmVerbose. The AsmVerbose argument to createAsmStreamer is unused. Deprecate the legacy Target::createAsmStreamer overload, which might be used by downstream.	2024-07-21 09:44:16 -07:00
Fangrui Song	86e21e1af2	[BOLT] Remove unused bool arguments from createMCObjectStreamer callers	2024-07-20 21:30:49 -07:00
Sayhaan Siddiqui	6747f12931	[BOLT][DWARF][NFC] Split processUnitDIE into two lambdas (#99225 ) Split processUnitDIE into two lambdas to separate the processing of DWO CUs and CUs in the main binary.	2024-07-19 17:52:49 -07:00
Eisuke Kawashima	8bc02bf5c6	fix(bolt/**.py): fix comparison to None (#94012 ) from PEP8 (https://peps.python.org/pep-0008/#programming-recommendations): > Comparisons to singletons like None should always be done with is or is not, never the equality operators. Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>	2024-07-19 16:59:56 -07:00
klensy	1ee8238f0e	[BOLT][test] Fix Filecheck typos (#93979 ) Fixes few FileCheck typos in tests and add missing(?) filecheck call in test. Co-authored-by: klensy <nightouser@gmail.com>	2024-07-19 16:57:14 -07:00
Itis-hard2name	7f563232d6	[bolt][Docs] fix missing option in cmake of stage3 in OptimizingClang.md (#93684 ) Fixes #93681	2024-07-19 16:55:21 -07:00
Daniel Hill	b686600a57	[BOLT] Skip instruction shortening (#93032 ) Add the ability to disable the instruction shortening pass through --shorten-instructions=false	2024-07-19 16:52:01 -07:00
Sayhaan Siddiqui	d54ec64f67	[BOLT][DWARF] Remove deprecated opt (#99575 ) Remove deprecated DeterministicDebugInfo option and its uses.	2024-07-19 14:03:50 -07:00
Shaw Young	296a956369	[BOLT] Match functions with call graph (#98125 ) Implemented call graph function matching. First, two call graphs are constructed for both profiled and binary functions. Then functions are hashed based on the names of their callee/caller functions. Finally, functions are matched based on these neighbor hashes and the longest common prefix of their names. The `match-with-call-graph` flag turns this matching on. Test Plan: Added match-with-call-graph.test. Matched 164 functions in a large binary with 10171 profiled functions.	2024-07-19 14:00:28 -07:00

1 2 3 4 5 ...

2296 Commits