Commit Graph

2296 Commits

Author SHA1 Message Date
Amir Ayupov
15fa3ba547 [BOLT][YAML] Allow unknown keys in the input (#100824)
This ensures forward compatibility, where old BOLT versions can consume
the profile created by newer versions with extra keys.

Test Plan: added yaml-unknown-keys.test
2024-09-03 11:27:57 -07:00
Maksim Panchenko
abd69b3653 [BOLT] Handle internal calls in ValidateInternalCalls (#105736)
Move handling of all internal calls into the designated pass. Preserve
NOPs and mark functions as non-simple on non-X86 platforms.
2024-08-27 11:31:32 -07:00
Amir Ayupov
a79cf0228e [MC][NFC] Use vector for GUIDProbeFunctionMap
Replace unordered_map with a vector. Pre-parse the section to statically
allocate storage. Use BumpPtrAllocator for FuncName strings, keep
StringRef in FuncDesc.

Reduces peak RSS of pseudo probe parsing from 9.08 GiB to 8.89 GiB as
part of perf2bolt with a large binary.

Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```

Reviewers: wlei-llvm, rafaelauler, dcci, maksfb, ayermolo

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102905
2024-08-26 09:15:53 -07:00
Amir Ayupov
ee09f7d1fc [MC][NFC] Reduce Address2ProbesMap size
Replace the map from addresses to list of probes with a flat vector
containing probe references sorted by their addresses.

Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from
9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary.

Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```

Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102904
2024-08-26 09:14:35 -07:00
Amir Ayupov
04ebd1907c [MC][NFC] Statically allocate storage for decoded pseudo probes and function records
Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`)
and function records (`InlineTreeVec`).

Leverage that to also shrink sizes of `MCDecodedPseudoProbe`:
- Drop Guid since it's accessible via `InlineTree`.

`MCDecodedPseudoProbeInlineTree`:
- Keep track of probes and inlinees using `ArrayRef`s now that probes
  and function records belonging to the same function are allocated
  contiguously.

This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing
time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with
400MiB .pseudo_probe section containing 43M probes and 25M function
records.

Depends on:
#102774
#102787
#102788

Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102789
2024-08-26 09:09:13 -07:00
Amir Ayupov
121ed07975 [MC][NFC] Count pseudo probes and function records
Pre-parse pseudo probes section counting the number of probes and
function records. These numbers are used in follow-up diff to
pre-allocate vectors for decoded probes and inline tree nodes.

Additional benefit is avoiding error handling during parsing.

This pre-parsing is fast: for a 404MiB .pseudo_probe section with
43373881 probes and 25228770 function records, it only takes 0.68±0.01s.
The total time of buildAddress2ProbeMap is 21s.

Reviewers: dcci, maksfb, rafaelauler, wlei-llvm, ayermolo

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102774
2024-08-26 09:05:34 -07:00
Harini0924
7f3793207b [BOLT][test] Removed the use of parentheses in BOLT tests with lit internal shell (#105720)
This patch addresses compatibility issues with the lit internal shell by
removing the use of subshell execution (parentheses and subshell syntax)
in the `BOLT` tests. The lit internal shell does not support
parentheses, so the tests have been refactored to use separate command
invocations, with outputs redirected to temporary files where necessary.

This change is relevant for enabling the lit internal shell by default,
as outlined in [[RFC] Enabling the Lit Internal Shell by
Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)

fixes: #102401
2024-08-23 08:20:11 -07:00
ShatianWang
cbd302410e [BOLT] Improve BinaryFunction::inferFallThroughCounts() (#105450)
This PR improves how basic block execution count is updated when using
the BOLT option `-infer-fall-throughs`. Previously, if a 0-count
fall-through edge is assigned a positive inferred count N, then the
successor block's execution count will be incremented by N. Since the
successor's execution count is calculated using information besides
inflow sum (such as outflow sum), it likely is already correct, and
incrementing it by an additional N would be wrong. This PR improves how
the successor's execution count is updated by using the max over its
current count and N.
2024-08-21 00:35:07 -04:00
Maksim Panchenko
8f3050684e [BOLT] Reduce CFI warning verbosity (#105336)
CFI programs may have more saves than restores and this is completely
benign from BOLT's perspective. Reduce the verbosity and print the
warning only under `-v=1` and above.
2024-08-20 13:41:19 -07:00
Harini0924
4f5d866af7 [llvm-lit] Add REQUIRES: shell to BOLT permission test for lit internal shell (#103012)
This patch adds the `REQUIRES: shell` directive to the BOLT permission
test to ensure it only runs in environments with a full-featured
Unix-like shell. This change is necessary because the test relies on
advanced shell capabilities that are not supported by lit's internal
shell.

**Reasoning:** The BOLT permission test uses features like running
commands in the background with `&`, performing arithmetic operations,
and handling special number formats (octal). These features require a
more capable shell than what lit's internal shell provides. Without a
proper shell, the test could fail or behave unpredictably.

This change is relevant for enabling the lit internal shell by default,
as outlined in [[RFC] Enabling the Lit Internal Shell by
Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)
2024-08-13 19:58:59 -07:00
Connie
887f7002b6 [NFC][bolt][test] Change '|&' to '2>&1 |' for lit internal shell support (#102402)
This patches changes all references to '|&' in bolt tests to instead use
the '2>&1 |' syntax for better consistency across testing and so that
lit's internal shell can be used to run these tests. This addresses a
suggestion made in the comments of this RFC:
https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179.

Fixes https://github.com/llvm/llvm-project/issues/102388
2024-08-12 17:18:17 -07:00
Peter Jung
c1912b4dd7 [BOLT][docs] Fix typo (#98640)
Typo:

`chwon` --> `chown`

Signed-off-by: Peter Jung <admin@ptr1337.dev>
2024-08-08 18:05:41 -07:00
Sayhaan Siddiqui
6aad62cf5b [BOLT][DWARF] Add parallelization for processing of DWO debug information (#100282)
Enables parallelization for the processing of DWO CUs.
2024-08-08 16:41:51 -07:00
Davide Italiano
e49549ff19 Revert "[BOLT] Abort on out-of-section symbols in GOT (#100801)"
This reverts commit a4900f0d93.
2024-08-07 20:52:19 -07:00
Vladislav Khmelevsky
445023f173 Revert "[BOLT] Move ADRRelaxationPass (#101371)" (#102333)
This reverts commit 750b12f06b.
The pass should run after splitting phase, but before nop removal
2024-08-07 21:03:51 +04:00
Sayhaan Siddiqui
62e894e0d7 [BOLT][DWARF][NFC] Move Arch assignment out of createBinaryContext (#102054)
Moves the assignment of Arch out of createBinaryContext to prevent data
races when parallelized.
2024-08-07 16:55:39 +00:00
Vladislav Khmelevsky
a4900f0d93 [BOLT] Abort on out-of-section symbols in GOT (#100801)
This patch aborts BOLT execution if it finds out-of-section (section
end) symbol in GOT table. In order to handle such situations properly in
future, we would need to have an arch-dependent way to analyze
relocations or its sequences, e.g., for ARM it would probably be ADRP +
LDR analysis in order to get GOT entry address. Currently, it is also
challenging because GOT-related relocation symbols are replaced to
__BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems
to be only? related to static binaries. For the most part, it seems that
it should be handled on the linker stage, since static binary should not
have GOT table at all. LLD linker with relaxations enabled would replace
instruction addresses from GOT directly to target symbols, which
eliminates the problem.

Anyway, in order to achieve detection of such cases, this patch fixes a
few things in BOLT:
1. For the end symbols, we're now using the section provided by ELF
binary. Previously it would be tied with a wrong section found by symbol
address.
2. The end symbols would have limited registration we would only
add them in name->data GlobalSymbols map, since using address->data
BinaryDataMap map would likely be impossible due to address duality of
such symbols.
3. The outdated BD->getSection (currently returning refence, not
pointer) check in postProcessSymbolTable is replaced by getSize check in
order to allow zero-sized top-level symbols if they are located in
zero-sized sections. For the most part, such things could only be found
in tests, but I don't see a reason not to handle such cases.
4. Updated section-end-sym test and removed x86_64 requirement since
there is no reason for this (tested on aarch64 linux)

The test was provided by peterwaller-arm (thank you) in #100096 and
slightly modified by me.
2024-08-07 16:26:12 +04:00
Vladislav Khmelevsky
097ddd3565 [BOLT] Fix relocations handling (#100890)
After porting BOLT to RISCV some of the relocations were broken on both
AArch64 and X86.
On AArch64 the example of broken relocations would be GOT, during
handling them, we should replace the symbol to __BOLT_got_zero in order
to address GOT entry, not the symbol that addresses this entry. This is
done further in code, so it is too early to add rel here.
On X86 it is a mistake to add relocations without addend. This is the
exact problem that is raised on #97937. Due to different code generation
I had to use gcc-generated yaml test, since with clang I wasn't able to
reproduce problem.
Added tests for both architectures and made the problematic condition
riscV-specific.
2024-08-07 16:25:46 +04:00
Vladislav Khmelevsky
25acc16fe2 [BOLT][RUNTIME][NFC] Fix aarch64 match (#100866)
One of the problems related to #93151 is probably that aarch64 target
might have different names in different env, so extend aarch64 cmake cpu
match with different name aliases.
2024-08-07 16:23:57 +04:00
Vladislav Khmelevsky
750b12f06b [BOLT] Move ADRRelaxationPass (#101371)
For non-simple functions we need nop instruction to be presented to
transform ADR to ADRP+ADD sequence, so run this pass before remove nops
pass.
2024-08-07 16:23:38 +04:00
sinan
6c8933e1a0 [BOLT] Skip PLT search for zero-value weak reference symbols (#69136)
Take a common weak reference pattern for example
```
    __attribute__((weak)) void undef_weak_fun();
    
      if (&undef_weak_fun)
        undef_weak_fun();
```
    
In this case, an undefined weak symbol `undef_weak_fun` has an address
of zero, and Bolt incorrectly changes the relocation for the
corresponding symbol to symbol@PLT, leading to incorrect runtime
behavior.
2024-08-07 18:02:42 +08:00
sinan
734c0488b6 [BOLT] Support map other function entry address (#101466)
Allow BOLT to map the old address to a new binary address if the old
address is the entry of the function.
2024-08-07 15:57:25 +08:00
Amir Ayupov
f83a89c1b1 [BOLT] Turn non-empty CFI StateStack assert into a warning (#102216)
clang-15 can produce binaries with mismatched RememberState/RestoreState
CFIs. This is benign for unwinding, so replace an assert with a warning.
2024-08-06 17:23:43 -07:00
Amir Ayupov
3f51bec466 [BOLT][NFC] Print timers in perf2bolt invocation
When BOLT is run in AggregateOnly mode (perf2bolt), it exits with code
zero so destructors are not run thus TimerGroup never prints the timers.

Add explicit printing just before the exit to honor options requesting
timers (`--time-rewrite`, `--time-aggr`).

Test Plan: updated bolt/test/timers.c

Reviewers: ayermolo, maksfb, rafaelauler, dcci

Reviewed By: dcci

Pull Request: https://github.com/llvm/llvm-project/pull/101270
2024-07-31 22:14:52 -07:00
Amir Ayupov
fb97b4f962 [BOLT][NFC] Add timers for MetadataManager invocations
Test Plan: added bolt/test/timers.c

Reviewers: ayermolo, maksfb, rafaelauler, dcci

Reviewed By: dcci

Pull Request: https://github.com/llvm/llvm-project/pull/101267
2024-07-31 22:12:34 -07:00
Sayhaan Siddiqui
910012e7c5 [BOLT][DWARF][NFC] Split DIEBuilder::finish (#101244)
Split DIEBuilder::finish so that code updating .debug_names is in a
separate function.
2024-07-31 13:41:38 -07:00
Sayhaan Siddiqui
33960ce5a8 [BOLT][DWARF] Sort GDBIndexTUEntryVector (#101264)
Sorts GDBIndexTUEntryVector in decreasing order by hash to ensure
determinism when parallelized.
2024-07-31 11:35:38 -07:00
Sayhaan Siddiqui
79dcd93b70 [BOLT][DWARF] Remove option to write to DWP (#100771)
Remove the --write-dwp option as well as related code and tests.
2024-07-30 16:58:01 -07:00
Vladislav Khmelevsky
803eaf2926 [BOLT][NFC] Fix test requirement (#100867)
Tests that are using instrumentation should have bolt-runtime in
requirements
2024-07-27 18:44:58 +04:00
Sayhaan Siddiqui
9a3e66e314 [BOLT][DWARF][NFC] Fix DebugStrOffsetsWriter (#100672)
Fix DebugStrOffsetsWriter so updateAddressMap can't be called after it
is finalized.
2024-07-26 18:58:25 -07:00
Sayhaan Siddiqui
b33ef5bd68 [BOLT][DWARF][NFC] Add mc opt to DWARFRewriter.cpp (#100800)
Running into an error with removing DWP where the assertion
`RelaxAllView &&
"RegisterMCTargetOptionsFlags not created."'` failed. This is a result
of DWP bringing the mc::RegisterMCTargetOptionsFlags option in, and the
option being removed with DWP. The need for this option didn't
originally exist because we didn't use MC in DWARFRewriter, but we
switched to using DWARFStreamer which needed the option.

https://reviews.llvm.org/D75579 
https://reviews.llvm.org/D106417
2024-07-26 14:09:46 -07:00
Tristan Ross
5909979869 [BOLT] Fix archive output directory for standalone on Mac (#100643)
CC @gulfemsavrun

Fixes a line which wasn't changed in #97130
2024-07-25 13:29:38 -07:00
Tristan Ross
ffd6240248 [BOLT] Update Docker to use Ubuntu 24.04 (#99421)
Updates the Dockerfile to use Ubuntu 24.04 due to CMake wanting a newer
version. Can be tested by trying to build the Docker image currently in
main and then try building the Docker image in this PR.
2024-07-25 08:20:57 -07:00
Tristan Ross
abc2eae682 [BOLT] Enable standalone build (#97130)
Continue from #87196 as author did not have much time, I have taken over
working on this PR. We would like to have this so it'll be easier to
package for Nix.

Can be tested by copying cmake, bolt, third-party, and llvm directories
out into their own directory with this PR applied and then build bolt.

---------

Co-authored-by: pca006132 <john.lck40@gmail.com>
2024-07-25 08:18:14 -07:00
Amir Ayupov
4d19676de4 [BOLT] Add profile-use-pseudo-probes option
Move pseudo probe profile generation under --profile-use-pseudo-probes
option. Note that updating pseudo probes is independent from this flag.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: maksfb, rafaelauler, ayermolo, dcci, WenleiHe

Reviewed By: WenleiHe

Pull Request: https://github.com/llvm/llvm-project/pull/100299
2024-07-24 07:31:01 -07:00
Amir Ayupov
9d2dd009b6 [BOLT] Support more than two jump table parents
Multi-way splitting can cause multiple fragments to access the same jump
table. Relax the assumption that a jump table can only have up to two
parents.

Test Plan: added bolt/test/X86/three-way-split-jt.s

Reviewers: ayermolo, dcci, rafaelauler, maksfb

Reviewed By: rafaelauler, dcci

Pull Request: https://github.com/llvm/llvm-project/pull/99988
2024-07-24 07:16:39 -07:00
Amir Ayupov
83ea7ce3a1 [BOLT][NFC] Track fragment relationships using EquivalenceClasses
Three-way splitting can create references between split fragments (warm
to cold or vice versa) that are not handled by
`isChildOf/isParentOf/isChildOrParentOf`. Generalize fragment
relationships to allow checking if two functions belong to one group,
potentially in presence of ICF which can join multiple groups.

Test Plan: NFC for existing tests

Reviewers: maksfb, ayermolo, rafaelauler, dcci

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/99979
2024-07-24 07:15:10 -07:00
Sayhaan Siddiqui
ea4a348098 [BOLT][DWARF][NFC] Move initialization of DWOName outside of lambda (#99728)
Followup to the splitting of processUnitDIE, moves code that accesses
common resource to be outside of the function that will be parallelized.

Followup to #99957
2024-07-23 17:30:54 -07:00
Sayhaan Siddiqui
7cd7a1eab4 [BOLT][DWARF][NFC] Split processUnitDIE into two lambdas (#99957)
Split processUnitDIE into two lambdas to separate the processing of DWO
CUs and CUs in the main binary.
2024-07-23 12:59:40 -07:00
Jordan Brantner
d251a328b8 [BOLT] Fix typo from alterantive to alternative (#99704)
Fix typo from `alterantive` -> `alternative`

Signed-off-by: Jordan Brantner <brantnej@oregonstate.edu>
2024-07-22 18:35:20 -07:00
Sayhaan Siddiqui
bdee9b05de Revert "[BOLT][DWARF][NFC] Split processUnitDIE into two lambdas" (#99904)
Reverts llvm/llvm-project#99225
2024-07-22 12:31:51 -07:00
Fangrui Song
867faeec05 [MC] Migrate to createAsmStreamer without unused bool parameters
In bolt/lib/Passes/AsmDump.cpp, the MCInstPrinter is created with false
AsmVerbose. The AsmVerbose argument to createAsmStreamer is unused.

Deprecate the legacy Target::createAsmStreamer overload, which might be
used by downstream.
2024-07-21 09:44:16 -07:00
Fangrui Song
86e21e1af2 [BOLT] Remove unused bool arguments from createMCObjectStreamer callers 2024-07-20 21:30:49 -07:00
Sayhaan Siddiqui
6747f12931 [BOLT][DWARF][NFC] Split processUnitDIE into two lambdas (#99225)
Split processUnitDIE into two lambdas to separate the processing of DWO
CUs and CUs in the main binary.
2024-07-19 17:52:49 -07:00
Eisuke Kawashima
8bc02bf5c6 fix(bolt/**.py): fix comparison to None (#94012)
from PEP8
(https://peps.python.org/pep-0008/#programming-recommendations):

> Comparisons to singletons like None should always be done with is or
is not, never the equality operators.

Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
2024-07-19 16:59:56 -07:00
klensy
1ee8238f0e [BOLT][test] Fix Filecheck typos (#93979)
Fixes few FileCheck typos in tests and add missing(?) filecheck call in
test.

Co-authored-by: klensy <nightouser@gmail.com>
2024-07-19 16:57:14 -07:00
Itis-hard2name
7f563232d6 [bolt][Docs] fix missing option in cmake of stage3 in OptimizingClang.md (#93684)
Fixes #93681
2024-07-19 16:55:21 -07:00
Daniel Hill
b686600a57 [BOLT] Skip instruction shortening (#93032)
Add the ability to disable the instruction shortening pass through
--shorten-instructions=false
2024-07-19 16:52:01 -07:00
Sayhaan Siddiqui
d54ec64f67 [BOLT][DWARF] Remove deprecated opt (#99575)
Remove deprecated DeterministicDebugInfo option and its uses.
2024-07-19 14:03:50 -07:00
Shaw Young
296a956369 [BOLT] Match functions with call graph (#98125)
Implemented call graph function matching. First, two call graphs are
constructed for both profiled and binary functions. Then functions are
hashed based on the names of their callee/caller functions. Finally,
functions are matched based on these neighbor hashes and the 
longest common prefix of their names. The `match-with-call-graph` 
flag turns this matching on.

Test Plan: Added match-with-call-graph.test. Matched 164 functions 
in a large binary with 10171 profiled functions.
2024-07-19 14:00:28 -07:00