Commit Graph

299 Commits

Author SHA1 Message Date
Maksim Panchenko
adaca1348e [BOLT] Introduce getOutputBinaryFunctions(). NFCI (#172174)
To gain better control over the functions that go into the output file
and their order, introduce `BinaryContext::getOutputBinaryFunctions()`.

The new API returns a modifiable list of functions in output order.

This list is filled by a new `PopulateOutputFunctions` pass and includes
emittable functions from the input file, plus functions added by BOLT
(injected functions).

The new functionality allows to freely intermix input functions with
injected ones in the output, which will be used in new PRs.

The new function replaces `BinaryContext::getSortedFunctions()`, but
unlike its predecessor, it includes injected functions in the returned
list.
2025-12-14 16:29:01 -08:00
Maksim Panchenko
28ff941a2c [BOLT][AArch64] Always cover veneers in lite mode (#171534)
If a veneer is not disassembled in lite mode, the veneer elimination
pass will not recognize it as such and the call to such veneer will
remain unchanged.

Later, we may need to insert a new veneer for such code ending up with a
double veneer.

To avoid such suboptimal code generation, always disassemble veneers and
guarantee that they are converted to direct calls in BOLT.
2025-12-10 09:54:37 -08:00
Jinjie Huang
33e0301b07 [BOLT] Add validation for direct call/branch targets (#165406)
In some edge cases, a binary may contain direct `branch` or `call`
instructions whose target do not point to a valid executable
instruction. This can occur due to compiler bugs, hand-written assembly,
obfuscation technique, **or when control flow targets a data by
mistake.**

We also encountered the problems as described in this
[issue](https://github.com/llvm/llvm-project/issues/149382), where "data
in code" within OpenSSL's hand-written assembly was misidentified as
instructions(island identification seems fail due to the absence of a
corresponding data symbol). The problem occurred because a data sequence
was incorrectly disassembled as a "jb" instruction.

The point here is that the data should not be pointed to by any edge, so
this patch tries to address this by validating the destination address
for **direct branches and calls**. If the target instruction is
invalid(implies a corrupted control flow), this function will be set
ignored.

Although this approach appears helpful for addressing the 'data in code'
problem, its validation might be compromised if the data can be
disassembled as normal instruction.
2025-12-09 16:17:19 +08:00
Vasily Leonenko
c4def4631c [BOLT] Allow missing DT_FINI{,_ARRAY} if instrumentation-sleep-time is used (#170086)
This PR allows instrument binaries without the .fini and .fini_array
entries if the user passes the `instrumentation-sleep-time` option.
The `.fini` or `.fini_array` entries are used to hook into the process
finalization process and write a profile during finalization. However,
with the `instrumentation-sleep-time` option, the profile should be
written periodically, without the need for it to be written at
finalization.

Co-authored-by: Vasily Leonenko <vasily.leonenko@huawei.com>
2025-12-02 10:41:12 +03:00
Vasily Leonenko
a751ed97ac [BOLT] Support runtime library hook via DT_INIT_ARRAY (#167467)
Major part of this PR is commit implementing support for DT_INIT_ARRAY
for BOLT runtime libraries initialization. Also, it adds related
hook-init test & fixes couple of X86 instrumentation tests.

This commit follows implementation of instrumentation hook via
DT_FINI_ARRAY (https://github.com/llvm/llvm-project/pull/67348) and
extends it for BOLT runtime libraries (including instrumentation
library) initialization hooking.

Initialization has has differences compared to finalization:
- Executables always use ELF entry point address. Update code checks it
and updates init_array entry if ELF is shared library (have no interp
entry) and have no DT_INIT entry. Also this commit introduces
"runtime-lib-init-hook" option to select primary initialization hook
(entry_point, init, init_array) with fall back to next available hook in
input binary. e.g. in case of libc we can explicitly set it to
init_array.
- Shared library init_array entries relocations usually has
R_AARCH64_ABS64 type on AArch64 binaries. We check relocation type and
adjust methods for reading init_array relocations in discovery and
update methods.

---------

Co-authored-by: Vasily Leonenko <vasily.leonenko@huawei.com>
2025-12-01 10:55:00 +03:00
Maksim Panchenko
5490bcf4aa [BOLT] Add missing new line. NFC 2025-11-25 00:05:13 -08:00
YongKang Zhu
ac6daa8181 [BOLT][print] Add option '--print-only-file' (NFC) (#168023)
With this option we can pass to BOLT names of functions to be printed
through a file instead of specifying them all on command line.
2025-11-14 10:26:21 -08:00
Maksim Panchenko
f2c50f9305 [BOLT] Support restartable sequences in tcmalloc (#167195)
Add `RSeqRewriter` to detect code references from `__rseq_cs` section
and ignore function referenced from that section. Code references are
detected via relocations (static or dynamic).

Note that the abort handler is preceded by a 4-byte signature byte
sequence and we cannot relocate the handler without that the signature,
otherwise the application may crash. Thus we are ignoring the function,
i.e. making sure it's not separated from its signature.
2025-11-09 12:43:50 -08:00
Maksim Panchenko
af456dfa11 [BOLT] Refactor tracking internals of BinaryFunction. NFCI (#167074)
In addition to tracking offsets inside a `BinaryFunction` that are
referenced by data relocations, we need to track those relocations too.
Plus, we will need to map symbols referenced by such relocations back to
the containing function.

This change introduces `BinaryFunction::InternalRefDataRelocations` to
track the aforementioned relocations and expands
`BinaryContext::SymbolToFunctionMap` to include local/temp symbols
involved in relocation processing.

There is no functional change introduced that should affect the output.
Future PRs will use the new tracking capabilities.
2025-11-08 00:31:03 -08:00
Kazu Hirata
bddab8359e [BOLT] Remove redundant declarations (NFC) (#166893)
In C++17, static constexpr members are implicitly inline, so they no
longer require an out-of-line definition.

Identified with readability-redundant-declaration.
2025-11-07 07:58:24 -08:00
Maksim Panchenko
cd27741c11 [BOLT] Remove CreatePastEnd parameter in getOrCreateLocalLabel(). NFC (#165065)
CreatePastEnd parameter had no effect on the label creation. Remove it.
2025-10-25 22:16:15 -07:00
Jakub Kuderski
d86da4efee [ADT] Prepare for deprecation of StringSwitch cases with 4+ args. NFC. (#164173)
Update `.Cases` and `.CasesLower` with 4+ args to use the
`initializer_list` overload. The deprecation of these functions will
come in a separate PR.

For more context, see: https://github.com/llvm/llvm-project/pull/163405.
2025-10-20 12:03:46 -04:00
Christian Clauss
0fc05aa1c6 [bolt] Fix typos discovered by codespell (#124726)
https://github.com/codespell-project/codespell
```bash
codespell bolt --skip="*.yaml,Maintainers.txt" --write-changes \
    --ignore-words-list=acount,alledges,ans,archtype,defin,iself,mis,mmaped,othere,outweight,vas
```
2025-10-14 14:45:40 +02:00
Amir Ayupov
ab897ae2e0 [BOLT] Support fragment symbol mapped to the parent address (#162727)
Observed in GCC-produced binary. Emit a warning for the user.

Test Plan: added bolt/test/X86/fragment-alias.s
2025-10-09 15:24:37 -07:00
Maksim Panchenko
6fe878a0e2 [BOLT] Modify warning when --use-old-text fails. NFC (#162731)
When --use-old-text fails, we are emitting all code meant for the
original `.text` section into the new section. This could be more bytes
compared to those emitted under no `--use-old-text`, especially under
`--lite`. As a result, `--use-old-text` results in a larger binary, not
smaller which could be confusing to the user.

Add more information to the warning, including recommendation to rebuild
without `--use-old-text` for smaller binary size.
2025-10-09 14:40:48 -07:00
Gergely Bálint
889bfd9172 Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353) (#162435)
Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing
binaries with pac-ret hardening (#120064)" (#162353)

This reverts commit c7d776b068.

#120064 was reverted for breaking builders.

Fix: changed the mismatched type in MarkRAStates.cpp to `auto`.

---

Original message:

OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
whether the current return address has been signed with PAC.

OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.

This patch introduces two new passes:

- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
  instruction based on OpNegateRAState CFIs in the input binary.

- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
  new OpNegateRAState CFIs where RA state changes between instructions.

Design details are described in: `bolt/docs/PacRetDesign.md`.
2025-10-08 11:05:41 +02:00
Gergely Bálint
c7d776b068 Revert "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353)
Reverts llvm/llvm-project#120064.

@gulfemsavrun reported that the patch broke toolchain builders.
2025-10-07 21:59:18 +02:00
Gergely Bálint
32eaf5b59c [BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening (#120064)
OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
if the current return address has been signed with PAC.

OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.

This patch introduces two new passes:

- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
  instruction based on OpNegateRAState CFIs in the input binary.

- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
  new OpNegateRAState CFIs where RA state changes between instructions.

Design details are described in: `bolt/docs/PacRetDesign.md`.
2025-10-07 10:22:14 +02:00
Maksim Panchenko
de9b3cabb2 [BOLT] Always treat function entry as code (#160161)
If an address has both, a data marker "$d" and a function symbol
associated with it, treat it as code.
2025-10-06 14:29:52 -07:00
Paschalis Mpeis
224a7176dc [BOLT][AArch64] Refuse to run CDSplit pass (#159351)
LongJmp does not support warm blocks.
On builds without assertions, this may lead to unexpected crashes.

This patch exits with a clear message.
2025-10-03 09:13:09 +01:00
Gergely Bálint
441f0c7c9a [BOLT] Add GNUPropertyRewriter and warn on AArch64 BTI note (#161206)
This commit adds the GNUPropertyRewriter, which parses features from the
.note.gnu.property section.

Currently we only read the bit indicating BTI support
(GNU_PROPERTY_AARCH64_FEATURE_1_BTI).

As BOLT does not add BTI landing pads to targets of indirect
branches/calls, we have to emit a warning that the output binary may be
corrupted.
2025-10-03 09:45:56 +02:00
Maksim Panchenko
d39095b193 [BOLT] Remove unused parameter. NFC (#161617)
`Skip` parameter not used/set inside `analyzeRelocation()`.
2025-10-01 22:21:12 -07:00
YongKang Zhu
5923004f81 [BOLT] Don't check address past end of function for data/code marker annotation (#159210)
We want to annotate function with data and code markers
whose addresses fall within the range of the function, so
setting `CheckPastEnd` to false.
2025-09-25 13:14:02 -07:00
YongKang Zhu
a9c1ae8672 [BOLT][AArch64] Fix another cause of extra entry point misidentification (#155055) 2025-08-27 00:15:46 -07:00
YafetBeyene
fda24dbc16 [BOLT] Add dump-dot-func option for selective function CFG dumping (#153007)
## Change:
* Added `--dump-dot-func` command-line option that allows users to dump
CFGs only for specific functions instead of dumping all functions (the
current only available option being `--dump-dot-all`)

## Usage:
* Users can now specify function names or regex patterns (e.g.,
`--dump-dot-func=main,helper` or `--dump-dot-func="init.*`") to generate
.dot files only for functions of interest
* Aims to save time when analysing specific functions in large binaries
(e.g., only dumping graphs for performance-critical functions identified
through profiling) and we can now avoid reduce output clutter from
generating thousands of unnecessary .dot files when analysing large
binaries

## Testing
The introduced test `dump-dot-func.test` confirms the new option does
the following:

- [x] 1. `dump-dot-func` can correctly filter a specified functions
- [x] 2. Can achieve the above with regexes
- [x] 3. Can do 1. with a list of functions
- [x] No option specified creates no dot files
- [x] Passing in a non-existent function generates no dumping messages
- [x] `dump-dot-all` continues to work as expected
2025-08-22 10:51:09 +01:00
YongKang Zhu
5c4f506cca [BOLT] Validate extra entry point by querying data marker symbols (#154611)
Look up marker symbols and decide whether candidate is really
extra entry point in `adjustFunctionBoundaries()`.
2025-08-20 14:18:56 -07:00
Maksim Panchenko
0d9b9d1eef [BOLT] Keep X86 HLT instruction as a terminator in user mode (#154402)
This is a follow-up to #150963. X86 HLT instruction may appear in the
user-level code, in which case we should treat it as a terminator.
Handle it as a non-terminator in the Linux kernel mode.
2025-08-19 14:41:13 -07:00
Maksim Panchenko
1e0edb072a [BOLT][AArch64] Compensate for missing code markers (#151060)
Code written in assembly can have missing code markers. In BOLT, we can
compensate by recognizing that a function entry point should start a
code sequence.

Seen such code in lua jit library.
2025-07-29 12:01:06 -07:00
Amir Ayupov
a850912de1 [BOLT] Require CFG in BAT mode (#150488)
`getFallthroughsInTrace` requires CFG for functions not covered by BAT,
even in BAT/fdata mode. BAT-covered functions go through special
handling in fdata (`BAT->getFallthroughsInTrace`) and YAML
(`DataAggregator::writeBATYAML`) modes.

Since all modes (BAT/no-BAT, YAML/fdata) now need disassembly/CFG
construction:
- drop special BAT/fdata handling that omitted disassembly/CFG in
  `RewriteInstance::run`, enabling *CFG for all non-BAT functions*,
- switch `getFallthroughsInTrace` to check if a function has CFG,
- which *allows emitting profile for non-simple functions* in all modes.

Previously, traces in non-simple functions were reported as invalid/
mismatching disassembled function contents. This change reduces the
number of such invalid traces and increases the number of profiled
functions. These functions may participate in function reordering via
call graph profile.

Test Plan: updated unclaimed-jt-entries.s
2025-07-25 13:54:37 +02:00
Maksim Panchenko
d5d94ba8bc [BOLT] More refactoring of PHDR handling. NFC (#148932)
Replace ad-hoc adjustment of the program header count with info from the
new segment list.
2025-07-24 11:47:27 -07:00
Maksim Panchenko
218fd69261 [BOLT] Decouple new segment creation from PHDR rewrite. NFCI (#146111)
Refactor handling of PHDR table rewrite to make modifications easier.
2025-07-02 11:22:12 -07:00
Maksim Panchenko
ad7d675991 [BOLT] Refactor mapCodeSections(). NFC (#146434)
Factor out non-relocation specific code into a separate function.
2025-06-30 17:09:41 -07:00
Maksim Panchenko
fb24b4d46a [BOLT] Push code to higher addresses under options (#146180)
When --hot-functions-at-end is used in combination with --use-old-text,
allocate code at the highest possible addresses withing old .text.

This feature is mostly useful for HHVM, where it is beneficial to have
hot static code placed as close as possible to jitted code.
2025-06-28 13:53:56 -07:00
Maksim Panchenko
d00c83ef22 [BOLT] Skip creation of new segments (#146023)
When all section contents are updated in-place, we can skip creation of
new segment(s), save disk space, and free up low memory addresses.

Currently, this feature only works with --use-gnu-stack.
2025-06-27 09:12:08 -07:00
Maksim Panchenko
4308292d1e [BOLT] Refactor NewTextSegmentAddress handling (#145950)
Refactor the code for NewTextSegmentAddress to correctly point at the
true start of the segment when PHDR table is placed at the beginning. We
used to offset NewTextSegmentAddress by PHDR table plus cache line
alignment.

NFC for proper binaries. Some YAML binaries from our tests will diverge
due to bad segment address/offset alignment.
2025-06-26 12:09:11 -07:00
Amir Ayupov
f0d32575a1 [BOLT][NFCI] Use FileSymbols for local symbol disambiguation (#89088)
Remove SymbolToFileName mapping from every local symbol to its
containing FILE symbol name, and reuse FileSymbols to disambiguate
local symbols instead.

Also removes the check for `ld-temp.o` file symbol which was added to
prevent LTO build mode from affecting the disambiguated name. This may
cause incompatibility when using the profile collected on a binary built
in a different mode than the input binary.

Addresses #90661.

Speeds up discover file objects by 5-10% for large binaries:
- binary with ~1.2M symbols: 12.6422s -> 12.0297s
- binary with ~4.5M symbols: 48.8851s -> 43.7315s
2025-06-20 14:29:32 -07:00
Amir Ayupov
4959e8a1da [BOLT][NFCI] Use heuristic for matching split global functions (#90429)
This change speeds up fragment matching for large BOLTed binaries where
all fragments of global parent functions are put under `bolt-pseudo.o`
file symbol:
- before: iterating over symbols under `bolt-pseudo.o` only to fail
  to find a parent,
- after: bail out immediately and use a global parent by name.

Test Plan: NFC, updated register-fragments-bolt-symbols.s
2025-06-20 12:46:56 -07:00
Maksim Panchenko
06f13f8684 [BOLT] Fix references in ignored functions in CFG state (#140678)
When we call setIgnored() on functions that already have CFG built,
these functions are not going to get emitted and we risk missing
external function references being updated.

To mitigate the potential issues, run scanExternalRefs() on such
functions to create patches/relocations.

Since scanExternalRefs() relies on function relocations, we have to
preserve relocations until the function is emitted. As a result, the
memory overhead without debug info update could reach up to 2%.
2025-06-02 12:33:54 -07:00
Maksim Panchenko
c9022a29b4 [BOLT][AArch64] Detect veneers with missing data markers (#142069)
The linker may omit data markers for long absolute veneers causing BOLT
to treat data as code. Detect such veneers and introduce data markers
artificially before BOLT's disassembler kicks in.
2025-05-29 19:24:34 -07:00
Kazu Hirata
d08833f23f [BOLT] Remove unused local variables (NFC) (#140421)
While I'm at it, this patch removes GetExecutablePath, which becomes
unused after removing the sole use.
2025-05-17 17:43:29 -07:00
Amir Ayupov
9d5d715330 [BOLT][heatmap] Add synthetic hot text section (#139824)
In heatmap mode, report samples and utilization of the section(s)
between hot text markers `[__hot_start, __hot_end)`.

The intended use is with multi-way splitting where there are several
sections that contain "hot" code (e.g. `.text.warm` with CDSplit).

Addresses the comment on #139193

https://github.com/llvm/llvm-project/pull/139193#pullrequestreview-2835274682

Test Plan: updated heatmap-preagg.test
2025-05-14 09:47:14 -07:00
Amir Ayupov
0289ca09be [BOLT] Print heatmap from perf2bolt (#139194)
Add perf2bolt `--heatmap` option to produce heatmaps during profile
aggregation.

Distinguish exclusive mode (`llvm-bolt-heatmap`) and optional mode 
(`perf2bolt --heatmap`), which impacts perf.data handling:
exclusive mode covers all addresses, whereas optional mode consumes
attached profile only covering function addresses.

Test Plan: updated per2bolt tests:
- pre-aggregated-perf.test: pre-aggregated data,
- bolt-address-translation-yaml.test: pre-aggregated + BOLTed input,
- perf_test.test: no-LBR perf data.
2025-05-13 13:23:18 -07:00
Amir Ayupov
7f4febde10 [BOLT][heatmap] Compute section utilization and partition score (#139193)
Heatmap groups samples into buckets of configurable size (`--block-size`
flag with 64 bytes as the default =X86 cache line size). Buckets are
mapped to containing sections; for buckets that cover multiple sections,
they are attributed to the first overlapping section. Buckets not mapped
to a section are reported as unmapped.

Heatmap reports **section hotness** which is a percentage of samples
attributed to the section.

Define **section utilization** as a percentage of buckets with non-zero
samples relative to the total number of section buckets.

Also define section **partition score** as a product of section hotness
(where total excludes unmapped buckets) and mapped utilization, ranging 
from 0 to 1 (higher is better).

The intended use of new metrics is with **production profile** collected
from BOLT-optimized binary. In this case the partition score of .text
(hot text if function splitting is enabled) reflects **optimization
profile** representativeness and the quality of hot-cold splitting.
Partition score of 1 means that all samples fall into hot text, and all
buckets (cache lines) in hot text are exercised, equivalent to perfect
hot-cold splitting.

Test Plan: updated heatmap-preagg.test
2025-05-13 13:20:13 -07:00
Kazu Hirata
d5b170c39b [BOLT] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#139403) 2025-05-10 13:39:15 -07:00
YongKang Zhu
316a6ff3d0 [BOLT][RelVTable] Skip special handling on non virtual function pointer relocations (#137406)
Besides virtual function pointers vtable could contain other kinds of
entries like those for RTTI data that also require relocations. We need
to skip special handling on relocations for non virtual function pointers
in relative vtable.

Co-authored-by: Maksim Panchenko <maks@meta.com>
2025-04-29 08:13:44 -07:00
Rafael Auler
3bcb724903 [BOLT] Add --custom-allocation-vma flag (#136385)
Add an advanced-user flag so we are able to rewrite binaries when we fail
to identify a suitable location to put new code. User then can supply a
custom location via --custom-allocation-vma. This happens more obviously if the
binary has segments mapped to very high addresses.
2025-04-18 21:02:09 -07:00
Rafael Auler
5c4e6c6113 [BOLT] Don't choke on nobits symbols (#136384) 2025-04-18 17:29:24 -07:00
wangjue
dbb79c30c9 [BOLT][Instrumentation] Initial instrumentation support for RISCV64 (#133882)
This patch adds code generation for RISCV64 instrumentation.The work
    involved includes the following three points:

a) Implements support for instrumenting direct function call and jump
    on RISC-V which relies on , Atomic instructions
    (used to increment counters) are only available on RISC-V when the A
    extension is used.

b) Implements support for instrumenting direct function inderect call
    by implementing the createInstrumentedIndCallHandlerEntryBB and
createInstrumentedIndCallHandlerExitBB interfaces. In this process, we
    need to accurately record the target address and IndCallID to ensure
    the correct recording of the indirect call counters.

c)Implemented the RISCV64 Bolt runtime library, implemented some system
call interfaces through embedded assembly. Get the difference between
runtime addrress of .text section andstatic address in section header
table, which in turn can be used to search for indirect call
description.

However, the community code currently has problems with relocation in
    some scenarios, but this has nothing to do with instrumentation. We
    may continue to submit patches to fix the related bugs.
2025-04-16 23:01:00 -07:00
alekuz01
38faf32d23 [BOLT] Enable hugify for AArch64 (#117158)
Add required hugify instrumentation and runtime libraries support for AArch64.
Fixes #58226
Unblocks #62695
2025-04-15 12:59:05 +01:00
YongKang Zhu
2a83c0cc13 [BOLT] Support relative vtable (#135449)
To handle relative vftable, which is enabled with clang option
`-fexperimental-relative-c++-abi-vtables`, we look for PC relative
relocations whose fixup locations fall in vtable address ranges.
For such relocations, actual target is just virtual function itself,
and the addend is to record the distance between vtable slot for
target virtual function and the first virtual function slot in vtable,
which is to match generated code that calls virtual function. So
we can skip the logic of handling "function + offset" and directly
save such relocations for future fixup after new layout is known.
2025-04-14 10:24:47 -07:00