Commit Graph

2725 Commits

Author SHA1 Message Date
Gergely Bálint
29fef3a51e [BOLT] Improve DWARF CFI generation for pac-ret binaries (#163381)
During InsertNegateRAState pass we check the annotations on
instructions,
to decide where to generate the OpNegateRAState CFIs in the output
binary.

As only instructions in the input binary were annotated, we have to make
a judgement on instructions generated by other BOLT passes.
Incorrect placement may cause issues when an (async) unwind request
is received during the new "unknown" instructions.

This patch adds more logic to make a more informed decision on by taking
into account:
- unknown instructions in a BasicBlock with other instruction have the
same RAState. Previously, if the BasicBlock started with an unknown
instruction,
the RAState was copied from the preceding block. Now, the RAState is
copied from
  the succeeding instructions in the same block.
- Some BasicBlocks may only contain instructions with unknown RAState,
As explained in issue #160989, these blocks already have incorrect
unwind info. Because of this, the last known RAState based on the layout order
is copied.

Updated bolt/docs/PacRetDesign.md to reflect changes.
2025-12-01 12:00:31 +01:00
Gergely Bálint
8e6fb0ee84 Reapply "[BOLT][BTI] Skip inlining BasicBlocks containing indirect tailcalls" (#169881) (#169929)
This reapplies commit 5d6d74359d.

Fix: added assertions to the requirements of the test

--------

Original commit message:

In the Inliner pass, tailcalls are converted to calls in the inlined
BasicBlock. If the tailcall is indirect, the `BR` is converted to `BLR`.

These instructions require different BTI landing pads at their targets.

As the targets of indirect tailcalls are unknown, inlining such blocks
is unsound for BTI: they should be skipped instead.
2025-12-01 10:20:23 +01:00
Vasily Leonenko
a751ed97ac [BOLT] Support runtime library hook via DT_INIT_ARRAY (#167467)
Major part of this PR is commit implementing support for DT_INIT_ARRAY
for BOLT runtime libraries initialization. Also, it adds related
hook-init test & fixes couple of X86 instrumentation tests.

This commit follows implementation of instrumentation hook via
DT_FINI_ARRAY (https://github.com/llvm/llvm-project/pull/67348) and
extends it for BOLT runtime libraries (including instrumentation
library) initialization hooking.

Initialization has has differences compared to finalization:
- Executables always use ELF entry point address. Update code checks it
and updates init_array entry if ELF is shared library (have no interp
entry) and have no DT_INIT entry. Also this commit introduces
"runtime-lib-init-hook" option to select primary initialization hook
(entry_point, init, init_array) with fall back to next available hook in
input binary. e.g. in case of libc we can explicitly set it to
init_array.
- Shared library init_array entries relocations usually has
R_AARCH64_ABS64 type on AArch64 binaries. We check relocation type and
adjust methods for reading init_array relocations in discovery and
update methods.

---------

Co-authored-by: Vasily Leonenko <vasily.leonenko@huawei.com>
2025-12-01 10:55:00 +03:00
Gergely Bálint
9bffb10e8b Revert "[BOLT][BTI] Skip inlining BasicBlocks containing indirect tailcalls" (#169881)
Reverts llvm/llvm-project#168403

The attached lit test is failing in some build configurations.
2025-11-28 10:10:53 +01:00
Alexey Moksyakov
ad605bdad7 [bolt][aarch64] Change indirect call instrumentation snippet
Indirect call instrumentation snippet uses x16 register in exit
handler to go to destination target

    __bolt_instr_ind_call_handler_func:
            msr  nzcv, x1
            ldp  x0, x1, [sp], llvm#16
            ldr  x16, [sp], llvm#16
            ldp  x0, x1, [sp], llvm#16
            br   x16	<-----

This patch adds the instrumentation snippet by calling instrumentation
runtime library through indirect call instruction and adding the wrapper
to store/load target value and the register for original indirect instruction.

Example:
            mov x16, foo

    infirectCall:
            adrp x8, Label
            add  x8, x8, #:lo12:Label
            blr x8

Before:

    Instrumented indirect call:
            stp     x0, x1, [sp, #-16]!
            mov     x0, x8
            movk    x1, #0x0, lsl llvm#48
            movk    x1, #0x0, lsl llvm#32
            movk    x1, #0x0, lsl llvm#16
            movk    x1, #0x0
            stp     x0, x1, [sp, #-16]!
            adrp    x0, __bolt_instr_ind_call_handler_func
            add     x0, x0, #:lo12:__bolt_instr_ind_call_handler_func
            blr     x0

    __bolt_instr_ind_call_handler:  (exit snippet)
            msr     nzcv, x1
            ldp     x0, x1, [sp], llvm#16
            ldr     x16, [sp], llvm#16
            ldp     x0, x1, [sp], llvm#16
            br      x16    <- overwrites the original value in X16

    __bolt_instr_ind_call_handler_func:  (entry snippet)
            stp     x0, x1, [sp, #-16]!
            mrs     x1, nzcv
            adrp    x0, __bolt_instr_ind_call_handler
            add     x0, x0, x0, #:lo12:__bolt_instr_ind_call_handler
            ldr     x0, [x0]
            cmp     x0, #0x0
            b.eq    __bolt_instr_ind_call_handler
            str     x30, [sp, #-16]!
            blr     x0     <--- runtime lib store/load all regs
            ldr     x30, [sp], llvm#16
            b       __bolt_instr_ind_call_handler

_________________________________________________________________________

After:

            mov     x16, foo
    infirectCall:
            adrp    x8, Label
            add     x8, x8, #:lo12:Label
            blr     x8

    Instrumented indirect call:
            stp     x0, x1, [sp, #-16]!
            mov     x0, x8
            movk    x1, #0x0, lsl llvm#48
            movk    x1, #0x0, lsl llvm#32
            movk    x1, #0x0, lsl llvm#16
            movk    x1, #0x0
            stp     x0, x30, [sp, #-16]!
            adrp    x8, __bolt_instr_ind_call_handler_func
            add     x8, x8, #:lo12:__bolt_instr_ind_call_handler_func
            blr     x8       <--- call trampoline instr lib
            ldp     x0, x30, [sp], llvm#16
            mov     x8, x0   <---- restore original target
            ldp     x0, x1, [sp], llvm#16
            blr     x8       <--- original indirect call instruction

    // don't touch regs besides x0, x1
    __bolt_instr_ind_call_handler:  (exit snippet)
            ret     <---- return to original function with indirect call

    __bolt_instr_ind_call_handler_func: (entry snippet)
            adrp    x0, __bolt_instr_ind_call_handler
            add     x0, x0, #:lo12:__bolt_instr_ind_call_handler
            ldr     x0, [x0]
            cmp     x0, #0x0
            b.eq    __bolt_instr_ind_call_handler
            str     x30, [sp, #-16]!
            blr     x0     <--- runtime lib store/load all regs
            ldr     x30, [sp], llvm#16
            b       __bolt_instr_ind_call_handler
2025-11-27 23:48:10 +03:00
Gergely Bálint
5d6d74359d [BOLT][BTI] Skip inlining BasicBlocks containing indirect tailcalls (#168403)
In the Inliner pass, tailcalls are converted to calls in the inlined
BasicBlock. If the tailcall is indirect, the `BR` is converted to `BLR`.

These instructions require different BTI landing pads at their targets.

As the targets of indirect tailcalls are unknown, inlining such blocks
is unsound for BTI: they should be skipped instead.
2025-11-27 16:50:38 +01:00
Gergely Bálint
cca66a21c2 [BOLT][BTI] Add MCPlusBuilder::updateBTIVariant (#167308)
Checks if an instruction is BTI, and updates the immediate value to the
newly requested variant.  
  
This can be used in situations when the compiler already inserted a BTI
landing pad to a location, but BOLT needs to update it to a different
variant.
Example: br x0 to a location with a BTI c.
2025-11-26 17:48:34 +01:00
Gergely Bálint
de4e12849b [BOLT] Fix assertion test (#169635)
The AArch64_BTI MCPlusBuilder unittest was failing in no assertion
builds. Add `#ifndef NDEBUG` to exclude the assertion test from
no assertion builds.
2025-11-26 15:26:45 +01:00
Maksim Panchenko
6c48fbc1dc [BOLT][Tests] Use AT&T assembler syntax only for X86 tests (#169541)
Enabling AT&T syntax for all tests is broken when X86 target is not
enabled as reported in #167225.
2025-11-25 11:15:24 -08:00
Gergely Bálint
4533699245 [BOLT][BTI] Add MCPlusBuilder::isBTILandingPad (#167306)
- takes both implicit and explicit BTIs into account
- fix related comment in 
   llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
2025-11-25 18:37:30 +01:00
Gergely Bálint
ed95c4d6ec [BOLT][BTI] Add MCPlusBuilder::createBTI (#167305)
- creates a BTI j|c landing pad MCInst.
- create getBTIHintNum utility in AArch64/Utils, to make sure BOLT
  generates BTI immediates the same way as LLVM.
- add MCPlusBuilder unittests to cover new function.
2025-11-25 09:51:40 +01:00
Maksim Panchenko
5490bcf4aa [BOLT] Add missing new line. NFC 2025-11-25 00:05:13 -08:00
Gergely Bálint
bab1c2971a [BOLT] Extend Inliner to work on functions with Pointer Authentication (#162458)
The inliner uses DirectSP to check if a function has instructions that
modify the SP. Exceptions are stack Push and Pop instructions.

We can also allow pointer signing and authenticating instructions.

The inliner removes the Return instructions from the inlined functions.
If it is a fused pointer-authentication-and-return (e.g. RETAA), we have
to generate a new authentication instruction.
2025-11-24 18:00:58 +01:00
Raul Tambre
58d9e47672 [NFCI][bolt][test] Use AT&T syntax explicitly (#167225)
This enables building LLVM with `-mllvm -x86-asm-syntax=intel` in one's
Clang config files (i.e. a global preference for Intel syntax).

`-masm=att` is insufficient as it doesn't override a specification of `-mllvm -x86-asm-syntax`.
2025-11-19 09:41:13 +02:00
YongKang Zhu
ac6daa8181 [BOLT][print] Add option '--print-only-file' (NFC) (#168023)
With this option we can pass to BOLT names of functions to be printed
through a file instead of specifying them all on command line.
2025-11-14 10:26:21 -08:00
Amir Ayupov
4c3e0320a1 [BOLT] Move call probe information to CallSiteInfo
Pseudo probe matching (#100446) needs callee information for call probes.
Embed call probe information (probe id, inline tree node, indirect flag)
into CallSiteInfo. As a consequence:
- Remove call probes from PseudoProbeInfo to avoid duplication, making
  it only contain block probes.
- Probe grouping across inline tree nodes becomes more potent + allows
  to unambiguously elide block id 1 (common case).

Block mask (blx) encoding becomes a low-ROI optimization and will be
replaced by a more compact encoding leveraging simplified PseudoProbeInfo
in #166680.

The size increase is ~3% for an XL profile (461->475MB). Compact block
probe encoding shrinks it by ~6%.

Test Plan: updated pseudoprobe-decoding-{inline,noinline}.test

Reviewers: paschalis-mpeis, ayermolo, yota9, yozhu, rafaelauler, maksfb

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/165490
2025-11-11 11:55:36 -08:00
Liu Ke
dee0afa048 [BOLT][DWARF] Slice .debug_str from the DWP for each CU (#159540)
Slice .debug_str from the DWP for each CU using .debug_str_offsets and
emit it, instead of directly copying the global .debug_str, in order to
address the bloat issue of DWO after updates. (more details here -
#155766 )
2025-11-11 11:46:34 +08:00
YongKang Zhu
4cd16f2a0c [BOLT][AArch64] Add more heuristics on epilogue determination (#167077)
Add more heuristics to check if a basic block is an AArch64 epilogue. We
assume instructions that load from stack or adjust stack pointer as
valid epilogue code sequence if and only if they immediately precede the
branch instruction that ends the basic block.
2025-11-10 09:50:44 -08:00
Gergely Bálint
cd68056d13 [BOLT] Simplify RAState helpers (NFCI) (#162820)
- unify isRAStateSigned and isRAStateUnsigned to a common getRAState,
- unify setRASigned and setRAUnsigned into setRAState(MCInst, bool),
- update users of these to match the new implementations.
2025-11-10 16:45:39 +01:00
Maksim Panchenko
f2c50f9305 [BOLT] Support restartable sequences in tcmalloc (#167195)
Add `RSeqRewriter` to detect code references from `__rseq_cs` section
and ignore function referenced from that section. Code references are
detected via relocations (static or dynamic).

Note that the abort handler is preceded by a 4-byte signature byte
sequence and we cannot relocate the handler without that the signature,
otherwise the application may crash. Thus we are ignoring the function,
i.e. making sure it's not separated from its signature.
2025-11-09 12:43:50 -08:00
Kazu Hirata
7b1a74cd79 [BOLT] Use DenseMap::contains (NFC) (#167169)
Identified with readability-container-contains.
2025-11-08 14:44:40 -08:00
Maksim Panchenko
af456dfa11 [BOLT] Refactor tracking internals of BinaryFunction. NFCI (#167074)
In addition to tracking offsets inside a `BinaryFunction` that are
referenced by data relocations, we need to track those relocations too.
Plus, we will need to map symbols referenced by such relocations back to
the containing function.

This change introduces `BinaryFunction::InternalRefDataRelocations` to
track the aforementioned relocations and expands
`BinaryContext::SymbolToFunctionMap` to include local/temp symbols
involved in relocation processing.

There is no functional change introduced that should affect the output.
Future PRs will use the new tracking capabilities.
2025-11-08 00:31:03 -08:00
Maksim Panchenko
7af2b56dd5 [BOLT] Refactor undefined symbols handling. NFCI (#167075)
Remove internal undefined symbol tracking and instead rely on the
emission state of `MCSymbol` while processing data-to-code relocations.

Note that `CleanMCState` pass resets the state of all `MCSymbol`s prior
to code emission.
2025-11-07 19:42:05 -08:00
Kazu Hirata
bddab8359e [BOLT] Remove redundant declarations (NFC) (#166893)
In C++17, static constexpr members are implicitly inline, so they no
longer require an out-of-line definition.

Identified with readability-redundant-declaration.
2025-11-07 07:58:24 -08:00
YongKang Zhu
6fce53af84 [BOLT][AArch64] Skip as many zeros as possible in padding validation (#166467)
We are skipping four zero's at a time when validating code padding in
case that the next zero would be part of an instruction or constant
island, and for functions that have large amount of padding (like due to
hugify), this could be very slow. We now change the validation to skip
as many as possible but still need to be 4's exact multiple number of
zero's. No valid instruction has encoding as 0x00000000 and even if we
stumble into some constant island, the API
`BinaryFunction::isInConstantIsland()` has been made to find the size
between the asked address and the end of island (#164037), so this
should be safe.
2025-11-06 09:38:25 -08:00
Ádám Kallai
a24eac88eb [BOLT] Adding a unittest that covers Arm SPE PBT aggregation (#160095)
When the SPE Previous Branch Target address (FEAT_SPE_PBT) feature is
available, an SPE sample by combining this PBT feature, has two entries.
Arm SPE records SRC/DEST addresses of the latest sampled branch
operation, and it stores into the first entry. PBT records the target
address of most recently taken branch in program order before the
sampled operation, it places into the second entry. They are formed a
chain of two consecutive branches.

Where:
- The previous branch operation (PBT) is always taken.
- In SPE entry, the current source branch (SRC) may be either
fall-through or taken, and the target address (DEST) of the recorded
branch operation is always what was architecturally executed.

However PBT doesn't provide as much information as SPE does. It lacks
those information such as the address of source branch, branch type, and
prediction bit. These information are always filled with zero in PBT
entry. Therefore Bolt cannot evaluate the prediction, and source branch
fields, it leaves them zero during the aggregation process.

Tests includes a fully expanded example.
2025-11-06 09:54:44 +00:00
Maksim Panchenko
5f1b9023a8 [BOLT][AArch64] Fix printing of relocation types (#166621)
Enumeration of relocation types is not always sequential, e.g. on
AArch64 the first real relocation type is 0x101. As such, the existing
code in `Relocation::print()` was crashing while printing AArch64
relocations. Fix it by using `llvm::object::getELFRelocationTypeName()`.
2025-11-05 12:36:57 -08:00
YongKang Zhu
b0ae054a56 [BOLT][AArch64] Fix LDR relocation type in ADRP+LDR sequence (#166391)
`R_AARCH64_ADD_ABS_LO12_NC` is for the `ADD` instruction in the
`ADRP+ADD` sequence. For `ADRP+LDR` sequence generated in LDR
relaxation, relocation type for `LDR` should be
`R_AARCH64_LDST64_ABS_LO12_NC` if it is 64-bit integer load or
`R_AARCH64_LDST32_ABS_LO12_NC` if 32-bit.

Sorry should have included this in #165787.
2025-11-05 12:01:58 -08:00
Elvina Yakubova
338fb02c98 [BOLT][NFC] Rename funtions with _negative suffix to _unknown when th… (#166536)
…e size is unknown

Keep _negative suffix only for test cases when the size is negative
2025-11-05 15:28:31 +00:00
Elvina Yakubova
a65867ac31 [BOLT][AArch64] Fix search to proceed upwards from memcpy call (#166182)
The search should proceed from CallInst to the beginning of BB since X2
can be rewritten and we need to catch the most recent write before the
call.

Patch by Yafet Beyene alulayafet@gmail.com
2025-11-05 10:51:31 +00:00
Amir Ayupov
1d0aa6c2ad [BOLT] Fix impute-fall-throughs (#166305)
BOLT expects pre-aggregated profile entries to be unique, which holds
for externally aggregated traces (or branches+fall-through ranges).
Therefore, BOLT doesn't merge duplicate entries for faster processing.
However, such traces are not expressly prohibited and could come from
concatenated pre-aggregated profiles or otherwise.

Relax the assumption about no duplicate (branch-only) traces in fall-
through imputing.

Test Plan: updated callcont-fallthru.s
2025-11-04 17:01:25 -08:00
YongKang Zhu
718a3b268f [BOLT][AArch64] Run LDR relaxation (#165787)
Replace the current `ADRRelaxationPass` with `AArch64RelaxationPass`,
which, besides the existing ADR relaxation, will also run LDR relaxation
that for now only handles these two forms of LDR instructions:
`ldr Xt, [label]` and `ldr Wt, [label]`.
2025-11-04 06:49:04 -08:00
Jinjie Huang
f7be258c28 [BOLT][NFC] Clean up the outdated option --write-dwp in doc (#166150)
Since the "--write-dwp" option has been removed in
[PR](https://github.com/llvm/llvm-project/pull/100771), this patch also
cleans up the corresponding document and test to avoid misleading
issues.
2025-11-04 18:27:53 +08:00
Rafael Auler
285b57b1a6 Update BOLT's README.md example optimization flag (#166251)
Drop hfsort in favor of a more modern function reordering algorithm.
2025-11-03 15:11:29 -08:00
YongKang Zhu
562e3bfcd4 [BOLT] Add an option for constant island cloning (#165778)
Avoid cloning constant island helps to reduce app size, especially for
BOLT optimization in which cloning would happen when a function is split
into multiple fragments. Add an option to make the cloning optional, and
we will introduce a new pass to handle the reference too far error that
may result from disabling constant island cloning (#165787).
2025-11-03 14:44:05 -08:00
Maksim Panchenko
97660c1094 [BOLT] Issue error on unclaimed PC-relative relocation (#166098)
Replace assert with an error and improve the report when unclaimed
PC-relative relocation is left in strict mode.
2025-11-03 09:19:33 -08:00
Jakub Kuderski
4c21d0cb14 [ADT] Prepare to deprecate variadic StringSwitch::Cases. NFC. (#166020)
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.

For more context, see https://github.com/llvm/llvm-project/pull/163117.
2025-11-02 00:12:33 +00:00
Kazu Hirata
03d044971e [ADT] Use a dedicated empty type for StringSet (NFC) (#165967)
This patch introduces StringSetTag, a dedicated empty struct to serve
as the "value type" for llvm::StringSet.  This change is part of an
effort to reduce the use of std::nullopt_t outside the context of
std::optional.
2025-11-01 10:41:47 -07:00
Maksim Panchenko
7c01a90545 [BOLT] Refactor handling of branch targets. NFCI (#165828)
Refactor code that verifies external branch destinations and creates
secondary entry points.
2025-10-31 08:56:30 -07:00
Jinjie Huang
6ba2127a5c [BOLT] Add constant island check in scanExternalRefs() (#165577)
The [previous patch](https://github.com/llvm/llvm-project/pull/163418)
has added a check to prevent adding an entry point into a constant
island, but only for successfully disassembled functions.

Because scanExternalRefs() is also called when a function fails to be
disassembled or is skipped, it can still attempt to add an entry point
at constant islands. The same issue may occur if without a check for it

So, this patch complements the 'constant island' check in
scanExternalRefs().
2025-10-31 10:29:00 +08:00
Amir Ayupov
04e78b4ddc [BOLT][NFC] Drop unused profile staleness stats (#165489)
Equal number of blocks in a function/instructions in a block between
stale profile and the binary isn't used in the matching.

Remove these stats to declutter the output.

Test Plan: NFC
2025-10-29 00:31:56 -07:00
Gergely Bálint
e12e0d39a7 [BOLT] Fix thread-safety of MarkRAStates (#165368)
The pass calls setIgnored() on functions in parallel, but setIgnored is
not thread safe. This patch adds a std::mutex to guard setIgnored calls.

Fixes: #165362
2025-10-28 12:43:52 +01:00
Liu Ke
8ee5c40fcf [DebugInfo] Support to get TU for hash from .debug_types.dwo section in DWARF4. (#161067)
Using the DWP's cu_index/tu_index only loads the DWO units from the
.debug_info.dwo section for hash, which works fine in DWARF5. However,
tu_index points to .debug_types.dwo section in DWARF4, which can cause
the type unit to be lost due to the incorrect loading target. (Related
discussion in
[811b60f](811b60f0b9))

This patch supports to get the type unit for hash from .debug_types.dwo
section in DWARF4.
2025-10-28 11:53:08 +08:00
Maksim Panchenko
cd27741c11 [BOLT] Remove CreatePastEnd parameter in getOrCreateLocalLabel(). NFC (#165065)
CreatePastEnd parameter had no effect on the label creation. Remove it.
2025-10-25 22:16:15 -07:00
YongKang Zhu
b35c93ffe3 [BOLT] Avoid extra function dump on invalid BBs found by UCE (NFC) (#165111) 2025-10-25 11:24:21 -07:00
Paschalis Mpeis
ae6cb98b29 [BOLT] Add --ba flag to deprecate --nl (#164257)
The `--nl` flag, originally for Non-LBR mode, is deprecated and will be
replaced by `--basic-events` (alias `--ba`).

`--nl` remains as a deprecated alias for backward compatibility.
2025-10-23 10:13:28 +01:00
YongKang Zhu
e1ae126401 [BOLT][AArch64] Validate code padding (#164037)
Check whether AArch64 function code padding is valid,
and add an option to treat invalid code padding as error.
2025-10-22 20:25:06 -07:00
Asher Dobrescu
2bbc4ae850 [BOLT] Check entry point address is not in constant island (#163418)
There are cases where `addEntryPointAtOffset` is called with a given
`Offset` that points to an address within a constant island. This
triggers `assert(!isInConstantIsland(EntryPointAddress)` and causes BOLT
to crash. This patch adds a check which ignores functions that would add
such entry points and warns the user.
2025-10-21 11:08:10 +01:00
Jakub Kuderski
d86da4efee [ADT] Prepare for deprecation of StringSwitch cases with 4+ args. NFC. (#164173)
Update `.Cases` and `.CasesLower` with 4+ args to use the
`initializer_list` overload. The deprecation of these functions will
come in a separate PR.

For more context, see: https://github.com/llvm/llvm-project/pull/163405.
2025-10-20 12:03:46 -04:00
Paschalis Mpeis
96688d4b3c [BOLT][NFC] Use brstack in guides and user outputs (#163950)
Update guides to use brstack, with a mention to BRBE for AArch64. Use
brstack in user-facing outputs.

---------

Co-authored-by: Amir Ayupov <aaupov@fb.com>
2025-10-20 09:30:06 +00:00