During InsertNegateRAState pass we check the annotations on
instructions,
to decide where to generate the OpNegateRAState CFIs in the output
binary.
As only instructions in the input binary were annotated, we have to make
a judgement on instructions generated by other BOLT passes.
Incorrect placement may cause issues when an (async) unwind request
is received during the new "unknown" instructions.
This patch adds more logic to make a more informed decision on by taking
into account:
- unknown instructions in a BasicBlock with other instruction have the
same RAState. Previously, if the BasicBlock started with an unknown
instruction,
the RAState was copied from the preceding block. Now, the RAState is
copied from
the succeeding instructions in the same block.
- Some BasicBlocks may only contain instructions with unknown RAState,
As explained in issue #160989, these blocks already have incorrect
unwind info. Because of this, the last known RAState based on the layout order
is copied.
Updated bolt/docs/PacRetDesign.md to reflect changes.
This reapplies commit 5d6d74359d.
Fix: added assertions to the requirements of the test
--------
Original commit message:
In the Inliner pass, tailcalls are converted to calls in the inlined
BasicBlock. If the tailcall is indirect, the `BR` is converted to `BLR`.
These instructions require different BTI landing pads at their targets.
As the targets of indirect tailcalls are unknown, inlining such blocks
is unsound for BTI: they should be skipped instead.
Major part of this PR is commit implementing support for DT_INIT_ARRAY
for BOLT runtime libraries initialization. Also, it adds related
hook-init test & fixes couple of X86 instrumentation tests.
This commit follows implementation of instrumentation hook via
DT_FINI_ARRAY (https://github.com/llvm/llvm-project/pull/67348) and
extends it for BOLT runtime libraries (including instrumentation
library) initialization hooking.
Initialization has has differences compared to finalization:
- Executables always use ELF entry point address. Update code checks it
and updates init_array entry if ELF is shared library (have no interp
entry) and have no DT_INIT entry. Also this commit introduces
"runtime-lib-init-hook" option to select primary initialization hook
(entry_point, init, init_array) with fall back to next available hook in
input binary. e.g. in case of libc we can explicitly set it to
init_array.
- Shared library init_array entries relocations usually has
R_AARCH64_ABS64 type on AArch64 binaries. We check relocation type and
adjust methods for reading init_array relocations in discovery and
update methods.
---------
Co-authored-by: Vasily Leonenko <vasily.leonenko@huawei.com>
In the Inliner pass, tailcalls are converted to calls in the inlined
BasicBlock. If the tailcall is indirect, the `BR` is converted to `BLR`.
These instructions require different BTI landing pads at their targets.
As the targets of indirect tailcalls are unknown, inlining such blocks
is unsound for BTI: they should be skipped instead.
Checks if an instruction is BTI, and updates the immediate value to the
newly requested variant.
This can be used in situations when the compiler already inserted a BTI
landing pad to a location, but BOLT needs to update it to a different
variant.
Example: br x0 to a location with a BTI c.
The AArch64_BTI MCPlusBuilder unittest was failing in no assertion
builds. Add `#ifndef NDEBUG` to exclude the assertion test from
no assertion builds.
- creates a BTI j|c landing pad MCInst.
- create getBTIHintNum utility in AArch64/Utils, to make sure BOLT
generates BTI immediates the same way as LLVM.
- add MCPlusBuilder unittests to cover new function.
The inliner uses DirectSP to check if a function has instructions that
modify the SP. Exceptions are stack Push and Pop instructions.
We can also allow pointer signing and authenticating instructions.
The inliner removes the Return instructions from the inlined functions.
If it is a fused pointer-authentication-and-return (e.g. RETAA), we have
to generate a new authentication instruction.
This enables building LLVM with `-mllvm -x86-asm-syntax=intel` in one's
Clang config files (i.e. a global preference for Intel syntax).
`-masm=att` is insufficient as it doesn't override a specification of `-mllvm -x86-asm-syntax`.
Pseudo probe matching (#100446) needs callee information for call probes.
Embed call probe information (probe id, inline tree node, indirect flag)
into CallSiteInfo. As a consequence:
- Remove call probes from PseudoProbeInfo to avoid duplication, making
it only contain block probes.
- Probe grouping across inline tree nodes becomes more potent + allows
to unambiguously elide block id 1 (common case).
Block mask (blx) encoding becomes a low-ROI optimization and will be
replaced by a more compact encoding leveraging simplified PseudoProbeInfo
in #166680.
The size increase is ~3% for an XL profile (461->475MB). Compact block
probe encoding shrinks it by ~6%.
Test Plan: updated pseudoprobe-decoding-{inline,noinline}.test
Reviewers: paschalis-mpeis, ayermolo, yota9, yozhu, rafaelauler, maksfb
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/165490
Slice .debug_str from the DWP for each CU using .debug_str_offsets and
emit it, instead of directly copying the global .debug_str, in order to
address the bloat issue of DWO after updates. (more details here -
#155766 )
Add more heuristics to check if a basic block is an AArch64 epilogue. We
assume instructions that load from stack or adjust stack pointer as
valid epilogue code sequence if and only if they immediately precede the
branch instruction that ends the basic block.
- unify isRAStateSigned and isRAStateUnsigned to a common getRAState,
- unify setRASigned and setRAUnsigned into setRAState(MCInst, bool),
- update users of these to match the new implementations.
Add `RSeqRewriter` to detect code references from `__rseq_cs` section
and ignore function referenced from that section. Code references are
detected via relocations (static or dynamic).
Note that the abort handler is preceded by a 4-byte signature byte
sequence and we cannot relocate the handler without that the signature,
otherwise the application may crash. Thus we are ignoring the function,
i.e. making sure it's not separated from its signature.
In addition to tracking offsets inside a `BinaryFunction` that are
referenced by data relocations, we need to track those relocations too.
Plus, we will need to map symbols referenced by such relocations back to
the containing function.
This change introduces `BinaryFunction::InternalRefDataRelocations` to
track the aforementioned relocations and expands
`BinaryContext::SymbolToFunctionMap` to include local/temp symbols
involved in relocation processing.
There is no functional change introduced that should affect the output.
Future PRs will use the new tracking capabilities.
Remove internal undefined symbol tracking and instead rely on the
emission state of `MCSymbol` while processing data-to-code relocations.
Note that `CleanMCState` pass resets the state of all `MCSymbol`s prior
to code emission.
In C++17, static constexpr members are implicitly inline, so they no
longer require an out-of-line definition.
Identified with readability-redundant-declaration.
We are skipping four zero's at a time when validating code padding in
case that the next zero would be part of an instruction or constant
island, and for functions that have large amount of padding (like due to
hugify), this could be very slow. We now change the validation to skip
as many as possible but still need to be 4's exact multiple number of
zero's. No valid instruction has encoding as 0x00000000 and even if we
stumble into some constant island, the API
`BinaryFunction::isInConstantIsland()` has been made to find the size
between the asked address and the end of island (#164037), so this
should be safe.
When the SPE Previous Branch Target address (FEAT_SPE_PBT) feature is
available, an SPE sample by combining this PBT feature, has two entries.
Arm SPE records SRC/DEST addresses of the latest sampled branch
operation, and it stores into the first entry. PBT records the target
address of most recently taken branch in program order before the
sampled operation, it places into the second entry. They are formed a
chain of two consecutive branches.
Where:
- The previous branch operation (PBT) is always taken.
- In SPE entry, the current source branch (SRC) may be either
fall-through or taken, and the target address (DEST) of the recorded
branch operation is always what was architecturally executed.
However PBT doesn't provide as much information as SPE does. It lacks
those information such as the address of source branch, branch type, and
prediction bit. These information are always filled with zero in PBT
entry. Therefore Bolt cannot evaluate the prediction, and source branch
fields, it leaves them zero during the aggregation process.
Tests includes a fully expanded example.
Enumeration of relocation types is not always sequential, e.g. on
AArch64 the first real relocation type is 0x101. As such, the existing
code in `Relocation::print()` was crashing while printing AArch64
relocations. Fix it by using `llvm::object::getELFRelocationTypeName()`.
`R_AARCH64_ADD_ABS_LO12_NC` is for the `ADD` instruction in the
`ADRP+ADD` sequence. For `ADRP+LDR` sequence generated in LDR
relaxation, relocation type for `LDR` should be
`R_AARCH64_LDST64_ABS_LO12_NC` if it is 64-bit integer load or
`R_AARCH64_LDST32_ABS_LO12_NC` if 32-bit.
Sorry should have included this in #165787.
The search should proceed from CallInst to the beginning of BB since X2
can be rewritten and we need to catch the most recent write before the
call.
Patch by Yafet Beyene alulayafet@gmail.com
BOLT expects pre-aggregated profile entries to be unique, which holds
for externally aggregated traces (or branches+fall-through ranges).
Therefore, BOLT doesn't merge duplicate entries for faster processing.
However, such traces are not expressly prohibited and could come from
concatenated pre-aggregated profiles or otherwise.
Relax the assumption about no duplicate (branch-only) traces in fall-
through imputing.
Test Plan: updated callcont-fallthru.s
Replace the current `ADRRelaxationPass` with `AArch64RelaxationPass`,
which, besides the existing ADR relaxation, will also run LDR relaxation
that for now only handles these two forms of LDR instructions:
`ldr Xt, [label]` and `ldr Wt, [label]`.
Since the "--write-dwp" option has been removed in
[PR](https://github.com/llvm/llvm-project/pull/100771), this patch also
cleans up the corresponding document and test to avoid misleading
issues.
Avoid cloning constant island helps to reduce app size, especially for
BOLT optimization in which cloning would happen when a function is split
into multiple fragments. Add an option to make the cloning optional, and
we will introduce a new pass to handle the reference too far error that
may result from disabling constant island cloning (#165787).
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.
For more context, see https://github.com/llvm/llvm-project/pull/163117.
This patch introduces StringSetTag, a dedicated empty struct to serve
as the "value type" for llvm::StringSet. This change is part of an
effort to reduce the use of std::nullopt_t outside the context of
std::optional.
The [previous patch](https://github.com/llvm/llvm-project/pull/163418)
has added a check to prevent adding an entry point into a constant
island, but only for successfully disassembled functions.
Because scanExternalRefs() is also called when a function fails to be
disassembled or is skipped, it can still attempt to add an entry point
at constant islands. The same issue may occur if without a check for it
So, this patch complements the 'constant island' check in
scanExternalRefs().
Equal number of blocks in a function/instructions in a block between
stale profile and the binary isn't used in the matching.
Remove these stats to declutter the output.
Test Plan: NFC
The pass calls setIgnored() on functions in parallel, but setIgnored is
not thread safe. This patch adds a std::mutex to guard setIgnored calls.
Fixes: #165362
Using the DWP's cu_index/tu_index only loads the DWO units from the
.debug_info.dwo section for hash, which works fine in DWARF5. However,
tu_index points to .debug_types.dwo section in DWARF4, which can cause
the type unit to be lost due to the incorrect loading target. (Related
discussion in
[811b60f](811b60f0b9))
This patch supports to get the type unit for hash from .debug_types.dwo
section in DWARF4.
The `--nl` flag, originally for Non-LBR mode, is deprecated and will be
replaced by `--basic-events` (alias `--ba`).
`--nl` remains as a deprecated alias for backward compatibility.
There are cases where `addEntryPointAtOffset` is called with a given
`Offset` that points to an address within a constant island. This
triggers `assert(!isInConstantIsland(EntryPointAddress)` and causes BOLT
to crash. This patch adds a check which ignores functions that would add
such entry points and warns the user.
Update `.Cases` and `.CasesLower` with 4+ args to use the
`initializer_list` overload. The deprecation of these functions will
come in a separate PR.
For more context, see: https://github.com/llvm/llvm-project/pull/163405.
Update guides to use brstack, with a mention to BRBE for AArch64. Use
brstack in user-facing outputs.
---------
Co-authored-by: Amir Ayupov <aaupov@fb.com>