Commit Graph

56 Commits

Author SHA1 Message Date
Rafael Auler
0d23cbaa52 [BOLT] Initial experimental instrumentation pass
Summary:
An instrumentation pass that modifies the input binary to
generate a profile after execution finishes. It modifies branches to
increment counters stored in the process memory and injects a new
function that dumps this data to an fdata file, readable by BOLT.

This instrumentation is experimental and currently uses a naive
approach where every branch is instrumented. This is not ideal for
runtime performance, but should be good enough for us to
evaluate/debug LBR profile quality against instrumentation.

Does not support instrumenting indirect calls yet, only direct
calls, direct branches and indirect local branches.

(cherry picked from FBD15998096)
2019-06-19 20:10:49 -07:00
Rafael Auler
db02a1a142 [BOLT] Ignore empty funcs in relocation mode
Summary:
Make BOLT ignore empty functions (those containing no instructions,
despite having some space allocated to it filled with zeroes).

(cherry picked from FBD15981683)
2019-06-24 20:23:22 -07:00
Maksim Panchenko
9894de0094 [BOLT] Check instruction boundaries while populating jump tables
Summary:
Now that we populate jump tables after all functions are disassembled,
we can check for instruction boundaries corresponding to jump table
entries. No need to delegate this task to postProcessJumpTables().

(cherry picked from FBD15814762)
2019-06-13 15:31:30 -07:00
Maksim Panchenko
9e2ad3f593 [BOLT] Delay populating jump tables
Summary:
During the initial disassembly pass, only identify jump tables
without populating the contents. Later, after all functions have been
disassembled, we have a better idea of jump table boundaries and can do
a better job of populating their entries.

As a result, we no longer have embedded jump tables (i.e. a jump table
that is parter of another jump table). If we ever need to keep
sequential jump tables inseparable during the output, we can always
add such functionality later.

Fixes facebookincubator/BOLT#56.

(cherry picked from FBD15800427)
2019-06-12 18:21:02 -07:00
Maksim Panchenko
fac6a89c23 [BOLT] Better handling of address references
Summary:
We used to handle PC-relative address references differently from direct
address references. As a result, some cases, such as escaped function
label address, were not handled when dealing with absolute (non-PIC)
code. This diff moves processing of an address reference into
BinaryContext::handleAddressRef() which is called for both PIC and
non-PIC code.

(cherry picked from FBD15643535)
2019-06-04 15:30:22 -07:00
Rafael Auler
21f4303bfd Support data collection in bolted binaries
Summary:
Similarly to how the compiler relies on DWARF to map samples, so
it is possible to collect profile data in binaries optimized by PGO
techniques and retrofit data to be used in a representation of the program
that was not optimized by PGO, this diff implements an option in BOLT to
encode a table in the output binary that allows us to map data collected
in optimized binaries back to the address space used in the input binary
(where the profile is useful, since we do not support running BOLT on a
binary already optimized by BOLT). The goal is to offer an option to
support BOLT in scenarios where it is not easy to run a special deployment of
the binary with a version that was not optimized by BOLT just for data
collection.

This feature is enabled with the -enable-bat flag. BAT stands for BOLT
Address Translation, which refers to the process of mapping output to
input addresses.

(cherry picked from FBD15531860)
2019-04-12 17:33:46 -07:00
Laith Sakka
3df2c9ea1f Update SDT locations after bolt reordering
Summary: Update SDT locations in .note section to match the new location after bolt reorder the code.

(cherry picked from FBD15427779)
2019-05-17 07:58:27 -07:00
Maksim Panchenko
9ef9a7b1be [BOLT] Use regex matching for function names passed on command line
Summary:
Options such as `-print-only`, `-skip-funcs`, etc. now take regular
expressions. Internally, the option is converted to '^funcname$' form
prior to regex matching. This ensures that names without special
symbols will match exactly, i.e. "foo" will not match "foo123".

(cherry picked from FBD15551930)
2019-05-29 18:33:09 -07:00
Maksim Panchenko
e5b1d9cd8c [BOLT][NFC] Fix white space
(cherry picked from FBD15485688)
2019-05-23 15:49:36 -07:00
Maksim Panchenko
f57d3c00fc [BOLT] Better verification of jump tables
Summary:
Run analyzeIndirectBranch() using basic block boundaries instead of
running ad-hoc validation of the jump table assumptions.

(cherry picked from FBD15465034)
2019-05-22 18:14:34 -07:00
Maksim Panchenko
be344c8de7 [BOLT] Refactor handling of interproc refs
Summary:
Move handling of interprocedural references to BinaryContext.

Post-process indirect branches immediately after the CFG is built.

This is almost NFC. Since indirect branches are now post-processed
before the profile data is processed it interferes with the way the
profile data in YAML format is handled.

(cherry picked from FBD15456003)
2019-05-22 11:26:58 -07:00
Laith Saed Sakka
ca659e4336 Preserve nops that are SDT markers in binaries and disable SDT conflicting optimizations
Summary:
SDT markers that appears as nops in the assembly, are preserved and not eliminated.
Functions with SDT markers are also flagged. Inlining and folding are disabled for
functions that have SDT markers.

(cherry picked from FBD15379799)
2019-05-16 12:46:32 -07:00
Maksim Panchenko
fee61231ef [BOLT] Move JumpTable management to BinaryContext
Summary:
Make BinaryContext responsible for creation and management of
JumpTables. This will be used for detection and resolution of jump table
conflicts across functions.

(cherry picked from FBD15196017)
2019-05-02 17:42:06 -07:00
Maksim Panchenko
310b32fbe5 [BOLT] Limit jump table size by containing object
Summary:
While checking for a size of a jump table, we've used containing
section as a boundary. This worked for most cases as typically jump
tables are not marked with symbol table entries. However, the compiler
may generate objects for indirect goto's.

(cherry picked from FBD15158905)
2019-04-30 15:47:10 -07:00
Maksim Panchenko
f1dfd38dec [BOLT][NFC] Move DynoStats out of BinaryFunction
Summary: Move DynoStats into separate source files.

(cherry picked from FBD15138883)
2019-04-29 12:51:10 -07:00
Maksim Panchenko
99ef4c90c1 [BOLT] Basic support for split functions
Summary:
This adds very basic and limited support for split functions.
In non-relocation mode, split functions are ignored, while their debug
info is properly updated. No support in the relocation mode yet.

Split functions consist of a main body and one or more fragments.
For fragments, the main part is called their parent. Any fragment
could only be entered via its parent or another fragment.

The short-term goal is to correctly update debug information for split
functions, while the long-term goal is to have a complete support
including full optimization. Note that if we don't detect split
bodies, we would have to add multiple entry points via tail calls,
which we would rather avoid.

Parent functions and fragments are represented by a `BinaryFunction`
and are marked accordingly. For now they are marked as non-simple, and
thus only supported in non-relocation mode. Once we start building a
CFG, it should be a common graph (i.e. the one that includes all
fragments) in the parent function.

The function discovery is unchanged, except for the detection of
`\.cold\.` pattern in the function name, which automatically marks the
function as a fragment of another function.

Because of the local function name ambiguity, we cannot rely on the
function name to establish child fragment and parent relationship.
Instead we rely on disassembly processing.

`BinaryContext::getBinaryFunctionContainingAddress()` now returns a
parent function if an address from its fragment is passed.

There's no jump table support at the moment. Jump tables can have
source and destinations in both fragment and parent.

Parent functions that enter their fragments via C++ exception handling
mechanism are not yet supported.

(cherry picked from FBD14970569)
2019-04-16 10:24:34 -07:00
Rafael Auler
31fc56b313 [BOLT] Fix adjustFunctionBoundaries w.r.t. entry points
Summary:
Don't consider symbols in another section when processing
additional entry points for a function.

(cherry picked from FBD14962853)
2019-04-16 14:35:29 -07:00
Maksim Panchenko
8f98268518 [BOLT] Reduce warnings for non-simple functions
Summary:
If a function was already marked as non-simple, there's no reason to
issue a warning that it has a reference in the middle of an
instruction. Besides, sometimes there wouldn't be instructions
disassembled at a given entry, and the warning would be incorrect.

(cherry picked from FBD14938227)
2019-04-15 11:56:55 -07:00
Maksim Panchenko
315ae74de3 [BOLT] Include <numeric> for std::iota
Summary: Some compilers require <numeric> header.

(cherry picked from FBD14868132)
2019-04-09 21:22:41 -07:00
Maksim Panchenko
88375d311e [BOLT] Sort basic block successors for printing
Summary:
For easier analysis of the hottest targets of jump tables it helps to
have basic block successors sorted based on the taken frequency.

(cherry picked from FBD14856640)
2019-04-09 11:27:23 -07:00
Maksim Panchenko
c8a927696c [BOLT] Detect internal references into a middle of instruction
Summary:
Some instructions in assembly-written functions could reference 8-byte
constants from another instructions using 4-byte offsets, presumably to
save a couple of bytes.

Detect such cases, and skip processing such functions until we teach
BOLT how to handle references into a middle of instruction.

(cherry picked from FBD14768212)
2019-04-03 22:31:12 -07:00
Maksim Panchenko
8894853f42 [BOLT][DWARF] Dedup .debug_abbrev section patches
Summary:
When we patch .debug_abbrev we issue many duplicate patches. Instead of
storing these patches as a vector, use a hash map. This saves some
processing time and memory.

(cherry picked from FBD14691292)
2019-03-29 14:22:54 -07:00
Maksim Panchenko
297d1a4e1a [BOLT] Do not write jump table section headers
Summary:
In non-relocation mode we were accidentally emitting section headers for
every single jump table. This happened with default
`-jump-tables=basic`.

(cherry picked from FBD14653282)
2019-03-27 13:58:31 -07:00
Maksim Panchenko
17cd2034f3 [BOLT] Fix debug line info emission
Summary:
GDB does not like if the first entry in the line info table after
end_sequence entry is not marked with is_stmt. If this happens, it will
not print the correct line number information for such address. Note
that everything works fine starting with the first address marked
with is_stmt.

This could happen if the first instruction in the cold section wasn't
marked with is_stmt.

The fix is to always emit debug line info for the first instruction
in any function fragment with is_stmt flag.

(cherry picked from FBD14516629)
2019-03-18 19:22:26 -07:00
Rafael Auler
c593563d1f Do not assert on addresses read from processIndirectBranch
Summary: As part of our heuristics to decode an indirect branch, if we
suspect the branch is an indirect tail call, we add its probable target
to the BC::InterproceduralReferences vector to detect functions with
more than one entry point. However, if this probable target is not in an
allocatable section, we were asserting. Remove this assertion and
change the code to conditionally store to InterproceduralReferences
instead. The probable target could be garbage at this point because
of analyzeIndirectBranch failing to identify the load instruction that
has the memory address of the target, so we should tolerate this.

(cherry picked from FBD14432821)
2019-03-12 16:36:35 -07:00
Maksim Panchenko
ff6e21290f [BOLT] New inliner implementation
Summary:
Addresses correctness issues related to inlining.
Inlining heuristics are not part of this diff.

(cherry picked from FBD13796888)
2019-01-31 11:23:02 -08:00
Maksim Panchenko
365bd1f1c8 [BOLT] For non-simple functions always update jump tables in-place
Summary:
For non-simple function we can miss a reference to a jump table or
to an indirect goto table. If we move the jump table, the missed
reference will not get updated, and the corresponding indirect jump
will end up in the old (wrong) location. Updating the original jump
table in-place should take care of the issue.

(cherry picked from FBD13849776)
2019-01-28 13:46:18 -08:00
Maksim Panchenko
b0f7fddd35 [BOLT] Add method for better function size estimation
Summary:
Add BinaryContext::calculateEmittedSize() that ephemerally emits code
to allow precise estimation of the function size. Relaxation and
macro-op alignment adjustments are taken into account.

(cherry picked from FBD13092139)
2018-11-15 16:02:16 -08:00
Maksim Panchenko
e1b8fade7f [BOLT] Add branch priority policy for blocks with 2 successors
Summary:
On x86 the difference between long and short jump instructions could be
either 4 or 3 bytes, depending if it's a conditional jump or not.
For a basic block with 2 jump instructions, if we know that one of
the successors is in a different code region, then we can make it
a target of an unconditional jump instruction. This will save 1 byte
in case the conditional jump happens to be a short one.

(cherry picked from FBD13078139)
2018-11-14 14:43:59 -08:00
Maksim Panchenko
ce508b58c6 [BOLT] Support relocations without symbols
Summary:
lld may generate relocations without associated symbols. Instead of
rejecting binaries with such relocations, we can re-create the symbol
the relocation is against based on the extracted value.

(cherry picked from FBD10054576)
2018-09-21 12:00:20 -07:00
Rafael Auler
bd0b99c45d [BOLT] Change stub-insertion pass for AArch64
Summary:
Previously, we were expanding eligible branches with stubs. After
expansion, we were computing which stubs were unnecessary and removing them,
assuming ranges were shortening as code is removed. The problem with this
approach is that for branches that refer to code that is not managed by
BOLT, the distance to that location can increase and we can end up with an
out-of-range branch.

This rewrites the pass to be simpler, only increasing size and expanding code
with stubs as needed after each iteration, stopping when code stops increasing.
Besides this rewrite, the stub-insertion pass now supports stubs grouping
similar to what the linker does, allowing different functions to share the
same veneer that jumps to a common callee. It also fixes a bug in the previous
implementation that, in very large functions that use TBZ/TBNZ (+-32KB range),
it would mistakenly try to reuse a local stub BB that is out of range.

This includes a change to allow hot functions to be put at the end of the
.text section, closer to the heap, requiring no veneers to jump to JITted
code. And finally it enables eliminate veneers pass by default.

(cherry picked from FBD10023158)
2018-09-17 13:36:59 -07:00
Maksim Panchenko
53b72d0f2e [BOLT] Ignore symbols from non-allocatable sections
Summary:
While creating BinaryData objects we used to process all symbol table
entries. However, some symbols could belong to non-allocatable sections,
and thus we have to ignore them for the purpose of analyzing in-memory
data.

(cherry picked from FBD9666511)
2018-09-05 14:36:52 -07:00
Maksim Panchenko
708a550084 [BOLT] Fix profile after ICP
Summary:
After optimizing a target of a jump table, ICP was not updating edge
counts corresponding to that target. As a result the edge could be left
hot and negatively influence the code layout.

(cherry picked from FBD9524396)
2018-08-23 22:47:46 -07:00
Maksim Panchenko
c35dc2a386 [BOLT] Detect and handle fixed indirect branches
Summary:
Sometimes GCC can generate code where one of jump table entries
is being used by an indirect branch with a fixed memory reference,
such as "jmp *(JT+8)". If we don't convert such branches to direct ones
and move jump tables, then the indirect branch will reference the old
table value and will end up at the non-updated destination, possibly
causing a runtime crash.

This fix converts such indirect branches into direct ones.

For now we mark functions containing indirect branches with fixed
destination as non-simple to prevent unreachable code elimination
problem triggered by related dead/unreachable jump table.

(cherry picked from FBD9192363)
2018-08-06 11:22:45 -07:00
Maksim Panchenko
771d976543 [BOLT][NFC] Minor code refactoring
(cherry picked from FBD8882632)
2018-07-12 10:13:03 -07:00
Laith Saed Sakka
27f3032447 Add initial function injection support
Summary:
This diff have the API needed to inject functions using bolt.
In relocation mode injected functions are emitted between the cold and the hot functions,
In non-reloc mode injected functions are emitted a next text section.

(cherry picked from FBD8715965)
2018-07-08 12:14:08 -07:00
Rafael Auler
7aee0adbf9 [BOLT-AArch64] Create cold symbols on demand
Summary:
Rework the logic we use for managing references to constant
islands. Defer the creation of the cold versions to when we split the
function and will need them.

(cherry picked from FBD8228803)
2018-05-31 10:33:53 -07:00
Rafael Auler
12380b8b06 Fix assembly after adding entry points
Summary:
When a given function B, located after function A, references
one of A's basic blocks, it registers a new global symbol at the
reference address and update A's Labels vector via
BinaryFunction::addEntryPoint(). However, we don't update A's branch
targets at this point. So we end up with an inconsistent CFG, where the
basic block names are global symbols, but the internal branch operands
are still referencing the old local name of the corresponding blocks
that got promoted to an entry point. This patch fix this by detecting
this situation in addEntryPoint and iterating over all instructions,
looking for references to the old symbol and replacing them to use the
new global symbol (since this is now an entry point).

Fixes facebookincubator/BOLT#26

(cherry picked from FBD8728407)
2018-07-03 11:57:46 -07:00
Rafael Auler
544d1577c1 Avoid removing BBs referenced by JTs
Summary:
While removing unreachable blocks, we may decide to remove a
block that is listed as a target in a jump table entry. If we do that,
this label will be then undefined and LLVM assembler will crash.
Mitigate this for now by not removing such blocks, as we don't support
removing unnecessary jump tables yet.

Fixes facebookincubator/BOLT#20

(cherry picked from FBD8730269)
2018-07-03 17:02:33 -07:00
Laith Saed Sakka
b6c4d8e924 -- Adding Veneer elimination pass and Veneer count to dyno stats.
Summary: Create a pass that performs veneers elimination .

(cherry picked from FBD8359299)
2018-06-07 11:10:37 -07:00
Maksim Panchenko
a6a37995d9 [BOLT] Reject processing of PIE binaries
Summary:
Check if the input binary ELF type. Reject any binary not of
ET_EXEC type, including position-independent executables (PIEs).

Also print the first function containing PIC jump table.

(cherry picked from FBD8707274)
2018-06-29 21:12:55 -07:00
Maksim Panchenko
6802948028 [BOLT] Allow jump tables with 2 entries
Summary:
GCC 8 can generate jump tables with just 2 entries. Modify our heuristic
to accept it. We still assert that there's more than one entry.

(cherry picked from FBD8709416)
2018-06-30 13:30:47 -07:00
Rafael Auler
8835f90d1e [X86] Support a subset of internal calls
Summary:
Add support for functions with internal calls, necessary for
handling Intel MKL library and some code observed in google core dumper
library.

This is not optimizing these functions, but only identifying them,
running analyses to assure we will not break those functions if we move
them, and then "freezing" these functions (marking as not simple so Bolt
will not try to reorder it or touch it in any way).

(cherry picked from FBD8364381)
2018-06-11 13:18:44 -07:00
Facebook Github Bot
07353e9590 [BOLT][PR] In some cases DB could be nullptr
Summary:
When processing binary with -debug mode in some cases, BD could be nullptr. It will be better to fail later on assert than here with segfault.
Closes https://github.com/facebookincubator/BOLT/pull/18
GitHub Author: Alexander Gryanko <xpahos@gmail.com>

(cherry picked from FBD8650719)
2018-06-26 17:02:00 -07:00
Maksim Panchenko
3ab2929b36 [BOLT] Fix support for PIC jump tables
Summary:
BOLT heuristics failed to work if false PIC jump table entries were
accepted when they were pointing inside a function, but not at
an instruction boundary.

This fix checks if the destination falls at instruction boundary, and
if it does not, it truncates the jump table. This, of course, still does not
guarantee that the entry corresponds to a real destination, and we can
have "false positive" entry(ies). However, it shouldn't affect
correctness of the function, but the CFG may have edges that are never
taken. We may update an incorrect jump table entry, corresponding to an
unrelated data, and for that reason we force moving of jump tables if a
PIC jump table was detected.

(cherry picked from FBD8559588)
2018-06-20 21:43:22 -07:00
Rafael Auler
35c09dc4dd [BOLT] Add a user friendly error reporting message
Summary:
In case we fail to disassemble or to build the CFG for a
function, print instructions on bug reporting.

(cherry picked from FBD8549737)
2018-06-20 12:03:24 -07:00
Maksim Panchenko
a7d025139f Revert "[Bolt][NFC] Change capitalization s/BOLT/Bolt/g"
Summary:

(cherry picked from FBD8431879)
2018-06-14 14:27:20 -07:00
Maksim Panchenko
789162276d [Bolt][NFC] Change capitalization s/BOLT/Bolt/g
(cherry picked from FBD8373789)
2018-06-11 19:46:40 -07:00
Rafael Auler
42e6512241 [BOLT-AArch64] Detect linker stubs and address them
Summary:
In AArch64, when the binary gets large, the linker inserts
stubs with 3 instructions: ADRP to load the PC-relative address of
a page; ADD to add the offset of the page; and a branch instruction
to do an indirect jump to the contents of X16 (the linker-reserved
reg). The problem is that the linker does not issue a relocation for
this (since this is not code coming from the assembler), so BOLT has
no idea what is the real target, unless it recognizes these instructions
and extract the target by combining the operands of the instructions
from the stub. This diff does exactly that.

(cherry picked from FBD7882653)
2018-04-30 14:47:32 -07:00
Maksim Panchenko
929b0908f7 [BOLT][NFC] Move ICF pass into a separate file
Summary:
Consolidate code used by identical code folding under
Passes/IdenticalCodeFolding.cpp.

(cherry picked from FBD8109916)
2018-05-22 15:52:21 -07:00