Commit Graph

27 Commits

Author SHA1 Message Date
Maksim Panchenko
ce508b58c6 [BOLT] Support relocations without symbols
Summary:
lld may generate relocations without associated symbols. Instead of
rejecting binaries with such relocations, we can re-create the symbol
the relocation is against based on the extracted value.

(cherry picked from FBD10054576)
2018-09-21 12:00:20 -07:00
Rafael Auler
bd0b99c45d [BOLT] Change stub-insertion pass for AArch64
Summary:
Previously, we were expanding eligible branches with stubs. After
expansion, we were computing which stubs were unnecessary and removing them,
assuming ranges were shortening as code is removed. The problem with this
approach is that for branches that refer to code that is not managed by
BOLT, the distance to that location can increase and we can end up with an
out-of-range branch.

This rewrites the pass to be simpler, only increasing size and expanding code
with stubs as needed after each iteration, stopping when code stops increasing.
Besides this rewrite, the stub-insertion pass now supports stubs grouping
similar to what the linker does, allowing different functions to share the
same veneer that jumps to a common callee. It also fixes a bug in the previous
implementation that, in very large functions that use TBZ/TBNZ (+-32KB range),
it would mistakenly try to reuse a local stub BB that is out of range.

This includes a change to allow hot functions to be put at the end of the
.text section, closer to the heap, requiring no veneers to jump to JITted
code. And finally it enables eliminate veneers pass by default.

(cherry picked from FBD10023158)
2018-09-17 13:36:59 -07:00
Maksim Panchenko
53b72d0f2e [BOLT] Ignore symbols from non-allocatable sections
Summary:
While creating BinaryData objects we used to process all symbol table
entries. However, some symbols could belong to non-allocatable sections,
and thus we have to ignore them for the purpose of analyzing in-memory
data.

(cherry picked from FBD9666511)
2018-09-05 14:36:52 -07:00
Maksim Panchenko
708a550084 [BOLT] Fix profile after ICP
Summary:
After optimizing a target of a jump table, ICP was not updating edge
counts corresponding to that target. As a result the edge could be left
hot and negatively influence the code layout.

(cherry picked from FBD9524396)
2018-08-23 22:47:46 -07:00
Maksim Panchenko
c35dc2a386 [BOLT] Detect and handle fixed indirect branches
Summary:
Sometimes GCC can generate code where one of jump table entries
is being used by an indirect branch with a fixed memory reference,
such as "jmp *(JT+8)". If we don't convert such branches to direct ones
and move jump tables, then the indirect branch will reference the old
table value and will end up at the non-updated destination, possibly
causing a runtime crash.

This fix converts such indirect branches into direct ones.

For now we mark functions containing indirect branches with fixed
destination as non-simple to prevent unreachable code elimination
problem triggered by related dead/unreachable jump table.

(cherry picked from FBD9192363)
2018-08-06 11:22:45 -07:00
Maksim Panchenko
771d976543 [BOLT][NFC] Minor code refactoring
(cherry picked from FBD8882632)
2018-07-12 10:13:03 -07:00
Laith Saed Sakka
27f3032447 Add initial function injection support
Summary:
This diff have the API needed to inject functions using bolt.
In relocation mode injected functions are emitted between the cold and the hot functions,
In non-reloc mode injected functions are emitted a next text section.

(cherry picked from FBD8715965)
2018-07-08 12:14:08 -07:00
Rafael Auler
7aee0adbf9 [BOLT-AArch64] Create cold symbols on demand
Summary:
Rework the logic we use for managing references to constant
islands. Defer the creation of the cold versions to when we split the
function and will need them.

(cherry picked from FBD8228803)
2018-05-31 10:33:53 -07:00
Rafael Auler
12380b8b06 Fix assembly after adding entry points
Summary:
When a given function B, located after function A, references
one of A's basic blocks, it registers a new global symbol at the
reference address and update A's Labels vector via
BinaryFunction::addEntryPoint(). However, we don't update A's branch
targets at this point. So we end up with an inconsistent CFG, where the
basic block names are global symbols, but the internal branch operands
are still referencing the old local name of the corresponding blocks
that got promoted to an entry point. This patch fix this by detecting
this situation in addEntryPoint and iterating over all instructions,
looking for references to the old symbol and replacing them to use the
new global symbol (since this is now an entry point).

Fixes facebookincubator/BOLT#26

(cherry picked from FBD8728407)
2018-07-03 11:57:46 -07:00
Rafael Auler
544d1577c1 Avoid removing BBs referenced by JTs
Summary:
While removing unreachable blocks, we may decide to remove a
block that is listed as a target in a jump table entry. If we do that,
this label will be then undefined and LLVM assembler will crash.
Mitigate this for now by not removing such blocks, as we don't support
removing unnecessary jump tables yet.

Fixes facebookincubator/BOLT#20

(cherry picked from FBD8730269)
2018-07-03 17:02:33 -07:00
Laith Saed Sakka
b6c4d8e924 -- Adding Veneer elimination pass and Veneer count to dyno stats.
Summary: Create a pass that performs veneers elimination .

(cherry picked from FBD8359299)
2018-06-07 11:10:37 -07:00
Maksim Panchenko
a6a37995d9 [BOLT] Reject processing of PIE binaries
Summary:
Check if the input binary ELF type. Reject any binary not of
ET_EXEC type, including position-independent executables (PIEs).

Also print the first function containing PIC jump table.

(cherry picked from FBD8707274)
2018-06-29 21:12:55 -07:00
Maksim Panchenko
6802948028 [BOLT] Allow jump tables with 2 entries
Summary:
GCC 8 can generate jump tables with just 2 entries. Modify our heuristic
to accept it. We still assert that there's more than one entry.

(cherry picked from FBD8709416)
2018-06-30 13:30:47 -07:00
Rafael Auler
8835f90d1e [X86] Support a subset of internal calls
Summary:
Add support for functions with internal calls, necessary for
handling Intel MKL library and some code observed in google core dumper
library.

This is not optimizing these functions, but only identifying them,
running analyses to assure we will not break those functions if we move
them, and then "freezing" these functions (marking as not simple so Bolt
will not try to reorder it or touch it in any way).

(cherry picked from FBD8364381)
2018-06-11 13:18:44 -07:00
Facebook Github Bot
07353e9590 [BOLT][PR] In some cases DB could be nullptr
Summary:
When processing binary with -debug mode in some cases, BD could be nullptr. It will be better to fail later on assert than here with segfault.
Closes https://github.com/facebookincubator/BOLT/pull/18
GitHub Author: Alexander Gryanko <xpahos@gmail.com>

(cherry picked from FBD8650719)
2018-06-26 17:02:00 -07:00
Maksim Panchenko
3ab2929b36 [BOLT] Fix support for PIC jump tables
Summary:
BOLT heuristics failed to work if false PIC jump table entries were
accepted when they were pointing inside a function, but not at
an instruction boundary.

This fix checks if the destination falls at instruction boundary, and
if it does not, it truncates the jump table. This, of course, still does not
guarantee that the entry corresponds to a real destination, and we can
have "false positive" entry(ies). However, it shouldn't affect
correctness of the function, but the CFG may have edges that are never
taken. We may update an incorrect jump table entry, corresponding to an
unrelated data, and for that reason we force moving of jump tables if a
PIC jump table was detected.

(cherry picked from FBD8559588)
2018-06-20 21:43:22 -07:00
Rafael Auler
35c09dc4dd [BOLT] Add a user friendly error reporting message
Summary:
In case we fail to disassemble or to build the CFG for a
function, print instructions on bug reporting.

(cherry picked from FBD8549737)
2018-06-20 12:03:24 -07:00
Maksim Panchenko
a7d025139f Revert "[Bolt][NFC] Change capitalization s/BOLT/Bolt/g"
Summary:

(cherry picked from FBD8431879)
2018-06-14 14:27:20 -07:00
Maksim Panchenko
789162276d [Bolt][NFC] Change capitalization s/BOLT/Bolt/g
(cherry picked from FBD8373789)
2018-06-11 19:46:40 -07:00
Rafael Auler
42e6512241 [BOLT-AArch64] Detect linker stubs and address them
Summary:
In AArch64, when the binary gets large, the linker inserts
stubs with 3 instructions: ADRP to load the PC-relative address of
a page; ADD to add the offset of the page; and a branch instruction
to do an indirect jump to the contents of X16 (the linker-reserved
reg). The problem is that the linker does not issue a relocation for
this (since this is not code coming from the assembler), so BOLT has
no idea what is the real target, unless it recognizes these instructions
and extract the target by combining the operands of the instructions
from the stub. This diff does exactly that.

(cherry picked from FBD7882653)
2018-04-30 14:47:32 -07:00
Maksim Panchenko
929b0908f7 [BOLT][NFC] Move ICF pass into a separate file
Summary:
Consolidate code used by identical code folding under
Passes/IdenticalCodeFolding.cpp.

(cherry picked from FBD8109916)
2018-05-22 15:52:21 -07:00
Maksim Panchenko
3af3537383 [BOLT] Properly handle non-standard function refs
Summary:
Application code can reference functions in a non-standard way, e.g.
using arithmetic and bitmask operations on them. One example is if a
program checks if a function is below a certain address or within
a certain address range to perform a low-level optimization or generate
a proper code (JIT).

Instead of relying on a relocation value (symbol+addend), we use only
the symbol value, and then check if the value is inside the function.
If it is, we treat it as a code reference against location within the
function, otherwise we handle it as a non-standard function reference
and issue a warning.

(cherry picked from FBD7996274)
2018-05-14 11:10:26 -07:00
Maksim Panchenko
56b38a14c5 [BOLT] Fix dyno-stats for PLT calls
Summary:
To accurately account for PLT optimization, each PLT call should be
counted as an extra indirect call instruction, which in turn is
a load, a call, an indirect call, and instruction entry in dyno stats.

(cherry picked from FBD7978980)
2018-05-11 15:30:56 -07:00
spupyrev
e4f39bda51 adjusting cache stats for non-simple functions
Summary:
While working with a binary in non-relocations mode, I realized
some cache metrics are not computed correctly. Hence, this fix.
In addition, logging the number of functions with modified ordering of
basic blocks, which is helpful for analysis.

(cherry picked from FBD7975392)
2018-05-11 12:03:19 -07:00
Bill Nell
729da2da22 [BOLT] Static data reordering pass.
Summary:
Enable BOLT to reorder data sections in a binary based on memory
profiling data.

This diff adds a new pass to BOLT that can reorder data sections for
better locality based on memory profiling data.  For now, the algorithm
to order data is primitive and just relies on the frequency of loads to
order the contents of a section.  We could probably do a lot better by
looking at what functions use the hot data and grouping together hot
data that is used by a single function (or cluster of functions).
Block ordering might give some hints on how to order the data better as
well.

The new pass has two basic modes: inplace and split (when inplace is
false).  The default is split since inplace hasn't really been tested
much.  When splitting is on, the cold data is copied to a "cold" version
of the section while the hot data is kept in the original section, e.g.
for .rodata, .rodata will contain the hot data and .bolt.org.rodata will
contain the cold bits.  In inplace mode, the section contents are
reordered inplace.  In either mode, all relocations to data within that
section are updated to reflect new data locations.

Things to improve:
- The current algorithm is really dumb and doesn't seem to lead to any
  wins.  It certainly could use some improvement.
- Private symbols can have data that leaks over to an adjacent symbol,
  e.g. a string that has a common suffix can start in one symbol and
  leak over (with the common suffix) into the next.  For now, we punt on
  adjacent private symbols.
- Handle ambiguous relocations better.  Section relocations that point
  to the boundary of two symbols will prevent the adjacent symbols from
  being moved because we can't tell which symbol the relocation is for.
- Handle jump tables.  Right now jump table support must be basic if
  data reordering is enabled.
- Being able to handle TLS.  A good amount of data access in some
  binaries are happening in TLS. It would be worthwhile to be able to
  reorder any TLS sections too.
- Handle sections with writeable data.  This hasn't been tested so
  probably won't work.  We could try to prevent false sharing in
  writeable sections as well.
- A pie in the sky goal would be to use DWARF info to reorder types.

(cherry picked from FBD6792876)
2018-04-20 20:03:31 -07:00
Maksim Panchenko
bdf21f7617 [BOLT] Align basic blocks based on execution count
Summary:
The default is not changing, i.e. we are not aligning code within a
function by default.

New meaning of options for aligning basic blocks:

  -align-blocks
      triggers basic block alignment based on profile

  -preserve-blocks-alignment
      tries to preserve basic block alignment seen on input

Tuning options for "-align-blocks":
  -align-blocks-min-size=<uint>
      blocks smaller than the specified size wouldn't be aligned

  -align-blocks-threshold=<uint>
      align only blocks with frequency larger than containing function
      execution frequency specified in percent. E.g. 1000 means aligning
      blocks that are 10 times more frequently executed than the containing
      function.

(cherry picked from FBD7921980)
2017-11-07 15:42:28 -08:00
Maksim Panchenko
9c6f965616 [BOLT] Getting open-source ready
Summary:
BOLT sources are being moved under tools/llvm-bolt/src
and tools/llvm-bolt will contain more files such as LICENSE.txt,
README.txt, etc.

Remove trailing white spaces from our sources.

Create llvm.patch by running

  > git diff f137ed238db11440f03083b1c88b7ffc0f4af65e include lib > \
    tools/llvm-bolt/llvm.patch

README.txt has instructions on checking out sources and applying the
patch.

(cherry picked from FBD7878380)
2018-05-04 10:10:41 -07:00