Commit Graph

67 Commits

Author SHA1 Message Date
Maksim Panchenko
db4642d0a6 [BOLT] Support -hot-text in lite mode
Summary: Update special symbol references in functions that are not emitted.

(cherry picked from FBD22120995)
2020-06-18 11:10:41 -07:00
Maksim Panchenko
0ce0bce9e7 [BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.

In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.

Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.

The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.

If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.

'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.

With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.

(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
Alexander Shaposhnikov
0823882d47 Link functions on MachO
Summary: Add first bits for linking functions on MachO.

(cherry picked from FBD21991721)
2020-06-12 20:16:27 -07:00
Maksim Panchenko
8729171182 [BOLT] Refactor profile-handling code
Summary:
This diff handles several issues related to profile reading and
handling:
  * Unifies interface used by 3 profile readers in ProfileReaderBase.
  * Adds automatic detection of the profile file contents.
  * Removes reader-specific fields from BinaryFunction and BinaryData.
    All the information is stored in instruction annotations.
  * Removes implicit memory dependencies in annotations on profile
    reader instance.
  * Adds lite mode support to YAML reader.
  * Moves profile reading code out of BinaryFunction.

(cherry picked from FBD21601411)
2020-05-07 23:00:29 -07:00
Maksim Panchenko
b62a1774af [BOLT] Cover PIC jump table reference in non-strict mode
Summary:
In non-strict relocation mode it was possible to miss a jump table
reference leading to incorrect code.

(cherry picked from FBD21251467)
2020-04-26 17:51:07 -07:00
Maksim Panchenko
ac36e17a73 [BOLT][BFC] Refactor code for adding secondary function entries
Summary:
In non-relocation mode, the code for marking a function non-simple was
decoupled from the code that added new entry points.  Fix that.

(cherry picked from FBD21264247)
2020-04-27 13:40:53 -07:00
Maksim Panchenko
5296b6d12a [BOLT] Change symbol handling for secondary function entries
Summary:
Some functions could be called at an address inside their function body.
Typically, these functions are written in assembly as C/C++ does not
have a multi-entry function concept. The addresses inside a function
body that could be referenced from outside are called secondary entry
points.

In BOLT we support processing functions with secondary/multiple entry
points. We used to mark basic blocks representing those entry points
with a special flag. There was only one problem - each basic block has
exactly one MCSymbol associated with it, and for the most efficient
processing we prefer that symbol to be local/temporary. However, in
certain scenarios, e.g. when running in non-relocation mode, we need
the entry symbol to be global/non-temporary.

We could create global symbols for secondary points ahead of time when
the entry point is marked in the symbol table. But not all such entries
are properly marked. This means that potentially we could discover an
entry point only after disassembling the code that references it, and
it could happen after a local label was already created at the same
location together with all its references. Replacing the local symbol
and updating the references turned out to be an error-prone process.

This diff takes a different approach. All basic blocks are created with
permanently local symbols. Whenever there's a need to add a secondary
entry point, we create an extra global symbol or use an existing one
at that location. Containing BinaryFunction maps a local symbol of a
basic block to the global symbol representing a secondary entry point.
This way we can tell if the basic block is a secondary entry point,
and we emit both symbols for all secondary entry points. Since secondary
entry points are quite rare, the overhead of this approach is minimal.

Note that the same location could be referenced via local symbol from
inside a function and via global entry point symbol from outside.
This is true for both primary and secondary entry points.

(cherry picked from FBD21150193)
2020-04-19 22:29:54 -07:00
Maksim Panchenko
abda7dc6a7 [BOLT] Fix ICF non-determinism in non-relocation mode
Summary:
ICF may fold functions in arbitrary order when running multi-threaded.
This is fine in relocation mode as we end up with just one function
holding all function symbols.

However, in non-relocation mode we keep all function bodies, and if we
keep merging profiles in non-deterministic order, we end up with
functions with non deterministic profiles. The fix for non-relocation
mode is to not merge profiles as the factual new profile could be
different from the merged one since both function instances are
potentially callable.

Additionally, emit extra symbols for ICF functions in non-relocation
mode to make it possible to track the folding.

(cherry picked from FBD20889866)
2020-04-04 20:12:38 -07:00
Maksim Panchenko
1f3e351a9c [BOLT] Refactor code and data emission code
Summary:
Consolidate code and data emission code in ELF-independent
BinaryEmitter. The high-level interface includes only two
functions emitBinaryContext() and emitFunctionBody() used
by RewriteInstance and BinaryContext respectively.

(cherry picked from FBD20332901)
2020-03-06 15:06:37 -08:00
Maksim Panchenko
cb9c991dcb [BOLT] Remove allow-section-relocations option
Summary: The option is not used. Remove all related code.

(cherry picked from FBD20237859)
2020-03-03 15:51:24 -08:00
Maksim Panchenko
c7e012e145 [BOLT][NFC] Get rid of BestFit parameter
Summary: The parameter is no longer used.

(cherry picked from FBD20236516)
2020-03-03 14:28:42 -08:00
Alexander Shaposhnikov
b0cbb60165 [BOLT] Fix begin decrementing
Summary: Fix begin decrementing.

(cherry picked from FBD20232474)
2020-03-03 13:36:32 -08:00
Rafael Auler
a9d85413ac [BOLT] Emit long nops by default
Summary:
Change our X86 target to use long nops by default. In general,
BOLT does not put nops into the instruction stream that is going to be
executed, since it doesn't align basic blocks, only functions. Since we
rebased BOLT, our relationship with MCAssembler changed because it
stopped using multibyte nops and we never needed to revisit that. But it
makes a difference if we want to mitigate perf issues with the Intel
JCC erratum, since the nops inserted are going to be decoded and
executed. To make MCAssembler emit long nops again, we need to explictly
set mattr (Features) of the X86 target.

(cherry picked from FBD19987277)
2020-02-19 16:13:58 -08:00
Maksim Panchenko
9711286858 [BOLT] Get rid of BinarySection::IsLocal
Summary: The flag is no longer used/needed.

(cherry picked from FBD19951571)
2020-02-18 09:20:17 -08:00
Alexander Shaposhnikov
c3c4b15a2e [BOLT] Remove BinaryContext::getFunctionData
Summary:
In this diff we refactor the code around getting the original binary encoding of function's body.
The main changes are: remove BinaryContext::getFunctionData, remove the parameter of the method BinaryFunction::disassemble, introduce BinaryFunction::getData.

(cherry picked from FBD19824368)
2020-02-10 15:35:11 -08:00
Maksim Panchenko
d57513e4ab [BOLT] Fix symbol table issue with ICF
Summary: Not all symbol table entries were updated after ICF.

(cherry picked from FBD19319685)
2020-01-08 13:32:59 -08:00
Maksim Panchenko
ac697b7d3a [BOLT] Replace list of Names with Symbols for BinaryFunction
Summary:
BinaryFunction used to have a list of Names associated with its main
entry point. However, the function is primarily identified by its
corresponding symbol or symbols, and these symbols are available as we
are creating them for a corresponding BinaryData object.

There's also no reason to emit symbols for alternative function names
(aliases), so change the code to only emit needed symbols.

When we emit a cold fragment for a function, only emit one cold symbol
for the fragment instead of one per every main entry symbol/name.

When we match a symbol to an entry point in the function, with this
change we can first go through the list of main entry symbols (now that
they are available).

(cherry picked from FBD19426709)
2020-01-13 11:56:59 -08:00
Alexander Shaposhnikov
7a59783d7a [BOLT] Move createBinaryContext to BinaryContext
Summary:
1. Move createBinaryContext to BinaryContext.
1. Add support for nonlinux triples in createBinaryContext.
2. Remove unnecessary std::move in DWARFRewriter.cpp.

(cherry picked from FBD19421314)
2020-01-15 15:23:45 -08:00
Maksim Panchenko
0283271f29 [BOLT] Do no report error on mismatched instruction encoding
Summary:
When the validation of instruction encoding fails but we are able to
continue processing the binary, do no report an error. Report encoding
format only under `-v=1`.

(cherry picked from FBD19376531)
2020-01-13 11:24:10 -08:00
Maksim Panchenko
45b27d7b44 [BOLT] Get rid of Names in BinaryData
Summary:
For BinaryData, we used to maintain a vector of StringRef names and also
a vector of pointers to MCSymbol's associated with the data. There was
an unnecessary duplication of information and an associated overhead of
keeping it in sync. Fix it by removing Names and using Symbols wherever
Names were used.

Also merge two variants of registerNameAtAddress() and remove
unreachable/dead code in the process.

(cherry picked from FBD19359123)
2020-01-10 16:17:47 -08:00
Maksim Panchenko
088e3c032a [BOLT] Improve handling of secondary function entry points
Summary:
"Fix symbol table entries for secondary entries" diff broke the inliner.

Fix the breakage and make the discovery of secondary entry points more
accurate.

Add ability to BinaryContext::getFunctionForSymbol() to return an entry
point discriminator and use it instead of calling getEntryForSymbol()
and isSecondaryEntry(). This is the preferred way since
getFunctionForSymbol() is thread-safe.

(cherry picked from FBD19295983)
2020-01-06 14:57:15 -08:00
Rafael Auler
de284bc510 [BOLT] Fix symbol table entries for secondary entries
Summary:
Commit "Support full instrumentation" changed the map
SymbolToFunction in BinaryContext to map secondary entries of functions
too. This introduced unexpected behavior in our symbol table rewriting
logic, which caused it to mistakenly write them with the address of the
original function. Fix the behavior of getBinaryFunctionAtAddress to
correct this. Also fix other users of SymbolToFunction to ensure they
are not accidentally using secondary entries when they shouldn't.

(cherry picked from FBD19168319)
2019-12-18 12:14:42 -08:00
Maksim Panchenko
3cc4fc267b [BOLT] Proper support for -trap-avx512 option
Summary:
If -trap-avx512 option is not set, verify that we correctly encode
AVX-512 instructions and treat them as ordinary instructions.

(cherry picked from FBD18666427)
2019-11-22 14:53:20 -08:00
Maksim Panchenko
a09659fd54 [BOLT] Refactor markAmbiguousRelocations()
Summary:
Refactor markAmbiguousRelocations() code and move it to BinaryContext.

Also remove a redundant check.

(cherry picked from FBD18623815)
2019-11-18 14:08:17 -08:00
Maksim Panchenko
658f270417 [BOLT] Refactor data PC relocations in BinaryContext
Summary:
We only use locations of PC relocations and ignore the rest of the data.
There's no need to store type and value.

(cherry picked from FBD18623280)
2019-11-19 18:52:08 -08:00
Maksim Panchenko
6796b7216b [BOLT] Fix jump table analysis for non-simple functions
Summary:
When we disassemble functions, we add discovered jump tables to a global
container in BinaryContext. Later, we analyze and verify all jump
tables. However, analysis for non-simple functions might fail for numerous
reasons, e.g. there would be no instruction at a destination. Since we
are not overwriting non-simple functions, it is not a critical error.
Thus, we can safely skip jump table analysis for non-simple functions.

(cherry picked from FBD18422997)
2019-11-10 21:09:01 -08:00
Maksim Panchenko
d5ddb320ef [BOLT] Free memory for CFG after emission
Summary:
Once we emit function code, we no longer need CFG for next phases
that use basic blocks for address-translation and symbol update
purposes. We free memory used by CFG and instructions. The freed
memory gets reused by later phases resulting in overall memory usage
reduction.

We can probably improve memory consumption even further by replacing
BinaryBasicBlocks with more compact data structures.

(cherry picked from FBD18408954)
2019-10-31 16:54:48 -07:00
Maksim Panchenko
103b0a77cc [BOLT] Fix non-determinism while reading debug info
Summary:
When reading debug info in parallel, CUs for functions were populated in
parallel and the order was non-deterministic. We used the first CU from
the non-deterministically-ordered list to set the line number resulting
in different outputs.

The fix is to sort the list after it's been created and before assigning
the line table unit.

(cherry picked from FBD17946889)
2019-10-14 17:57:36 -07:00
Maksim Panchenko
e9c6c73bb8 [BOLT][non-reloc] Change function splitting in non-relocation mode
Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.

For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.

After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.

One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.

Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.

As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.

New semantics for `-split-functions=<n>`:

  -split-functions - split functions into hot and cold regions
    =0 -   do not split any function
    =1 -   in non-relocation mode only split functions too large to fit
           into original code space
    =2 -   same as 1 (backwards compatibility)
    =3 -   split all functions

(cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
Rafael Auler
cc4b2fb614 [BOLT] Efficient edge profiling in instrumented mode
Summary:
Change our edge profiling technique when using instrumentation
to do not instrument every edge. Instead, build the spanning tree
for the CFG and omit instrumentation for edges in the spanning tree.
Infer the edge count for these edges when writing the profile during
run time. The inference works with a bottom-up traversal of the spanning
tree and establishes the value of the edge connecting to the parent based
on a simple flow equation involving output and input edges, where the
only unknown variable is the parent edge.

This requires some engineering in the runtime lib to support dynamic
allocation for building these graphs at runtime.

(cherry picked from FBD17062773)
2019-08-07 16:09:50 -07:00
Maksim Panchenko
f588d7a6ea [BOLT] Tighter control of jump table detection
Summary:
We were too permissive by allowing more jump tables during the
preliminary scan of memory. This allowed for jump tables to be
falsely detected. And since we didn't have a way to backtrack
the jump table creation, we had to assert.

This diff refactors the code that analyzes jump table contents.
Preliminary and final passes share the same code. The only difference
should be the detection of instruction boundaries that are available
during the final pass.

This should affect strict relocation mode only.

(cherry picked from FBD16923335)
2019-08-19 14:06:36 -07:00
Maksim Panchenko
8d5854ef09 [BOLT] Add option to verify instruction encoder/decoder
Summary:
Add option `-check-encoding` to verify if the input to LLVM disassembler
matches the output of the assembler. When set, the verification runs on
every instruction in processed functions.

I'm not enabling the option by default as it could be quite noisy on x86
where instruction encoding is ambiguous and can include redundant
prefixes.

(cherry picked from FBD16595415)
2019-07-31 16:03:49 -07:00
Maksim Panchenko
a9b9aa1e02 [BOLT] Add code padding verification
Summary:
In non-relocation mode, we allow data objects to be embedded in the
code. Such objects could be unmarked, and could occupy an area between
functions, the area which is considered to be code padding.

When we disassemble code, we detect references into the padding area
and adjust it, so that it is not overwritten during the code emission.
We assume the reference to be pointing to the beginning of the object.

However, assembly-written functions may reference the middle of an
object and use negative offsets to reference data fields. Thus,
conservatively, we reduce the possibly-overwritten padding area to
a minimum if the object reference was detected.

Since we also allow functions with unknown code in non-relocation mode,
it is possible that we miss references to some objects in code.
To cover such cases, we need to verify the padding area before we
allow to overwrite it.

(cherry picked from FBD16477787)
2019-07-23 20:48:41 -07:00
laith sakka
744a2417dd Run findSubprograms in preprocessDebugInfo in parallel
Summary:
While reading debug info the function findSubprograms
runs on each compilation unit. This diff parallelize that loop
reducing its runtime duration by 70%.

(cherry picked from FBD16362867)
2019-07-17 20:54:53 -07:00
laith sakka
9977b03fea Run reorder blocks in parallel
Summary:
This diff change reorderBasicBlocks pass to run in parallel,
it does so by adding locks to the fix branches function,
and creating temporary MCCodeEmitters when estimating basic block code size.

(cherry picked from FBD16161149)
2019-07-08 12:32:58 -07:00
Rafael Auler
1169f1fdd8 [BOLT] Support duplicating jump tables
Summary:
If two indirect branches use the same jump table, we need to
detect this and duplicate dump tables so we can modify this CFG
correctly. This is necessary for instrumentation and shrink wrapping.
For the latter, we only detect this and bail, fixing this old known
issue with shrink wrapping.

Other minor changes to support better instrumentation: add an option
to instrument only hot functions, add LOCK prefix to instrumentation
increment instruction, speed up splitting critical edges by avoiding
calling recomputeLandingPads() unnecessarily.

(cherry picked from FBD16101312)
2019-07-02 16:56:41 -07:00
Rafael Auler
8880969ced [BOLT] Restrict creation of jump tables
Summary:
Heuristic that creates a jump table for every memory access,
including those we do not match against a pattern in an indirect jump,
is too permissive and has false positives. Guard this logic under
strict mode until we figure out a better strategy.

(cherry picked from FBD16192205)
2019-07-10 15:41:34 -07:00
Maksim Panchenko
e89ad0db4b [BOLT] Introduce strict relocation mode
Summary:
In strict relocation mode we rely on relocations to represent all
possible entry points into a function. Most of the code generated by
tested compilers (gcc and clang) will result in relocations against
any internal labels for jump tables and for computed goto tables.

In situations where we cannot properly reconstruct a jump table, or when
we cannot determine a table that guides an indirect jump, e.g. when
multiple computed goto tables are used, we conservatively assume that
the indirect jump can end up at any possible basic block referenced by
relocations.

In strict mode, simple functions may include the aforementioned
instructions with unknown control flow with a conservative list of
destinations added to the containing basic block. This allows us to
expand coverage of simple functions and to enable code reordering
optimizations for more functions.

The strict mode is recommended when BOLT is used with a well-formed
code generated by a compiler.

To use the strict mode, add "-strict" on the command line.

Another effect of this diff, is that with relocations, we will always
replace the immediate operand of an instruction with a symbol if the
relocation exists against this operand.

Also this diff fixes issues with Clang compiled with -fpic.

(cherry picked from FBD15872849)
2019-06-28 09:21:27 -07:00
Maksim Panchenko
06e7a1e059 [BOLT] Ignore false function references
Summary:
A relocation can have an addend that makes it look as the relocated
value is in a different section from the symbol being relocated.
E.g., a relocation against a variable in .rodata could have a negative
offset that will make it look like it is against a symbol in .text
(a section that typically precedes .rodata).

Unless the relocation is against a section symbol, we know
exactly the symbol that is being relocated and there is no issue.
However, when the linker leaves only a section relocation (i.e. a
relocation against a section symbol when a temporary original symbol
gets deleted), we have to guess the relocated symbol, and can falsely
detect a function reference in the case described above.

The fix is to keep a section relocation if the corresponding
relocated value falls into a different section, and to detect and
ignore false function reference.

(cherry picked from FBD16030791)
2019-06-27 03:20:17 -07:00
laith sakka
1ec091e6f5 Parallelize ICF Pass
Summary:
ICF consumes 10-15% of bolt runtime, for HHVM that is around 45 seconds.
this diff perform some parallelization for the pass to make it faster.
A 60% reduction in the ICF runtime  is measured on the parallel version for HHVM.

(cherry picked from FBD15589515)
2019-05-31 16:45:31 -07:00
Maksim Panchenko
9894de0094 [BOLT] Check instruction boundaries while populating jump tables
Summary:
Now that we populate jump tables after all functions are disassembled,
we can check for instruction boundaries corresponding to jump table
entries. No need to delegate this task to postProcessJumpTables().

(cherry picked from FBD15814762)
2019-06-13 15:31:30 -07:00
Maksim Panchenko
9e2ad3f593 [BOLT] Delay populating jump tables
Summary:
During the initial disassembly pass, only identify jump tables
without populating the contents. Later, after all functions have been
disassembled, we have a better idea of jump table boundaries and can do
a better job of populating their entries.

As a result, we no longer have embedded jump tables (i.e. a jump table
that is parter of another jump table). If we ever need to keep
sequential jump tables inseparable during the output, we can always
add such functionality later.

Fixes facebookincubator/BOLT#56.

(cherry picked from FBD15800427)
2019-06-12 18:21:02 -07:00
Maksim Panchenko
fac6a89c23 [BOLT] Better handling of address references
Summary:
We used to handle PC-relative address references differently from direct
address references. As a result, some cases, such as escaped function
label address, were not handled when dealing with absolute (non-PIC)
code. This diff moves processing of an address reference into
BinaryContext::handleAddressRef() which is called for both PIC and
non-PIC code.

(cherry picked from FBD15643535)
2019-06-04 15:30:22 -07:00
Maksim Panchenko
be344c8de7 [BOLT] Refactor handling of interproc refs
Summary:
Move handling of interprocedural references to BinaryContext.

Post-process indirect branches immediately after the CFG is built.

This is almost NFC. Since indirect branches are now post-processed
before the profile data is processed it interferes with the way the
profile data in YAML format is handled.

(cherry picked from FBD15456003)
2019-05-22 11:26:58 -07:00
Maksim Panchenko
fee61231ef [BOLT] Move JumpTable management to BinaryContext
Summary:
Make BinaryContext responsible for creation and management of
JumpTables. This will be used for detection and resolution of jump table
conflicts across functions.

(cherry picked from FBD15196017)
2019-05-02 17:42:06 -07:00
Rafael Auler
4e4d39c21c [BOLT] Update symbols for secondary entry points
Summary:
Update the output ELF symbol table for symbols representing
secondary entry points for functions. Previously, those were left
unchanged in the symtab.

(cherry picked from FBD15010517)
2019-04-18 16:32:22 -07:00
Brian Gesiak
eba1a67730 Fix casting issues on macOS
Summary:
`size_t` is platform-dependent, and on macOS it is defined as
`unsigned long long`. This is not the same type as is used in many calls
to templated functions that expect the same type. As a result, on macOS,
calls to `std::max` fail because a template function that takes
`uint64_t, unsigned long long` cannot be found.

To work around the issue:

* Specify explicit `std::max` and `std::min` functions where necessary,
  to work around the compiler trying (and failing) to find a suitable
  instantiation.
* For lambda return types, specify an explicit return type where necessary.
* For `operator ==()` calls, use an explicit cast where necessary.

(cherry picked from FBD15030283)
2019-04-22 11:27:50 -04:00
Maksim Panchenko
99ef4c90c1 [BOLT] Basic support for split functions
Summary:
This adds very basic and limited support for split functions.
In non-relocation mode, split functions are ignored, while their debug
info is properly updated. No support in the relocation mode yet.

Split functions consist of a main body and one or more fragments.
For fragments, the main part is called their parent. Any fragment
could only be entered via its parent or another fragment.

The short-term goal is to correctly update debug information for split
functions, while the long-term goal is to have a complete support
including full optimization. Note that if we don't detect split
bodies, we would have to add multiple entry points via tail calls,
which we would rather avoid.

Parent functions and fragments are represented by a `BinaryFunction`
and are marked accordingly. For now they are marked as non-simple, and
thus only supported in non-relocation mode. Once we start building a
CFG, it should be a common graph (i.e. the one that includes all
fragments) in the parent function.

The function discovery is unchanged, except for the detection of
`\.cold\.` pattern in the function name, which automatically marks the
function as a fragment of another function.

Because of the local function name ambiguity, we cannot rely on the
function name to establish child fragment and parent relationship.
Instead we rely on disassembly processing.

`BinaryContext::getBinaryFunctionContainingAddress()` now returns a
parent function if an address from its fragment is passed.

There's no jump table support at the moment. Jump tables can have
source and destinations in both fragment and parent.

Parent functions that enter their fragments via C++ exception handling
mechanism are not yet supported.

(cherry picked from FBD14970569)
2019-04-16 10:24:34 -07:00
Maksim Panchenko
a8e05d067d [BOLT] Add interface to extract values from static addresses
(cherry picked from FBD14858028)
2019-04-09 12:29:40 -07:00
Maksim Panchenko
7d89b113d8 [BOLT][NFC] Indentation fix
(cherry picked from FBD14856700)
2019-04-09 11:31:45 -07:00