Commit Graph

711 Commits

Author SHA1 Message Date
Rafael Auler
f91d121eee [BOLT] Add option to tag version
Summary:
Add a dummy option in BOLT to allow us to write any string in
the bolt info section. This is accomplished since we record the complete
argv vector. This string used to tag this binary with any ID that can
later be associated with a specific BOLT invocation.

(cherry picked from FBD21441902)
2020-05-06 17:31:25 -07:00
Maksim Panchenko
689447bf10 [BOLT] Change .debug_line emission for non-simple functions
Summary:
We use a special routine to emit line info for functions that we do not
overwrite. The resulting DWARF was not quite efficient as we were
advancing addresses using a larger than needed opcodes. Since there were
only a few functions that we didn't emit/overwrite, it was not a big
issue.

However, in lite mode the majority of functions are not overwritten and
as a result, the inefficiency in debug line encoding got exposed and
binaries were getting larger than expected .debug_line sections.

Fix it by using more conventional line table opcodes for address
advancing.

(cherry picked from FBD21423074)
2020-05-05 23:56:50 -07:00
Maksim Panchenko
96c4168ddc [BOLT] Ignore kernel interrupts by default
(cherry picked from FBD21431563)
2020-05-06 11:52:16 -07:00
Xun Li
7b61bdf8ea Check runtime lib format within archiver
Summary: We only support linking ELF runtime library right now. If the library is an archiver, we check that each individual library inside the archiver is an ELF library.

(cherry picked from FBD21388672)
2020-05-04 13:57:21 -07:00
Maksim Panchenko
924d0bdb08 [BOLT] Introduce lite processing mode without relocations
Summary:
When optimizing a binary without relocations, we can skip processing
functions without profile (cold functions). By skipping processing of
cold functions, we reduce the processing time and memory. We call
such mode a lite mode, and it is enabled by default.

Some processing is still done for functions without profile even in lite
mode. scanExternalRefs() function is used to detect secondary entry
points to functions that are not marked in the symbol table.

Note that the no-relocation requirement is a temporary limitation
of the lite mode.

(cherry picked from FBD21366567)
2020-05-03 15:49:58 -07:00
Maksim Panchenko
04c5d4fcab [BOLT] Introduce isIgnored() function attribute
Summary:
Whenever a function is not meant for processing, e.g. when the user
requests to optimize only a subset of functions, mark the function as
ignored. Use this attribute instead of opts::shouldProcess().

(cherry picked from FBD21374806)
2020-05-03 13:54:45 -07:00
Maksim Panchenko
4e69764c65 [BOLT] Fix dyno stats after ICF in non-reloc mode
Summary:
The commit that fixed ICF determinism in non-relocation mode disabled
profile merging for functions. Dyno stats output needs to be updated to
reflect the lack of merge.

(cherry picked from FBD21366046)
2020-05-01 17:51:43 -07:00
Maksim Panchenko
b62a1774af [BOLT] Cover PIC jump table reference in non-strict mode
Summary:
In non-strict relocation mode it was possible to miss a jump table
reference leading to incorrect code.

(cherry picked from FBD21251467)
2020-04-26 17:51:07 -07:00
Maksim Panchenko
ac36e17a73 [BOLT][BFC] Refactor code for adding secondary function entries
Summary:
In non-relocation mode, the code for marking a function non-simple was
decoupled from the code that added new entry points.  Fix that.

(cherry picked from FBD21264247)
2020-04-27 13:40:53 -07:00
Maksim Panchenko
5296b6d12a [BOLT] Change symbol handling for secondary function entries
Summary:
Some functions could be called at an address inside their function body.
Typically, these functions are written in assembly as C/C++ does not
have a multi-entry function concept. The addresses inside a function
body that could be referenced from outside are called secondary entry
points.

In BOLT we support processing functions with secondary/multiple entry
points. We used to mark basic blocks representing those entry points
with a special flag. There was only one problem - each basic block has
exactly one MCSymbol associated with it, and for the most efficient
processing we prefer that symbol to be local/temporary. However, in
certain scenarios, e.g. when running in non-relocation mode, we need
the entry symbol to be global/non-temporary.

We could create global symbols for secondary points ahead of time when
the entry point is marked in the symbol table. But not all such entries
are properly marked. This means that potentially we could discover an
entry point only after disassembling the code that references it, and
it could happen after a local label was already created at the same
location together with all its references. Replacing the local symbol
and updating the references turned out to be an error-prone process.

This diff takes a different approach. All basic blocks are created with
permanently local symbols. Whenever there's a need to add a secondary
entry point, we create an extra global symbol or use an existing one
at that location. Containing BinaryFunction maps a local symbol of a
basic block to the global symbol representing a secondary entry point.
This way we can tell if the basic block is a secondary entry point,
and we emit both symbols for all secondary entry points. Since secondary
entry points are quite rare, the overhead of this approach is minimal.

Note that the same location could be referenced via local symbol from
inside a function and via global entry point symbol from outside.
This is true for both primary and secondary entry points.

(cherry picked from FBD21150193)
2020-04-19 22:29:54 -07:00
Maksim Panchenko
ac1af09e82 [BOLT][NFC] Change wording while reporting functions stats
Summary:

(cherry picked from FBD21242167)
2020-04-24 16:36:22 -07:00
Maksim Panchenko
fbca177a83 [BOLT] Speedup PLT processing
Summary:
With larger PLT sizes, linear PLT symbol name lookup becomes a
bottleneck.

(cherry picked from FBD21223695)
2020-04-23 21:29:10 -07:00
Maksim Panchenko
0ea98d1f0b [BOLT] Option to fail if invalid profile detected
Summary:
Add an option to fail processing of the input binary if the profile
is not accurate:

  -stale-threshold=<uint>
    - maximum percentage of stale functions to tolerate (default: 100)

Default (100) means never to fail.

A function profile is considered stale if any branch in its profile
has invalid source or destination.

Use `-stale-threshold=0` to fail if any staleness is detected in the
profile.

(cherry picked from FBD21189036)
2020-04-22 15:09:49 -07:00
Maksim Panchenko
33e0b2aa58 [BOLT] Do not emit old .eh_frame in relocation mode
Summary:
In relocation mode, there is no use for old .eh_frame section. Moreover,
it can interfere with new EH frames via .eh_frame_hdr when the original
.text is reused.

(cherry picked from FBD21120070)
2020-04-19 12:55:43 -07:00
Maksim Panchenko
23edb3ed9c [BOLT] Option to control .text alignment
Summary:
Add option `-align-text=<n>` to control .text alignment within a
segment. Set to page size by default.

(cherry picked from FBD21120063)
2020-04-19 15:02:50 -07:00
Maksim Panchenko
10245b5c5b [BOLT] Emit ICF symbols for large functions
Summary:
In non-relocation mode, make sure we emit extra symbols for a folded
function even if the function was not overwritten due to its large
size.

(cherry picked from FBD21080467)
2020-04-16 00:05:01 -07:00
Maksim Panchenko
606532bdf1 [BOLT] Fix .eh_frame update with ICF in non-relocation mode
Summary:
In a rare case, we may fold a function and fail to emit it in
non-relocation mode due to a function size increase. At the same time,
the function that the original function was folded into could have been
successfully emitted, e.g. because it was split in the presence of a
profile information.

Later, because the function was not emitted, we have to use its original
.eh_frame entry in the preserved .eh_frame section. However, that entry
is no longer referencing the original function, but the function that
the original was folded into. This happens since the original symbol gets
emitted at the other function location. As a result, .eh_frame entry for
the folded function is missing.

To prevent incorrect update of the original .eh_frame, create
relocations against absolute values. This guarantees preservation of the
section contents while updating pc-relative references.

(cherry picked from FBD21061130)
2020-04-16 00:02:35 -07:00
Maksim Panchenko
1be7a82540 [BOLT] Speedup RTDyld external symbol resolution
Summary:
RuntimeDyldImpl::resolveExternalSymbols() some time ago used to call
getSymbolAddress() while in the second loop. That call could have
modified the contents of ExternalSymbolRelocations that the loop was
iterating over. Thus the code was written in a way that erased the
processed entry on every loop iteration and reset the map
iterator. With large number of entries in ExternalSymbolRelocations the
loop code becomes a performance bottleneck.

Since getSymbolAddress() is no longer used, the
ExternalSymbolRelocations could be iterated in a straightforward way and
the map cleared before the function exit.

(cherry picked from FBD21057058)
2019-11-11 13:29:46 -08:00
Rafael Auler
6dbd15bc01 [BOLT-X86] Fix instrumentation issue with indirect calls
Summary:
Indirect calls that use RSP to compute the target address would
break in instrumentation mode because we were adding instructions that
changed the stack pointer. Fix this.

(cherry picked from FBD20883791)
2020-04-06 17:38:11 -07:00
Maksim Panchenko
401fa5b493 [BOLT] Further speedup ICF
Summary:
Further speedup ICF by applying stricter rules for congruent functions.
While checking symbolic operands in congruent functions, consider
operands congruent only if they are equal or reference functions
with identical hashes, i.e. potentially foldable functions.
Note that jump table operands are handled as a special case.

(cherry picked from FBD20912054)
2020-04-07 22:10:12 -07:00
Maksim Panchenko
ee0371ad97 [BOLT] Speedup ICF by better function hashing
Summary:
Too many hash collisions may cause ICF to run slowly.

We used to hash BinaryFunction only looking at instruction opcodes,
ignoring instruction operands. With many almost identical functions,
such approach may lead to long ICF processing time. By including
operands into the hash, we reduce the number of collisions and
improve the runtime often by a factor of 2 or more.

(cherry picked from FBD20888957)
2020-04-07 00:21:37 -07:00
Maksim Panchenko
abda7dc6a7 [BOLT] Fix ICF non-determinism in non-relocation mode
Summary:
ICF may fold functions in arbitrary order when running multi-threaded.
This is fine in relocation mode as we end up with just one function
holding all function symbols.

However, in non-relocation mode we keep all function bodies, and if we
keep merging profiles in non-deterministic order, we end up with
functions with non deterministic profiles. The fix for non-relocation
mode is to not merge profiles as the factual new profile could be
different from the merged one since both function instances are
potentially callable.

Additionally, emit extra symbols for ICF functions in non-relocation
mode to make it possible to track the folding.

(cherry picked from FBD20889866)
2020-04-04 20:12:38 -07:00
Maksim Panchenko
b08d82d91b [BOLT] Verify exceptions action table equivalence in ICF
Summary:
Some functions may have exactly the same code and exception handlers.
However, their action tables could be different leading to mismatching
semantics. We should verify their equivalence while running ICF.

(cherry picked from FBD20889035)
2020-03-30 19:08:24 -07:00
Maksim Panchenko
58b0d9e7b0 [BOLT][DWARF] Add support for base address in DWARF location lists
Summary:
The version of LLVM that we are based on lacks the support for base
address in DWARF location lists. Add the missing pieces.

(cherry picked from FBD20640784)
2020-03-24 22:05:37 -07:00
Maksim Panchenko
bbbf679b42 [BOLT] Refactor ELF symbol table rewriting code
Summary:
Make ELF symbol table rewriting code more structured. While at it,
remove symbols from non-allocatable sections.

(cherry picked from FBD20243386)
2020-02-26 20:43:18 -08:00
Maksim Panchenko
a07f1a26e7 [BOLT] Refactor section prefixes
(cherry picked from FBD20400886)
2020-03-11 15:51:32 -07:00
Maksim Panchenko
1f3e351a9c [BOLT] Refactor code and data emission code
Summary:
Consolidate code and data emission code in ELF-independent
BinaryEmitter. The high-level interface includes only two
functions emitBinaryContext() and emitFunctionBody() used
by RewriteInstance and BinaryContext respectively.

(cherry picked from FBD20332901)
2020-03-06 15:06:37 -08:00
Maksim Panchenko
74a2777c54 [BOLT] Refactor ELF parts of instrumentation code
Summary:
This is a prerequisite for larger emitter refactoring.

Since .dynamic is read unconditionally, add an error message if the
section is missing, or the size of the section is zero.

(cherry picked from FBD20331735)
2020-03-08 19:04:39 -07:00
Maksim Panchenko
af553124d3 [BOLT] Refactor emission of original .eh_frame
Summary:
There is no need to treat the emission of the original `.eh_frame`
section as a special case.

(cherry picked from FBD20323360)
2020-03-07 11:19:09 -08:00
Alexander Shaposhnikov
e3654fc274 [BOLT] Uniquify names of local symbols
Summary:
1. Uniquify names of local symbols.
2. Handle aliases.

(cherry picked from FBD20270196)
2020-03-04 18:36:44 -08:00
Alexander Shaposhnikov
842a25f785 [BOLT] Mark functions containing data as non-simple
Summary: Temporarily mark functions containing data as non-simple.

(cherry picked from FBD20213279)
2020-03-02 22:41:12 -08:00
Maksim Panchenko
cb9c991dcb [BOLT] Remove allow-section-relocations option
Summary: The option is not used. Remove all related code.

(cherry picked from FBD20237859)
2020-03-03 15:51:24 -08:00
Maksim Panchenko
c7e012e145 [BOLT][NFC] Get rid of BestFit parameter
Summary: The parameter is no longer used.

(cherry picked from FBD20236516)
2020-03-03 14:28:42 -08:00
Alexander Shaposhnikov
b0cbb60165 [BOLT] Fix begin decrementing
Summary: Fix begin decrementing.

(cherry picked from FBD20232474)
2020-03-03 13:36:32 -08:00
Maksim Panchenko
d89bb53afa [BOLT][NFC] Factor out relocation processing
(cherry picked from FBD20087297)
2020-02-24 17:10:02 -08:00
Rafael Auler
340da8f294 [BOLT] Fix shrink wrapping to check pops
Summary:
Shrink wrapping has a mode where it will directly move push
pop pairs, instead of replacing them with stores/loads. This is an
ambitious mode that is triggered sometimes, but whenever matching with
a push, it would operate with the assumption that the restoring
instruction was a pop, not a load, otherwise it would assert. Fix this
assertion to bail nicely back to non-pushpop mode (use regular store and
load instructions).

(cherry picked from FBD20085905)
2020-02-18 16:00:40 -08:00
Maksim Panchenko
2df4e7b99e [BOLT][NFC] Minor refactoring of RewriteInstance
(cherry picked from FBD20087424)
2020-02-24 17:12:41 -08:00
Maksim Panchenko
495761dc70 [BOLT][NFC] Remove unused BinarySection member functions
(cherry picked from FBD20087243)
2020-02-24 16:56:45 -08:00
Maksim Panchenko
3b45212e84 [BOLT] Delete ExecutableFileMemoryManager::registerNoteSection()
Summary: The interface is no longer in use.

(cherry picked from FBD20070558)
2020-02-24 09:40:32 -08:00
Alexander Shaposhnikov
01b7c90242 [BOLT] Add missing override
Summary: Add missing override in X86MCPlusBuilder.cpp.

(cherry picked from FBD20064222)
2020-02-23 22:27:28 -08:00
Maksim Panchenko
be43f89c4f [BOLT][llvm] Update llvm.patch
Summary:

(cherry picked from FBD20063562)
2020-02-23 19:51:33 -08:00
Alexander Shaposhnikov
76aa1c26aa [BOLT] Enable reversing the order of basic blocks
Summary: Enable reversing the order of basic blocks.

(cherry picked from FBD19943692)
2020-02-17 13:35:09 -08:00
Alexander Shaposhnikov
4ad5048393 [BOLT] Add first bits to build CFG
Summary: Add first bits to build CFG.

(cherry picked from FBD19943472)
2020-02-17 12:18:42 -08:00
Alexander Shaposhnikov
5b64bf2128 [BOLT] Disassemble functions from a MachO binary
Summary: Add first bits to disassemble functions from a MachO binary.

(cherry picked from FBD19900493)
2020-02-11 14:30:33 -08:00
Rafael Auler
a9d85413ac [BOLT] Emit long nops by default
Summary:
Change our X86 target to use long nops by default. In general,
BOLT does not put nops into the instruction stream that is going to be
executed, since it doesn't align basic blocks, only functions. Since we
rebased BOLT, our relationship with MCAssembler changed because it
stopped using multibyte nops and we never needed to revisit that. But it
makes a difference if we want to mitigate perf issues with the Intel
JCC erratum, since the nops inserted are going to be decoded and
executed. To make MCAssembler emit long nops again, we need to explictly
set mattr (Features) of the X86 target.

(cherry picked from FBD19987277)
2020-02-19 16:13:58 -08:00
Maksim Panchenko
9711286858 [BOLT] Get rid of BinarySection::IsLocal
Summary: The flag is no longer used/needed.

(cherry picked from FBD19951571)
2020-02-18 09:20:17 -08:00
Alexander Shaposhnikov
16630f5c58 [BOLT] Factor out NameResolver from RewriteInstance
Summary: Factor out the helper class NameResolver from the class RewriteInstance.

(cherry picked from FBD19943916)
2020-02-17 14:37:46 -08:00
Alexander Shaposhnikov
754b6569f6 [BOLT] Add missing std::move
Summary: Add missing std::move in the method BinaryFunction::addAlternativeName

(cherry picked from FBD19944661)
2020-02-17 17:53:12 -08:00
Alexander Shaposhnikov
36cf37c4c1 [BOLT] Add initial bits for parsing MachO files
Summary: Start adding initial bits for MachO, this diff contains some small preparations for finding functions inside a MachO binary, this will be done in the next diff. The concept of a section in the MachO world is quite different from ELF, nevertheless, for functions for now it more or less fits into the current picture (in BOLT), but things will diverge more significantly a bit later.

(cherry picked from FBD19648161)
2020-01-30 13:10:48 -08:00
Rafael Auler
58a129a602 [BOLT] Move peepholes pass after sctc
Summary:
There are two peephole subpasses, remove-double-jumps and
remove-useless-conditional-branches, that operates by reading branches
directly, which makes them tricky to run before fix-branches. In the
case of remove-double-jumps, it will even lead to suboptimal code if
the patched branch was going to be removed by fix-branches when the
target is the fall-through. If the final target is a tail call, it will
lead to a broken CFG in the worst case. Fix this by moving these passes
after SCTC, which already produces CFGs with conditional tail calls.

(cherry picked from FBD18795592)
2019-12-03 12:28:22 -08:00