Commit Graph

122 Commits

Author SHA1 Message Date
Amir Ayupov
12e9fec697 Rebase: [BOLT] DebugFission Support
Summary:
Implemented support for Debug Fission.
For the most part it doesn't impact Monolithic execution path.
One area that was changed is the DW_AT_low_pc/DW_AT_high_pc conversion. Before it was to DW_AT_ranges/DW_AT_low_pc, now DW_AT_low_pc is kept in same place.
Another more visible impact is in Skeleton CU the DW_AT_low_pc is replaced with DW_AT_ranges_base if it's not originally present and bolt converted ranges conversion inside the dwo units.

Output of this are multiple .dwo files with updated debug information.

(cherry picked from FBD29569788)
2021-04-01 11:43:00 -07:00
Maksim Panchenko
ba6fdb8113 [BOLT] Preserve original jump table relocations
Summary:
Remove relocations against internal function labels, e.g. jump table
relocations, only when overwriting them.

While reading an input file with relocations, we create internal
relocations against code references (we skip PIC relocations).
Later, when we discover jump tables, we remove corresponding relocations
with the assumption that original relocations will either be ignored or
replaced by new relocations. However, it is possible to miss some
references to the jump table, in which case the original entries will
not be ignored. While such situation is abnormal, it is still a
better/safer approach to preserve relocations if we are not replacing
them with new ones.

(cherry picked from FBD28406628)
2021-05-12 23:35:10 -07:00
Maksim Panchenko
fe37f1870e [BOLT][NFC] Follow LLVM variable initialization style
(cherry picked from FBD28417604)
2021-05-13 10:50:47 -07:00
Alexey Moksyakov
ce84e9607a [PR] Fix bb reordering optimization
Summary:
Reorder-blocks optimization pass doesn't take into account that
available offset for legacy Jcc instructions (for example,
JRCXZ - operand 8 bits) has to be less than 255 bytes.
It's rare case and to exclude such functions with unsupported
instructions from optimization passes added extra checking

Alexey Moksyakov
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28264117)
2021-04-23 11:34:40 +03:00
Amir Ayupov
eb99a6665c Rebase: [BOLT][NFC] Remove unneeded includes with include-what-you-use
Summary:
Ran iwyu multiple times, manually picked header remove lines.
Reached fixed point wrt removal: iwyu doesn't automatically remove
any more headers or forward declarations.

(cherry picked from FBD29569221)
2021-04-30 13:54:02 -07:00
Amir Ayupov
c7306cc219 Rebase: [BOLT][NFC] Expand auto types
Summary:
Expanded auto types across BOLT semi-automatically with the aid
of clangd LSP

(cherry picked from FBD33289309)
2021-04-08 00:19:26 -07:00
Maksim Panchenko
e7169be93f [BOLT] Do not assert on jump table heuristic failure
Summary:
During the initial indirect jump analysis, we used to assert that the
discovered jump table type matched the pattern of the corresponding
instruction sequence. E.g., for PIC jump table memory we expected the
PIC jump table instruction sequence. The assertions were too
conservative, as in the case of a mismatch we can mark the indirect jump
as having an unknown control flow. That should be sufficient to either
skip the function processing or rely on relocation information for
possible recovery of the control flow.

(cherry picked from FBD27255816)
2021-03-23 13:41:41 -07:00
Rafael Auler
b3c34d568a [BOLT] Fix instrumentation bug in duplicated JTs
Summary:
Fix a bug with instrumentation when trying to instrument
functions that share a jump table with multiple indirect
jumps. Usually, each indirect jump that uses a JT will have its own
copy of it. When this does not happen, we need to duplicate the jump
table safely, so we can split the edges correctly (each copy of the
jump table may have different split edges). For this to happen, we
need to correctly match the sequence of instructions that perform the
indirect jump to identify the base address of the jump table and patch
it to point to the new cloned JT. It was reported to us a case in
which the compiler generated suboptimal code to do an indirect jump
which our matcher failed to identify.

Fixes facebookincubator/BOLT#126

(cherry picked from FBD27065579)
2021-03-15 16:34:25 -07:00
Rafael Auler
16521f1f79 [BOLT] Update license headers
Summary: Update license and fix headers for some files.

(cherry picked from FBD28112041)
2021-03-15 18:04:18 -07:00
Amir Ayupov
1c5d3a056c Rebase: Merge BOLT codebase in monorepo
Summary:
This commit is the first step in rebasing all of BOLT
history in the LLVM monorepo. It also solves trivial build issues
by updating BOLT codebase to use current LLVM. There is still work
left in rebasing some BOLT features and in making sure everything
is working as intended.

History has been rewritten to put BOLT in the /bolt folder, as
opposed to /tools/llvm-bolt.

(cherry picked from FBD33289252)
2020-12-01 16:29:39 -08:00
Rafael Auler
e3898d5969 [BOLT] Add threshold options for lite mode
Summary:
Add options for trading processing speed for binary performance.

  -lite-threshold-pct=<uint>
    Threshold (in percent) for selecting functions to process in lite
    mode. Higher threshold means fewer functions to process.
    E.g threshold of 90 means only top 10 percent of functions with
    profile will be processed.

  -lite-threshold-count=<uint>
    Similar to '-lite-threshold-pct' but specify threshold using
    absolute function call count. I.e. limit processing to functions
    executed at least the specified number of times.

  -no-scan
    Do not scan cold functions for external references (may result in
    slower binary).

(cherry picked from FBD24739092)
2020-12-30 12:23:58 -08:00
Amir Ayupov
157129b751 [BOLT] Debug logging in analyzeJumpTable
Summary:
Added debug logging in/around `analyzeJumpTable`:
- Dump jump table entries as they are being processed:
```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2)
  * Checking 0x428ff40 -> OK: real entry
  * Checking 0x428ff44 -> OK: real entry
  * Checking 0x428ff48 -> OK: real entry
  * Checking 0x428ff4c -> OK: real entry
  * Checking 0x428ff50 -> OK: real entry
  * Checking 0x428ff54 -> OK: address in split fragment
  * Checking 0x428ff58 -> OK: address in split fragment
  * Checking 0x428ff5c -> OK: address in split fragment
  * Checking 0x428ff60 -> OK: address in split fragment
  * Checking 0x428ff64 -> OK: real entry
  * Checking 0x428ff68 -> OK: real entry
  * Checking 0x428ff6c -> OK: real entry
  * Checking 0x428ff70 -> OK: real entry
BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2)
  * Checking 0x428ff74 -> OK: real entry
  ...
```
- Dump skipped functions:
```
Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2)
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2)
Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode
Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2)
```
- Dump values of unclaimed PC-relative relocations in data.

(cherry picked from FBD24898172)
2020-11-12 11:54:38 -08:00
Amir Ayupov
c36b71686c Improve cold fragment name matching
Summary:
Fix cold fragment name matching regex by replacing existing
regexes `.*\.cold\..*` and  `.*\.cold`
and combining them into `.*\.cold(\.\d)?`,
applied to restored name (with BOLT-added suffixes stripped)

This allows matching names like "execute_stack_op.cold/1", which
previously weren't recognized.

(cherry picked from FBD24804880)
2020-11-09 12:38:51 -08:00
Maksim Panchenko
f15532c2aa [BOLT][DWARF] Streamline processing of DWARF unit DIEs
Summary:
Do not store processed DWARF DIEs, but instead process them while
reading one at a time.

Reduces memory consumption when updating debug info by 10%-25%.

(cherry picked from FBD24327029)
2020-10-16 00:11:24 -07:00
Maksim Panchenko
53bd88c7fe [BOLT] Refactor reading of debug line info
Summary:
Match BinaryFunction to a DWARFUnit based on the unit's address ranges
skipping the parsing of DIEs.

(cherry picked from FBD24269325)
2020-10-12 21:04:42 -07:00
Maksim Panchenko
0465d952cc [BOLT] Refactor PatchEntries pass
Summary:
Use injected functions with fixed addresses to patch original function
entries.

(cherry picked from FBD24133890)
2020-10-09 16:06:27 -07:00
Amir Ayupov
d1ec11b28f postProcessEntryPoints: return after setIgnored and setSimple are set
Summary:
This patch fixes the assertion failure during instrumentation.

The assertion is raised by `getInstructionAtOffset` , which expects `CurrentState` to be either `Disassembled` or `CFG`.

The function is called from `postProcessEntryPoints`, which goes over Labels and performs a series of checks. The checks call BinaryFunction methods `setSimple(false)` or `setIgnored()`.
However, if `setIgnored` is invoked, it resets the state to `Empty`. Thus subsequent call to `getInstructionAtOffset` will fail.

(cherry picked from FBD24005197)
2020-09-29 19:37:47 -07:00
Maksim Panchenko
a82cff0f52 [BOLT] Eliminate "shallow" function lookup
Summary:
Whenever we search for a function based on its address in the input
binary, we now always return a corresponding fragment for split
functions. If the user needs an access to the main fragment, they can
call getTopmostFragment().

(cherry picked from FBD23670311)
2020-09-14 15:48:32 -07:00
takh
0033a7612d Linux kernel marker to update special sections
Summary: This diff adds SDT marker like LK marker to update special lk sections

(cherry picked from FBD22932157)
2020-08-04 13:50:00 -07:00
Rafael Auler
6c8fc28892 Revert "[BOLT] Add the FeatureMiner pass to extract Calder's features."
This reverts commit 2476f46af02ccce04e9ed456462dd098460e4e1f.

Reviewed By: maks

(cherry picked from FBD28111787)
2020-07-16 17:35:55 -07:00
Rafael Auler
170f73ac9e [BOLT] Fix fix-branches in presence of JRCXZ and friends
Summary:
Do not fail/assert when trying to reorder blocks that terminate
with JRCXZ/JECXZ/LOOP instructions. We cannot invert the condition of
these instructions, so just treat them accordingly in fixBranches().

(cherry picked from FBD22487107)
2020-07-15 23:02:58 -07:00
Angélica Moreira
181327d763 [BOLT] Add the FeatureMiner pass to extract Calder's features.
(cherry picked from FBD19844247)
2020-07-07 23:01:22 -07:00
Maksim Panchenko
13baf47a3c [BOLT] Add '-force-patch' to forcefully patch old entries
Summary:
The option is useful for debugging.

Also, print personality function when dumping a function.

(cherry picked from FBD22169482)
2020-06-22 13:08:28 -07:00
Maksim Panchenko
0403adde32 [BOLT] Fixes for scanExternalRefs()
Summary:
In my previous commit, I've accidentally reverted the condition while
evaluating a branch target.

Also, do not emit instruction for relocation purposes in
scanExternalRefs() if there was no TargetSymbol set and we have not
produced new relocations.

(cherry picked from FBD22169317)
2020-06-22 12:50:49 -07:00
Maksim Panchenko
8374e8e3fe [BOLT] Properly register symbols at secondary entry points
Summary:
We may end up with a secondary entry symbol set to zero if there was no
symbol in the input file at the entry point address, and if we skipped
the function emission, e.g. if it was ignored. In that case, the symbol
should be properly initialized with a proper address.

(cherry picked from FBD22169167)
2020-06-22 12:37:48 -07:00
Maksim Panchenko
15fffe2824 [BOLT] Fix memory error
Summary: Fix for double-free I've introduced earlier.

(cherry picked from FBD22132595)
2020-06-18 20:59:01 -07:00
Maksim Panchenko
db4642d0a6 [BOLT] Support -hot-text in lite mode
Summary: Update special symbol references in functions that are not emitted.

(cherry picked from FBD22120995)
2020-06-18 11:10:41 -07:00
Maksim Panchenko
e7c3464226 [BOLT] Disable trapping on AVX-512 by default
Summary:

(cherry picked from FBD22118562)
2020-06-18 09:55:05 -07:00
Maksim Panchenko
0ce0bce9e7 [BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.

In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.

Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.

The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.

If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.

'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.

With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.

(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
Alexander Shaposhnikov
cd067ae1e8 Emit functions on MachO
Summary: Start emitting  functions (for MachO input binaries).

(cherry picked from FBD21721586)
2020-05-26 04:21:04 -07:00
Maksim Panchenko
8729171182 [BOLT] Refactor profile-handling code
Summary:
This diff handles several issues related to profile reading and
handling:
  * Unifies interface used by 3 profile readers in ProfileReaderBase.
  * Adds automatic detection of the profile file contents.
  * Removes reader-specific fields from BinaryFunction and BinaryData.
    All the information is stored in instruction annotations.
  * Removes implicit memory dependencies in annotations on profile
    reader instance.
  * Adds lite mode support to YAML reader.
  * Moves profile reading code out of BinaryFunction.

(cherry picked from FBD21601411)
2020-05-07 23:00:29 -07:00
Maksim Panchenko
cce49b9522 [BOLT] Remove StringRef from IndirectCallProfile
Summary:
IndirectCallProfile was holding to a StringRef from a profile reader
providing an implicit dependency on the reader.

(cherry picked from FBD21587101)
2020-05-14 17:34:20 -07:00
Maksim Panchenko
924d0bdb08 [BOLT] Introduce lite processing mode without relocations
Summary:
When optimizing a binary without relocations, we can skip processing
functions without profile (cold functions). By skipping processing of
cold functions, we reduce the processing time and memory. We call
such mode a lite mode, and it is enabled by default.

Some processing is still done for functions without profile even in lite
mode. scanExternalRefs() function is used to detect secondary entry
points to functions that are not marked in the symbol table.

Note that the no-relocation requirement is a temporary limitation
of the lite mode.

(cherry picked from FBD21366567)
2020-05-03 15:49:58 -07:00
Maksim Panchenko
04c5d4fcab [BOLT] Introduce isIgnored() function attribute
Summary:
Whenever a function is not meant for processing, e.g. when the user
requests to optimize only a subset of functions, mark the function as
ignored. Use this attribute instead of opts::shouldProcess().

(cherry picked from FBD21374806)
2020-05-03 13:54:45 -07:00
Maksim Panchenko
ac36e17a73 [BOLT][BFC] Refactor code for adding secondary function entries
Summary:
In non-relocation mode, the code for marking a function non-simple was
decoupled from the code that added new entry points.  Fix that.

(cherry picked from FBD21264247)
2020-04-27 13:40:53 -07:00
Maksim Panchenko
5296b6d12a [BOLT] Change symbol handling for secondary function entries
Summary:
Some functions could be called at an address inside their function body.
Typically, these functions are written in assembly as C/C++ does not
have a multi-entry function concept. The addresses inside a function
body that could be referenced from outside are called secondary entry
points.

In BOLT we support processing functions with secondary/multiple entry
points. We used to mark basic blocks representing those entry points
with a special flag. There was only one problem - each basic block has
exactly one MCSymbol associated with it, and for the most efficient
processing we prefer that symbol to be local/temporary. However, in
certain scenarios, e.g. when running in non-relocation mode, we need
the entry symbol to be global/non-temporary.

We could create global symbols for secondary points ahead of time when
the entry point is marked in the symbol table. But not all such entries
are properly marked. This means that potentially we could discover an
entry point only after disassembling the code that references it, and
it could happen after a local label was already created at the same
location together with all its references. Replacing the local symbol
and updating the references turned out to be an error-prone process.

This diff takes a different approach. All basic blocks are created with
permanently local symbols. Whenever there's a need to add a secondary
entry point, we create an extra global symbol or use an existing one
at that location. Containing BinaryFunction maps a local symbol of a
basic block to the global symbol representing a secondary entry point.
This way we can tell if the basic block is a secondary entry point,
and we emit both symbols for all secondary entry points. Since secondary
entry points are quite rare, the overhead of this approach is minimal.

Note that the same location could be referenced via local symbol from
inside a function and via global entry point symbol from outside.
This is true for both primary and secondary entry points.

(cherry picked from FBD21150193)
2020-04-19 22:29:54 -07:00
Maksim Panchenko
606532bdf1 [BOLT] Fix .eh_frame update with ICF in non-relocation mode
Summary:
In a rare case, we may fold a function and fail to emit it in
non-relocation mode due to a function size increase. At the same time,
the function that the original function was folded into could have been
successfully emitted, e.g. because it was split in the presence of a
profile information.

Later, because the function was not emitted, we have to use its original
.eh_frame entry in the preserved .eh_frame section. However, that entry
is no longer referencing the original function, but the function that
the original was folded into. This happens since the original symbol gets
emitted at the other function location. As a result, .eh_frame entry for
the folded function is missing.

To prevent incorrect update of the original .eh_frame, create
relocations against absolute values. This guarantees preservation of the
section contents while updating pc-relative references.

(cherry picked from FBD21061130)
2020-04-16 00:02:35 -07:00
Maksim Panchenko
ee0371ad97 [BOLT] Speedup ICF by better function hashing
Summary:
Too many hash collisions may cause ICF to run slowly.

We used to hash BinaryFunction only looking at instruction opcodes,
ignoring instruction operands. With many almost identical functions,
such approach may lead to long ICF processing time. By including
operands into the hash, we reduce the number of collisions and
improve the runtime often by a factor of 2 or more.

(cherry picked from FBD20888957)
2020-04-07 00:21:37 -07:00
Maksim Panchenko
58b0d9e7b0 [BOLT][DWARF] Add support for base address in DWARF location lists
Summary:
The version of LLVM that we are based on lacks the support for base
address in DWARF location lists. Add the missing pieces.

(cherry picked from FBD20640784)
2020-03-24 22:05:37 -07:00
Maksim Panchenko
1f3e351a9c [BOLT] Refactor code and data emission code
Summary:
Consolidate code and data emission code in ELF-independent
BinaryEmitter. The high-level interface includes only two
functions emitBinaryContext() and emitFunctionBody() used
by RewriteInstance and BinaryContext respectively.

(cherry picked from FBD20332901)
2020-03-06 15:06:37 -08:00
Alexander Shaposhnikov
c3c4b15a2e [BOLT] Remove BinaryContext::getFunctionData
Summary:
In this diff we refactor the code around getting the original binary encoding of function's body.
The main changes are: remove BinaryContext::getFunctionData, remove the parameter of the method BinaryFunction::disassemble, introduce BinaryFunction::getData.

(cherry picked from FBD19824368)
2020-02-10 15:35:11 -08:00
Rafael Auler
0080d74506 [BOLT] Fix issue with strict and builtin_unreachable
Summary:
In strict mode, a jump table with targets generated by
builtin_unreachable (located at the very end of the function) was
asserting when being recreated by postProcessIndirectBranches. Fix
this.

(cherry picked from FBD19614981)
2020-01-28 18:38:10 -08:00
Maksim Panchenko
ac697b7d3a [BOLT] Replace list of Names with Symbols for BinaryFunction
Summary:
BinaryFunction used to have a list of Names associated with its main
entry point. However, the function is primarily identified by its
corresponding symbol or symbols, and these symbols are available as we
are creating them for a corresponding BinaryData object.

There's also no reason to emit symbols for alternative function names
(aliases), so change the code to only emit needed symbols.

When we emit a cold fragment for a function, only emit one cold symbol
for the fragment instead of one per every main entry symbol/name.

When we match a symbol to an entry point in the function, with this
change we can first go through the list of main entry symbols (now that
they are available).

(cherry picked from FBD19426709)
2020-01-13 11:56:59 -08:00
Rafael Auler
961d3d02d8 [BOLT] Move postProcessEntryPoints after disassembly
Summary:
Call postProcessEntryPoints only after all functions have been
disassembled and all interprocedural references have been processed,
when all possible entry points have been accounted for. This makes our
detection of bad entries more robust as it does not depend on the order
of the functions any more.

(cherry picked from FBD19404767)
2020-01-14 17:12:03 -08:00
Maksim Panchenko
088e3c032a [BOLT] Improve handling of secondary function entry points
Summary:
"Fix symbol table entries for secondary entries" diff broke the inliner.

Fix the breakage and make the discovery of secondary entry points more
accurate.

Add ability to BinaryContext::getFunctionForSymbol() to return an entry
point discriminator and use it instead of calling getEntryForSymbol()
and isSecondaryEntry(). This is the preferred way since
getFunctionForSymbol() is thread-safe.

(cherry picked from FBD19295983)
2020-01-06 14:57:15 -08:00
Rafael Auler
de284bc510 [BOLT] Fix symbol table entries for secondary entries
Summary:
Commit "Support full instrumentation" changed the map
SymbolToFunction in BinaryContext to map secondary entries of functions
too. This introduced unexpected behavior in our symbol table rewriting
logic, which caused it to mistakenly write them with the address of the
original function. Fix the behavior of getBinaryFunctionAtAddress to
correct this. Also fix other users of SymbolToFunction to ensure they
are not accidentally using secondary entries when they shouldn't.

(cherry picked from FBD19168319)
2019-12-18 12:14:42 -08:00
Rafael Auler
16a497c627 [BOLT] Support full instrumentation
Summary:
Add full instrumentation support (branches, direct and
indirect calls). Add output statistics to show how many hot bytes
were split from cold ones in functions. Add -cold-threshold option
to allow splitting warm code (non-zero count). Add option in
bolt-diff to report missing functions in profile 2.

In instrumentation, fini hooks are fixed to run proper finalization
code after program finishes. Hooks for startup are added to setup
the runtime structures that needs initilization, such as indirect call
hash tables.

Add support for automatically dumping profile data every N seconds by
forking a watcher process during runtime.

(cherry picked from FBD17644396)
2019-12-13 17:27:03 -08:00
Maksim Panchenko
3cc4fc267b [BOLT] Proper support for -trap-avx512 option
Summary:
If -trap-avx512 option is not set, verify that we correctly encode
AVX-512 instructions and treat them as ordinary instructions.

(cherry picked from FBD18666427)
2019-11-22 14:53:20 -08:00
Maksim Panchenko
7350d40404 [BOLT][NFC] Refactor BinaryFunction::addEntryPoint()
Summary:
There is no need to support existing functionality of adding entry
points after the CFG is built as the function is only called in empty or
disassembled state. Previously we used to run disassemble+buildCFG per
function, but now these phases are decoupled.

Also, remove a couple of redundant checks.

(cherry picked from FBD18622822)
2019-11-11 17:02:37 -08:00
Maksim Panchenko
72b52edcbb [BOLT] Free more memory in BinaryFunction::releaseCFG()
Summary:
Free more lists in BinaryFunction::releaseCFG().

Release BinaryFunction::Relocations after disassembly.

Do not populate BinaryFunction::MoveRelocations as we are not using them
currently.

Also remove PCRelativeRelocationOffsets that weren't used.

(cherry picked from FBD18413256)
2019-11-08 14:41:31 -08:00