2015-11-23 17:54:18 -08:00
|
|
|
//===--- RewriteInstance.cpp - Interface for machine-level function -------===//
|
|
|
|
|
//
|
|
|
|
|
// The LLVM Compiler Infrastructure
|
|
|
|
|
//
|
|
|
|
|
// This file is distributed under the University of Illinois Open Source
|
|
|
|
|
// License. See LICENSE.TXT for details.
|
|
|
|
|
//
|
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
//
|
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
|
2019-04-25 17:00:05 -07:00
|
|
|
#include "RewriteInstance.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "BinaryBasicBlock.h"
|
|
|
|
|
#include "BinaryContext.h"
|
2020-03-06 15:06:37 -08:00
|
|
|
#include "BinaryEmitter.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "BinaryFunction.h"
|
2016-04-15 15:59:52 -07:00
|
|
|
#include "BinaryPassManager.h"
|
2019-04-12 17:33:46 -07:00
|
|
|
#include "BoltAddressTranslation.h"
|
2017-10-16 16:53:50 -07:00
|
|
|
#include "CacheMetrics.h"
|
2019-04-03 15:52:01 -07:00
|
|
|
#include "DWARFRewriter.h"
|
2017-09-01 18:13:51 -07:00
|
|
|
#include "DataAggregator.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "DataReader.h"
|
|
|
|
|
#include "Exceptions.h"
|
2019-03-14 18:49:40 -07:00
|
|
|
#include "ExecutableFileMemoryManager.h"
|
2018-03-09 09:45:13 -08:00
|
|
|
#include "MCPlusBuilder.h"
|
2019-07-12 07:25:50 -07:00
|
|
|
#include "ParallelUtilities.h"
|
2019-04-25 17:00:05 -07:00
|
|
|
#include "Passes/ReorderFunctions.h"
|
2019-04-11 17:11:08 -07:00
|
|
|
#include "Relocation.h"
|
Adding automatic huge page support
Summary:
This patch enables automated hugify for Bolt.
When running Bolt against a binary with -hugify specified, Bolt will inject a call to a runtime library function at the entry of the binary. The runtime library calls madvise to map the hot code region into a 2M huge page. We support both new kernel with THP support and old kernels. For kernels with THP support we simply make a madvise call, while for old kernels, we first copy the code out, remap the memory with huge page, and then copy the code back.
With this change, we no longer need to manually call into hugify_self and precompile it with --hot-text. Instead, we could simply combine --hugify option with existing optimizations, and at runtime it will automatically move hot code into 2M pages.
Some details around the changes made:
1. Add an command line option to support --hugify. --hugify will automatically turn on --hot-text to get the proper hot code symbols. However, running with both --hugify and --hot-text is not allowed, since --hot-text is used on binaries that has precompiled call to hugify_self, which contradicts with the purpose of --hugify.
2. Moved the common utility functions out of instr.cpp to common.h, which will also be used by hugify.cpp. Added a few new system calls definitions.
3. Added a new class that inherits RuntimeLibrary, and implemented the necessary emit and link logic for hugify.
4. Added a simple test for hugify.
(cherry picked from FBD21384529)
2020-05-02 11:14:38 -07:00
|
|
|
#include "RuntimeLibs/HugifyRuntimeLibrary.h"
|
2020-01-30 13:10:48 -08:00
|
|
|
#include "Utils.h"
|
2020-05-07 23:00:29 -07:00
|
|
|
#include "YAMLProfileReader.h"
|
|
|
|
|
#include "YAMLProfileWriter.h"
|
2017-07-17 11:22:22 -07:00
|
|
|
#include "llvm/ADT/Optional.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "llvm/ADT/STLExtras.h"
|
2019-07-24 14:03:43 -07:00
|
|
|
#include "llvm/BinaryFormat/Magic.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "llvm/DebugInfo/DWARF/DWARFContext.h"
|
|
|
|
|
#include "llvm/ExecutionEngine/Orc/LambdaResolver.h"
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
#include "llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "llvm/ExecutionEngine/RTDyldMemoryManager.h"
|
|
|
|
|
#include "llvm/MC/MCAsmBackend.h"
|
|
|
|
|
#include "llvm/MC/MCAsmInfo.h"
|
2019-07-12 07:25:50 -07:00
|
|
|
#include "llvm/MC/MCAsmLayout.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "llvm/MC/MCContext.h"
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
#include "llvm/MC/MCDisassembler/MCDisassembler.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "llvm/MC/MCInstPrinter.h"
|
|
|
|
|
#include "llvm/MC/MCInstrAnalysis.h"
|
|
|
|
|
#include "llvm/MC/MCInstrInfo.h"
|
|
|
|
|
#include "llvm/MC/MCObjectFileInfo.h"
|
|
|
|
|
#include "llvm/MC/MCObjectStreamer.h"
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
#include "llvm/MC/MCObjectWriter.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "llvm/MC/MCRegisterInfo.h"
|
|
|
|
|
#include "llvm/MC/MCStreamer.h"
|
|
|
|
|
#include "llvm/MC/MCSubtargetInfo.h"
|
|
|
|
|
#include "llvm/MC/MCSymbol.h"
|
2019-07-24 14:03:43 -07:00
|
|
|
#include "llvm/Object/Archive.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "llvm/Object/ObjectFile.h"
|
|
|
|
|
#include "llvm/Support/Casting.h"
|
|
|
|
|
#include "llvm/Support/CommandLine.h"
|
2016-09-27 19:09:38 -07:00
|
|
|
#include "llvm/Support/DataExtractor.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "llvm/Support/Errc.h"
|
|
|
|
|
#include "llvm/Support/ManagedStatic.h"
|
|
|
|
|
#include "llvm/Support/TargetRegistry.h"
|
2019-07-12 07:25:50 -07:00
|
|
|
#include "llvm/Support/TargetSelect.h"
|
2017-11-27 18:00:24 -08:00
|
|
|
#include "llvm/Support/Timer.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "llvm/Support/ToolOutputFile.h"
|
2017-05-24 14:14:16 -07:00
|
|
|
#include "llvm/Support/raw_ostream.h"
|
2015-11-23 17:54:18 -08:00
|
|
|
#include "llvm/Target/TargetMachine.h"
|
|
|
|
|
#include <algorithm>
|
2016-01-26 16:03:58 -08:00
|
|
|
#include <fstream>
|
2015-11-23 17:54:18 -08:00
|
|
|
#include <stack>
|
|
|
|
|
#include <system_error>
|
2019-06-11 13:24:10 -07:00
|
|
|
#include <thread>
|
2015-11-23 17:54:18 -08:00
|
|
|
|
|
|
|
|
#undef DEBUG_TYPE
|
2016-02-05 14:42:04 -08:00
|
|
|
#define DEBUG_TYPE "bolt"
|
2015-11-23 17:54:18 -08:00
|
|
|
|
|
|
|
|
using namespace llvm;
|
|
|
|
|
using namespace object;
|
2016-02-05 14:42:04 -08:00
|
|
|
using namespace bolt;
|
2015-11-23 17:54:18 -08:00
|
|
|
|
[BOLT] Decoder cache friendly alignment wrt Intel JCC Erratum
Summary:
This diff ports reviews.llvm.org/D70157 to our LLVM tree, which
makes the integrated assembler able to align X86 control-flow changing
instructions in a way to reduce the performance impact of the ucode
update on Intel processors that implement the JCC erratum mitigation.
See white paper "Mitigations for Jump Conditional Code Erratum" by Intel
published November 2019.
To port this patch, I changed classifySecondInstInMacroFusion to analyze
instruction opcodes directly instead of analyzing the CondCond operand
(in more recent versions of LLVM, all conditional branches share the
same opcode, but with a different conditional operand). I also pulled to
our tree Alignment.h as a dependency, and the macroop analyzing helpers.
x86-align-branch-boundary and -x86-align-branch are the two flags that
control nop insertion to avoid disabling the decoder cache, following
the original patch. In BOLT, I added the flag
x86-align-branch-boundary-hot-only to request the alignment to only be
applied to hot code, which is turned on by default. The reason is
because such alignment is expensive to perform on large modules, but if
we limit it to hot code, the relaxation pass runtime becomes tolerable.
(cherry picked from FBD19828850)
2020-02-10 18:50:53 -08:00
|
|
|
extern cl::opt<uint32_t> X86AlignBranchBoundary;
|
|
|
|
|
extern cl::opt<bool> X86AlignBranchWithin32BBoundaries;
|
|
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
namespace opts {
|
|
|
|
|
|
2019-02-05 15:28:19 -08:00
|
|
|
extern bool HeatmapMode;
|
Generate heatmap for linux kernel
Summary:
This diff handles several challenges related to heatmap generation for Linux kernel (vmlinux elf file):
- If the input binary elf file contains the section `__ksymtab`, this diff assumes that this is the linux kernel `vmlinux` file and enables an extra flag `LinuxKernelMode`
- In `LinuxKernelMode`, we only support heat map generation right now, therefore it ensures that current BOLT mode is heat map generation. Otherwise, it exits with error.
- For some Linux symbol and section combinations, BOLT may not be able to find section for symbol (specially symbols that specifies the end of some section). For such cases, we show an warning message without exiting which was the previous behavior.
- Linux kernel elf file does not contain dynamic section, therefore, we don't exit when no dynamic section is found for linux kernel binary.
- Current `ParseMMap` logic does not work with linux kernel. MMap entries for linux kernel uses `PERF_RECORD_MMAP` format instead of typical `PERF_RECORD_MMAP2` format. Since linux kernel address mapping is absolute (same as specified in the ELF file), we avoid calling `ParseMMap` in linux kernel mode.
- Linux kernel entries are registered with PID -1, therefore `BinaryMMapInfo` lookup is not required for linux kernel entries. Similarly, `adjustLBR` is also not required.
- Default max address in linux kernel mode is highest unsigned 64-bit integer instead of current 4GBs.
- Added another new parameter for heatmap, `MinAddress`, in case of Linux kernel mode which is `KernelBaseAddress`, otherwise, it is 0. While registering Heatmap sample counts from LBR entries, any address lower than this `MinAddress` is ignored.
- `IgnoreInterruptLBR` is disabled in linux kernel mode to ensure that kernel entries are processed
Currently, linux kernel heat map also include heat map for Linux kernel modules that are not part of vmlinux elf file. This is intentional to identify other potential optimization opportunities. If reviewers think, those modules should be omitted, I will disable those modules based on highest end address of a vmlinux elf section.
(cherry picked from FBD21992765)
2020-06-10 23:00:39 -07:00
|
|
|
extern bool LinuxKernelMode;
|
2019-02-05 15:28:19 -08:00
|
|
|
|
2017-03-28 14:40:20 -07:00
|
|
|
extern cl::OptionCategory BoltCategory;
|
2017-12-07 15:00:41 -08:00
|
|
|
extern cl::OptionCategory BoltDiffCategory;
|
2017-03-28 14:40:20 -07:00
|
|
|
extern cl::OptionCategory BoltOptCategory;
|
2017-09-01 18:13:51 -07:00
|
|
|
extern cl::OptionCategory BoltOutputCategory;
|
|
|
|
|
extern cl::OptionCategory AggregatorCategory;
|
2017-03-28 14:40:20 -07:00
|
|
|
|
2018-04-13 15:46:19 -07:00
|
|
|
extern cl::opt<MacroFusionType> AlignMacroOpFusion;
|
Adding automatic huge page support
Summary:
This patch enables automated hugify for Bolt.
When running Bolt against a binary with -hugify specified, Bolt will inject a call to a runtime library function at the entry of the binary. The runtime library calls madvise to map the hot code region into a 2M huge page. We support both new kernel with THP support and old kernels. For kernels with THP support we simply make a madvise call, while for old kernels, we first copy the code out, remap the memory with huge page, and then copy the code back.
With this change, we no longer need to manually call into hugify_self and precompile it with --hot-text. Instead, we could simply combine --hugify option with existing optimizations, and at runtime it will automatically move hot code into 2M pages.
Some details around the changes made:
1. Add an command line option to support --hugify. --hugify will automatically turn on --hot-text to get the proper hot code symbols. However, running with both --hugify and --hot-text is not allowed, since --hot-text is used on binaries that has precompiled call to hugify_self, which contradicts with the purpose of --hugify.
2. Moved the common utility functions out of instr.cpp to common.h, which will also be used by hugify.cpp. Added a few new system calls definitions.
3. Added a new class that inherits RuntimeLibrary, and implemented the necessary emit and link logic for hugify.
4. Added a simple test for hugify.
(cherry picked from FBD21384529)
2020-05-02 11:14:38 -07:00
|
|
|
extern cl::opt<bool> Hugify;
|
Refactor runtime library
Summary:
As we are adding more types of runtime libraries, it would be better to move the runtime library out of RewriteInstance so that it could grow separately. This also requires splitting the current implementation of Instrumentation.cpp to two separate pieces, one as normal Pass, one as the runtime library. The Instrumentation Pass would pass over the generated data to the runtime library, which will use to emit binary and perform linking.
This patch does the following:
1. Turn Instrumentation class into an optimization pass. Register the pass in the pass manager instead of in RewriteInstance.
2. Split all the data that are generated by Instrumentation that's needed by runtime library into a separate data structure called InstrumentationSummary. At the creation of Instrumentation pass, we create an instance of such data structure, which will be moved over to the runtime at the end of the pass.
3. Added a runtime library member to BinaryContext. Set the member at the end of Instrumentation pass.
4. In BinaryEmitter, make BinaryContext to also emit runtime library binary.
5. Created a base class RuntimeLibrary, that defines the interface of a runtime library, along with a few common helper functions.
6. Created InstrumentationRuntimeLibrary which inherits from RuntimeLibrary, that does all the work (mostly copied over) for emit and linking.
7. Added a new directory called RuntimeLibs, and put all the runtime library related files into it.
(cherry picked from FBD21694762)
2020-05-21 14:28:47 -07:00
|
|
|
extern cl::opt<bool> Instrument;
|
2017-01-17 15:49:59 -08:00
|
|
|
extern cl::opt<JumpTableSupportLevel> JumpTables;
|
2018-04-20 20:03:31 -07:00
|
|
|
extern cl::list<std::string> ReorderData;
|
2019-04-25 17:00:05 -07:00
|
|
|
extern cl::opt<bolt::ReorderFunctions::ReorderType> ReorderFunctions;
|
2019-07-12 07:25:50 -07:00
|
|
|
extern cl::opt<bool> TimeBuild;
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2020-04-19 15:02:50 -07:00
|
|
|
cl::opt<unsigned>
|
|
|
|
|
AlignText("align-text",
|
|
|
|
|
cl::desc("alignment of .text section"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
2018-01-24 05:42:11 -08:00
|
|
|
static cl::opt<bool>
|
|
|
|
|
ForceToDataRelocations("force-data-relocations",
|
|
|
|
|
cl::desc("force relocations to data sections to always be processed"),
|
|
|
|
|
cl::init(false),
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
2020-03-06 15:06:37 -08:00
|
|
|
cl::opt<bool>
|
2017-10-16 16:53:50 -07:00
|
|
|
PrintCacheMetrics("print-cache-metrics",
|
|
|
|
|
cl::desc("calculate and print various metrics for instruction cache"),
|
2017-06-13 16:29:39 -07:00
|
|
|
cl::init(false),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::cat(BoltOptCategory));
|
|
|
|
|
|
2017-09-01 18:13:51 -07:00
|
|
|
cl::opt<std::string>
|
2017-03-28 14:40:20 -07:00
|
|
|
OutputFilename("o",
|
|
|
|
|
cl::desc("<output file>"),
|
2017-12-07 15:00:41 -08:00
|
|
|
cl::Optional,
|
2017-09-01 18:13:51 -07:00
|
|
|
cl::cat(BoltOutputCategory));
|
2016-09-02 14:15:29 -07:00
|
|
|
|
2020-05-06 17:31:25 -07:00
|
|
|
cl::opt<std::string>
|
|
|
|
|
BoltID("bolt-id",
|
|
|
|
|
cl::desc("add any string to tag this execution in the "
|
|
|
|
|
"output binary via bolt info section"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
cl::opt<bool>
|
2017-03-28 14:40:20 -07:00
|
|
|
AllowStripped("allow-stripped",
|
|
|
|
|
cl::desc("allow processing of stripped binaries"),
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2016-09-09 12:37:37 -07:00
|
|
|
cl::opt<bool>
|
2017-03-28 14:40:20 -07:00
|
|
|
DumpDotAll("dump-dot-all",
|
|
|
|
|
cl::desc("dump function CFGs to graphviz format after each stage"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2016-09-09 12:37:37 -07:00
|
|
|
|
2017-03-28 14:40:20 -07:00
|
|
|
static cl::opt<bool>
|
|
|
|
|
DumpEHFrame("dump-eh-frame",
|
|
|
|
|
cl::desc("dump parsed .eh_frame (debugging)"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2017-02-22 11:29:52 -08:00
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
static cl::list<std::string>
|
2020-05-03 13:54:45 -07:00
|
|
|
ForceFunctionNames("funcs",
|
2017-03-28 14:40:20 -07:00
|
|
|
cl::CommaSeparated,
|
2020-05-03 13:54:45 -07:00
|
|
|
cl::desc("limit optimizations to functions from the list"),
|
2017-03-28 14:40:20 -07:00
|
|
|
cl::value_desc("func1,func2,func3,..."),
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2016-01-26 16:03:58 -08:00
|
|
|
static cl::opt<std::string>
|
2016-04-21 09:54:33 -07:00
|
|
|
FunctionNamesFile("funcs-file",
|
2017-03-28 14:40:20 -07:00
|
|
|
cl::desc("file with list of functions to optimize"),
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2017-03-28 14:40:20 -07:00
|
|
|
cl::opt<bool>
|
2019-03-14 18:51:05 -07:00
|
|
|
HotFunctionsAtEnd(
|
|
|
|
|
"hot-functions-at-end",
|
|
|
|
|
cl::desc(
|
|
|
|
|
"if reorder-functions is used, order functions putting hottest last"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
Adding automatic huge page support
Summary:
This patch enables automated hugify for Bolt.
When running Bolt against a binary with -hugify specified, Bolt will inject a call to a runtime library function at the entry of the binary. The runtime library calls madvise to map the hot code region into a 2M huge page. We support both new kernel with THP support and old kernels. For kernels with THP support we simply make a madvise call, while for old kernels, we first copy the code out, remap the memory with huge page, and then copy the code back.
With this change, we no longer need to manually call into hugify_self and precompile it with --hot-text. Instead, we could simply combine --hugify option with existing optimizations, and at runtime it will automatically move hot code into 2M pages.
Some details around the changes made:
1. Add an command line option to support --hugify. --hugify will automatically turn on --hot-text to get the proper hot code symbols. However, running with both --hugify and --hot-text is not allowed, since --hot-text is used on binaries that has precompiled call to hugify_self, which contradicts with the purpose of --hugify.
2. Moved the common utility functions out of instr.cpp to common.h, which will also be used by hugify.cpp. Added a few new system calls definitions.
3. Added a new class that inherits RuntimeLibrary, and implemented the necessary emit and link logic for hugify.
4. Added a simple test for hugify.
(cherry picked from FBD21384529)
2020-05-02 11:14:38 -07:00
|
|
|
cl::opt<bool> HotText(
|
|
|
|
|
"hot-text",
|
|
|
|
|
cl::desc(
|
|
|
|
|
"Generate hot text symbols. Apply this option to a precompiled binary "
|
|
|
|
|
"that manually calls into hugify, such that at runtime hugify call "
|
|
|
|
|
"will put hot code into 2M pages. This requires relocation."),
|
|
|
|
|
cl::ZeroOrMore, cl::cat(BoltCategory));
|
2017-03-28 14:40:20 -07:00
|
|
|
|
2019-03-15 13:43:36 -07:00
|
|
|
static cl::list<std::string>
|
|
|
|
|
HotTextMoveSections("hot-text-move-sections",
|
|
|
|
|
cl::desc("list of sections containing functions used for hugifying hot text. "
|
|
|
|
|
"BOLT makes sure these functions are not placed on the same page as "
|
|
|
|
|
"the hot text. (default=\'.stub,.mover\')."),
|
|
|
|
|
cl::value_desc("sec1,sec2,sec3,..."),
|
|
|
|
|
cl::CommaSeparated,
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
2019-11-19 14:47:49 -08:00
|
|
|
cl::opt<bool>
|
2018-04-20 20:03:31 -07:00
|
|
|
HotData("hot-data",
|
|
|
|
|
cl::desc("hot data symbols support (relocation mode)"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
2020-05-26 04:21:04 -07:00
|
|
|
cl::opt<bool>
|
2017-03-28 14:40:20 -07:00
|
|
|
KeepTmp("keep-tmp",
|
|
|
|
|
cl::desc("preserve intermediate .o file"),
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2020-05-03 15:49:58 -07:00
|
|
|
static cl::opt<bool>
|
|
|
|
|
Lite("lite",
|
|
|
|
|
cl::desc("skip processing of cold functions"),
|
|
|
|
|
cl::init(false),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
static cl::opt<unsigned>
|
2016-04-21 09:54:33 -07:00
|
|
|
MaxFunctions("max-funcs",
|
2020-05-03 13:54:45 -07:00
|
|
|
cl::desc("maximum number of functions to process"),
|
2017-03-28 14:40:20 -07:00
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2017-11-14 20:05:11 -08:00
|
|
|
static cl::opt<unsigned>
|
|
|
|
|
MaxDataRelocations("max-data-relocations",
|
|
|
|
|
cl::desc("maximum number of data relocations to process"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
cl::opt<bool>
|
2017-03-28 14:40:20 -07:00
|
|
|
PrintAll("print-all",
|
|
|
|
|
cl::desc("print functions after each stage"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2016-02-25 16:57:07 -08:00
|
|
|
|
2020-02-17 12:18:42 -08:00
|
|
|
cl::opt<bool>
|
2017-03-28 14:40:20 -07:00
|
|
|
PrintCFG("print-cfg",
|
|
|
|
|
cl::desc("print functions after CFG construction"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2016-04-11 17:46:18 -07:00
|
|
|
|
2020-02-11 14:30:33 -08:00
|
|
|
cl::opt<bool> PrintDisasm("print-disasm",
|
2017-03-28 14:40:20 -07:00
|
|
|
cl::desc("print function after disassembly"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2016-02-12 19:01:53 -08:00
|
|
|
|
2017-11-14 20:05:11 -08:00
|
|
|
static cl::opt<bool>
|
|
|
|
|
PrintGlobals("print-globals",
|
|
|
|
|
cl::desc("print global symbols after disassembly"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
2020-01-30 13:10:48 -08:00
|
|
|
extern cl::opt<bool> PrintSections;
|
2018-02-01 16:33:43 -08:00
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
static cl::opt<bool>
|
2017-03-28 14:40:20 -07:00
|
|
|
PrintLoopInfo("print-loops",
|
|
|
|
|
cl::desc("print loop related information"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2019-05-15 17:19:18 -07:00
|
|
|
static cl::opt<bool>
|
|
|
|
|
PrintSDTMarkers("print-sdt",
|
|
|
|
|
cl::desc("print all SDT markers"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
2017-12-09 21:40:39 -08:00
|
|
|
static cl::opt<cl::boolOrDefault>
|
|
|
|
|
RelocationMode("relocs",
|
|
|
|
|
cl::desc("use relocations in the binary (default=autodetect)"),
|
2017-03-28 14:40:20 -07:00
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::cat(BoltCategory));
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2017-12-13 23:12:01 -08:00
|
|
|
static cl::opt<std::string>
|
|
|
|
|
SaveProfile("w",
|
|
|
|
|
cl::desc("save recorded profile to a file"),
|
|
|
|
|
cl::cat(BoltOutputCategory));
|
|
|
|
|
|
2017-03-28 14:40:20 -07:00
|
|
|
static cl::list<std::string>
|
|
|
|
|
SkipFunctionNames("skip-funcs",
|
|
|
|
|
cl::CommaSeparated,
|
|
|
|
|
cl::desc("list of functions to skip"),
|
|
|
|
|
cl::value_desc("func1,func2,func3,..."),
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2016-07-01 08:40:56 -07:00
|
|
|
|
2017-03-28 14:40:20 -07:00
|
|
|
static cl::opt<std::string>
|
|
|
|
|
SkipFunctionNamesFile("skip-funcs-file",
|
|
|
|
|
cl::desc("file with list of functions to skip"),
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2018-06-25 14:55:48 -07:00
|
|
|
cl::opt<bool>
|
|
|
|
|
SplitEH("split-eh",
|
|
|
|
|
cl::desc("split C++ exception handling code"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltOptCategory));
|
|
|
|
|
|
2019-06-28 09:21:27 -07:00
|
|
|
cl::opt<bool>
|
|
|
|
|
StrictMode("strict",
|
|
|
|
|
cl::desc("trust the input to be from a well-formed source"),
|
|
|
|
|
cl::init(false),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
2017-03-28 14:40:20 -07:00
|
|
|
cl::opt<bool>
|
|
|
|
|
TrapOldCode("trap-old-code",
|
|
|
|
|
cl::desc("insert traps in old function bodies (relocation mode)"),
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
|
|
|
|
cl::opt<bool>
|
|
|
|
|
UpdateDebugSections("update-debug-sections",
|
|
|
|
|
cl::desc("update DWARF debug sections of the executable"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::cat(BoltCategory));
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2019-04-12 17:33:46 -07:00
|
|
|
cl::opt<bool>
|
|
|
|
|
EnableBAT("enable-bat",
|
|
|
|
|
cl::desc("write BOLT Address Translation tables"),
|
|
|
|
|
cl::init(false),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
2016-02-08 10:08:28 -08:00
|
|
|
static cl::opt<bool>
|
2017-03-28 14:40:20 -07:00
|
|
|
UseGnuStack("use-gnu-stack",
|
|
|
|
|
cl::desc("use GNU_STACK program header for new segment (workaround for "
|
|
|
|
|
"issues with strip/objcopy)"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::cat(BoltCategory));
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2016-07-11 18:51:13 -07:00
|
|
|
cl::opt<bool>
|
2017-03-28 14:40:20 -07:00
|
|
|
UseOldText("use-old-text",
|
|
|
|
|
cl::desc("re-use space in old .text if possible (relocation mode)"),
|
2020-06-22 14:05:19 -07:00
|
|
|
cl::ZeroOrMore,
|
2017-03-28 14:40:20 -07:00
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
|
|
|
|
// The default verbosity level (0) is pretty terse, level 1 is fairly
|
|
|
|
|
// verbose and usually prints some informational message for every
|
|
|
|
|
// function processed. Level 2 is for the noisiest of messages and
|
|
|
|
|
// often prints a message per basic block.
|
|
|
|
|
// Error messages should never be suppressed by the verbosity level.
|
|
|
|
|
// Only warnings and info messages should be affected.
|
|
|
|
|
//
|
|
|
|
|
// The rational behind stream usage is as follows:
|
|
|
|
|
// outs() for info and debugging controlled by command line flags.
|
|
|
|
|
// errs() for errors and warnings.
|
|
|
|
|
// dbgs() for output within DEBUG().
|
|
|
|
|
cl::opt<unsigned>
|
|
|
|
|
Verbosity("v",
|
|
|
|
|
cl::desc("set verbosity level for diagnostic output"),
|
|
|
|
|
cl::init(0),
|
|
|
|
|
cl::ZeroOrMore,
|
2019-02-05 15:28:19 -08:00
|
|
|
cl::cat(BoltCategory),
|
|
|
|
|
cl::sub(*cl::AllSubCommands));
|
2016-07-11 18:51:13 -07:00
|
|
|
|
2017-09-01 18:13:51 -07:00
|
|
|
cl::opt<bool>
|
|
|
|
|
AggregateOnly("aggregate-only",
|
|
|
|
|
cl::desc("exit after writing aggregated data file"),
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(AggregatorCategory));
|
|
|
|
|
|
2017-12-07 15:00:41 -08:00
|
|
|
cl::opt<bool>
|
|
|
|
|
DiffOnly("diff-only",
|
|
|
|
|
cl::desc("stop processing once we have enough to compare two binaries"),
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltDiffCategory));
|
|
|
|
|
|
2017-11-27 18:00:24 -08:00
|
|
|
static cl::opt<bool>
|
|
|
|
|
TimeRewrite("time-rewrite",
|
|
|
|
|
cl::desc("print time spent in rewriting passes"),
|
|
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
|
|
|
|
cl::cat(BoltCategory));
|
|
|
|
|
|
2019-07-12 07:25:50 -07:00
|
|
|
static cl::opt<bool>
|
|
|
|
|
SequentialDisassembly("sequential-disassembly",
|
|
|
|
|
cl::desc("performs disassembly sequentially"),
|
|
|
|
|
cl::init(false),
|
|
|
|
|
cl::cat(BoltOptCategory));
|
|
|
|
|
|
2019-07-30 17:55:27 -07:00
|
|
|
static cl::opt<bool>
|
2019-08-05 13:56:48 -07:00
|
|
|
WriteBoltInfoSection("bolt-info",
|
2019-07-30 17:55:27 -07:00
|
|
|
cl::desc("write bolt info section in the output binary"),
|
|
|
|
|
cl::init(true),
|
2019-08-05 13:56:48 -07:00
|
|
|
cl::ZeroOrMore,
|
|
|
|
|
cl::Hidden,
|
2019-07-30 17:55:27 -07:00
|
|
|
cl::cat(BoltOutputCategory));
|
|
|
|
|
|
2019-03-15 13:43:36 -07:00
|
|
|
bool isHotTextMover(const BinaryFunction &Function) {
|
|
|
|
|
for (auto &SectionName : opts::HotTextMoveSections) {
|
|
|
|
|
if (Function.getOriginSectionName() == SectionName)
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2020-06-25 16:29:17 -07:00
|
|
|
/// Return true if we should process all functions in the binary.
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
bool processAllFunctions() {
|
2020-06-25 16:29:17 -07:00
|
|
|
if (opts::AggregateOnly)
|
|
|
|
|
return false;
|
|
|
|
|
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (UseOldText || StrictMode)
|
|
|
|
|
return true;
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
} // namespace opts
|
|
|
|
|
|
2017-05-16 09:27:34 -07:00
|
|
|
constexpr const char *RewriteInstance::SectionsToOverwrite[];
|
2019-04-26 15:30:12 -07:00
|
|
|
constexpr const char *RewriteInstance::DebugSectionsToOverwrite[];
|
2016-07-22 20:52:57 -07:00
|
|
|
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
const char RewriteInstance::TimerGroupName[] = "rewrite";
|
|
|
|
|
const char RewriteInstance::TimerGroupDesc[] = "Rewrite passes";
|
2017-11-27 18:00:24 -08:00
|
|
|
|
2017-05-24 14:14:16 -07:00
|
|
|
namespace llvm {
|
|
|
|
|
namespace bolt {
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2020-01-30 13:10:48 -08:00
|
|
|
extern const char *BoltRevision;
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2020-01-30 13:10:48 -08:00
|
|
|
} // namespace bolt
|
|
|
|
|
} // namespace llvm
|
2017-10-20 12:11:34 -07:00
|
|
|
|
2017-11-14 20:05:11 -08:00
|
|
|
namespace {
|
2018-04-20 20:03:31 -07:00
|
|
|
|
|
|
|
|
bool refersToReorderedSection(ErrorOr<BinarySection &> Section) {
|
|
|
|
|
auto Itr = std::find_if(opts::ReorderData.begin(),
|
|
|
|
|
opts::ReorderData.end(),
|
|
|
|
|
[&](const std::string &SectionName) {
|
|
|
|
|
return (Section &&
|
|
|
|
|
Section->getName() == SectionName);
|
|
|
|
|
});
|
|
|
|
|
return Itr != opts::ReorderData.end();
|
|
|
|
|
}
|
|
|
|
|
|
2016-08-11 14:23:54 -07:00
|
|
|
} // namespace
|
|
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
RewriteInstance::RewriteInstance(ELFObjectFileBase *File, const int Argc,
|
2019-07-24 14:03:43 -07:00
|
|
|
const char *const *Argv, StringRef ToolPath)
|
2020-05-07 23:00:29 -07:00
|
|
|
: InputFile(File), Argc(Argc), Argv(Argv), ToolPath(ToolPath),
|
2020-01-15 15:23:45 -08:00
|
|
|
BC(BinaryContext::createBinaryContext(
|
2020-05-07 23:00:29 -07:00
|
|
|
File,
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
DWARFContext::create(*File, nullptr,
|
|
|
|
|
DWARFContext::defaultErrorHandler, "", false))),
|
2019-04-12 17:33:46 -07:00
|
|
|
BAT(llvm::make_unique<BoltAddressTranslation>(*BC)),
|
2019-04-03 15:52:01 -07:00
|
|
|
SHStrTab(StringTableBuilder::ELF) {
|
|
|
|
|
if (opts::UpdateDebugSections) {
|
|
|
|
|
DebugInfoRewriter = llvm::make_unique<DWARFRewriter>(*BC, SectionPatchers);
|
|
|
|
|
}
|
Adding automatic huge page support
Summary:
This patch enables automated hugify for Bolt.
When running Bolt against a binary with -hugify specified, Bolt will inject a call to a runtime library function at the entry of the binary. The runtime library calls madvise to map the hot code region into a 2M huge page. We support both new kernel with THP support and old kernels. For kernels with THP support we simply make a madvise call, while for old kernels, we first copy the code out, remap the memory with huge page, and then copy the code back.
With this change, we no longer need to manually call into hugify_self and precompile it with --hot-text. Instead, we could simply combine --hugify option with existing optimizations, and at runtime it will automatically move hot code into 2M pages.
Some details around the changes made:
1. Add an command line option to support --hugify. --hugify will automatically turn on --hot-text to get the proper hot code symbols. However, running with both --hugify and --hot-text is not allowed, since --hot-text is used on binaries that has precompiled call to hugify_self, which contradicts with the purpose of --hugify.
2. Moved the common utility functions out of instr.cpp to common.h, which will also be used by hugify.cpp. Added a few new system calls definitions.
3. Added a new class that inherits RuntimeLibrary, and implemented the necessary emit and link logic for hugify.
4. Added a simple test for hugify.
(cherry picked from FBD21384529)
2020-05-02 11:14:38 -07:00
|
|
|
if (opts::Hugify) {
|
|
|
|
|
BC->setRuntimeLibrary(llvm::make_unique<HugifyRuntimeLibrary>());
|
|
|
|
|
}
|
2019-04-03 15:52:01 -07:00
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
|
|
|
|
|
RewriteInstance::~RewriteInstance() {}
|
|
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
Error RewriteInstance::setProfile(StringRef Filename) {
|
|
|
|
|
if (!sys::fs::exists(Filename))
|
|
|
|
|
return errorCodeToError(make_error_code(errc::no_such_file_or_directory));
|
|
|
|
|
|
|
|
|
|
if (ProfileReader) {
|
|
|
|
|
// Already exists
|
|
|
|
|
return make_error<StringError>(
|
|
|
|
|
Twine("multiple profiles specified: ") + ProfileReader->getFilename() +
|
|
|
|
|
" and " + Filename, inconvertibleErrorCode());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Spawn a profile reader based on file contents.
|
|
|
|
|
if (DataAggregator::checkPerfDataMagic(Filename)) {
|
|
|
|
|
ProfileReader = llvm::make_unique<DataAggregator>(Filename);
|
|
|
|
|
} else if (YAMLProfileReader::isYAML(Filename)) {
|
|
|
|
|
ProfileReader = llvm::make_unique<YAMLProfileReader>(Filename);
|
|
|
|
|
} else {
|
|
|
|
|
ProfileReader = llvm::make_unique<DataReader>(Filename);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return Error::success();
|
|
|
|
|
}
|
|
|
|
|
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
/// Return true if the function \p BF should be disassembled.
|
|
|
|
|
static bool shouldDisassemble(const BinaryFunction &BF) {
|
|
|
|
|
if (BF.isPseudo())
|
2019-01-15 23:43:40 -08:00
|
|
|
return false;
|
|
|
|
|
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (opts::processAllFunctions())
|
|
|
|
|
return true;
|
|
|
|
|
|
|
|
|
|
return !BF.isIgnored();
|
2019-01-15 23:43:40 -08:00
|
|
|
}
|
|
|
|
|
|
2016-02-08 10:02:48 -08:00
|
|
|
void RewriteInstance::discoverStorage() {
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
NamedRegionTimer T("discoverStorage", "discover storage", TimerGroupName,
|
|
|
|
|
TimerGroupDesc, opts::TimeRewrite);
|
2017-11-14 16:51:24 -08:00
|
|
|
|
2017-09-20 10:43:01 -07:00
|
|
|
// Stubs are harmful because RuntimeDyld may try to increase the size of
|
|
|
|
|
// sections accounting for stubs when we need those sections to match the
|
|
|
|
|
// same size seen in the input binary, in case this section is a copy
|
|
|
|
|
// of the original one seen in the binary.
|
2019-12-17 11:17:31 -08:00
|
|
|
BC->EFMM.reset(new ExecutableFileMemoryManager(*BC, /*AllowStubs*/ false));
|
2017-01-17 15:49:59 -08:00
|
|
|
|
2016-03-03 10:13:11 -08:00
|
|
|
auto ELF64LEFile = dyn_cast<ELF64LEObjectFile>(InputFile);
|
2016-02-08 10:02:48 -08:00
|
|
|
if (!ELF64LEFile) {
|
|
|
|
|
errs() << "BOLT-ERROR: only 64-bit LE ELF binaries are supported\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
|
|
|
|
auto Obj = ELF64LEFile->getELFFile();
|
2018-06-29 21:12:55 -07:00
|
|
|
if (Obj->getHeader()->e_type != ELF::ET_EXEC) {
|
2018-08-14 13:24:44 -07:00
|
|
|
outs() << "BOLT-INFO: shared object or position-independent executable "
|
|
|
|
|
"detected\n";
|
|
|
|
|
BC->HasFixedLoadAddress = false;
|
2018-06-29 21:12:55 -07:00
|
|
|
}
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2020-03-08 19:04:39 -07:00
|
|
|
BC->StartFunctionAddress = Obj->getHeader()->e_entry;
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2016-02-08 10:02:48 -08:00
|
|
|
NextAvailableAddress = 0;
|
2016-02-12 19:01:53 -08:00
|
|
|
uint64_t NextAvailableOffset = 0;
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
auto PHs = cantFail(Obj->program_headers(), "program_headers() failed");
|
|
|
|
|
for (const auto &Phdr : PHs) {
|
2016-02-08 10:02:48 -08:00
|
|
|
if (Phdr.p_type == ELF::PT_LOAD) {
|
2018-10-02 17:16:26 -07:00
|
|
|
BC->FirstAllocAddress = std::min(BC->FirstAllocAddress,
|
|
|
|
|
static_cast<uint64_t>(Phdr.p_vaddr));
|
2016-02-08 10:02:48 -08:00
|
|
|
NextAvailableAddress = std::max(NextAvailableAddress,
|
|
|
|
|
Phdr.p_vaddr + Phdr.p_memsz);
|
2016-02-12 19:01:53 -08:00
|
|
|
NextAvailableOffset = std::max(NextAvailableOffset,
|
|
|
|
|
Phdr.p_offset + Phdr.p_filesz);
|
2017-01-17 15:49:59 -08:00
|
|
|
|
2020-06-26 16:52:07 -07:00
|
|
|
BC->SegmentMapInfo[Phdr.p_vaddr] = SegmentInfo{Phdr.p_vaddr,
|
|
|
|
|
Phdr.p_memsz,
|
|
|
|
|
Phdr.p_offset,
|
|
|
|
|
Phdr.p_filesz,
|
|
|
|
|
Phdr.p_align};
|
2016-02-08 10:02:48 -08:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
for (const auto &Section : InputFile->sections()) {
|
|
|
|
|
StringRef SectionName;
|
|
|
|
|
Section.getName(SectionName);
|
|
|
|
|
if (SectionName == ".text") {
|
2017-09-20 10:43:01 -07:00
|
|
|
BC->OldTextSectionAddress = Section.getAddress();
|
|
|
|
|
BC->OldTextSectionSize = Section.getSize();
|
2020-02-24 17:12:41 -08:00
|
|
|
|
|
|
|
|
StringRef SectionContents;
|
|
|
|
|
Section.getContents(SectionContents);
|
2017-09-20 10:43:01 -07:00
|
|
|
BC->OldTextSectionOffset =
|
2016-09-27 19:09:38 -07:00
|
|
|
SectionContents.data() - InputFile->getData().data();
|
2017-02-07 15:31:14 -08:00
|
|
|
}
|
|
|
|
|
|
2020-07-16 17:35:55 -07:00
|
|
|
if (!opts::HeatmapMode &&
|
2019-04-12 17:33:46 -07:00
|
|
|
!(opts::AggregateOnly && BAT->enabledFor(InputFile)) &&
|
2020-03-11 15:51:32 -07:00
|
|
|
(SectionName.startswith(getOrgSecPrefix()) ||
|
|
|
|
|
SectionName == getBOLTTextSectionName())) {
|
2017-02-07 15:31:14 -08:00
|
|
|
errs() << "BOLT-ERROR: input file was processed by BOLT. "
|
|
|
|
|
"Cannot re-optimize.\n";
|
|
|
|
|
exit(1);
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-02-12 19:01:53 -08:00
|
|
|
assert(NextAvailableAddress && NextAvailableOffset &&
|
|
|
|
|
"no PT_LOAD pheader seen");
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2016-09-02 14:15:29 -07:00
|
|
|
outs() << "BOLT-INFO: first alloc address is 0x"
|
2018-10-02 17:16:26 -07:00
|
|
|
<< Twine::utohexstr(BC->FirstAllocAddress) << '\n';
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2016-02-12 19:01:53 -08:00
|
|
|
FirstNonAllocatableOffset = NextAvailableOffset;
|
|
|
|
|
|
2018-09-24 20:58:31 -07:00
|
|
|
NextAvailableAddress = alignTo(NextAvailableAddress, BC->PageAlign);
|
|
|
|
|
NextAvailableOffset = alignTo(NextAvailableOffset, BC->PageAlign);
|
2016-02-12 19:01:53 -08:00
|
|
|
|
|
|
|
|
if (!opts::UseGnuStack) {
|
|
|
|
|
// This is where the black magic happens. Creating PHDR table in a segment
|
|
|
|
|
// other than that containing ELF header is tricky. Some loaders and/or
|
|
|
|
|
// parts of loaders will apply e_phoff from ELF header assuming both are in
|
|
|
|
|
// the same segment, while others will do the proper calculation.
|
|
|
|
|
// We create the new PHDR table in such a way that both of the methods
|
|
|
|
|
// of loading and locating the table work. There's a slight file size
|
|
|
|
|
// overhead because of that.
|
2016-03-03 10:13:11 -08:00
|
|
|
//
|
|
|
|
|
// NB: bfd's strip command cannot do the above and will corrupt the
|
|
|
|
|
// binary during the process of stripping non-allocatable sections.
|
2018-10-02 17:16:26 -07:00
|
|
|
if (NextAvailableOffset <= NextAvailableAddress - BC->FirstAllocAddress) {
|
|
|
|
|
NextAvailableOffset = NextAvailableAddress - BC->FirstAllocAddress;
|
2016-02-12 19:01:53 -08:00
|
|
|
} else {
|
2018-10-02 17:16:26 -07:00
|
|
|
NextAvailableAddress = NextAvailableOffset + BC->FirstAllocAddress;
|
2016-02-12 19:01:53 -08:00
|
|
|
}
|
2018-10-02 17:16:26 -07:00
|
|
|
assert(NextAvailableOffset == NextAvailableAddress - BC->FirstAllocAddress
|
|
|
|
|
&& "PHDR table address calculation error");
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2016-09-02 14:15:29 -07:00
|
|
|
outs() << "BOLT-INFO: creating new program header table at address 0x"
|
2016-02-12 19:01:53 -08:00
|
|
|
<< Twine::utohexstr(NextAvailableAddress) << ", offset 0x"
|
|
|
|
|
<< Twine::utohexstr(NextAvailableOffset) << '\n';
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2016-02-12 19:01:53 -08:00
|
|
|
PHDRTableAddress = NextAvailableAddress;
|
|
|
|
|
PHDRTableOffset = NextAvailableOffset;
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2016-02-12 19:01:53 -08:00
|
|
|
// Reserve space for 3 extra pheaders.
|
|
|
|
|
unsigned Phnum = Obj->getHeader()->e_phnum;
|
|
|
|
|
Phnum += 3;
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2016-02-12 19:01:53 -08:00
|
|
|
NextAvailableAddress += Phnum * sizeof(ELFFile<ELF64LE>::Elf_Phdr);
|
|
|
|
|
NextAvailableOffset += Phnum * sizeof(ELFFile<ELF64LE>::Elf_Phdr);
|
|
|
|
|
}
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2016-02-12 19:01:53 -08:00
|
|
|
// Align at cache line.
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
NextAvailableAddress = alignTo(NextAvailableAddress, 64);
|
|
|
|
|
NextAvailableOffset = alignTo(NextAvailableOffset, 64);
|
2016-02-08 10:02:48 -08:00
|
|
|
|
|
|
|
|
NewTextSegmentAddress = NextAvailableAddress;
|
|
|
|
|
NewTextSegmentOffset = NextAvailableOffset;
|
2017-08-31 11:45:37 -07:00
|
|
|
BC->LayoutStartAddress = NextAvailableAddress;
|
2020-06-26 16:52:07 -07:00
|
|
|
|
|
|
|
|
// Tools such as objcopy can strip section contents but leave header
|
|
|
|
|
// entries. Check that at least .text is mapped in the file.
|
|
|
|
|
if (!getFileOffsetForAddress(BC->OldTextSectionAddress)) {
|
|
|
|
|
errs() << "BOLT-ERROR: input binary is not a valid ELF executable as its "
|
|
|
|
|
"text section is not mapped to a valid segment\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
2016-02-08 10:02:48 -08:00
|
|
|
}
|
|
|
|
|
|
2019-05-15 17:19:18 -07:00
|
|
|
void RewriteInstance::parseSDTNotes() {
|
|
|
|
|
if (!SDTSection)
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
StringRef Buf = SDTSection->getContents();
|
|
|
|
|
auto DE = DataExtractor(Buf, BC->AsmInfo->isLittleEndian(),
|
|
|
|
|
BC->AsmInfo->getCodePointerSize());
|
|
|
|
|
uint32_t Offset = 0;
|
|
|
|
|
|
|
|
|
|
while (DE.isValidOffset(Offset)) {
|
|
|
|
|
auto NameSz = DE.getU32(&Offset);
|
|
|
|
|
DE.getU32(&Offset); // skip over DescSz
|
|
|
|
|
auto Type = DE.getU32(&Offset);
|
|
|
|
|
Offset = alignTo(Offset, 4);
|
|
|
|
|
|
|
|
|
|
if (Type != 3)
|
|
|
|
|
errs() << "BOLT-WARNING: SDT note type \"" << Type
|
|
|
|
|
<< "\" is not expected\n";
|
|
|
|
|
|
|
|
|
|
if (NameSz == 0)
|
|
|
|
|
errs() << "BOLT-WARNING: SDT note has empty name\n";
|
|
|
|
|
|
|
|
|
|
StringRef Name = DE.getCStr(&Offset);
|
|
|
|
|
|
|
|
|
|
if (!Name.equals("stapsdt"))
|
|
|
|
|
errs() << "BOLT-WARNING: SDT note name \"" << Name
|
|
|
|
|
<< "\" is not expected\n";
|
|
|
|
|
|
|
|
|
|
// Parse description
|
|
|
|
|
SDTMarkerInfo Marker;
|
2019-05-17 07:58:27 -07:00
|
|
|
Marker.PCOffset = Offset;
|
2019-05-15 17:19:18 -07:00
|
|
|
Marker.PC = DE.getU64(&Offset);
|
|
|
|
|
Marker.Base = DE.getU64(&Offset);
|
|
|
|
|
Marker.Semaphore = DE.getU64(&Offset);
|
|
|
|
|
Marker.Provider = DE.getCStr(&Offset);
|
|
|
|
|
Marker.Name = DE.getCStr(&Offset);
|
|
|
|
|
Marker.Args = DE.getCStr(&Offset);
|
|
|
|
|
Offset = alignTo(Offset, 4);
|
2019-05-16 12:46:32 -07:00
|
|
|
BC->SDTMarkers[Marker.PC] = Marker;
|
2019-05-15 17:19:18 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (opts::PrintSDTMarkers)
|
|
|
|
|
printSDTMarkers();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void RewriteInstance::printSDTMarkers() {
|
|
|
|
|
outs() << "BOLT-INFO: Number of SDT markers is " << BC->SDTMarkers.size()
|
|
|
|
|
<< "\n";
|
2019-05-16 12:46:32 -07:00
|
|
|
for (auto It : BC->SDTMarkers) {
|
|
|
|
|
auto &Marker = It.second;
|
2019-05-15 17:19:18 -07:00
|
|
|
outs() << "BOLT-INFO: PC: " << utohexstr(Marker.PC)
|
|
|
|
|
<< ", Base: " << utohexstr(Marker.Base)
|
|
|
|
|
<< ", Semaphore: " << utohexstr(Marker.Semaphore)
|
|
|
|
|
<< ", Provider: " << Marker.Provider << ", Name: " << Marker.Name
|
|
|
|
|
<< ", Args: " << Marker.Args << "\n";
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2018-08-08 17:55:24 -07:00
|
|
|
void RewriteInstance::parseBuildID() {
|
|
|
|
|
if (!BuildIDSection)
|
|
|
|
|
return;
|
2017-10-06 14:42:46 -07:00
|
|
|
|
2018-08-08 17:55:24 -07:00
|
|
|
StringRef Buf = BuildIDSection->getContents();
|
2017-10-06 14:42:46 -07:00
|
|
|
|
2018-08-08 17:55:24 -07:00
|
|
|
// Reading notes section (see Portable Formats Specification, Version 1.1,
|
|
|
|
|
// pg 2-5, section "Note Section").
|
|
|
|
|
DataExtractor DE = DataExtractor(Buf, true, 8);
|
|
|
|
|
uint32_t Offset = 0;
|
|
|
|
|
if (!DE.isValidOffset(Offset))
|
|
|
|
|
return;
|
|
|
|
|
uint32_t NameSz = DE.getU32(&Offset);
|
|
|
|
|
if (!DE.isValidOffset(Offset))
|
|
|
|
|
return;
|
|
|
|
|
uint32_t DescSz = DE.getU32(&Offset);
|
|
|
|
|
if (!DE.isValidOffset(Offset))
|
|
|
|
|
return;
|
|
|
|
|
uint32_t Type = DE.getU32(&Offset);
|
2017-10-06 14:42:46 -07:00
|
|
|
|
2018-08-08 17:55:24 -07:00
|
|
|
DEBUG(dbgs() << "NameSz = " << NameSz << "; DescSz = " << DescSz
|
|
|
|
|
<< "; Type = " << Type << "\n");
|
|
|
|
|
|
|
|
|
|
// Type 3 is a GNU build-id note section
|
|
|
|
|
if (Type != 3)
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
StringRef Name = Buf.slice(Offset, Offset + NameSz);
|
|
|
|
|
Offset = alignTo(Offset + NameSz, 4);
|
|
|
|
|
if (Name.substr(0, 3) != "GNU")
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
BuildID = Buf.slice(Offset, Offset + DescSz);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
Optional<std::string> RewriteInstance::getPrintableBuildID() const {
|
|
|
|
|
if (BuildID.empty())
|
|
|
|
|
return NoneType();
|
|
|
|
|
|
|
|
|
|
std::string Str;
|
|
|
|
|
raw_string_ostream OS(Str);
|
|
|
|
|
auto CharIter = BuildID.bytes_begin();
|
|
|
|
|
while (CharIter != BuildID.bytes_end()) {
|
|
|
|
|
if (*CharIter < 0x10)
|
|
|
|
|
OS << "0";
|
|
|
|
|
OS << Twine::utohexstr(*CharIter);
|
|
|
|
|
++CharIter;
|
|
|
|
|
}
|
|
|
|
|
return OS.str();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void RewriteInstance::patchBuildID() {
|
|
|
|
|
auto &OS = Out->os();
|
|
|
|
|
|
|
|
|
|
if (BuildID.empty())
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
size_t IDOffset = BuildIDSection->getContents().rfind(BuildID);
|
|
|
|
|
assert(IDOffset != StringRef::npos && "failed to patch build-id");
|
|
|
|
|
|
|
|
|
|
auto FileOffset = getFileOffsetForAddress(BuildIDSection->getAddress());
|
|
|
|
|
if (!FileOffset) {
|
|
|
|
|
errs() << "BOLT-WARNING: Non-allocatable build-id will not be updated.\n";
|
|
|
|
|
return;
|
2017-10-06 14:42:46 -07:00
|
|
|
}
|
2018-08-08 17:55:24 -07:00
|
|
|
|
|
|
|
|
char LastIDByte = BuildID[BuildID.size() - 1];
|
|
|
|
|
LastIDByte ^= 1;
|
|
|
|
|
OS.pwrite(&LastIDByte, 1, FileOffset + IDOffset + BuildID.size() - 1);
|
|
|
|
|
|
|
|
|
|
outs() << "BOLT-INFO: patched build-id (flipped last bit)\n";
|
2017-10-06 14:42:46 -07:00
|
|
|
}
|
|
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
void RewriteInstance::run() {
|
|
|
|
|
if (!BC) {
|
2016-09-02 14:15:29 -07:00
|
|
|
errs() << "BOLT-ERROR: failed to create a binary context\n";
|
2015-11-23 17:54:18 -08:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
[BOLT][non-reloc] Change function splitting in non-relocation mode
Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.
For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.
After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.
One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.
Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.
As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.
New semantics for `-split-functions=<n>`:
-split-functions - split functions into hot and cold regions
=0 - do not split any function
=1 - in non-relocation mode only split functions too large to fit
into original code space
=2 - same as 1 (backwards compatibility)
=3 - split all functions
(cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
|
|
|
outs() << "BOLT-INFO: Target architecture: "
|
|
|
|
|
<< Triple::getArchTypeName(
|
|
|
|
|
(llvm::Triple::ArchType)InputFile->getArch())
|
|
|
|
|
<< "\n";
|
2019-06-11 13:24:10 -07:00
|
|
|
|
[BOLT][non-reloc] Change function splitting in non-relocation mode
Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.
For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.
After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.
One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.
Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.
As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.
New semantics for `-split-functions=<n>`:
-split-functions - split functions into hot and cold regions
=0 - do not split any function
=1 - in non-relocation mode only split functions too large to fit
into original code space
=2 - same as 1 (backwards compatibility)
=3 - split all functions
(cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
|
|
|
discoverStorage();
|
|
|
|
|
readSpecialSections();
|
|
|
|
|
adjustCommandLineOptions();
|
|
|
|
|
discoverFileObjects();
|
2019-08-19 17:11:42 -07:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
// Skip disassembling if we have a translation table and we are running an
|
|
|
|
|
// aggregation job.
|
|
|
|
|
if (opts::AggregateOnly && BAT->enabledFor(InputFile)) {
|
[BOLT][non-reloc] Change function splitting in non-relocation mode
Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.
For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.
After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.
One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.
Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.
As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.
New semantics for `-split-functions=<n>`:
-split-functions - split functions into hot and cold regions
=0 - do not split any function
=1 - in non-relocation mode only split functions too large to fit
into original code space
=2 - same as 1 (backwards compatibility)
=3 - split all functions
(cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
|
|
|
preprocessProfileData();
|
2020-05-07 23:00:29 -07:00
|
|
|
processProfileData();
|
|
|
|
|
return;
|
|
|
|
|
}
|
2019-06-11 13:24:10 -07:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
preprocessProfileData();
|
2019-06-11 13:24:10 -07:00
|
|
|
|
2020-05-03 13:54:45 -07:00
|
|
|
selectFunctionsToProcess();
|
|
|
|
|
|
[BOLT][non-reloc] Change function splitting in non-relocation mode
Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.
For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.
After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.
One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.
Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.
As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.
New semantics for `-split-functions=<n>`:
-split-functions - split functions into hot and cold regions
=0 - do not split any function
=1 - in non-relocation mode only split functions too large to fit
into original code space
=2 - same as 1 (backwards compatibility)
=3 - split all functions
(cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
|
|
|
readDebugInfo();
|
2019-06-11 13:24:10 -07:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
disassembleFunctions();
|
2019-06-11 13:24:10 -07:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
processProfileDataPreCFG();
|
2019-06-11 13:24:10 -07:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
buildFunctionsCFG();
|
2019-06-11 13:24:10 -07:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
processProfileData();
|
2016-04-11 17:46:18 -07:00
|
|
|
|
[BOLT][non-reloc] Change function splitting in non-relocation mode
Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.
For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.
After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.
One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.
Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.
As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.
New semantics for `-split-functions=<n>`:
-split-functions - split functions into hot and cold regions
=0 - do not split any function
=1 - in non-relocation mode only split functions too large to fit
into original code space
=2 - same as 1 (backwards compatibility)
=3 - split all functions
(cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
|
|
|
postProcessFunctions();
|
2017-07-25 09:11:42 -07:00
|
|
|
|
[BOLT][non-reloc] Change function splitting in non-relocation mode
Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.
For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.
After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.
One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.
Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.
As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.
New semantics for `-split-functions=<n>`:
-split-functions - split functions into hot and cold regions
=0 - do not split any function
=1 - in non-relocation mode only split functions too large to fit
into original code space
=2 - same as 1 (backwards compatibility)
=3 - split all functions
(cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
|
|
|
if (opts::DiffOnly)
|
2017-09-01 18:13:51 -07:00
|
|
|
return;
|
2016-03-31 16:38:49 -07:00
|
|
|
|
[BOLT][non-reloc] Change function splitting in non-relocation mode
Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.
For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.
After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.
One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.
Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.
As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.
New semantics for `-split-functions=<n>`:
-split-functions - split functions into hot and cold regions
=0 - do not split any function
=1 - in non-relocation mode only split functions too large to fit
into original code space
=2 - same as 1 (backwards compatibility)
=3 - split all functions
(cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
|
|
|
runOptimizationPasses();
|
2016-03-31 16:38:49 -07:00
|
|
|
|
[BOLT][non-reloc] Change function splitting in non-relocation mode
Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.
For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.
After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.
One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.
Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.
As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.
New semantics for `-split-functions=<n>`:
-split-functions - split functions into hot and cold regions
=0 - do not split any function
=1 - in non-relocation mode only split functions too large to fit
into original code space
=2 - same as 1 (backwards compatibility)
=3 - split all functions
(cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
|
|
|
emitAndLink();
|
2016-04-11 17:46:18 -07:00
|
|
|
|
2019-11-03 21:57:15 -08:00
|
|
|
updateMetadata();
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2016-03-03 10:13:11 -08:00
|
|
|
// Rewrite allocatable contents and copy non-allocatable parts with mods.
|
2015-11-23 17:54:18 -08:00
|
|
|
rewriteFile();
|
|
|
|
|
}
|
|
|
|
|
|
2016-03-11 11:09:34 -08:00
|
|
|
void RewriteInstance::discoverFileObjects() {
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
NamedRegionTimer T("discoverFileObjects", "discover file objects",
|
|
|
|
|
TimerGroupName, TimerGroupDesc, opts::TimeRewrite);
|
2015-11-23 17:54:18 -08:00
|
|
|
FileSymRefs.clear();
|
2019-04-03 15:52:01 -07:00
|
|
|
BC->getBinaryFunctions().clear();
|
2017-11-14 20:05:11 -08:00
|
|
|
BC->clearBinaryData();
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2016-09-29 11:19:06 -07:00
|
|
|
// For local symbols we want to keep track of associated FILE symbol name for
|
|
|
|
|
// disambiguation by combined name.
|
|
|
|
|
StringRef FileSymbolName;
|
|
|
|
|
bool SeenFileName = false;
|
|
|
|
|
struct SymbolRefHash {
|
2020-04-07 00:21:37 -07:00
|
|
|
size_t operator()(SymbolRef const &S) const {
|
2016-09-29 11:19:06 -07:00
|
|
|
return std::hash<decltype(DataRefImpl::p)>{}(S.getRawDataRefImpl().p);
|
|
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
std::unordered_map<SymbolRef, StringRef, SymbolRefHash> SymbolToFileName;
|
|
|
|
|
for (const auto &Symbol : InputFile->symbols()) {
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
auto NameOrError = Symbol.getName();
|
2017-02-07 15:56:00 -08:00
|
|
|
if (NameOrError && NameOrError->startswith("__asan_init")) {
|
|
|
|
|
errs() << "BOLT-ERROR: input file was compiled or linked with sanitizer "
|
|
|
|
|
"support. Cannot optimize.\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
2017-03-31 07:51:30 -07:00
|
|
|
if (NameOrError && NameOrError->startswith("__llvm_coverage_mapping")) {
|
|
|
|
|
errs() << "BOLT-ERROR: input file was compiled or linked with coverage "
|
|
|
|
|
"support. Cannot optimize.\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
2017-02-07 15:56:00 -08:00
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
if (Symbol.getFlags() & SymbolRef::SF_Undefined)
|
|
|
|
|
continue;
|
|
|
|
|
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
if (cantFail(Symbol.getType()) == SymbolRef::ST_File) {
|
|
|
|
|
auto Name =
|
|
|
|
|
cantFail(std::move(NameOrError), "cannot get symbol name for file");
|
2017-09-25 18:05:37 -07:00
|
|
|
// Ignore Clang LTO artificial FILE symbol as it is not always generated,
|
|
|
|
|
// and this uncertainty is causing havoc in function name matching.
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
if (Name == "ld-temp.o")
|
2017-09-25 18:05:37 -07:00
|
|
|
continue;
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
FileSymbolName = Name;
|
2016-07-11 18:51:13 -07:00
|
|
|
SeenFileName = true;
|
2015-11-23 17:54:18 -08:00
|
|
|
continue;
|
|
|
|
|
}
|
2016-09-29 11:19:06 -07:00
|
|
|
if (!FileSymbolName.empty() &&
|
|
|
|
|
!(Symbol.getFlags() & SymbolRef::SF_Global)) {
|
|
|
|
|
SymbolToFileName[Symbol] = FileSymbolName;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2018-09-05 14:36:52 -07:00
|
|
|
// Sort symbols in the file by value. Ignore symbols from non-allocatable
|
|
|
|
|
// sections.
|
|
|
|
|
auto isSymbolInMemory = [this](const SymbolRef &Sym) {
|
|
|
|
|
if (cantFail(Sym.getType()) == SymbolRef::ST_File)
|
|
|
|
|
return false;
|
|
|
|
|
if (Sym.getFlags() & SymbolRef::SF_Absolute)
|
|
|
|
|
return true;
|
|
|
|
|
if (Sym.getFlags() & SymbolRef::SF_Undefined)
|
|
|
|
|
return false;
|
|
|
|
|
BinarySection Section(*BC, *cantFail(Sym.getSection()));
|
|
|
|
|
return Section.isAllocatable();
|
|
|
|
|
};
|
|
|
|
|
std::vector<SymbolRef> SortedFileSymbols;
|
|
|
|
|
std::copy_if(InputFile->symbol_begin(), InputFile->symbol_end(),
|
|
|
|
|
std::back_inserter(SortedFileSymbols),
|
|
|
|
|
isSymbolInMemory);
|
|
|
|
|
|
2016-09-29 11:19:06 -07:00
|
|
|
std::stable_sort(SortedFileSymbols.begin(), SortedFileSymbols.end(),
|
|
|
|
|
[](const SymbolRef &A, const SymbolRef &B) {
|
[BOLT] Basic support for split functions
Summary:
This adds very basic and limited support for split functions.
In non-relocation mode, split functions are ignored, while their debug
info is properly updated. No support in the relocation mode yet.
Split functions consist of a main body and one or more fragments.
For fragments, the main part is called their parent. Any fragment
could only be entered via its parent or another fragment.
The short-term goal is to correctly update debug information for split
functions, while the long-term goal is to have a complete support
including full optimization. Note that if we don't detect split
bodies, we would have to add multiple entry points via tail calls,
which we would rather avoid.
Parent functions and fragments are represented by a `BinaryFunction`
and are marked accordingly. For now they are marked as non-simple, and
thus only supported in non-relocation mode. Once we start building a
CFG, it should be a common graph (i.e. the one that includes all
fragments) in the parent function.
The function discovery is unchanged, except for the detection of
`\.cold\.` pattern in the function name, which automatically marks the
function as a fragment of another function.
Because of the local function name ambiguity, we cannot rely on the
function name to establish child fragment and parent relationship.
Instead we rely on disassembly processing.
`BinaryContext::getBinaryFunctionContainingAddress()` now returns a
parent function if an address from its fragment is passed.
There's no jump table support at the moment. Jump tables can have
source and destinations in both fragment and parent.
Parent functions that enter their fragments via C++ exception handling
mechanism are not yet supported.
(cherry picked from FBD14970569)
2019-04-16 10:24:34 -07:00
|
|
|
// FUNC symbols have the highest precedence, while SECTIONs
|
|
|
|
|
// have the lowest.
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
auto AddressA = cantFail(A.getAddress());
|
|
|
|
|
auto AddressB = cantFail(B.getAddress());
|
[BOLT] Basic support for split functions
Summary:
This adds very basic and limited support for split functions.
In non-relocation mode, split functions are ignored, while their debug
info is properly updated. No support in the relocation mode yet.
Split functions consist of a main body and one or more fragments.
For fragments, the main part is called their parent. Any fragment
could only be entered via its parent or another fragment.
The short-term goal is to correctly update debug information for split
functions, while the long-term goal is to have a complete support
including full optimization. Note that if we don't detect split
bodies, we would have to add multiple entry points via tail calls,
which we would rather avoid.
Parent functions and fragments are represented by a `BinaryFunction`
and are marked accordingly. For now they are marked as non-simple, and
thus only supported in non-relocation mode. Once we start building a
CFG, it should be a common graph (i.e. the one that includes all
fragments) in the parent function.
The function discovery is unchanged, except for the detection of
`\.cold\.` pattern in the function name, which automatically marks the
function as a fragment of another function.
Because of the local function name ambiguity, we cannot rely on the
function name to establish child fragment and parent relationship.
Instead we rely on disassembly processing.
`BinaryContext::getBinaryFunctionContainingAddress()` now returns a
parent function if an address from its fragment is passed.
There's no jump table support at the moment. Jump tables can have
source and destinations in both fragment and parent.
Parent functions that enter their fragments via C++ exception handling
mechanism are not yet supported.
(cherry picked from FBD14970569)
2019-04-16 10:24:34 -07:00
|
|
|
if (AddressA != AddressB)
|
|
|
|
|
return AddressA < AddressB;
|
|
|
|
|
|
|
|
|
|
auto AType = cantFail(A.getType());
|
|
|
|
|
auto BType = cantFail(B.getType());
|
|
|
|
|
if (AType == SymbolRef::ST_Function &&
|
|
|
|
|
BType != SymbolRef::ST_Function)
|
|
|
|
|
return true;
|
|
|
|
|
if (BType == SymbolRef::ST_Debug &&
|
|
|
|
|
AType != SymbolRef::ST_Debug)
|
|
|
|
|
return true;
|
|
|
|
|
|
|
|
|
|
return false;
|
2016-09-29 11:19:06 -07:00
|
|
|
});
|
|
|
|
|
|
2017-11-22 16:17:36 -08:00
|
|
|
// For aarch64, the ABI defines mapping symbols so we identify data in the
|
|
|
|
|
// code section (see IHI0056B). $d identifies data contents.
|
2019-10-08 11:03:33 -07:00
|
|
|
auto LastSymbol = SortedFileSymbols.end() - 1;
|
2018-03-20 14:34:58 -07:00
|
|
|
if (BC->isAArch64()) {
|
2019-10-08 11:03:33 -07:00
|
|
|
LastSymbol = std::stable_partition(
|
2017-11-22 16:17:36 -08:00
|
|
|
SortedFileSymbols.begin(), SortedFileSymbols.end(),
|
|
|
|
|
[](const SymbolRef &Symbol) {
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
StringRef Name = cantFail(Symbol.getName());
|
|
|
|
|
return !(cantFail(Symbol.getType()) == SymbolRef::ST_Unknown &&
|
|
|
|
|
(Name == "$d" || Name == "$x"));
|
2017-11-22 16:17:36 -08:00
|
|
|
});
|
2019-10-08 11:03:33 -07:00
|
|
|
--LastSymbol;
|
2017-11-22 16:17:36 -08:00
|
|
|
}
|
|
|
|
|
|
2017-11-14 20:05:11 -08:00
|
|
|
auto getNextAddress = [&](std::vector<SymbolRef>::const_iterator Itr) {
|
2019-10-08 11:03:33 -07:00
|
|
|
const auto SymbolSection = cantFail(Itr->getSection());
|
|
|
|
|
const auto SymbolAddress = cantFail(Itr->getAddress());
|
|
|
|
|
const auto SymbolEndAddress = SymbolAddress + ELFSymbolRef(*Itr).getSize();
|
2017-11-14 20:05:11 -08:00
|
|
|
|
|
|
|
|
// absolute sym
|
2019-10-08 11:03:33 -07:00
|
|
|
if (SymbolSection == InputFile->section_end())
|
2017-11-14 20:05:11 -08:00
|
|
|
return SymbolEndAddress;
|
|
|
|
|
|
2019-10-08 11:03:33 -07:00
|
|
|
while (Itr != LastSymbol &&
|
|
|
|
|
cantFail(std::next(Itr)->getSection()) == SymbolSection &&
|
|
|
|
|
cantFail(std::next(Itr)->getAddress()) == SymbolAddress) {
|
2017-11-14 20:05:11 -08:00
|
|
|
++Itr;
|
|
|
|
|
}
|
|
|
|
|
|
2019-10-08 11:03:33 -07:00
|
|
|
if (Itr != LastSymbol &&
|
|
|
|
|
cantFail(std::next(Itr)->getSection()) == SymbolSection)
|
2017-11-14 20:05:11 -08:00
|
|
|
return cantFail(std::next(Itr)->getAddress());
|
|
|
|
|
|
2019-10-08 11:03:33 -07:00
|
|
|
const auto SymbolSectionEndAddress =
|
|
|
|
|
SymbolSection->getAddress() + SymbolSection->getSize();
|
|
|
|
|
if ((ELFSectionRef(*SymbolSection).getFlags() & ELF::SHF_TLS) ||
|
|
|
|
|
SymbolEndAddress > SymbolSectionEndAddress)
|
2017-11-14 20:05:11 -08:00
|
|
|
return SymbolEndAddress;
|
|
|
|
|
|
2019-10-08 11:03:33 -07:00
|
|
|
return SymbolSectionEndAddress;
|
2017-11-14 20:05:11 -08:00
|
|
|
};
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
BinaryFunction *PreviousFunction = nullptr;
|
2017-11-14 20:05:11 -08:00
|
|
|
unsigned AnonymousId = 0;
|
|
|
|
|
|
2019-10-08 11:03:33 -07:00
|
|
|
const auto MarkersBegin = std::next(LastSymbol);
|
2017-11-22 16:17:36 -08:00
|
|
|
for (auto ISym = SortedFileSymbols.begin(); ISym != MarkersBegin; ++ISym) {
|
|
|
|
|
const auto &Symbol = *ISym;
|
2016-09-29 11:19:06 -07:00
|
|
|
// Keep undefined symbols for pretty printing?
|
|
|
|
|
if (Symbol.getFlags() & SymbolRef::SF_Undefined)
|
|
|
|
|
continue;
|
|
|
|
|
|
2019-10-08 11:03:33 -07:00
|
|
|
const auto SymbolType = cantFail(Symbol.getType());
|
|
|
|
|
|
|
|
|
|
if (SymbolType == SymbolRef::ST_File)
|
2016-09-29 11:19:06 -07:00
|
|
|
continue;
|
|
|
|
|
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
StringRef SymName = cantFail(Symbol.getName(), "cannot get symbol name");
|
|
|
|
|
uint64_t Address =
|
|
|
|
|
cantFail(Symbol.getAddress(), "cannot get symbol address");
|
2015-11-23 17:54:18 -08:00
|
|
|
if (Address == 0) {
|
2019-10-08 11:03:33 -07:00
|
|
|
if (opts::Verbosity >= 1 && SymbolType == SymbolRef::ST_Function)
|
2016-02-05 14:42:04 -08:00
|
|
|
errs() << "BOLT-WARNING: function with 0 address seen\n";
|
2015-11-23 17:54:18 -08:00
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
2017-11-22 16:17:36 -08:00
|
|
|
FileSymRefs[Address] = Symbol;
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2016-07-11 18:51:13 -07:00
|
|
|
/// It is possible we are seeing a globalized local. LLVM might treat it as
|
|
|
|
|
/// a local if it has a "private global" prefix, e.g. ".L". Thus we have to
|
|
|
|
|
/// change the prefix to enforce global scope of the symbol.
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
std::string Name = SymName.startswith(BC->AsmInfo->getPrivateGlobalPrefix())
|
|
|
|
|
? "PG" + std::string(SymName)
|
|
|
|
|
: std::string(SymName);
|
2016-07-11 18:51:13 -07:00
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
// Disambiguate all local symbols before adding to symbol table.
|
2016-07-11 18:51:13 -07:00
|
|
|
// Since we don't know if we will see a global with the same name,
|
2015-11-23 17:54:18 -08:00
|
|
|
// always modify the local name.
|
2016-07-11 18:51:13 -07:00
|
|
|
//
|
|
|
|
|
// NOTE: the naming convention for local symbols should match
|
|
|
|
|
// the one we use for profile data.
|
2015-11-23 17:54:18 -08:00
|
|
|
std::string UniqueName;
|
2016-07-11 18:51:13 -07:00
|
|
|
std::string AlternativeName;
|
2017-11-14 20:05:11 -08:00
|
|
|
if (Name.empty()) {
|
2019-08-26 15:03:38 -07:00
|
|
|
// Symbols that will be registered by disassemblePLT()
|
|
|
|
|
if ((PLTSection && PLTSection->getAddress() == Address) ||
|
|
|
|
|
(PLTGOTSection && PLTGOTSection->getAddress() == Address)) {
|
2017-11-14 20:05:11 -08:00
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
UniqueName = "ANONYMOUS." + std::to_string(AnonymousId++);
|
|
|
|
|
} else if (Symbol.getFlags() & SymbolRef::SF_Global) {
|
|
|
|
|
assert(!BC->getBinaryDataByName(Name) && "global name not unique");
|
2016-07-11 18:51:13 -07:00
|
|
|
UniqueName = Name;
|
2015-11-23 17:54:18 -08:00
|
|
|
} else {
|
2016-07-11 18:51:13 -07:00
|
|
|
// If we have a local file name, we should create 2 variants for the
|
|
|
|
|
// function name. The reason is that perf profile might have been
|
|
|
|
|
// collected on a binary that did not have the local file name (e.g. as
|
|
|
|
|
// a side effect of stripping debug info from the binary):
|
|
|
|
|
//
|
|
|
|
|
// primary: <function>/<id>
|
|
|
|
|
// alternative: <function>/<file>/<id2>
|
|
|
|
|
//
|
|
|
|
|
// The <id> field is used for disambiguation of local symbols since there
|
|
|
|
|
// could be identical function names coming from identical file names
|
|
|
|
|
// (e.g. from different directories).
|
|
|
|
|
std::string AltPrefix;
|
2016-09-29 11:19:06 -07:00
|
|
|
auto SFI = SymbolToFileName.find(Symbol);
|
2019-10-08 11:03:33 -07:00
|
|
|
if (SymbolType == SymbolRef::ST_Function &&
|
|
|
|
|
SFI != SymbolToFileName.end()) {
|
|
|
|
|
AltPrefix = Name + "/" + std::string(SFI->second);
|
2016-09-29 11:19:06 -07:00
|
|
|
}
|
2016-07-11 18:51:13 -07:00
|
|
|
|
2020-02-17 14:37:46 -08:00
|
|
|
UniqueName = NR.uniquify(Name);
|
2016-07-11 18:51:13 -07:00
|
|
|
if (!AltPrefix.empty())
|
2020-02-17 14:37:46 -08:00
|
|
|
AlternativeName = NR.uniquify(AltPrefix);
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
|
|
|
|
|
2017-11-14 20:05:11 -08:00
|
|
|
uint64_t SymbolSize = ELFSymbolRef(Symbol).getSize();
|
2019-10-08 11:03:33 -07:00
|
|
|
uint64_t TentativeSize = SymbolSize ? SymbolSize
|
|
|
|
|
: getNextAddress(ISym) - Address;
|
2017-11-14 20:05:11 -08:00
|
|
|
uint64_t SymbolAlignment = Symbol.getAlignment();
|
2018-04-20 20:03:31 -07:00
|
|
|
unsigned SymbolFlags = Symbol.getFlags();
|
2017-11-14 20:05:11 -08:00
|
|
|
|
|
|
|
|
auto registerName = [&](uint64_t FinalSize) {
|
|
|
|
|
// Register names even if it's not a function, e.g. for an entry point.
|
2018-04-20 20:03:31 -07:00
|
|
|
BC->registerNameAtAddress(UniqueName, Address, FinalSize,
|
|
|
|
|
SymbolAlignment, SymbolFlags);
|
2017-11-14 20:05:11 -08:00
|
|
|
if (!AlternativeName.empty())
|
|
|
|
|
BC->registerNameAtAddress(AlternativeName, Address, FinalSize,
|
2018-04-20 20:03:31 -07:00
|
|
|
SymbolAlignment, SymbolFlags);
|
2017-11-14 20:05:11 -08:00
|
|
|
};
|
2015-11-23 17:54:18 -08:00
|
|
|
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
section_iterator Section =
|
|
|
|
|
cantFail(Symbol.getSection(), "cannot get symbol section");
|
2016-03-03 10:13:11 -08:00
|
|
|
if (Section == InputFile->section_end()) {
|
2015-11-23 17:54:18 -08:00
|
|
|
// Could be an absolute symbol. Could record for pretty printing.
|
2017-11-14 20:05:11 -08:00
|
|
|
DEBUG(if (opts::Verbosity > 1) {
|
|
|
|
|
dbgs() << "BOLT-INFO: absolute sym " << UniqueName << "\n";
|
|
|
|
|
});
|
|
|
|
|
registerName(TentativeSize);
|
2015-11-23 17:54:18 -08:00
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
2016-09-29 11:19:06 -07:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: considering symbol " << UniqueName
|
|
|
|
|
<< " for function\n");
|
|
|
|
|
|
|
|
|
|
if (!Section->isText()) {
|
2019-10-08 11:03:33 -07:00
|
|
|
assert(SymbolType != SymbolRef::ST_Function &&
|
2016-09-29 11:19:06 -07:00
|
|
|
"unexpected function inside non-code section");
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: rejecting as symbol is not in code\n");
|
2017-11-14 20:05:11 -08:00
|
|
|
registerName(TentativeSize);
|
2016-09-29 11:19:06 -07:00
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Assembly functions could be ST_NONE with 0 size. Check that the
|
|
|
|
|
// corresponding section is a code section and they are not inside any
|
|
|
|
|
// other known function to consider them.
|
|
|
|
|
//
|
|
|
|
|
// Sometimes assembly functions are not marked as functions and neither are
|
|
|
|
|
// their local labels. The only way to tell them apart is to look at
|
|
|
|
|
// symbol scope - global vs local.
|
2019-10-08 11:03:33 -07:00
|
|
|
if (PreviousFunction && SymbolType != SymbolRef::ST_Function) {
|
|
|
|
|
if (PreviousFunction->containsAddress(Address)) {
|
|
|
|
|
if (PreviousFunction->isSymbolValidInScope(Symbol, SymbolSize)) {
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: symbol is a function local symbol\n");
|
|
|
|
|
} else if (Address == PreviousFunction->getAddress() && !SymbolSize) {
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: ignoring symbol as a marker\n");
|
|
|
|
|
} else if (opts::Verbosity > 1) {
|
|
|
|
|
errs() << "BOLT-WARNING: symbol " << UniqueName
|
|
|
|
|
<< " seen in the middle of function "
|
|
|
|
|
<< *PreviousFunction << ". Could be a new entry.\n";
|
2016-09-29 11:19:06 -07:00
|
|
|
}
|
2019-10-08 11:03:33 -07:00
|
|
|
registerName(SymbolSize);
|
|
|
|
|
continue;
|
|
|
|
|
} else if (PreviousFunction->getSize() == 0 &&
|
|
|
|
|
PreviousFunction->isSymbolValidInScope(Symbol, SymbolSize)) {
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: symbol is a function local symbol\n");
|
|
|
|
|
registerName(SymbolSize);
|
|
|
|
|
continue;
|
2016-09-29 11:19:06 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (PreviousFunction &&
|
|
|
|
|
PreviousFunction->containsAddress(Address) &&
|
2016-09-27 19:09:38 -07:00
|
|
|
PreviousFunction->getAddress() != Address) {
|
|
|
|
|
if (PreviousFunction->isSymbolValidInScope(Symbol, SymbolSize)) {
|
|
|
|
|
if (opts::Verbosity >= 1) {
|
2020-06-22 16:16:08 -07:00
|
|
|
outs() << "BOLT-DEBUG: skipping possibly another entry for function "
|
2016-09-27 19:09:38 -07:00
|
|
|
<< *PreviousFunction << " : " << UniqueName << '\n';
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
outs() << "BOLT-INFO: using " << UniqueName << " as another entry to "
|
|
|
|
|
<< "function " << *PreviousFunction << '\n';
|
|
|
|
|
|
2020-06-22 16:16:08 -07:00
|
|
|
registerName(0);
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
PreviousFunction->
|
|
|
|
|
addEntryPointAtOffset(Address - PreviousFunction->getAddress());
|
|
|
|
|
|
|
|
|
|
// Remove the symbol from FileSymRefs so that we can skip it from
|
|
|
|
|
// in the future.
|
|
|
|
|
auto SI = FileSymRefs.find(Address);
|
|
|
|
|
assert(SI != FileSymRefs.end() && "symbol expected to be present");
|
|
|
|
|
assert(SI->second == Symbol && "wrong symbol found");
|
|
|
|
|
FileSymRefs.erase(SI);
|
2016-09-29 11:19:06 -07:00
|
|
|
}
|
2017-11-14 20:05:11 -08:00
|
|
|
registerName(SymbolSize);
|
2016-09-29 11:19:06 -07:00
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
2016-03-11 11:09:34 -08:00
|
|
|
// Checkout for conflicts with function data from FDEs.
|
|
|
|
|
bool IsSimple = true;
|
|
|
|
|
auto FDEI = CFIRdWrt->getFDEs().lower_bound(Address);
|
|
|
|
|
if (FDEI != CFIRdWrt->getFDEs().end()) {
|
2018-02-14 12:06:17 -08:00
|
|
|
const auto &FDE = *FDEI->second;
|
2016-03-11 11:09:34 -08:00
|
|
|
if (FDEI->first != Address) {
|
|
|
|
|
// There's no matching starting address in FDE. Make sure the previous
|
|
|
|
|
// FDE does not contain this address.
|
|
|
|
|
if (FDEI != CFIRdWrt->getFDEs().begin()) {
|
|
|
|
|
--FDEI;
|
|
|
|
|
auto &PrevFDE = *FDEI->second;
|
|
|
|
|
auto PrevStart = PrevFDE.getInitialLocation();
|
|
|
|
|
auto PrevLength = PrevFDE.getAddressRange();
|
2016-09-15 15:47:10 -07:00
|
|
|
if (Address > PrevStart && Address < PrevStart + PrevLength) {
|
2016-09-27 19:09:38 -07:00
|
|
|
errs() << "BOLT-ERROR: function " << UniqueName
|
|
|
|
|
<< " is in conflict with FDE ["
|
|
|
|
|
<< Twine::utohexstr(PrevStart) << ", "
|
|
|
|
|
<< Twine::utohexstr(PrevStart + PrevLength)
|
|
|
|
|
<< "). Skipping.\n";
|
2016-03-11 11:09:34 -08:00
|
|
|
IsSimple = false;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
} else if (FDE.getAddressRange() != SymbolSize) {
|
2016-09-15 15:47:10 -07:00
|
|
|
if (SymbolSize) {
|
|
|
|
|
// Function addresses match but sizes differ.
|
2017-06-02 18:41:31 -07:00
|
|
|
errs() << "BOLT-WARNING: sizes differ for function " << UniqueName
|
2016-09-27 19:09:38 -07:00
|
|
|
<< ". FDE : " << FDE.getAddressRange()
|
2017-06-02 18:41:31 -07:00
|
|
|
<< "; symbol table : " << SymbolSize << ". Using max size.\n";
|
2016-09-02 14:15:29 -07:00
|
|
|
}
|
2016-03-11 11:09:34 -08:00
|
|
|
SymbolSize = std::max(SymbolSize, FDE.getAddressRange());
|
2017-11-14 20:05:11 -08:00
|
|
|
if (BC->getBinaryDataAtAddress(Address)) {
|
|
|
|
|
BC->setBinaryDataSize(Address, SymbolSize);
|
|
|
|
|
} else {
|
2018-02-14 12:06:17 -08:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: No BD @ 0x"
|
|
|
|
|
<< Twine::utohexstr(Address) << "\n");
|
2017-11-14 20:05:11 -08:00
|
|
|
}
|
2016-03-11 11:09:34 -08:00
|
|
|
}
|
|
|
|
|
}
|
2019-04-16 14:35:29 -07:00
|
|
|
|
2016-08-11 14:23:54 -07:00
|
|
|
BinaryFunction *BF{nullptr};
|
2019-04-03 15:52:01 -07:00
|
|
|
// Since function may not have yet obtained its real size, do a search
|
|
|
|
|
// using the list of registered functions instead of calling
|
|
|
|
|
// getBinaryFunctionAtAddress().
|
|
|
|
|
auto BFI = BC->getBinaryFunctions().find(Address);
|
|
|
|
|
if (BFI != BC->getBinaryFunctions().end()) {
|
2016-08-11 14:23:54 -07:00
|
|
|
BF = &BFI->second;
|
2019-04-03 15:52:01 -07:00
|
|
|
// Duplicate the function name. Make sure everything matches before we add
|
2016-06-10 17:13:05 -07:00
|
|
|
// an alternative name.
|
2016-09-15 15:47:10 -07:00
|
|
|
if (SymbolSize != BF->getSize()) {
|
|
|
|
|
if (opts::Verbosity >= 1) {
|
|
|
|
|
if (SymbolSize && BF->getSize()) {
|
|
|
|
|
errs() << "BOLT-WARNING: size mismatch for duplicate entries "
|
|
|
|
|
<< *BF << " and " << UniqueName << '\n';
|
|
|
|
|
}
|
|
|
|
|
outs() << "BOLT-INFO: adjusting size of function " << *BF
|
|
|
|
|
<< " old " << BF->getSize() << " new " << SymbolSize << "\n";
|
|
|
|
|
}
|
|
|
|
|
BF->setSize(std::max(SymbolSize, BF->getSize()));
|
2017-11-14 20:05:11 -08:00
|
|
|
BC->setBinaryDataSize(Address, BF->getSize());
|
2016-06-10 17:13:05 -07:00
|
|
|
}
|
2016-08-11 14:23:54 -07:00
|
|
|
BF->addAlternativeName(UniqueName);
|
2016-06-10 17:13:05 -07:00
|
|
|
} else {
|
2018-02-01 16:33:43 -08:00
|
|
|
auto Section = BC->getSectionForAddress(Address);
|
Generate heatmap for linux kernel
Summary:
This diff handles several challenges related to heatmap generation for Linux kernel (vmlinux elf file):
- If the input binary elf file contains the section `__ksymtab`, this diff assumes that this is the linux kernel `vmlinux` file and enables an extra flag `LinuxKernelMode`
- In `LinuxKernelMode`, we only support heat map generation right now, therefore it ensures that current BOLT mode is heat map generation. Otherwise, it exits with error.
- For some Linux symbol and section combinations, BOLT may not be able to find section for symbol (specially symbols that specifies the end of some section). For such cases, we show an warning message without exiting which was the previous behavior.
- Linux kernel elf file does not contain dynamic section, therefore, we don't exit when no dynamic section is found for linux kernel binary.
- Current `ParseMMap` logic does not work with linux kernel. MMap entries for linux kernel uses `PERF_RECORD_MMAP` format instead of typical `PERF_RECORD_MMAP2` format. Since linux kernel address mapping is absolute (same as specified in the ELF file), we avoid calling `ParseMMap` in linux kernel mode.
- Linux kernel entries are registered with PID -1, therefore `BinaryMMapInfo` lookup is not required for linux kernel entries. Similarly, `adjustLBR` is also not required.
- Default max address in linux kernel mode is highest unsigned 64-bit integer instead of current 4GBs.
- Added another new parameter for heatmap, `MinAddress`, in case of Linux kernel mode which is `KernelBaseAddress`, otherwise, it is 0. While registering Heatmap sample counts from LBR entries, any address lower than this `MinAddress` is ignored.
- `IgnoreInterruptLBR` is disabled in linux kernel mode to ensure that kernel entries are processed
Currently, linux kernel heat map also include heat map for Linux kernel modules that are not part of vmlinux elf file. This is intentional to identify other potential optimization opportunities. If reviewers think, those modules should be omitted, I will disable those modules based on highest end address of a vmlinux elf section.
(cherry picked from FBD21992765)
2020-06-10 23:00:39 -07:00
|
|
|
// Skip symbols from invalid sections
|
|
|
|
|
if (!Section) {
|
|
|
|
|
errs() << "BOLT-WARNING: " << UniqueName << " (0x"
|
|
|
|
|
<< Twine::utohexstr(Address)
|
|
|
|
|
<< ") does not have any section\n";
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2019-06-27 03:20:17 -07:00
|
|
|
assert(Section && "section for functions must be registered");
|
2019-06-26 11:06:46 -07:00
|
|
|
|
2019-06-27 03:20:17 -07:00
|
|
|
// Skip symbols from zero-sized sections.
|
|
|
|
|
if (!Section->getSize())
|
2019-06-26 11:06:46 -07:00
|
|
|
continue;
|
2019-07-12 07:25:50 -07:00
|
|
|
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
BF = BC->createBinaryFunction(UniqueName, *Section, Address, SymbolSize);
|
|
|
|
|
if (!IsSimple)
|
|
|
|
|
BF->setSimple(false);
|
2016-06-10 17:13:05 -07:00
|
|
|
}
|
2016-07-11 18:51:13 -07:00
|
|
|
if (!AlternativeName.empty())
|
2016-08-11 14:23:54 -07:00
|
|
|
BF->addAlternativeName(AlternativeName);
|
2016-09-29 11:19:06 -07:00
|
|
|
|
2017-11-14 20:05:11 -08:00
|
|
|
registerName(SymbolSize);
|
2016-09-29 11:19:06 -07:00
|
|
|
PreviousFunction = BF;
|
2016-07-11 18:51:13 -07:00
|
|
|
}
|
|
|
|
|
|
2017-08-04 11:21:05 -07:00
|
|
|
// Process PLT section.
|
2017-08-24 14:37:35 -07:00
|
|
|
if (BC->TheTriple->getArch() == Triple::x86_64)
|
|
|
|
|
disassemblePLT();
|
2017-08-04 11:21:05 -07:00
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
// See if we missed any functions marked by FDE.
|
|
|
|
|
for (const auto &FDEI : CFIRdWrt->getFDEs()) {
|
|
|
|
|
const auto Address = FDEI.first;
|
|
|
|
|
const auto *FDE = FDEI.second;
|
2019-04-03 15:52:01 -07:00
|
|
|
const auto *BF = BC->getBinaryFunctionAtAddress(Address);
|
|
|
|
|
if (BF)
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
BF = BC->getBinaryFunctionContainingAddress(Address);
|
|
|
|
|
if (BF) {
|
|
|
|
|
errs() << "BOLT-WARNING: FDE [0x" << Twine::utohexstr(Address) << ", 0x"
|
|
|
|
|
<< Twine::utohexstr(Address + FDE->getAddressRange())
|
|
|
|
|
<< ") conflicts with function " << *BF << '\n';
|
|
|
|
|
continue;
|
2017-06-02 18:41:31 -07:00
|
|
|
}
|
2019-04-03 15:52:01 -07:00
|
|
|
|
|
|
|
|
if (opts::Verbosity >= 1) {
|
|
|
|
|
errs() << "BOLT-WARNING: FDE [0x" << Twine::utohexstr(Address)
|
|
|
|
|
<< ", 0x" << Twine::utohexstr(Address + FDE->getAddressRange())
|
|
|
|
|
<< ") has no corresponding symbol table entry\n";
|
|
|
|
|
}
|
|
|
|
|
auto Section = BC->getSectionForAddress(Address);
|
|
|
|
|
assert(Section && "cannot get section for address from FDE");
|
|
|
|
|
std::string FunctionName =
|
|
|
|
|
"__BOLT_FDE_FUNCat" + Twine::utohexstr(Address).str();
|
|
|
|
|
BC->createBinaryFunction(FunctionName, *Section, Address,
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
FDE->getAddressRange());
|
2017-06-02 18:41:31 -07:00
|
|
|
}
|
|
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
BC->setHasSymbolsWithFileName(SeenFileName);
|
2016-09-29 11:19:06 -07:00
|
|
|
|
|
|
|
|
// Now that all the functions were created - adjust their boundaries.
|
|
|
|
|
adjustFunctionBoundaries();
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2017-11-22 16:17:36 -08:00
|
|
|
// Annotate functions with code/data markers in AArch64
|
|
|
|
|
for (auto ISym = MarkersBegin; ISym != SortedFileSymbols.end(); ++ISym) {
|
|
|
|
|
const auto &Symbol = *ISym;
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
uint64_t Address =
|
|
|
|
|
cantFail(Symbol.getAddress(), "cannot get symbol address");
|
2017-11-22 16:17:36 -08:00
|
|
|
auto SymbolSize = ELFSymbolRef(Symbol).getSize();
|
2019-04-03 15:52:01 -07:00
|
|
|
auto *BF = BC->getBinaryFunctionContainingAddress(Address, true, true);
|
2017-11-22 16:17:36 -08:00
|
|
|
if (!BF) {
|
|
|
|
|
// Stray marker
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
const auto EntryOffset = Address - BF->getAddress();
|
|
|
|
|
if (BF->isCodeMarker(Symbol, SymbolSize)) {
|
|
|
|
|
BF->markCodeAtOffset(EntryOffset);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
if (BF->isDataMarker(Symbol, SymbolSize)) {
|
|
|
|
|
BF->markDataAtOffset(EntryOffset);
|
|
|
|
|
BC->AddressToConstantIslandMap[Address] = BF;
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
llvm_unreachable("Unknown marker");
|
|
|
|
|
}
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
// Read all relocations now that we have binary functions mapped.
|
2020-02-24 17:10:02 -08:00
|
|
|
processRelocations();
|
2016-09-29 11:19:06 -07:00
|
|
|
}
|
|
|
|
|
|
2017-08-04 11:21:05 -07:00
|
|
|
void RewriteInstance::disassemblePLT() {
|
2019-08-26 15:03:38 -07:00
|
|
|
// Used to analyze both the .plt section (most common) and the less common
|
|
|
|
|
// .plt.got created by the BFD linker.
|
|
|
|
|
auto analyzeOnePLTSection = [&](BinarySection &Section,
|
2020-04-23 21:29:10 -07:00
|
|
|
const BinarySection &RelocsSection,
|
2019-08-26 15:03:38 -07:00
|
|
|
uint64_t RelocType, uint64_t EntrySize) {
|
|
|
|
|
const auto PLTAddress = Section.getAddress();
|
|
|
|
|
StringRef PLTContents = Section.getContents();
|
|
|
|
|
ArrayRef<uint8_t> PLTData(
|
|
|
|
|
reinterpret_cast<const uint8_t *>(PLTContents.data()),
|
|
|
|
|
Section.getSize());
|
2020-04-23 21:29:10 -07:00
|
|
|
const auto PtrSize = BC->AsmInfo->getCodePointerSize();
|
2019-08-26 15:03:38 -07:00
|
|
|
|
2020-04-23 21:29:10 -07:00
|
|
|
// Runtime linker will put a value of an external symbol at the location
|
|
|
|
|
// referenced by the relocation. Map the address to the name of the symbol.
|
|
|
|
|
std::unordered_map<uint64_t, StringRef> RelAddrToNameMap;
|
|
|
|
|
for (const auto &Rel : RelocsSection.getSectionRef().relocations()) {
|
|
|
|
|
if (Rel.getType() != RelocType)
|
|
|
|
|
continue;
|
|
|
|
|
const auto SymbolIter = Rel.getSymbol();
|
|
|
|
|
assert(SymbolIter != InputFile->symbol_end() &&
|
|
|
|
|
"non-null symbol expected");
|
|
|
|
|
RelAddrToNameMap[Rel.getOffset()] = cantFail((*SymbolIter).getName());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
for (uint64_t Offset = 0; Offset < Section.getSize(); Offset += EntrySize) {
|
2019-08-26 15:03:38 -07:00
|
|
|
uint64_t InstrSize;
|
|
|
|
|
MCInst Instruction;
|
|
|
|
|
const uint64_t InstrAddr = PLTAddress + Offset;
|
|
|
|
|
if (!BC->DisAsm->getInstruction(Instruction, InstrSize,
|
|
|
|
|
PLTData.slice(Offset), InstrAddr, nulls(),
|
|
|
|
|
nulls())) {
|
|
|
|
|
errs() << "BOLT-ERROR: unable to disassemble instruction in PLT "
|
|
|
|
|
"section "
|
|
|
|
|
<< Section.getName() << " at offset 0x"
|
|
|
|
|
<< Twine::utohexstr(Offset) << '\n';
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
2017-08-04 11:21:05 -07:00
|
|
|
|
2019-08-26 15:03:38 -07:00
|
|
|
if (!BC->MIB->isIndirectBranch(Instruction))
|
|
|
|
|
continue;
|
2017-08-04 11:21:05 -07:00
|
|
|
|
2019-08-26 15:03:38 -07:00
|
|
|
uint64_t TargetAddress;
|
|
|
|
|
if (!BC->MIB->evaluateMemOperandTarget(Instruction, TargetAddress,
|
|
|
|
|
InstrAddr, InstrSize)) {
|
|
|
|
|
errs() << "BOLT-ERROR: error evaluating PLT instruction at offset 0x"
|
|
|
|
|
<< Twine::utohexstr(InstrAddr) << '\n';
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
2020-05-04 13:57:21 -07:00
|
|
|
|
2020-04-23 21:29:10 -07:00
|
|
|
auto NI = RelAddrToNameMap.find(TargetAddress);
|
|
|
|
|
if (NI == RelAddrToNameMap.end())
|
|
|
|
|
continue;
|
2017-08-04 11:21:05 -07:00
|
|
|
|
2020-04-23 21:29:10 -07:00
|
|
|
StringRef SymbolName = NI->second;
|
|
|
|
|
auto *BF = BC->createBinaryFunction(SymbolName.str() + "@PLT", Section,
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
InstrAddr, 0, EntrySize,
|
|
|
|
|
PLTAlignment);
|
2020-04-23 21:29:10 -07:00
|
|
|
MCSymbol *TargetSymbol =
|
|
|
|
|
BC->registerNameAtAddress(SymbolName.str() + "@GOT",
|
|
|
|
|
TargetAddress, PtrSize, PLTAlignment);
|
|
|
|
|
BF->setPLTSymbol(TargetSymbol);
|
2017-08-04 11:21:05 -07:00
|
|
|
}
|
2019-08-26 15:03:38 -07:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
if (PLTSection) {
|
|
|
|
|
// Pseudo function for the start of PLT. The table could have a matching
|
|
|
|
|
// FDE that we want to match to pseudo function.
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
auto *BF = BC->createBinaryFunction("__BOLT_PLT_PSEUDO", *PLTSection,
|
|
|
|
|
PLTSection->getAddress(), 0, PLTSize,
|
|
|
|
|
PLTAlignment);
|
|
|
|
|
BF->setPseudo(true);
|
2020-04-23 21:29:10 -07:00
|
|
|
if (RelaPLTSection) {
|
|
|
|
|
analyzeOnePLTSection(*PLTSection, *RelaPLTSection,
|
|
|
|
|
ELF::R_X86_64_JUMP_SLOT, PLTSize);
|
|
|
|
|
}
|
2017-08-04 11:21:05 -07:00
|
|
|
}
|
|
|
|
|
|
2018-01-23 15:10:24 -08:00
|
|
|
if (PLTGOTSection) {
|
2020-04-23 21:29:10 -07:00
|
|
|
if (RelaDynSection) {
|
|
|
|
|
analyzeOnePLTSection(*PLTGOTSection, *RelaDynSection,
|
|
|
|
|
ELF::R_X86_64_GLOB_DAT, /*Size=*/8);
|
|
|
|
|
}
|
2019-08-26 15:03:38 -07:00
|
|
|
// If we did not register any function at PLTGOT start, we may be missing
|
|
|
|
|
// relocs. Add a function at the start to mark this section.
|
|
|
|
|
if (BC->getBinaryFunctions().find(PLTGOTSection->getAddress()) ==
|
|
|
|
|
BC->getBinaryFunctions().end()) {
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
auto *BF =
|
|
|
|
|
BC->createBinaryFunction("__BOLT_PLTGOT_PSEUDO", *PLTGOTSection,
|
|
|
|
|
PLTGOTSection->getAddress(), 0,
|
|
|
|
|
/*SymbolSize*/ 8, PLTAlignment);
|
|
|
|
|
BF->setPseudo(true);
|
2017-08-04 11:21:05 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-09-29 11:19:06 -07:00
|
|
|
void RewriteInstance::adjustFunctionBoundaries() {
|
2019-04-03 15:52:01 -07:00
|
|
|
for (auto BFI = BC->getBinaryFunctions().begin(),
|
|
|
|
|
BFE = BC->getBinaryFunctions().end();
|
2017-10-10 14:54:09 -07:00
|
|
|
BFI != BFE; ++BFI) {
|
|
|
|
|
auto &Function = BFI->second;
|
2019-06-28 11:53:34 -07:00
|
|
|
const BinaryFunction *NextFunction{nullptr};
|
|
|
|
|
if (std::next(BFI) != BFE)
|
|
|
|
|
NextFunction = &std::next(BFI)->second;
|
2017-10-10 14:54:09 -07:00
|
|
|
|
[BOLT] Basic support for split functions
Summary:
This adds very basic and limited support for split functions.
In non-relocation mode, split functions are ignored, while their debug
info is properly updated. No support in the relocation mode yet.
Split functions consist of a main body and one or more fragments.
For fragments, the main part is called their parent. Any fragment
could only be entered via its parent or another fragment.
The short-term goal is to correctly update debug information for split
functions, while the long-term goal is to have a complete support
including full optimization. Note that if we don't detect split
bodies, we would have to add multiple entry points via tail calls,
which we would rather avoid.
Parent functions and fragments are represented by a `BinaryFunction`
and are marked accordingly. For now they are marked as non-simple, and
thus only supported in non-relocation mode. Once we start building a
CFG, it should be a common graph (i.e. the one that includes all
fragments) in the parent function.
The function discovery is unchanged, except for the detection of
`\.cold\.` pattern in the function name, which automatically marks the
function as a fragment of another function.
Because of the local function name ambiguity, we cannot rely on the
function name to establish child fragment and parent relationship.
Instead we rely on disassembly processing.
`BinaryContext::getBinaryFunctionContainingAddress()` now returns a
parent function if an address from its fragment is passed.
There's no jump table support at the moment. Jump tables can have
source and destinations in both fragment and parent.
Parent functions that enter their fragments via C++ exception handling
mechanism are not yet supported.
(cherry picked from FBD14970569)
2019-04-16 10:24:34 -07:00
|
|
|
// Check if it's a fragment of a function.
|
2020-01-13 11:56:59 -08:00
|
|
|
auto FragName = Function.hasNameRegex(".*\\.cold\\..*");
|
2019-05-29 18:33:09 -07:00
|
|
|
if (!FragName)
|
|
|
|
|
FragName = Function.hasNameRegex(".*\\.cold");
|
|
|
|
|
if (FragName) {
|
[BOLT] Basic support for split functions
Summary:
This adds very basic and limited support for split functions.
In non-relocation mode, split functions are ignored, while their debug
info is properly updated. No support in the relocation mode yet.
Split functions consist of a main body and one or more fragments.
For fragments, the main part is called their parent. Any fragment
could only be entered via its parent or another fragment.
The short-term goal is to correctly update debug information for split
functions, while the long-term goal is to have a complete support
including full optimization. Note that if we don't detect split
bodies, we would have to add multiple entry points via tail calls,
which we would rather avoid.
Parent functions and fragments are represented by a `BinaryFunction`
and are marked accordingly. For now they are marked as non-simple, and
thus only supported in non-relocation mode. Once we start building a
CFG, it should be a common graph (i.e. the one that includes all
fragments) in the parent function.
The function discovery is unchanged, except for the detection of
`\.cold\.` pattern in the function name, which automatically marks the
function as a fragment of another function.
Because of the local function name ambiguity, we cannot rely on the
function name to establish child fragment and parent relationship.
Instead we rely on disassembly processing.
`BinaryContext::getBinaryFunctionContainingAddress()` now returns a
parent function if an address from its fragment is passed.
There's no jump table support at the moment. Jump tables can have
source and destinations in both fragment and parent.
Parent functions that enter their fragments via C++ exception handling
mechanism are not yet supported.
(cherry picked from FBD14970569)
2019-04-16 10:24:34 -07:00
|
|
|
static bool PrintedWarning = false;
|
|
|
|
|
if (BC->HasRelocations && !PrintedWarning) {
|
|
|
|
|
errs() << "BOLT-WARNING: split function detected on input : "
|
2019-08-26 15:03:38 -07:00
|
|
|
<< *FragName << ". The support is limited in relocation mode.\n";
|
[BOLT] Basic support for split functions
Summary:
This adds very basic and limited support for split functions.
In non-relocation mode, split functions are ignored, while their debug
info is properly updated. No support in the relocation mode yet.
Split functions consist of a main body and one or more fragments.
For fragments, the main part is called their parent. Any fragment
could only be entered via its parent or another fragment.
The short-term goal is to correctly update debug information for split
functions, while the long-term goal is to have a complete support
including full optimization. Note that if we don't detect split
bodies, we would have to add multiple entry points via tail calls,
which we would rather avoid.
Parent functions and fragments are represented by a `BinaryFunction`
and are marked accordingly. For now they are marked as non-simple, and
thus only supported in non-relocation mode. Once we start building a
CFG, it should be a common graph (i.e. the one that includes all
fragments) in the parent function.
The function discovery is unchanged, except for the detection of
`\.cold\.` pattern in the function name, which automatically marks the
function as a fragment of another function.
Because of the local function name ambiguity, we cannot rely on the
function name to establish child fragment and parent relationship.
Instead we rely on disassembly processing.
`BinaryContext::getBinaryFunctionContainingAddress()` now returns a
parent function if an address from its fragment is passed.
There's no jump table support at the moment. Jump tables can have
source and destinations in both fragment and parent.
Parent functions that enter their fragments via C++ exception handling
mechanism are not yet supported.
(cherry picked from FBD14970569)
2019-04-16 10:24:34 -07:00
|
|
|
PrintedWarning = true;
|
|
|
|
|
}
|
|
|
|
|
Function.IsFragment = true;
|
|
|
|
|
}
|
|
|
|
|
|
2017-10-10 14:54:09 -07:00
|
|
|
// Check if there's a symbol or a function with a larger address in the
|
|
|
|
|
// same section. If there is - it determines the maximum size for the
|
|
|
|
|
// current function. Otherwise, it is the size of a containing section
|
|
|
|
|
// the defines it.
|
2016-09-29 11:19:06 -07:00
|
|
|
//
|
|
|
|
|
// NOTE: ignore some symbols that could be tolerated inside the body
|
|
|
|
|
// of a function.
|
|
|
|
|
auto NextSymRefI = FileSymRefs.upper_bound(Function.getAddress());
|
|
|
|
|
while (NextSymRefI != FileSymRefs.end()) {
|
|
|
|
|
auto &Symbol = NextSymRefI->second;
|
2019-06-28 11:53:34 -07:00
|
|
|
const auto SymbolAddress = NextSymRefI->first;
|
|
|
|
|
const auto SymbolSize = ELFSymbolRef(Symbol).getSize();
|
|
|
|
|
|
|
|
|
|
if (NextFunction && SymbolAddress >= NextFunction->getAddress())
|
|
|
|
|
break;
|
2016-09-29 11:19:06 -07:00
|
|
|
|
|
|
|
|
if (!Function.isSymbolValidInScope(Symbol, SymbolSize))
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
// This is potentially another entry point into the function.
|
|
|
|
|
auto EntryOffset = NextSymRefI->first - Function.getAddress();
|
2017-11-22 16:17:36 -08:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: adding entry point to function " << Function
|
|
|
|
|
<< " at offset 0x" << Twine::utohexstr(EntryOffset) << '\n');
|
|
|
|
|
Function.addEntryPointAtOffset(EntryOffset);
|
2016-09-29 11:19:06 -07:00
|
|
|
|
|
|
|
|
++NextSymRefI;
|
|
|
|
|
}
|
|
|
|
|
|
2017-10-10 14:54:09 -07:00
|
|
|
// Function runs at most till the end of the containing section.
|
2018-01-23 15:10:24 -08:00
|
|
|
uint64_t NextObjectAddress = Function.getSection().getEndAddress();
|
2017-10-10 14:54:09 -07:00
|
|
|
// Or till the next object marked by a symbol.
|
|
|
|
|
if (NextSymRefI != FileSymRefs.end()) {
|
|
|
|
|
NextObjectAddress = std::min(NextSymRefI->first, NextObjectAddress);
|
|
|
|
|
}
|
|
|
|
|
// Or till the next function not marked by a symbol.
|
2019-06-28 11:53:34 -07:00
|
|
|
if (NextFunction) {
|
2019-08-26 15:03:38 -07:00
|
|
|
NextObjectAddress =
|
|
|
|
|
std::min(NextFunction->getAddress(), NextObjectAddress);
|
2016-09-29 11:19:06 -07:00
|
|
|
}
|
|
|
|
|
|
2017-10-10 14:54:09 -07:00
|
|
|
const auto MaxSize = NextObjectAddress - Function.getAddress();
|
2016-09-29 11:19:06 -07:00
|
|
|
if (MaxSize < Function.getSize()) {
|
2016-09-27 19:09:38 -07:00
|
|
|
errs() << "BOLT-ERROR: symbol seen in the middle of the function "
|
|
|
|
|
<< Function << ". Skipping.\n";
|
2016-09-29 11:19:06 -07:00
|
|
|
Function.setSimple(false);
|
2016-09-27 19:09:38 -07:00
|
|
|
Function.setMaxSize(Function.getSize());
|
2016-09-29 11:19:06 -07:00
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
Function.setMaxSize(MaxSize);
|
2016-09-27 19:09:38 -07:00
|
|
|
if (!Function.getSize() && Function.isSimple()) {
|
2016-09-29 11:19:06 -07:00
|
|
|
// Some assembly functions have their size set to 0, use the max
|
|
|
|
|
// size as their real size.
|
|
|
|
|
if (opts::Verbosity >= 1) {
|
2019-08-26 15:03:38 -07:00
|
|
|
outs() << "BOLT-INFO: setting size of function " << Function << " to "
|
|
|
|
|
<< Function.getMaxSize() << " (was 0)\n";
|
2016-09-29 11:19:06 -07:00
|
|
|
}
|
|
|
|
|
Function.setSize(Function.getMaxSize());
|
|
|
|
|
}
|
|
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
|
|
|
|
|
2016-11-11 14:33:34 -08:00
|
|
|
void RewriteInstance::relocateEHFrameSection() {
|
2018-01-23 15:10:24 -08:00
|
|
|
assert(EHFrameSection && "non-empty .eh_frame section expected");
|
2016-11-11 14:33:34 -08:00
|
|
|
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
DWARFDebugFrame EHFrame(true, EHFrameSection->getAddress());
|
2018-02-01 16:33:43 -08:00
|
|
|
DWARFDataExtractor DE(EHFrameSection->getContents(),
|
|
|
|
|
BC->AsmInfo->isLittleEndian(),
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
BC->AsmInfo->getCodePointerSize());
|
2016-11-11 14:33:34 -08:00
|
|
|
auto createReloc = [&](uint64_t Value, uint64_t Offset, uint64_t DwarfType) {
|
|
|
|
|
if (DwarfType == dwarf::DW_EH_PE_omit)
|
|
|
|
|
return;
|
|
|
|
|
|
2020-04-16 00:02:35 -07:00
|
|
|
// Only fix references that are relative to other locations.
|
2016-11-11 14:33:34 -08:00
|
|
|
if (!(DwarfType & dwarf::DW_EH_PE_pcrel) &&
|
|
|
|
|
!(DwarfType & dwarf::DW_EH_PE_textrel) &&
|
|
|
|
|
!(DwarfType & dwarf::DW_EH_PE_funcrel) &&
|
|
|
|
|
!(DwarfType & dwarf::DW_EH_PE_datarel)) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (!(DwarfType & dwarf::DW_EH_PE_sdata4))
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
uint64_t RelType;
|
|
|
|
|
switch (DwarfType & 0x0f) {
|
|
|
|
|
default:
|
|
|
|
|
llvm_unreachable("unsupported DWARF encoding type");
|
|
|
|
|
case dwarf::DW_EH_PE_sdata4:
|
|
|
|
|
case dwarf::DW_EH_PE_udata4:
|
|
|
|
|
RelType = ELF::R_X86_64_PC32;
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
Offset -= 4;
|
2016-11-11 14:33:34 -08:00
|
|
|
break;
|
|
|
|
|
case dwarf::DW_EH_PE_sdata8:
|
|
|
|
|
case dwarf::DW_EH_PE_udata8:
|
|
|
|
|
RelType = ELF::R_X86_64_PC64;
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
Offset -= 8;
|
2016-11-11 14:33:34 -08:00
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
|
2020-04-16 00:02:35 -07:00
|
|
|
// Create a relocation against an absolute value since the goal is to
|
|
|
|
|
// preserve the contents of the section independent of the new values
|
|
|
|
|
// of referenced symbols.
|
|
|
|
|
EHFrameSection->addRelocation(Offset, nullptr, RelType, Value);
|
2016-11-11 14:33:34 -08:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
EHFrame.parse(DE, createReloc);
|
|
|
|
|
}
|
|
|
|
|
|
2018-04-20 20:03:31 -07:00
|
|
|
ArrayRef<uint8_t> RewriteInstance::getLSDAData() {
|
|
|
|
|
return ArrayRef<uint8_t>(LSDASection->getData(),
|
|
|
|
|
LSDASection->getContents().size());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
uint64_t RewriteInstance::getLSDAAddress() {
|
|
|
|
|
return LSDASection->getAddress();
|
|
|
|
|
}
|
|
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
void RewriteInstance::readSpecialSections() {
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
NamedRegionTimer T("readSpecialSections", "read special sections",
|
|
|
|
|
TimerGroupName, TimerGroupDesc, opts::TimeRewrite);
|
2017-11-27 18:00:24 -08:00
|
|
|
|
2017-03-22 22:05:50 -07:00
|
|
|
bool HasTextRelocations = false;
|
2019-04-26 15:30:12 -07:00
|
|
|
bool HasDebugInfo = false;
|
2017-03-22 22:05:50 -07:00
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
// Process special sections.
|
2016-03-03 10:13:11 -08:00
|
|
|
for (const auto &Section : InputFile->sections()) {
|
2015-11-23 17:54:18 -08:00
|
|
|
StringRef SectionName;
|
|
|
|
|
check_error(Section.getName(SectionName), "cannot get section name");
|
2016-07-21 12:45:35 -07:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
// Only register sections with names.
|
2018-04-20 20:03:31 -07:00
|
|
|
if (!SectionName.empty()) {
|
2018-02-01 16:33:43 -08:00
|
|
|
BC->registerSection(Section);
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: registering section " << SectionName
|
|
|
|
|
<< " @ 0x" << Twine::utohexstr(Section.getAddress()) << ":0x"
|
|
|
|
|
<< Twine::utohexstr(Section.getAddress() + Section.getSize())
|
|
|
|
|
<< "\n");
|
2019-04-26 15:30:12 -07:00
|
|
|
if (isDebugSection(SectionName))
|
|
|
|
|
HasDebugInfo = true;
|
Generate heatmap for linux kernel
Summary:
This diff handles several challenges related to heatmap generation for Linux kernel (vmlinux elf file):
- If the input binary elf file contains the section `__ksymtab`, this diff assumes that this is the linux kernel `vmlinux` file and enables an extra flag `LinuxKernelMode`
- In `LinuxKernelMode`, we only support heat map generation right now, therefore it ensures that current BOLT mode is heat map generation. Otherwise, it exits with error.
- For some Linux symbol and section combinations, BOLT may not be able to find section for symbol (specially symbols that specifies the end of some section). For such cases, we show an warning message without exiting which was the previous behavior.
- Linux kernel elf file does not contain dynamic section, therefore, we don't exit when no dynamic section is found for linux kernel binary.
- Current `ParseMMap` logic does not work with linux kernel. MMap entries for linux kernel uses `PERF_RECORD_MMAP` format instead of typical `PERF_RECORD_MMAP2` format. Since linux kernel address mapping is absolute (same as specified in the ELF file), we avoid calling `ParseMMap` in linux kernel mode.
- Linux kernel entries are registered with PID -1, therefore `BinaryMMapInfo` lookup is not required for linux kernel entries. Similarly, `adjustLBR` is also not required.
- Default max address in linux kernel mode is highest unsigned 64-bit integer instead of current 4GBs.
- Added another new parameter for heatmap, `MinAddress`, in case of Linux kernel mode which is `KernelBaseAddress`, otherwise, it is 0. While registering Heatmap sample counts from LBR entries, any address lower than this `MinAddress` is ignored.
- `IgnoreInterruptLBR` is disabled in linux kernel mode to ensure that kernel entries are processed
Currently, linux kernel heat map also include heat map for Linux kernel modules that are not part of vmlinux elf file. This is intentional to identify other potential optimization opportunities. If reviewers think, those modules should be omitted, I will disable those modules based on highest end address of a vmlinux elf section.
(cherry picked from FBD21992765)
2020-06-10 23:00:39 -07:00
|
|
|
if (isKSymtabSection(SectionName))
|
|
|
|
|
opts::LinuxKernelMode = true;
|
2018-02-01 16:33:43 -08:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
Generate heatmap for linux kernel
Summary:
This diff handles several challenges related to heatmap generation for Linux kernel (vmlinux elf file):
- If the input binary elf file contains the section `__ksymtab`, this diff assumes that this is the linux kernel `vmlinux` file and enables an extra flag `LinuxKernelMode`
- In `LinuxKernelMode`, we only support heat map generation right now, therefore it ensures that current BOLT mode is heat map generation. Otherwise, it exits with error.
- For some Linux symbol and section combinations, BOLT may not be able to find section for symbol (specially symbols that specifies the end of some section). For such cases, we show an warning message without exiting which was the previous behavior.
- Linux kernel elf file does not contain dynamic section, therefore, we don't exit when no dynamic section is found for linux kernel binary.
- Current `ParseMMap` logic does not work with linux kernel. MMap entries for linux kernel uses `PERF_RECORD_MMAP` format instead of typical `PERF_RECORD_MMAP2` format. Since linux kernel address mapping is absolute (same as specified in the ELF file), we avoid calling `ParseMMap` in linux kernel mode.
- Linux kernel entries are registered with PID -1, therefore `BinaryMMapInfo` lookup is not required for linux kernel entries. Similarly, `adjustLBR` is also not required.
- Default max address in linux kernel mode is highest unsigned 64-bit integer instead of current 4GBs.
- Added another new parameter for heatmap, `MinAddress`, in case of Linux kernel mode which is `KernelBaseAddress`, otherwise, it is 0. While registering Heatmap sample counts from LBR entries, any address lower than this `MinAddress` is ignored.
- `IgnoreInterruptLBR` is disabled in linux kernel mode to ensure that kernel entries are processed
Currently, linux kernel heat map also include heat map for Linux kernel modules that are not part of vmlinux elf file. This is intentional to identify other potential optimization opportunities. If reviewers think, those modules should be omitted, I will disable those modules based on highest end address of a vmlinux elf section.
(cherry picked from FBD21992765)
2020-06-10 23:00:39 -07:00
|
|
|
if (opts::LinuxKernelMode && !opts::HeatmapMode) {
|
|
|
|
|
errs() << "BOLT-ERROR: input binary seems like the vmlinux binary"
|
|
|
|
|
<< " as it has linux kernel symbol information, for which we"
|
|
|
|
|
<< " only support heatmap generation right now!!!\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
|
|
|
|
|
2019-04-12 17:33:46 -07:00
|
|
|
if (HasDebugInfo && !opts::UpdateDebugSections && !opts::AggregateOnly) {
|
2019-04-26 15:30:12 -07:00
|
|
|
errs() << "BOLT-WARNING: debug info will be stripped from the binary. "
|
|
|
|
|
"Use -update-debug-sections to keep it.\n";
|
|
|
|
|
}
|
|
|
|
|
|
2018-04-20 20:03:31 -07:00
|
|
|
HasTextRelocations = (bool)BC->getUniqueSectionByName(".rela.text");
|
|
|
|
|
LSDASection = BC->getUniqueSectionByName(".gcc_except_table");
|
|
|
|
|
EHFrameSection = BC->getUniqueSectionByName(".eh_frame");
|
|
|
|
|
PLTSection = BC->getUniqueSectionByName(".plt");
|
|
|
|
|
GOTPLTSection = BC->getUniqueSectionByName(".got.plt");
|
|
|
|
|
PLTGOTSection = BC->getUniqueSectionByName(".plt.got");
|
|
|
|
|
RelaPLTSection = BC->getUniqueSectionByName(".rela.plt");
|
2019-08-26 15:03:38 -07:00
|
|
|
RelaDynSection = BC->getUniqueSectionByName(".rela.dyn");
|
2018-08-08 17:55:24 -07:00
|
|
|
BuildIDSection = BC->getUniqueSectionByName(".note.gnu.build-id");
|
2019-05-15 17:19:18 -07:00
|
|
|
SDTSection = BC->getUniqueSectionByName(".note.stapsdt");
|
2018-04-20 20:03:31 -07:00
|
|
|
|
2019-04-12 17:33:46 -07:00
|
|
|
if (auto BATSec =
|
|
|
|
|
BC->getUniqueSectionByName(BoltAddressTranslation::SECTION_NAME)) {
|
2019-10-11 13:32:14 -07:00
|
|
|
// Do not read BAT when plotting a heatmap
|
|
|
|
|
if (!opts::HeatmapMode) {
|
|
|
|
|
if (std::error_code EC = BAT->parse(BATSec->getContents())) {
|
|
|
|
|
errs() << "BOLT-ERROR: failed to parse BOLT address translation "
|
|
|
|
|
"table.\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
2019-04-12 17:33:46 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
if (opts::PrintSections) {
|
|
|
|
|
outs() << "BOLT-INFO: Sections from original binary:\n";
|
|
|
|
|
BC->printSections(outs());
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
|
|
|
|
|
2017-12-09 21:40:39 -08:00
|
|
|
if (opts::RelocationMode == cl::BOU_TRUE && !HasTextRelocations) {
|
2017-03-22 22:05:50 -07:00
|
|
|
errs() << "BOLT-ERROR: relocations against code are missing from the input "
|
|
|
|
|
"file. Cannot proceed in relocations mode (-relocs).\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
|
|
|
|
|
2017-12-09 21:40:39 -08:00
|
|
|
BC->HasRelocations = HasTextRelocations &&
|
|
|
|
|
(opts::RelocationMode != cl::BOU_FALSE);
|
2019-07-12 07:25:50 -07:00
|
|
|
|
2019-06-26 11:06:46 -07:00
|
|
|
// Force non-relocation mode for heatmap generation
|
|
|
|
|
if (opts::HeatmapMode) {
|
|
|
|
|
BC->HasRelocations = false;
|
|
|
|
|
}
|
2019-07-12 07:25:50 -07:00
|
|
|
|
2018-04-09 13:47:43 -07:00
|
|
|
if (BC->HasRelocations) {
|
2019-06-28 09:21:27 -07:00
|
|
|
outs() << "BOLT-INFO: enabling " << (opts::StrictMode ? "strict " : "")
|
|
|
|
|
<< "relocation mode\n";
|
2018-04-09 13:47:43 -07:00
|
|
|
}
|
2017-12-09 21:40:39 -08:00
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
// Process debug sections.
|
2016-02-25 16:57:07 -08:00
|
|
|
EHFrame = BC->DwCtx->getEHFrame();
|
2015-11-23 17:54:18 -08:00
|
|
|
if (opts::DumpEHFrame) {
|
2016-11-15 10:40:00 -08:00
|
|
|
outs() << "BOLT-INFO: Dumping original binary .eh_frame\n";
|
2018-03-30 15:49:34 -07:00
|
|
|
EHFrame->dump(outs(), &*BC->MRI, NoneType());
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
2016-11-14 16:39:55 -08:00
|
|
|
CFIRdWrt.reset(new CFIReaderWriter(*EHFrame));
|
2018-08-08 17:55:24 -07:00
|
|
|
|
|
|
|
|
// Parse build-id
|
|
|
|
|
parseBuildID();
|
2020-05-07 23:00:29 -07:00
|
|
|
if (auto FileBuildID = getPrintableBuildID()) {
|
|
|
|
|
BC->setFileBuildID(*FileBuildID);
|
2018-08-08 17:55:24 -07:00
|
|
|
}
|
2019-05-15 17:19:18 -07:00
|
|
|
|
|
|
|
|
parseSDTNotes();
|
2020-03-08 19:04:39 -07:00
|
|
|
|
2020-06-26 16:52:07 -07:00
|
|
|
// Read .dynamic/PT_DYNAMIC.
|
|
|
|
|
readELFDynamic();
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
|
|
|
|
|
2018-04-13 15:46:19 -07:00
|
|
|
void RewriteInstance::adjustCommandLineOptions() {
|
2018-09-21 12:00:20 -07:00
|
|
|
if (BC->isAArch64() && !BC->HasRelocations) {
|
2018-04-13 15:46:19 -07:00
|
|
|
errs() << "BOLT-WARNING: non-relocation mode for AArch64 is not fully "
|
|
|
|
|
"supported\n";
|
|
|
|
|
}
|
|
|
|
|
|
Adding automatic huge page support
Summary:
This patch enables automated hugify for Bolt.
When running Bolt against a binary with -hugify specified, Bolt will inject a call to a runtime library function at the entry of the binary. The runtime library calls madvise to map the hot code region into a 2M huge page. We support both new kernel with THP support and old kernels. For kernels with THP support we simply make a madvise call, while for old kernels, we first copy the code out, remap the memory with huge page, and then copy the code back.
With this change, we no longer need to manually call into hugify_self and precompile it with --hot-text. Instead, we could simply combine --hugify option with existing optimizations, and at runtime it will automatically move hot code into 2M pages.
Some details around the changes made:
1. Add an command line option to support --hugify. --hugify will automatically turn on --hot-text to get the proper hot code symbols. However, running with both --hugify and --hot-text is not allowed, since --hot-text is used on binaries that has precompiled call to hugify_self, which contradicts with the purpose of --hugify.
2. Moved the common utility functions out of instr.cpp to common.h, which will also be used by hugify.cpp. Added a few new system calls definitions.
3. Added a new class that inherits RuntimeLibrary, and implemented the necessary emit and link logic for hugify.
4. Added a simple test for hugify.
(cherry picked from FBD21384529)
2020-05-02 11:14:38 -07:00
|
|
|
if (auto *RtLibrary = BC->getRuntimeLibrary()) {
|
|
|
|
|
RtLibrary->adjustCommandLineOptions(*BC);
|
2019-06-19 20:10:49 -07:00
|
|
|
}
|
|
|
|
|
|
2018-04-13 15:46:19 -07:00
|
|
|
if (opts::AlignMacroOpFusion != MFT_NONE && !BC->isX86()) {
|
|
|
|
|
outs() << "BOLT-INFO: disabling -align-macro-fusion on non-x86 platform\n";
|
|
|
|
|
opts::AlignMacroOpFusion = MFT_NONE;
|
|
|
|
|
}
|
2019-03-15 13:43:36 -07:00
|
|
|
|
[BOLT] Decoder cache friendly alignment wrt Intel JCC Erratum
Summary:
This diff ports reviews.llvm.org/D70157 to our LLVM tree, which
makes the integrated assembler able to align X86 control-flow changing
instructions in a way to reduce the performance impact of the ucode
update on Intel processors that implement the JCC erratum mitigation.
See white paper "Mitigations for Jump Conditional Code Erratum" by Intel
published November 2019.
To port this patch, I changed classifySecondInstInMacroFusion to analyze
instruction opcodes directly instead of analyzing the CondCond operand
(in more recent versions of LLVM, all conditional branches share the
same opcode, but with a different conditional operand). I also pulled to
our tree Alignment.h as a dependency, and the macroop analyzing helpers.
x86-align-branch-boundary and -x86-align-branch are the two flags that
control nop insertion to avoid disabling the decoder cache, following
the original patch. In BOLT, I added the flag
x86-align-branch-boundary-hot-only to request the alignment to only be
applied to hot code, which is turned on by default. The reason is
because such alignment is expensive to perform on large modules, but if
we limit it to hot code, the relaxation pass runtime becomes tolerable.
(cherry picked from FBD19828850)
2020-02-10 18:50:53 -08:00
|
|
|
if ((X86AlignBranchWithin32BBoundaries || X86AlignBranchBoundary != 0) &&
|
|
|
|
|
BC->isX86()) {
|
|
|
|
|
if (!BC->HasRelocations) {
|
|
|
|
|
errs() << "BOLT-ERROR: cannot apply mitigations for Intel JCC erratum in "
|
|
|
|
|
"non-relocation mode\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
|
|
|
|
outs() << "BOLT-WARNING: using mitigation for Intel JCC erratum, layout "
|
|
|
|
|
"may take several minutes\n";
|
|
|
|
|
opts::AlignMacroOpFusion = MFT_NONE;
|
|
|
|
|
}
|
|
|
|
|
|
2020-04-19 15:02:50 -07:00
|
|
|
if (opts::AlignMacroOpFusion != MFT_NONE && !BC->HasRelocations) {
|
2018-04-13 15:46:19 -07:00
|
|
|
outs() << "BOLT-INFO: disabling -align-macro-fusion in non-relocation "
|
|
|
|
|
"mode\n";
|
|
|
|
|
opts::AlignMacroOpFusion = MFT_NONE;
|
|
|
|
|
}
|
2019-03-15 13:43:36 -07:00
|
|
|
|
2018-06-25 14:55:48 -07:00
|
|
|
if (opts::SplitEH && !BC->HasRelocations) {
|
2019-04-25 17:00:05 -07:00
|
|
|
errs() << "BOLT-WARNING: disabling -split-eh in non-relocation mode\n";
|
2018-06-25 14:55:48 -07:00
|
|
|
opts::SplitEH = false;
|
|
|
|
|
}
|
2019-03-15 13:43:36 -07:00
|
|
|
|
2019-06-28 09:21:27 -07:00
|
|
|
if (opts::StrictMode && !BC->HasRelocations) {
|
|
|
|
|
errs() << "BOLT-WARNING: disabling strict mode (-strict) in non-relocation "
|
|
|
|
|
"mode\n";
|
|
|
|
|
opts::StrictMode = false;
|
|
|
|
|
}
|
|
|
|
|
|
2019-06-11 13:24:10 -07:00
|
|
|
if (BC->HasRelocations && opts::AggregateOnly &&
|
|
|
|
|
!opts::StrictMode.getNumOccurrences()) {
|
2019-11-14 16:07:11 -08:00
|
|
|
outs() << "BOLT-INFO: enabling strict relocation mode for aggregation "
|
2019-06-11 13:24:10 -07:00
|
|
|
"purposes\n";
|
|
|
|
|
opts::StrictMode = true;
|
|
|
|
|
}
|
|
|
|
|
|
2018-04-13 15:46:19 -07:00
|
|
|
if (BC->isX86() && BC->HasRelocations &&
|
2020-05-07 23:00:29 -07:00
|
|
|
opts::AlignMacroOpFusion == MFT_HOT && !ProfileReader) {
|
2018-04-13 15:46:19 -07:00
|
|
|
outs() << "BOLT-INFO: enabling -align-macro-fusion=all since no profile "
|
|
|
|
|
"was specified\n";
|
|
|
|
|
opts::AlignMacroOpFusion = MFT_ALL;
|
|
|
|
|
}
|
2019-03-14 18:51:05 -07:00
|
|
|
|
2019-04-25 17:00:05 -07:00
|
|
|
if (!BC->HasRelocations &&
|
|
|
|
|
opts::ReorderFunctions != ReorderFunctions::RT_NONE) {
|
|
|
|
|
errs() << "BOLT-ERROR: function reordering only works when "
|
|
|
|
|
<< "relocations are enabled\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (opts::ReorderFunctions != ReorderFunctions::RT_NONE &&
|
|
|
|
|
!opts::HotText.getNumOccurrences()) {
|
|
|
|
|
opts::HotText = true;
|
|
|
|
|
} else if (opts::HotText && !BC->HasRelocations) {
|
|
|
|
|
errs() << "BOLT-WARNING: hot text is disabled in non-relocation mode\n";
|
2019-03-14 18:51:05 -07:00
|
|
|
opts::HotText = false;
|
|
|
|
|
}
|
2019-03-15 13:43:36 -07:00
|
|
|
|
|
|
|
|
if (opts::HotText && opts::HotTextMoveSections.getNumOccurrences() == 0) {
|
|
|
|
|
opts::HotTextMoveSections.addValue(".stub");
|
|
|
|
|
opts::HotTextMoveSections.addValue(".mover");
|
2019-04-16 10:39:05 -07:00
|
|
|
opts::HotTextMoveSections.addValue(".never_hugify");
|
2019-03-15 13:43:36 -07:00
|
|
|
}
|
2020-02-24 17:12:41 -08:00
|
|
|
|
|
|
|
|
if (opts::UseOldText && !BC->OldTextSectionAddress) {
|
|
|
|
|
errs() << "BOLT-WARNING: cannot use old .text as the section was not found"
|
|
|
|
|
"\n";
|
|
|
|
|
opts::UseOldText = false;
|
|
|
|
|
}
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (opts::UseOldText && !BC->HasRelocations) {
|
|
|
|
|
errs() << "BOLT-WARNING: cannot use old .text in non-relocation mode\n";
|
|
|
|
|
opts::UseOldText = false;
|
|
|
|
|
}
|
|
|
|
|
|
2020-04-19 15:02:50 -07:00
|
|
|
|
|
|
|
|
if (!opts::AlignText.getNumOccurrences()) {
|
|
|
|
|
opts::AlignText = BC->PageAlign;
|
|
|
|
|
}
|
2020-05-03 15:49:58 -07:00
|
|
|
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (opts::Lite.getNumOccurrences() == 0 && !BC->HasRelocations) {
|
2020-05-03 15:49:58 -07:00
|
|
|
opts::Lite = true;
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (opts::Lite && opts::UseOldText) {
|
|
|
|
|
errs() << "BOLT-WARNING: cannot combine -lite with -use-old-text. "
|
|
|
|
|
"Disabling -use-old-text.\n";
|
|
|
|
|
opts::UseOldText = false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (opts::StrictMode && opts::Lite) {
|
|
|
|
|
errs() << "BOLT-ERROR: -strict and -lite cannot be used at the same time\n";
|
|
|
|
|
exit(1);
|
2020-05-03 15:49:58 -07:00
|
|
|
}
|
2018-04-13 15:46:19 -07:00
|
|
|
}
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
namespace {
|
|
|
|
|
template <typename ELFT>
|
|
|
|
|
int64_t getRelocationAddend(const ELFObjectFile<ELFT> *Obj,
|
|
|
|
|
const RelocationRef &RelRef) {
|
|
|
|
|
int64_t Addend = 0;
|
|
|
|
|
const ELFFile<ELFT> &EF = *Obj->getELFFile();
|
|
|
|
|
DataRefImpl Rel = RelRef.getRawDataRefImpl();
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
const auto *RelocationSection = cantFail(EF.getSection(Rel.d.a));
|
2016-09-27 19:09:38 -07:00
|
|
|
switch (RelocationSection->sh_type) {
|
|
|
|
|
default: llvm_unreachable("unexpected relocation section type");
|
|
|
|
|
case ELF::SHT_REL:
|
|
|
|
|
break;
|
|
|
|
|
case ELF::SHT_RELA: {
|
|
|
|
|
const auto *RelA = Obj->getRela(Rel);
|
|
|
|
|
Addend = RelA->r_addend;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return Addend;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int64_t getRelocationAddend(const ELFObjectFileBase *Obj,
|
|
|
|
|
const RelocationRef &Rel) {
|
|
|
|
|
if (auto *ELF32LE = dyn_cast<ELF32LEObjectFile>(Obj))
|
|
|
|
|
return getRelocationAddend(ELF32LE, Rel);
|
|
|
|
|
if (auto *ELF64LE = dyn_cast<ELF64LEObjectFile>(Obj))
|
|
|
|
|
return getRelocationAddend(ELF64LE, Rel);
|
|
|
|
|
if (auto *ELF32BE = dyn_cast<ELF32BEObjectFile>(Obj))
|
|
|
|
|
return getRelocationAddend(ELF32BE, Rel);
|
|
|
|
|
auto *ELF64BE = cast<ELF64BEObjectFile>(Obj);
|
|
|
|
|
return getRelocationAddend(ELF64BE, Rel);
|
|
|
|
|
}
|
|
|
|
|
} // anonymous namespace
|
|
|
|
|
|
2018-01-24 05:42:11 -08:00
|
|
|
bool RewriteInstance::analyzeRelocation(const RelocationRef &Rel,
|
2019-04-11 17:11:08 -07:00
|
|
|
uint64_t RType,
|
2018-01-24 05:42:11 -08:00
|
|
|
std::string &SymbolName,
|
2018-09-21 12:00:20 -07:00
|
|
|
bool &IsSectionRelocation,
|
2018-01-24 05:42:11 -08:00
|
|
|
uint64_t &SymbolAddress,
|
|
|
|
|
int64_t &Addend,
|
|
|
|
|
uint64_t &ExtractedValue) const {
|
2019-04-11 17:11:08 -07:00
|
|
|
if (!Relocation::isSupported(RType))
|
2018-01-24 05:42:11 -08:00
|
|
|
return false;
|
|
|
|
|
|
2018-03-20 14:34:58 -07:00
|
|
|
const bool IsAArch64 = BC->isAArch64();
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
|
2019-04-11 17:11:08 -07:00
|
|
|
const auto RelSize = Relocation::getSizeForType(RType);
|
2019-04-09 12:29:40 -07:00
|
|
|
|
|
|
|
|
auto Value = BC->getUnsignedValueAtAddress(Rel.getOffset(), RelSize);
|
|
|
|
|
assert(Value && "failed to extract relocated value");
|
|
|
|
|
ExtractedValue = *Value;
|
2018-01-24 05:42:11 -08:00
|
|
|
if (IsAArch64) {
|
2019-04-11 17:11:08 -07:00
|
|
|
ExtractedValue = Relocation::extractValue(RType,
|
2018-01-24 05:42:11 -08:00
|
|
|
ExtractedValue,
|
|
|
|
|
Rel.getOffset());
|
|
|
|
|
}
|
|
|
|
|
|
2018-09-21 12:00:20 -07:00
|
|
|
Addend = getRelocationAddend(InputFile, Rel);
|
|
|
|
|
|
2019-04-11 17:11:08 -07:00
|
|
|
const auto IsPCRelative = Relocation::isPCRelative(RType);
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
const auto PCRelOffset = IsPCRelative && !IsAArch64 ? Rel.getOffset() : 0;
|
2018-09-21 12:00:20 -07:00
|
|
|
bool SkipVerification = false;
|
|
|
|
|
auto SymbolIter = Rel.getSymbol();
|
|
|
|
|
if (SymbolIter == InputFile->symbol_end()) {
|
2019-06-27 03:20:17 -07:00
|
|
|
SymbolAddress = ExtractedValue - Addend + PCRelOffset;
|
2018-09-21 12:00:20 -07:00
|
|
|
auto *RelSymbol = BC->getOrCreateGlobalSymbol(SymbolAddress, "RELSYMat");
|
|
|
|
|
SymbolName = RelSymbol->getName();
|
|
|
|
|
IsSectionRelocation = false;
|
|
|
|
|
} else {
|
|
|
|
|
const auto &Symbol = *SymbolIter;
|
|
|
|
|
SymbolName = cantFail(Symbol.getName());
|
|
|
|
|
SymbolAddress = cantFail(Symbol.getAddress());
|
|
|
|
|
SkipVerification = (cantFail(Symbol.getType()) == SymbolRef::ST_Other);
|
|
|
|
|
// Section symbols are marked as ST_Debug.
|
|
|
|
|
IsSectionRelocation = (cantFail(Symbol.getType()) == SymbolRef::ST_Debug);
|
2019-06-27 03:20:17 -07:00
|
|
|
}
|
2019-11-14 16:07:11 -08:00
|
|
|
// For PIE or dynamic libs, the linker may choose not to put the relocation
|
|
|
|
|
// result at the address if it is a X86_64_64 one because it will emit a
|
|
|
|
|
// dynamic relocation (X86_RELATIVE) for the dynamic linker and loader to
|
|
|
|
|
// resolve it at run time. The static relocation result goes as the addend
|
|
|
|
|
// of the dynamic relocation in this case. We can't verify these cases.
|
|
|
|
|
// FIXME: perhaps we can try to find if it really emitted a corresponding
|
|
|
|
|
// RELATIVE relocation at this offset with the correct value as the addend.
|
|
|
|
|
if (!BC->HasFixedLoadAddress && RelSize == 8)
|
|
|
|
|
SkipVerification = true;
|
2019-06-27 03:20:17 -07:00
|
|
|
|
|
|
|
|
if (IsSectionRelocation && !IsAArch64) {
|
|
|
|
|
auto Section = BC->getSectionForAddress(SymbolAddress);
|
|
|
|
|
assert(Section && "section expected for section relocation");
|
|
|
|
|
SymbolName = "section " + std::string(Section->getName());
|
|
|
|
|
// Convert section symbol relocations to regular relocations inside
|
|
|
|
|
// non-section symbols.
|
|
|
|
|
if (Section->containsAddress(ExtractedValue) && !IsPCRelative) {
|
|
|
|
|
SymbolAddress = ExtractedValue;
|
|
|
|
|
Addend = 0;
|
|
|
|
|
} else {
|
|
|
|
|
Addend = ExtractedValue - (SymbolAddress - PCRelOffset);
|
2018-09-21 12:00:20 -07:00
|
|
|
}
|
|
|
|
|
}
|
2018-01-24 05:42:11 -08:00
|
|
|
|
|
|
|
|
// If no symbol has been found or if it is a relocation requiring the
|
|
|
|
|
// creation of a GOT entry, do not link against the symbol but against
|
|
|
|
|
// whatever address was extracted from the instruction itself. We are
|
|
|
|
|
// not creating a GOT entry as this was already processed by the linker.
|
2018-10-11 18:12:09 -07:00
|
|
|
// For GOT relocs, do not subtract addend as the addend does not refer
|
|
|
|
|
// to this instruction's target, but it refers to the target in the GOT
|
|
|
|
|
// entry.
|
2019-04-11 17:11:08 -07:00
|
|
|
if (Relocation::isGOT(RType)) {
|
2018-10-11 18:12:09 -07:00
|
|
|
Addend = 0;
|
|
|
|
|
SymbolAddress = ExtractedValue + PCRelOffset;
|
|
|
|
|
} else if (!SymbolAddress) {
|
2018-09-21 12:00:20 -07:00
|
|
|
assert(!IsSectionRelocation);
|
2018-10-11 18:12:09 -07:00
|
|
|
if (ExtractedValue || Addend == 0 || IsPCRelative) {
|
2020-06-30 19:58:43 -07:00
|
|
|
SymbolAddress = truncateToSize(ExtractedValue - Addend + PCRelOffset,
|
|
|
|
|
RelSize);
|
2018-01-24 05:42:11 -08:00
|
|
|
} else {
|
|
|
|
|
// This is weird case. The extracted value is zero but the addend is
|
|
|
|
|
// non-zero and the relocation is not pc-rel. Using the previous logic,
|
|
|
|
|
// the SymbolAddress would end up as a huge number. Seen in
|
|
|
|
|
// exceptions_pic.test.
|
2017-11-14 20:05:11 -08:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: relocation @ 0x"
|
2018-01-24 05:42:11 -08:00
|
|
|
<< Twine::utohexstr(Rel.getOffset())
|
|
|
|
|
<< " value does not match addend for "
|
2017-11-14 20:05:11 -08:00
|
|
|
<< "relocation to undefined symbol.\n");
|
2018-01-24 05:42:11 -08:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2018-07-30 10:29:47 -07:00
|
|
|
auto verifyExtractedValue = [&]() {
|
2018-09-21 12:00:20 -07:00
|
|
|
if (SkipVerification)
|
|
|
|
|
return true;
|
|
|
|
|
|
2018-07-30 10:29:47 -07:00
|
|
|
if (IsAArch64)
|
|
|
|
|
return true;
|
|
|
|
|
|
|
|
|
|
if (SymbolName == "__hot_start" || SymbolName == "__hot_end")
|
|
|
|
|
return true;
|
|
|
|
|
|
2019-04-11 17:11:08 -07:00
|
|
|
if (Relocation::isTLS(RType))
|
2018-07-30 10:29:47 -07:00
|
|
|
return true;
|
|
|
|
|
|
2020-06-30 19:58:43 -07:00
|
|
|
if (RType == ELF::R_X86_64_PLT32)
|
|
|
|
|
return true;
|
|
|
|
|
|
2018-07-30 10:29:47 -07:00
|
|
|
return truncateToSize(ExtractedValue, RelSize) ==
|
|
|
|
|
truncateToSize(SymbolAddress + Addend - PCRelOffset, RelSize);
|
|
|
|
|
};
|
2018-01-24 05:42:11 -08:00
|
|
|
|
2018-07-30 10:29:47 -07:00
|
|
|
assert(verifyExtractedValue() && "mismatched extracted relocation value");
|
2018-01-24 05:42:11 -08:00
|
|
|
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
2020-02-24 17:10:02 -08:00
|
|
|
void RewriteInstance::processRelocations() {
|
|
|
|
|
if (!BC->HasRelocations)
|
|
|
|
|
return;
|
|
|
|
|
|
2020-06-23 12:22:58 -07:00
|
|
|
// Read dynamic relocation first as their presence affects the way we process
|
|
|
|
|
// static relocations. E.g. we will ignore a static relocation at an address
|
|
|
|
|
// that is a subject to dynamic relocation processing.
|
2020-02-24 17:10:02 -08:00
|
|
|
for (const auto &Section : InputFile->sections()) {
|
2020-06-23 12:22:58 -07:00
|
|
|
if (Section.relocation_begin() != Section.relocation_end() &&
|
|
|
|
|
BinarySection(*BC, Section).isAllocatable()) {
|
|
|
|
|
readDynamicRelocations(Section);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
for (const auto &Section : InputFile->sections()) {
|
|
|
|
|
if (Section.getRelocatedSection() != InputFile->section_end() &&
|
|
|
|
|
!BinarySection(*BC, Section).isAllocatable()) {
|
2020-02-24 17:10:02 -08:00
|
|
|
readRelocations(Section);
|
2020-06-23 12:22:58 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void RewriteInstance::readDynamicRelocations(const SectionRef &Section) {
|
|
|
|
|
if (!BC->DynamicRelocationsAddress || !BC->DynamicRelocationsSize)
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
assert(BinarySection(*BC, Section).isAllocatable() && "allocatable expected");
|
|
|
|
|
|
|
|
|
|
if (Section.getAddress() < *BC->DynamicRelocationsAddress ||
|
|
|
|
|
Section.getAddress() >=
|
|
|
|
|
*BC->DynamicRelocationsAddress + *BC->DynamicRelocationsSize)
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
assert(Section.getAddress() + Section.getSize() <=
|
|
|
|
|
*BC->DynamicRelocationsAddress + *BC->DynamicRelocationsSize &&
|
|
|
|
|
"dynamic relocations section runs over ELF dynamic boundaries");
|
|
|
|
|
|
|
|
|
|
StringRef SectionName;
|
|
|
|
|
Section.getName(SectionName);
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: reading relocations for section "
|
|
|
|
|
<< SectionName << ":\n");
|
|
|
|
|
|
|
|
|
|
for (const auto &Rel : Section.relocations()) {
|
|
|
|
|
auto SymbolIter = Rel.getSymbol();
|
|
|
|
|
|
|
|
|
|
StringRef SymbolName = "<none>";
|
|
|
|
|
MCSymbol *Symbol = nullptr;
|
|
|
|
|
uint64_t SymbolAddress = 0;
|
|
|
|
|
const uint64_t Addend = getRelocationAddend(InputFile, Rel);
|
|
|
|
|
|
|
|
|
|
if (SymbolIter != InputFile->symbol_end()) {
|
|
|
|
|
SymbolName = cantFail(SymbolIter->getName());
|
|
|
|
|
auto *BD = BC->getBinaryDataByName(SymbolName);
|
|
|
|
|
Symbol = BD ? BD->getSymbol() : nullptr;
|
|
|
|
|
SymbolAddress = cantFail(SymbolIter->getAddress());
|
|
|
|
|
(void)SymbolAddress;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
DEBUG(
|
|
|
|
|
SmallString<16> TypeName;
|
|
|
|
|
Rel.getTypeName(TypeName);
|
|
|
|
|
dbgs() << "BOLT-DEBUG: dynamic relocation at 0x"
|
|
|
|
|
<< Twine::utohexstr(Rel.getOffset()) << " : " << TypeName
|
|
|
|
|
<< " : " << SymbolName << " : " << Twine::utohexstr(SymbolAddress)
|
|
|
|
|
<< " : + 0x" << Twine::utohexstr(Addend) << '\n'
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
BC->addDynamicRelocation(Rel.getOffset(), Symbol, Rel.getType(), Addend);
|
2020-02-24 17:10:02 -08:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void RewriteInstance::readRelocations(const SectionRef &Section) {
|
2016-09-27 19:09:38 -07:00
|
|
|
StringRef SectionName;
|
|
|
|
|
Section.getName(SectionName);
|
2019-06-28 09:21:27 -07:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: reading relocations for section "
|
2016-09-27 19:09:38 -07:00
|
|
|
<< SectionName << ":\n");
|
2018-09-21 12:00:20 -07:00
|
|
|
if (BinarySection(*BC, Section).isAllocatable()) {
|
2016-09-27 19:09:38 -07:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: ignoring runtime relocations\n");
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
auto SecIter = Section.getRelocatedSection();
|
|
|
|
|
assert(SecIter != InputFile->section_end() && "relocated section expected");
|
|
|
|
|
auto RelocatedSection = *SecIter;
|
|
|
|
|
|
|
|
|
|
StringRef RelocatedSectionName;
|
|
|
|
|
RelocatedSection.getName(RelocatedSectionName);
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: relocated section is "
|
|
|
|
|
<< RelocatedSectionName << '\n');
|
|
|
|
|
|
2018-09-21 12:00:20 -07:00
|
|
|
if (!BinarySection(*BC, RelocatedSection).isAllocatable()) {
|
2016-09-27 19:09:38 -07:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: ignoring relocations against "
|
|
|
|
|
<< "non-allocatable section\n");
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
const bool SkipRelocs = StringSwitch<bool>(RelocatedSectionName)
|
2019-06-28 09:21:27 -07:00
|
|
|
.Cases(".plt", ".rela.plt", ".got.plt", ".eh_frame", ".gcc_except_table",
|
|
|
|
|
true)
|
2016-09-27 19:09:38 -07:00
|
|
|
.Default(false);
|
|
|
|
|
if (SkipRelocs) {
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: ignoring relocations against known section\n");
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
2018-03-20 14:34:58 -07:00
|
|
|
const bool IsAArch64 = BC->isAArch64();
|
2018-01-24 05:42:11 -08:00
|
|
|
const bool IsFromCode = RelocatedSection.isText();
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2017-11-14 20:05:11 -08:00
|
|
|
auto printRelocationInfo = [&](const RelocationRef &Rel,
|
|
|
|
|
StringRef SymbolName,
|
|
|
|
|
uint64_t SymbolAddress,
|
|
|
|
|
uint64_t Addend,
|
|
|
|
|
uint64_t ExtractedValue) {
|
|
|
|
|
SmallString<16> TypeName;
|
|
|
|
|
Rel.getTypeName(TypeName);
|
|
|
|
|
const auto Address = SymbolAddress + Addend;
|
|
|
|
|
auto Section = BC->getSectionForAddress(SymbolAddress);
|
|
|
|
|
dbgs() << "Relocation: offset = 0x"
|
|
|
|
|
<< Twine::utohexstr(Rel.getOffset())
|
2018-07-12 10:13:03 -07:00
|
|
|
<< "; type = " << TypeName
|
2017-11-14 20:05:11 -08:00
|
|
|
<< "; value = 0x" << Twine::utohexstr(ExtractedValue)
|
|
|
|
|
<< "; symbol = " << SymbolName
|
|
|
|
|
<< " (" << (Section ? Section->getName() : "") << ")"
|
|
|
|
|
<< "; symbol address = 0x" << Twine::utohexstr(SymbolAddress)
|
|
|
|
|
<< "; addend = 0x" << Twine::utohexstr(Addend)
|
|
|
|
|
<< "; address = 0x" << Twine::utohexstr(Address)
|
|
|
|
|
<< "; in = ";
|
2019-04-03 15:52:01 -07:00
|
|
|
if (auto *Func = BC->getBinaryFunctionContainingAddress(Rel.getOffset(),
|
|
|
|
|
false,
|
|
|
|
|
IsAArch64)) {
|
2017-11-14 20:05:11 -08:00
|
|
|
dbgs() << Func->getPrintName() << "\n";
|
|
|
|
|
} else {
|
|
|
|
|
dbgs() << BC->getSectionForAddress(Rel.getOffset())->getName() << "\n";
|
|
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
for (const auto &Rel : Section.relocations()) {
|
|
|
|
|
SmallString<16> TypeName;
|
|
|
|
|
Rel.getTypeName(TypeName);
|
2019-04-11 17:11:08 -07:00
|
|
|
auto RType = Rel.getType();
|
|
|
|
|
|
|
|
|
|
// Adjust the relocation type as the linker might have skewed it.
|
|
|
|
|
if (BC->isX86() && (RType & ELF::R_X86_64_converted_reloc_bit)) {
|
|
|
|
|
if (opts::Verbosity >= 1) {
|
|
|
|
|
dbgs() << "BOLT-WARNING: ignoring R_X86_64_converted_reloc_bit\n";
|
|
|
|
|
}
|
|
|
|
|
RType &= ~ELF::R_X86_64_converted_reloc_bit;
|
|
|
|
|
}
|
2018-01-24 05:42:11 -08:00
|
|
|
|
2019-06-28 09:21:27 -07:00
|
|
|
// No special handling required for TLS relocations.
|
|
|
|
|
if (Relocation::isTLS(RType))
|
|
|
|
|
continue;
|
|
|
|
|
|
2020-06-23 12:22:58 -07:00
|
|
|
if (BC->getDynamicRelocationAt(Rel.getOffset())) {
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: address 0x"
|
|
|
|
|
<< Twine::utohexstr(Rel.getOffset())
|
|
|
|
|
<< " has a dynamic relocation against it. Ignoring static "
|
|
|
|
|
"relocation.\n");
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
2018-01-24 05:42:11 -08:00
|
|
|
std::string SymbolName;
|
|
|
|
|
uint64_t SymbolAddress;
|
|
|
|
|
int64_t Addend;
|
|
|
|
|
uint64_t ExtractedValue;
|
2018-09-21 12:00:20 -07:00
|
|
|
bool IsSectionRelocation;
|
2018-01-24 05:42:11 -08:00
|
|
|
if (!analyzeRelocation(Rel,
|
2019-04-11 17:11:08 -07:00
|
|
|
RType,
|
2018-01-24 05:42:11 -08:00
|
|
|
SymbolName,
|
2018-09-21 12:00:20 -07:00
|
|
|
IsSectionRelocation,
|
2018-01-24 05:42:11 -08:00
|
|
|
SymbolAddress,
|
|
|
|
|
Addend,
|
|
|
|
|
ExtractedValue)) {
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: skipping relocation @ offset = 0x"
|
2018-09-21 12:00:20 -07:00
|
|
|
<< Twine::utohexstr(Rel.getOffset())
|
|
|
|
|
<< "; type name = " << TypeName
|
|
|
|
|
<< '\n');
|
2017-09-13 11:21:47 -07:00
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
2018-01-24 05:42:11 -08:00
|
|
|
const auto Address = SymbolAddress + Addend;
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2019-06-28 09:21:27 -07:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: ";
|
|
|
|
|
printRelocationInfo(Rel,
|
|
|
|
|
SymbolName,
|
|
|
|
|
SymbolAddress,
|
|
|
|
|
Addend,
|
|
|
|
|
ExtractedValue));
|
2016-09-27 19:09:38 -07:00
|
|
|
|
|
|
|
|
BinaryFunction *ContainingBF = nullptr;
|
|
|
|
|
if (IsFromCode) {
|
2018-01-24 05:42:11 -08:00
|
|
|
ContainingBF =
|
2019-04-03 15:52:01 -07:00
|
|
|
BC->getBinaryFunctionContainingAddress(Rel.getOffset(),
|
|
|
|
|
/*CheckPastEnd*/ false,
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
/*UseMaxSize*/ true);
|
2016-09-27 19:09:38 -07:00
|
|
|
assert(ContainingBF && "cannot find function for address in code");
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (!IsAArch64 && !ContainingBF->containsAddress(Rel.getOffset())) {
|
|
|
|
|
if (opts::Verbosity >= 1) {
|
|
|
|
|
outs() << "BOLT-INFO: " << *ContainingBF
|
|
|
|
|
<< " has relocations in padding area\n";
|
|
|
|
|
}
|
|
|
|
|
ContainingBF->setSize(ContainingBF->getMaxSize());
|
|
|
|
|
ContainingBF->setSimple(false);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// PC-relative relocations from data to code are tricky since the original
|
|
|
|
|
// information is typically lost after linking even with '--emit-relocs'.
|
|
|
|
|
// They are normally used by PIC-style jump tables and reference both
|
|
|
|
|
// the jump table and jump destination by computing the difference
|
|
|
|
|
// between the two. If we blindly apply the relocation it will appear
|
|
|
|
|
// that it references an arbitrary location in the code, possibly even
|
|
|
|
|
// in a different function from that containing the jump table.
|
2019-04-11 17:11:08 -07:00
|
|
|
if (!IsAArch64 && Relocation::isPCRelative(RType)) {
|
2016-09-27 19:09:38 -07:00
|
|
|
// Just register the fact that we have PC-relative relocation at a given
|
|
|
|
|
// address. The actual referenced label/address cannot be determined
|
|
|
|
|
// from linker data alone.
|
2019-11-08 14:41:31 -08:00
|
|
|
if (!IsFromCode) {
|
2019-11-19 18:52:08 -08:00
|
|
|
BC->addPCRelativeDataRelocation(Rel.getOffset());
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
2018-01-24 05:42:11 -08:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: not creating PC-relative relocation at 0x"
|
2018-04-20 20:03:31 -07:00
|
|
|
<< Twine::utohexstr(Rel.getOffset()) << " for " << SymbolName
|
2018-01-24 05:42:11 -08:00
|
|
|
<< "\n");
|
2016-09-27 19:09:38 -07:00
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
2020-06-18 11:10:41 -07:00
|
|
|
bool ForceRelocation = BC->forceSymbolRelocations(SymbolName);
|
2018-07-12 10:13:03 -07:00
|
|
|
|
2019-04-11 17:11:08 -07:00
|
|
|
if (BC->isAArch64() && RType == ELF::R_AARCH64_ADR_GOT_PAGE)
|
2018-07-12 10:13:03 -07:00
|
|
|
ForceRelocation = true;
|
|
|
|
|
|
2018-01-24 05:42:11 -08:00
|
|
|
auto RefSection = BC->getSectionForAddress(SymbolAddress);
|
2016-09-27 19:09:38 -07:00
|
|
|
if (!RefSection && !ForceRelocation) {
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: cannot determine referenced section.\n");
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
2018-01-24 05:42:11 -08:00
|
|
|
const bool IsToCode = RefSection && RefSection->isText();
|
2016-09-27 19:09:38 -07:00
|
|
|
|
|
|
|
|
// Occasionally we may see a reference past the last byte of the function
|
|
|
|
|
// typically as a result of __builtin_unreachable(). Check it here.
|
2019-04-03 15:52:01 -07:00
|
|
|
auto *ReferencedBF = BC->getBinaryFunctionContainingAddress(
|
2018-04-12 10:07:11 -07:00
|
|
|
Address, /*CheckPastEnd*/ true, /*UseMaxSize*/ IsAArch64);
|
2018-05-14 11:10:26 -07:00
|
|
|
|
|
|
|
|
if (!IsSectionRelocation) {
|
2019-04-03 15:52:01 -07:00
|
|
|
if (auto *BF = BC->getBinaryFunctionContainingAddress(SymbolAddress)) {
|
2018-05-14 11:10:26 -07:00
|
|
|
if (BF != ReferencedBF) {
|
|
|
|
|
// It's possible we are referencing a function without referencing any
|
|
|
|
|
// code, e.g. when taking a bitmask action on a function address.
|
|
|
|
|
errs() << "BOLT-WARNING: non-standard function reference (e.g. "
|
|
|
|
|
"bitmask) detected against function " << *BF;
|
|
|
|
|
if (IsFromCode) {
|
|
|
|
|
errs() << " from function " << *ContainingBF << '\n';
|
|
|
|
|
} else {
|
|
|
|
|
errs() << " from data section at 0x"
|
|
|
|
|
<< Twine::utohexstr(Rel.getOffset()) << '\n';
|
|
|
|
|
}
|
|
|
|
|
DEBUG(printRelocationInfo(Rel,
|
|
|
|
|
SymbolName,
|
|
|
|
|
SymbolAddress,
|
|
|
|
|
Addend,
|
|
|
|
|
ExtractedValue)
|
|
|
|
|
);
|
|
|
|
|
ReferencedBF = BF;
|
|
|
|
|
}
|
|
|
|
|
}
|
2019-06-27 03:20:17 -07:00
|
|
|
} else if (ReferencedBF) {
|
|
|
|
|
assert(RefSection && "section expected for section relocation");
|
|
|
|
|
if (ReferencedBF->getSection() != *RefSection) {
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: ignoring false function reference\n");
|
|
|
|
|
ReferencedBF = nullptr;
|
|
|
|
|
}
|
2018-05-14 11:10:26 -07:00
|
|
|
}
|
|
|
|
|
|
2019-09-17 14:24:31 -07:00
|
|
|
// Workaround for a member function pointer de-virtualization bug. We check
|
|
|
|
|
// if a non-pc-relative relocation in the code is pointing to (fptr - 1).
|
|
|
|
|
if (IsToCode && ContainingBF && !Relocation::isPCRelative(RType) &&
|
|
|
|
|
(!ReferencedBF || (ReferencedBF->getAddress() != Address))) {
|
|
|
|
|
if (const auto *RogueBF = BC->getBinaryFunctionAtAddress(Address + 1)) {
|
|
|
|
|
// Do an extra check that the function was referenced previously.
|
|
|
|
|
// It's a linear search, but it should rarely happen.
|
|
|
|
|
bool Found{false};
|
|
|
|
|
for (const auto &RelKV : ContainingBF->Relocations) {
|
|
|
|
|
const auto &Rel = RelKV.second;
|
|
|
|
|
if (Rel.Symbol == RogueBF->getSymbol() &&
|
|
|
|
|
!Relocation::isPCRelative(Rel.Type)) {
|
|
|
|
|
Found = true;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (Found) {
|
|
|
|
|
errs() << "BOLT-WARNING: detected possible compiler "
|
|
|
|
|
"de-virtualization bug: -1 addend used with "
|
|
|
|
|
"non-pc-relative relocation against function "
|
|
|
|
|
<< *RogueBF << " in function " << *ContainingBF << '\n';
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
MCSymbol *ReferencedSymbol = nullptr;
|
|
|
|
|
if (ForceRelocation) {
|
2019-04-11 17:11:08 -07:00
|
|
|
auto Name = Relocation::isGOT(RType) ? "Zero" : SymbolName;
|
2017-11-14 20:05:11 -08:00
|
|
|
ReferencedSymbol = BC->registerNameAtAddress(Name, 0, 0, 0);
|
2018-01-24 05:42:11 -08:00
|
|
|
SymbolAddress = 0;
|
2019-04-11 17:11:08 -07:00
|
|
|
if (Relocation::isGOT(RType))
|
2018-08-28 18:15:13 -07:00
|
|
|
Addend = Address;
|
2018-07-12 10:13:03 -07:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: forcing relocation against symbol "
|
|
|
|
|
<< SymbolName << " with addend " << Addend << '\n');
|
2016-09-27 19:09:38 -07:00
|
|
|
} else if (ReferencedBF) {
|
2018-05-14 11:10:26 -07:00
|
|
|
ReferencedSymbol = ReferencedBF->getSymbol();
|
2019-11-18 14:08:17 -08:00
|
|
|
uint64_t RefFunctionOffset = 0;
|
2018-05-14 11:10:26 -07:00
|
|
|
|
|
|
|
|
// Adjust the point of reference to a code location inside a function.
|
|
|
|
|
if (ReferencedBF->containsAddress(Address, /*UseMaxSize = */true)) {
|
|
|
|
|
RefFunctionOffset = Address - ReferencedBF->getAddress();
|
|
|
|
|
if (RefFunctionOffset) {
|
2020-06-22 13:05:13 -07:00
|
|
|
if (ContainingBF && ContainingBF != ReferencedBF) {
|
|
|
|
|
ReferencedSymbol =
|
|
|
|
|
ReferencedBF->addEntryPointAtOffset(RefFunctionOffset);
|
|
|
|
|
} else {
|
|
|
|
|
ReferencedSymbol =
|
|
|
|
|
ReferencedBF->getOrCreateLocalLabel(Address,
|
|
|
|
|
/*CreatePastEnd =*/ true);
|
|
|
|
|
ReferencedBF->registerReferencedOffset(RefFunctionOffset);
|
|
|
|
|
}
|
2019-06-28 09:21:27 -07:00
|
|
|
if (opts::Verbosity > 1 && !RelocatedSection.isReadOnly()) {
|
|
|
|
|
dbgs() << "BOLT-WARNING: writable reference into the middle of "
|
|
|
|
|
<< "the function " << *ReferencedBF
|
|
|
|
|
<< " detected at address 0x"
|
|
|
|
|
<< Twine::utohexstr(Rel.getOffset()) << '\n';
|
|
|
|
|
}
|
2018-05-14 11:10:26 -07:00
|
|
|
}
|
|
|
|
|
SymbolAddress = Address;
|
|
|
|
|
Addend = 0;
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
2018-05-14 11:10:26 -07:00
|
|
|
DEBUG(
|
|
|
|
|
dbgs() << " referenced function " << *ReferencedBF;
|
|
|
|
|
if (Address != ReferencedBF->getAddress())
|
|
|
|
|
dbgs() << " at offset 0x" << Twine::utohexstr(RefFunctionOffset);
|
|
|
|
|
dbgs() << '\n'
|
|
|
|
|
);
|
2016-09-27 19:09:38 -07:00
|
|
|
} else {
|
2019-06-28 09:21:27 -07:00
|
|
|
if (IsToCode && SymbolAddress) {
|
2016-09-27 19:09:38 -07:00
|
|
|
// This can happen e.g. with PIC-style jump tables.
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: no corresponding function for "
|
|
|
|
|
"relocation against code\n");
|
|
|
|
|
}
|
2017-11-14 20:05:11 -08:00
|
|
|
|
2018-03-20 14:34:58 -07:00
|
|
|
// In AArch64 there are zero reasons to keep a reference to the
|
|
|
|
|
// "original" symbol plus addend. The original symbol is probably just a
|
|
|
|
|
// section symbol. If we are here, this means we are probably accessing
|
|
|
|
|
// data, so it is imperative to keep the original address.
|
|
|
|
|
if (IsAArch64) {
|
|
|
|
|
SymbolName = ("SYMBOLat0x" + Twine::utohexstr(Address)).str();
|
|
|
|
|
SymbolAddress = Address;
|
|
|
|
|
Addend = 0;
|
|
|
|
|
}
|
|
|
|
|
|
2020-03-03 15:51:24 -08:00
|
|
|
if (auto *BD = BC->getBinaryDataContainingAddress(SymbolAddress)) {
|
2018-03-20 14:34:58 -07:00
|
|
|
// Note: this assertion is trying to check sanity of BinaryData objects
|
|
|
|
|
// but AArch64 has inferred and incomplete object locations coming from
|
|
|
|
|
// GOT/TLS or any other non-trivial relocation (that requires creation
|
|
|
|
|
// of sections and whose symbol address is not really what should be
|
|
|
|
|
// encoded in the instruction). So we essentially disabled this check
|
|
|
|
|
// for AArch64 and live with bogus names for objects.
|
2018-04-20 20:03:31 -07:00
|
|
|
assert((IsAArch64 ||
|
|
|
|
|
IsSectionRelocation ||
|
|
|
|
|
BD->nameStartsWith(SymbolName) ||
|
|
|
|
|
BD->nameStartsWith("PG" + SymbolName) ||
|
|
|
|
|
(BD->nameStartsWith("ANONYMOUS") &&
|
|
|
|
|
(BD->getSectionName().startswith(".plt") ||
|
|
|
|
|
BD->getSectionName().endswith(".plt")))) &&
|
2018-06-14 14:27:20 -07:00
|
|
|
"BOLT symbol names of all non-section relocations must match "
|
2018-04-20 20:03:31 -07:00
|
|
|
"up with symbol names referenced in the relocation");
|
|
|
|
|
|
2020-03-03 15:51:24 -08:00
|
|
|
if (IsSectionRelocation) {
|
2019-11-18 14:08:17 -08:00
|
|
|
BC->markAmbiguousRelocations(*BD, Address);
|
2018-04-20 20:03:31 -07:00
|
|
|
}
|
|
|
|
|
|
2017-11-14 20:05:11 -08:00
|
|
|
ReferencedSymbol = BD->getSymbol();
|
|
|
|
|
Addend += (SymbolAddress - BD->getAddress());
|
|
|
|
|
SymbolAddress = BD->getAddress();
|
|
|
|
|
assert(Address == SymbolAddress + Addend);
|
|
|
|
|
} else {
|
|
|
|
|
// These are mostly local data symbols but undefined symbols
|
|
|
|
|
// in relocation sections can get through here too, from .plt.
|
2018-04-20 20:03:31 -07:00
|
|
|
assert((IsAArch64 ||
|
|
|
|
|
IsSectionRelocation ||
|
|
|
|
|
BC->getSectionNameForAddress(SymbolAddress)->startswith(".plt"))
|
|
|
|
|
&& "known symbols should not resolve to anonymous locals");
|
|
|
|
|
|
2018-09-21 12:00:20 -07:00
|
|
|
if (IsSectionRelocation) {
|
|
|
|
|
ReferencedSymbol = BC->getOrCreateGlobalSymbol(SymbolAddress,
|
|
|
|
|
"SYMBOLat");
|
|
|
|
|
} else {
|
|
|
|
|
auto Symbol = *Rel.getSymbol();
|
|
|
|
|
const uint64_t SymbolSize =
|
|
|
|
|
IsAArch64 ? 0 : ELFSymbolRef(Symbol).getSize();
|
|
|
|
|
const uint64_t SymbolAlignment =
|
|
|
|
|
IsAArch64 ? 1 : Symbol.getAlignment();
|
|
|
|
|
const auto SymbolFlags = Symbol.getFlags();
|
2017-11-14 20:05:11 -08:00
|
|
|
std::string Name;
|
2018-09-21 12:00:20 -07:00
|
|
|
if (SymbolFlags & SymbolRef::SF_Global) {
|
2017-11-14 20:05:11 -08:00
|
|
|
Name = SymbolName;
|
2018-03-20 14:34:58 -07:00
|
|
|
} else {
|
2019-10-08 11:03:33 -07:00
|
|
|
if (StringRef(SymbolName).startswith(
|
|
|
|
|
BC->AsmInfo->getPrivateGlobalPrefix())) {
|
2020-02-17 14:37:46 -08:00
|
|
|
Name = NR.uniquify("PG" + SymbolName);
|
2019-10-08 11:03:33 -07:00
|
|
|
} else {
|
2020-02-17 14:37:46 -08:00
|
|
|
Name = NR.uniquify(SymbolName);
|
2019-10-08 11:03:33 -07:00
|
|
|
}
|
2018-03-20 14:34:58 -07:00
|
|
|
}
|
2017-11-14 20:05:11 -08:00
|
|
|
ReferencedSymbol = BC->registerNameAtAddress(Name,
|
|
|
|
|
SymbolAddress,
|
|
|
|
|
SymbolSize,
|
2018-04-20 20:03:31 -07:00
|
|
|
SymbolAlignment,
|
|
|
|
|
SymbolFlags);
|
|
|
|
|
}
|
|
|
|
|
|
2020-03-03 15:51:24 -08:00
|
|
|
if (IsSectionRelocation) {
|
2018-04-20 20:03:31 -07:00
|
|
|
auto *BD = BC->getBinaryDataByName(ReferencedSymbol->getName());
|
2019-11-18 14:08:17 -08:00
|
|
|
BC->markAmbiguousRelocations(*BD, Address);
|
2017-11-14 20:05:11 -08:00
|
|
|
}
|
|
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
|
|
|
|
|
2017-11-14 20:05:11 -08:00
|
|
|
auto checkMaxDataRelocations = [&]() {
|
|
|
|
|
++NumDataRelocations;
|
|
|
|
|
if (opts::MaxDataRelocations &&
|
|
|
|
|
NumDataRelocations + 1 == opts::MaxDataRelocations) {
|
|
|
|
|
dbgs() << "BOLT-DEBUG: processing ending on data relocation "
|
|
|
|
|
<< NumDataRelocations << ": ";
|
|
|
|
|
printRelocationInfo(Rel,
|
|
|
|
|
ReferencedSymbol->getName(),
|
|
|
|
|
SymbolAddress,
|
|
|
|
|
Addend,
|
|
|
|
|
ExtractedValue);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return (!opts::MaxDataRelocations ||
|
|
|
|
|
NumDataRelocations < opts::MaxDataRelocations);
|
|
|
|
|
};
|
|
|
|
|
|
2019-06-28 09:21:27 -07:00
|
|
|
if ((RefSection && refersToReorderedSection(RefSection)) ||
|
2018-07-12 10:13:03 -07:00
|
|
|
(opts::ForceToDataRelocations && checkMaxDataRelocations()))
|
|
|
|
|
ForceRelocation = true;
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
if (IsFromCode) {
|
2019-06-28 09:21:27 -07:00
|
|
|
ContainingBF->addRelocation(Rel.getOffset(),
|
|
|
|
|
ReferencedSymbol,
|
|
|
|
|
RType,
|
|
|
|
|
Addend,
|
|
|
|
|
ExtractedValue);
|
2018-07-12 10:13:03 -07:00
|
|
|
} else if (IsToCode || ForceRelocation) {
|
2019-06-28 09:21:27 -07:00
|
|
|
BC->addRelocation(Rel.getOffset(), ReferencedSymbol, RType, Addend,
|
|
|
|
|
ExtractedValue);
|
2016-09-27 19:09:38 -07:00
|
|
|
} else {
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: ignoring relocation from data to data\n");
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2020-05-03 13:54:45 -07:00
|
|
|
void RewriteInstance::selectFunctionsToProcess() {
|
|
|
|
|
// Extend the list of functions to process or skip from a file.
|
|
|
|
|
auto populateFunctionNames = [](cl::opt<std::string> &FunctionNamesFile,
|
|
|
|
|
cl::list<std::string> &FunctionNames) {
|
|
|
|
|
if (FunctionNamesFile.empty())
|
|
|
|
|
return;
|
|
|
|
|
std::ifstream FuncsFile(FunctionNamesFile, std::ios::in);
|
|
|
|
|
std::string FuncName;
|
|
|
|
|
while (std::getline(FuncsFile, FuncName)) {
|
|
|
|
|
FunctionNames.push_back(FuncName);
|
|
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
populateFunctionNames(opts::FunctionNamesFile, opts::ForceFunctionNames);
|
|
|
|
|
populateFunctionNames(opts::SkipFunctionNamesFile, opts::SkipFunctionNames);
|
|
|
|
|
|
|
|
|
|
if (!opts::ForceFunctionNames.empty() && !opts::SkipFunctionNames.empty()) {
|
|
|
|
|
errs() << "BOLT-ERROR: cannot select functions to process and skip at the "
|
|
|
|
|
"same time. Please use only one type of selection.\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
uint64_t NumFunctionsToProcess{0};
|
|
|
|
|
|
|
|
|
|
auto shouldProcess = [&](const BinaryFunction &Function) {
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (opts::MaxFunctions && NumFunctionsToProcess > opts::MaxFunctions) {
|
2020-05-03 13:54:45 -07:00
|
|
|
return false;
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
}
|
2020-05-03 13:54:45 -07:00
|
|
|
|
|
|
|
|
// If the list is not empty, only process functions from the list.
|
|
|
|
|
if (!opts::ForceFunctionNames.empty()) {
|
|
|
|
|
for (auto &Name : opts::ForceFunctionNames) {
|
|
|
|
|
if (Function.hasNameRegex(Name)) {
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
for (auto &Name : opts::SkipFunctionNames) {
|
|
|
|
|
if (Function.hasNameRegex(Name)) {
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2020-05-03 15:49:58 -07:00
|
|
|
if (opts::Lite) {
|
2020-05-07 23:00:29 -07:00
|
|
|
if (ProfileReader && !ProfileReader->mayHaveProfileData(Function)) {
|
2020-05-03 15:49:58 -07:00
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2020-05-03 13:54:45 -07:00
|
|
|
return true;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
for (auto &BFI : BC->getBinaryFunctions()) {
|
|
|
|
|
auto &Function = BFI.second;
|
|
|
|
|
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
// Pseudo functions are explicitely marked by us not to be processed.
|
|
|
|
|
if (Function.isPseudo()) {
|
|
|
|
|
Function.IsIgnored = true;
|
|
|
|
|
Function.HasExternalRefRelocations = true;
|
2020-05-03 13:54:45 -07:00
|
|
|
continue;
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
}
|
2020-05-03 13:54:45 -07:00
|
|
|
|
|
|
|
|
if (!shouldProcess(Function)) {
|
|
|
|
|
DEBUG(dbgs() << "BOLT-INFO: skipping processing of function " << Function
|
|
|
|
|
<< " per user request\n");
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
Function.setIgnored();
|
2020-05-03 13:54:45 -07:00
|
|
|
} else {
|
|
|
|
|
++NumFunctionsToProcess;
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (opts::MaxFunctions && NumFunctionsToProcess == opts::MaxFunctions) {
|
|
|
|
|
outs() << "BOLT-INFO: processing ending on " << Function << '\n';
|
|
|
|
|
}
|
2020-05-03 13:54:45 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-03-14 18:48:05 -07:00
|
|
|
void RewriteInstance::readDebugInfo() {
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
NamedRegionTimer T("readDebugInfo", "read debug info", TimerGroupName,
|
|
|
|
|
TimerGroupDesc, opts::TimeRewrite);
|
2016-03-14 18:48:05 -07:00
|
|
|
if (!opts::UpdateDebugSections)
|
|
|
|
|
return;
|
|
|
|
|
|
2019-04-03 15:52:01 -07:00
|
|
|
BC->preprocessDebugInfo();
|
2016-03-14 18:48:05 -07:00
|
|
|
}
|
|
|
|
|
|
2019-01-15 23:43:40 -08:00
|
|
|
void RewriteInstance::preprocessProfileData() {
|
2020-05-07 23:00:29 -07:00
|
|
|
if (!ProfileReader)
|
|
|
|
|
return;
|
|
|
|
|
|
2019-01-15 23:43:40 -08:00
|
|
|
NamedRegionTimer T("preprocessprofile", "pre-process profile data",
|
|
|
|
|
TimerGroupName, TimerGroupDesc, opts::TimeRewrite);
|
2020-05-07 23:00:29 -07:00
|
|
|
|
|
|
|
|
outs() << "BOLT-INFO: pre-processing profile using "
|
|
|
|
|
<< ProfileReader->getReaderName() << '\n';
|
|
|
|
|
|
|
|
|
|
if (BAT->enabledFor(InputFile)) {
|
|
|
|
|
outs() << "BOLT-INFO: profile collection done on a binary already "
|
|
|
|
|
"processed by BOLT\n";
|
|
|
|
|
ProfileReader->setBAT(&*BAT);
|
2020-05-03 15:49:58 -07:00
|
|
|
}
|
|
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
if (auto E = ProfileReader->preprocessProfile(*BC.get()))
|
|
|
|
|
report_error("cannot pre-process profile", std::move(E));
|
|
|
|
|
|
|
|
|
|
if (!BC->hasSymbolsWithFileName() &&
|
|
|
|
|
ProfileReader->hasLocalsWithFileName() &&
|
|
|
|
|
!opts::AllowStripped) {
|
|
|
|
|
errs() << "BOLT-ERROR: input binary does not have local file symbols "
|
|
|
|
|
"but profile data includes function names with embedded file "
|
|
|
|
|
"names. It appears that the input binary was stripped while a "
|
|
|
|
|
"profiled binary was not. If you know what you are doing and "
|
|
|
|
|
"wish to proceed, use -allow-stripped option.\n";
|
|
|
|
|
exit(1);
|
2019-04-12 17:33:46 -07:00
|
|
|
}
|
2019-01-15 23:43:40 -08:00
|
|
|
}
|
|
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
void RewriteInstance::processProfileDataPreCFG() {
|
|
|
|
|
if (!ProfileReader)
|
|
|
|
|
return;
|
|
|
|
|
|
|
|
|
|
NamedRegionTimer T("processprofile-precfg", "process profile data pre-CFG",
|
|
|
|
|
TimerGroupName, TimerGroupDesc, opts::TimeRewrite);
|
|
|
|
|
|
|
|
|
|
if (auto E = ProfileReader->readProfilePreCFG(*BC.get()))
|
|
|
|
|
report_error("cannot read profile pre-CFG", std::move(E));
|
|
|
|
|
}
|
|
|
|
|
|
2017-12-13 23:12:01 -08:00
|
|
|
void RewriteInstance::processProfileData() {
|
2020-05-07 23:00:29 -07:00
|
|
|
if (!ProfileReader)
|
|
|
|
|
return;
|
|
|
|
|
|
2019-01-15 23:43:40 -08:00
|
|
|
NamedRegionTimer T("processprofile", "process profile data", TimerGroupName,
|
|
|
|
|
TimerGroupDesc, opts::TimeRewrite);
|
2020-05-03 15:49:58 -07:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
if (auto E = ProfileReader->readProfile(*BC.get()))
|
|
|
|
|
report_error("cannot read profile", std::move(E));
|
2017-07-17 11:22:22 -07:00
|
|
|
|
2017-12-13 23:12:01 -08:00
|
|
|
if (!opts::SaveProfile.empty()) {
|
2020-05-07 23:00:29 -07:00
|
|
|
YAMLProfileWriter PW(opts::SaveProfile);
|
2018-04-09 19:10:19 -07:00
|
|
|
PW.writeProfile(*this);
|
2017-07-17 11:22:22 -07:00
|
|
|
}
|
2020-05-07 23:00:29 -07:00
|
|
|
|
|
|
|
|
// Release memory used by profile reader.
|
|
|
|
|
ProfileReader.reset();
|
|
|
|
|
|
|
|
|
|
if (opts::AggregateOnly) {
|
|
|
|
|
exit(0);
|
|
|
|
|
}
|
2017-07-17 11:22:22 -07:00
|
|
|
}
|
|
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
void RewriteInstance::disassembleFunctions() {
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
NamedRegionTimer T("disassembleFunctions", "disassemble functions",
|
|
|
|
|
TimerGroupName, TimerGroupDesc, opts::TimeRewrite);
|
2019-04-03 15:52:01 -07:00
|
|
|
for (auto &BFI : BC->getBinaryFunctions()) {
|
2015-11-23 17:54:18 -08:00
|
|
|
BinaryFunction &Function = BFI.second;
|
|
|
|
|
|
2020-02-10 15:35:11 -08:00
|
|
|
auto FunctionData = Function.getData();
|
2017-10-20 12:11:34 -07:00
|
|
|
if (!FunctionData) {
|
2016-09-27 19:09:38 -07:00
|
|
|
errs() << "BOLT-ERROR: corresponding section is non-executable or "
|
|
|
|
|
<< "empty for function " << Function << '\n';
|
2020-05-03 15:49:58 -07:00
|
|
|
exit(1);
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
|
|
|
|
|
2016-09-15 15:47:10 -07:00
|
|
|
// Treat zero-sized functions as non-simple ones.
|
|
|
|
|
if (Function.getSize() == 0) {
|
|
|
|
|
Function.setSimple(false);
|
|
|
|
|
continue;
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Offset of the function in the file.
|
2017-11-28 09:57:21 -08:00
|
|
|
const auto *FileBegin =
|
2017-10-20 12:11:34 -07:00
|
|
|
reinterpret_cast<const uint8_t*>(InputFile->getData().data());
|
|
|
|
|
Function.setFileOffset(FunctionData->begin() - FileBegin);
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2020-05-03 15:49:58 -07:00
|
|
|
if (!shouldDisassemble(Function)) {
|
|
|
|
|
NamedRegionTimer T("scan", "scan functions", "buildfuncs",
|
|
|
|
|
"Scan Binary Functions", opts::TimeBuild);
|
|
|
|
|
Function.scanExternalRefs();
|
|
|
|
|
Function.setSimple(false);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (!Function.disassemble()) {
|
|
|
|
|
if (opts::processAllFunctions()) {
|
|
|
|
|
BC->exitWithBugReport("function cannot be properly disassembled. "
|
|
|
|
|
"Unable to continue in relocation mode.",
|
|
|
|
|
Function);
|
|
|
|
|
}
|
|
|
|
|
if (opts::Verbosity >= 1) {
|
|
|
|
|
outs() << "BOLT-INFO: could not disassemble function " << Function
|
|
|
|
|
<< ". Will ignore.\n";
|
|
|
|
|
}
|
|
|
|
|
// Forcefully ignore the function.
|
|
|
|
|
Function.setIgnored();
|
|
|
|
|
continue;
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
|
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
if (opts::PrintAll || opts::PrintDisasm)
|
2016-09-02 14:15:29 -07:00
|
|
|
Function.print(outs(), "after disassembly", true);
|
2015-11-23 17:54:18 -08:00
|
|
|
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
BC->processInterproceduralReferences(Function);
|
2018-02-14 12:06:17 -08:00
|
|
|
}
|
|
|
|
|
|
2019-06-12 18:21:02 -07:00
|
|
|
BC->populateJumpTables();
|
|
|
|
|
|
|
|
|
|
for (auto &BFI : BC->getBinaryFunctions()) {
|
|
|
|
|
BinaryFunction &Function = BFI.second;
|
|
|
|
|
|
|
|
|
|
if (!shouldDisassemble(Function))
|
|
|
|
|
continue;
|
|
|
|
|
|
2020-01-14 17:12:03 -08:00
|
|
|
Function.postProcessEntryPoints();
|
2019-06-12 18:21:02 -07:00
|
|
|
Function.postProcessJumpTables();
|
|
|
|
|
}
|
|
|
|
|
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
BC->adjustCodePadding();
|
[BOLT] Add code padding verification
Summary:
In non-relocation mode, we allow data objects to be embedded in the
code. Such objects could be unmarked, and could occupy an area between
functions, the area which is considered to be code padding.
When we disassemble code, we detect references into the padding area
and adjust it, so that it is not overwritten during the code emission.
We assume the reference to be pointing to the beginning of the object.
However, assembly-written functions may reference the middle of an
object and use negative offsets to reference data fields. Thus,
conservatively, we reduce the possibly-overwritten padding area to
a minimum if the object reference was detected.
Since we also allow functions with unknown code in non-relocation mode,
it is possible that we miss references to some objects in code.
To cover such cases, we need to verify the padding area before we
allow to overwrite it.
(cherry picked from FBD16477787)
2019-07-23 20:48:41 -07:00
|
|
|
|
2019-04-03 15:52:01 -07:00
|
|
|
for (auto &BFI : BC->getBinaryFunctions()) {
|
2018-02-14 12:06:17 -08:00
|
|
|
BinaryFunction &Function = BFI.second;
|
|
|
|
|
|
2019-01-15 23:43:40 -08:00
|
|
|
if (!shouldDisassemble(Function))
|
2018-02-14 12:06:17 -08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
if (!Function.isSimple()) {
|
|
|
|
|
assert((!BC->HasRelocations || Function.getSize() == 0) &&
|
|
|
|
|
"unexpected non-simple function in relocation mode");
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
|
|
|
|
|
// Fill in CFI information for this function
|
2018-02-14 12:06:17 -08:00
|
|
|
if (!Function.trapsOnEntry()) {
|
2016-09-27 19:09:38 -07:00
|
|
|
if (!CFIRdWrt->fillCFIInfoFor(Function)) {
|
2018-02-14 12:06:17 -08:00
|
|
|
if (BC->HasRelocations) {
|
2018-06-20 12:03:24 -07:00
|
|
|
BC->exitWithBugReport("unable to fill CFI.", Function);
|
2018-02-14 12:06:17 -08:00
|
|
|
} else {
|
|
|
|
|
errs() << "BOLT-WARNING: unable to fill CFI for function "
|
|
|
|
|
<< Function << ". Skipping.\n";
|
|
|
|
|
Function.setSimple(false);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2016-02-22 18:25:43 -08:00
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Parse LSDA.
|
2018-02-14 12:06:17 -08:00
|
|
|
if (Function.getLSDAAddress() != 0)
|
2018-04-20 20:03:31 -07:00
|
|
|
Function.parseLSDA(getLSDAData(), getLSDAAddress());
|
2020-05-07 23:00:29 -07:00
|
|
|
}
|
|
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
void RewriteInstance::buildFunctionsCFG() {
|
|
|
|
|
NamedRegionTimer T("buildCFG", "buildCFG", "buildfuncs",
|
|
|
|
|
"Build Binary Functions", opts::TimeBuild);
|
2019-07-12 07:25:50 -07:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
// Create annotation indices to allow lock-free execution
|
|
|
|
|
BC->MIB->getOrCreateAnnotationIndex("Offset");
|
|
|
|
|
BC->MIB->getOrCreateAnnotationIndex("JTIndexReg");
|
2019-07-12 07:25:50 -07:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
ParallelUtilities::WorkFuncWithAllocTy WorkFun =
|
|
|
|
|
[&](BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId) {
|
|
|
|
|
if (!BF.buildCFG(AllocId))
|
|
|
|
|
return;
|
2019-07-12 07:25:50 -07:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
if (opts::PrintAll)
|
|
|
|
|
BF.print(outs(), "while building cfg", true);
|
|
|
|
|
};
|
2017-11-28 09:57:21 -08:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
ParallelUtilities::PredicateTy SkipPredicate =
|
|
|
|
|
[&](const BinaryFunction &BF) {
|
|
|
|
|
return !shouldDisassemble(BF) || !BF.isSimple();
|
|
|
|
|
};
|
2019-07-12 07:25:50 -07:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
ParallelUtilities::runOnEachFunctionWithUniqueAllocId(
|
|
|
|
|
*BC, ParallelUtilities::SchedulingPolicy::SP_INST_LINEAR, WorkFun,
|
|
|
|
|
SkipPredicate, "disassembleFunctions-buildCFG",
|
|
|
|
|
/*ForceSequential*/ opts::SequentialDisassembly || opts::PrintAll);
|
2018-06-06 03:17:32 -07:00
|
|
|
|
|
|
|
|
BC->postProcessSymbolTable();
|
2017-11-28 09:57:21 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void RewriteInstance::postProcessFunctions() {
|
|
|
|
|
BC->TotalScore = 0;
|
|
|
|
|
BC->SumExecutionCount = 0;
|
2019-04-03 15:52:01 -07:00
|
|
|
for (auto &BFI : BC->getBinaryFunctions()) {
|
2017-11-28 09:57:21 -08:00
|
|
|
BinaryFunction &Function = BFI.second;
|
|
|
|
|
|
|
|
|
|
if (Function.empty())
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
Function.postProcessCFG();
|
|
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
if (opts::PrintAll || opts::PrintCFG)
|
2016-09-02 14:15:29 -07:00
|
|
|
Function.print(outs(), "after building cfg", true);
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2016-07-01 08:40:56 -07:00
|
|
|
if (opts::DumpDotAll)
|
|
|
|
|
Function.dumpGraphForPass("build-cfg");
|
|
|
|
|
|
2016-05-26 10:58:01 -07:00
|
|
|
if (opts::PrintLoopInfo) {
|
|
|
|
|
Function.calculateLoopInfo();
|
2016-09-02 14:15:29 -07:00
|
|
|
Function.printLoopInfo(outs());
|
2016-05-26 10:58:01 -07:00
|
|
|
}
|
|
|
|
|
|
2017-11-28 09:57:21 -08:00
|
|
|
BC->TotalScore += Function.getFunctionScore();
|
2017-05-01 16:52:54 -07:00
|
|
|
BC->SumExecutionCount += Function.getKnownExecutionCount();
|
2016-01-16 14:58:22 -08:00
|
|
|
}
|
2017-11-14 20:05:11 -08:00
|
|
|
|
|
|
|
|
if (opts::PrintGlobals) {
|
|
|
|
|
outs() << "BOLT-INFO: Global symbols:\n";
|
|
|
|
|
BC->printGlobalSymbols(outs());
|
|
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void RewriteInstance::runOptimizationPasses() {
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
NamedRegionTimer T("runOptimizationPasses", "run optimization passes",
|
|
|
|
|
TimerGroupName, TimerGroupDesc, opts::TimeRewrite);
|
2019-04-03 15:52:01 -07:00
|
|
|
BinaryFunctionPassManager::runAllPasses(*BC);
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
|
|
|
|
|
2017-05-24 18:40:29 -07:00
|
|
|
namespace {
|
|
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
template <typename T>
|
|
|
|
|
std::vector<T> singletonSet(T t) {
|
|
|
|
|
std::vector<T> Vec;
|
|
|
|
|
Vec.push_back(std::move(t));
|
|
|
|
|
return Vec;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
} // anonymous namespace
|
|
|
|
|
|
2019-07-24 14:03:43 -07:00
|
|
|
void RewriteInstance::emitAndLink() {
|
|
|
|
|
NamedRegionTimer T("emitAndLink", "emit and link", TimerGroupName,
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
TimerGroupDesc, opts::TimeRewrite);
|
2015-11-23 17:54:18 -08:00
|
|
|
std::error_code EC;
|
|
|
|
|
|
|
|
|
|
// This is an object file, which we keep for debugging purposes.
|
|
|
|
|
// Once we decide it's useless, we should create it in memory.
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
std::unique_ptr<ToolOutputFile> TempOut =
|
|
|
|
|
llvm::make_unique<ToolOutputFile>(opts::OutputFilename + ".bolt.o",
|
|
|
|
|
EC, sys::fs::F_None);
|
2015-11-23 17:54:18 -08:00
|
|
|
check_error(EC, "cannot create output object file");
|
|
|
|
|
|
|
|
|
|
std::unique_ptr<buffer_ostream> BOS =
|
|
|
|
|
make_unique<buffer_ostream>(TempOut->os());
|
|
|
|
|
raw_pwrite_stream *OS = BOS.get();
|
|
|
|
|
|
|
|
|
|
// Implicitly MCObjectStreamer takes ownership of MCAsmBackend (MAB)
|
|
|
|
|
// and MCCodeEmitter (MCE). ~MCObjectStreamer() will delete these
|
|
|
|
|
// two instances.
|
|
|
|
|
auto MCE = BC->TheTarget->createMCCodeEmitter(*BC->MII, *BC->MRI, *BC->Ctx);
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
auto MAB =
|
|
|
|
|
BC->TheTarget->createMCAsmBackend(*BC->STI, *BC->MRI, MCTargetOptions());
|
|
|
|
|
std::unique_ptr<MCStreamer> Streamer(BC->TheTarget->createMCObjectStreamer(
|
|
|
|
|
*BC->TheTriple, *BC->Ctx, std::unique_ptr<MCAsmBackend>(MAB), *OS,
|
|
|
|
|
std::unique_ptr<MCCodeEmitter>(MCE), *BC->STI,
|
|
|
|
|
/* RelaxAll */ false,
|
|
|
|
|
/* IncrementalLinkerCompatible */ false,
|
|
|
|
|
/* DWARFMustBeAtTheEnd */ false));
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2018-01-23 15:10:24 -08:00
|
|
|
if (EHFrameSection) {
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (opts::UseOldText || opts::StrictMode) {
|
2020-04-19 12:55:43 -07:00
|
|
|
// The section is going to be regenerated from scratch.
|
|
|
|
|
// Empty the contents, but keep the section reference.
|
2020-05-21 16:25:05 -07:00
|
|
|
EHFrameSection->clearContents();
|
2020-04-19 12:55:43 -07:00
|
|
|
} else {
|
|
|
|
|
// Make .eh_frame relocatable.
|
|
|
|
|
relocateEHFrameSection();
|
|
|
|
|
}
|
2016-11-11 14:33:34 -08:00
|
|
|
}
|
|
|
|
|
|
2020-03-11 15:51:32 -07:00
|
|
|
emitBinaryContext(*Streamer, *BC, getOrgSecPrefix());
|
2020-03-06 15:06:37 -08:00
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
Streamer->Finish();
|
|
|
|
|
|
2016-02-08 10:02:48 -08:00
|
|
|
//////////////////////////////////////////////////////////////////////////////
|
2017-05-08 22:51:36 -07:00
|
|
|
// Assign addresses to new sections.
|
2016-02-08 10:02:48 -08:00
|
|
|
//////////////////////////////////////////////////////////////////////////////
|
|
|
|
|
|
2016-03-02 18:40:10 -08:00
|
|
|
if (opts::UpdateDebugSections) {
|
|
|
|
|
// Compute offsets of tables in .debug_line for each compile unit.
|
2019-04-03 15:52:01 -07:00
|
|
|
DebugInfoRewriter->updateLineTableOffsets();
|
2016-03-02 18:40:10 -08:00
|
|
|
}
|
|
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
// Get output object as ObjectFile.
|
|
|
|
|
std::unique_ptr<MemoryBuffer> ObjectMemBuffer =
|
|
|
|
|
MemoryBuffer::getMemBuffer(BOS->str(), "in-memory object file", false);
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
std::unique_ptr<object::ObjectFile> Obj = cantFail(
|
|
|
|
|
object::ObjectFile::createObjectFile(ObjectMemBuffer->getMemBufferRef()),
|
|
|
|
|
"error creating in-memory object");
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2018-03-30 15:49:34 -07:00
|
|
|
auto Resolver = orc::createLegacyLookupResolver(
|
|
|
|
|
[&](const std::string &Name) -> JITSymbol {
|
|
|
|
|
DEBUG(dbgs() << "BOLT: looking for " << Name << "\n");
|
2019-12-17 11:17:31 -08:00
|
|
|
if (BC->EFMM->ObjectsLoaded) {
|
2019-08-02 11:20:13 -07:00
|
|
|
auto Result = OLT->findSymbol(Name, false);
|
|
|
|
|
if (cantFail(Result.getAddress()) == 0) {
|
2019-08-07 16:09:50 -07:00
|
|
|
// Resolve to a PLT entry if possible
|
|
|
|
|
if (auto *I = BC->getBinaryDataByName(Name + "@PLT"))
|
|
|
|
|
return JITSymbol(I->getAddress(), JITSymbolFlags());
|
|
|
|
|
|
|
|
|
|
errs() << "BOLT-ERROR: symbol not found required by runtime "
|
|
|
|
|
"library: "
|
|
|
|
|
<< Name << "\n";
|
2019-08-02 11:20:13 -07:00
|
|
|
exit(1);
|
|
|
|
|
}
|
|
|
|
|
return Result;
|
2019-07-24 14:03:43 -07:00
|
|
|
}
|
2018-04-20 20:03:31 -07:00
|
|
|
if (auto *I = BC->getBinaryDataByName(Name)) {
|
|
|
|
|
const uint64_t Address = I->isMoved() && !I->isJumpTable()
|
|
|
|
|
? I->getOutputAddress()
|
|
|
|
|
: I->getAddress();
|
2019-07-24 14:03:43 -07:00
|
|
|
DEBUG(dbgs() << "Resolved to address 0x" << Twine::utohexstr(Address)
|
|
|
|
|
<< "\n");
|
2018-04-20 20:03:31 -07:00
|
|
|
return JITSymbol(Address, JITSymbolFlags());
|
|
|
|
|
}
|
2019-07-24 14:03:43 -07:00
|
|
|
DEBUG(dbgs() << "Resolved to address 0x0\n");
|
2018-03-30 15:49:34 -07:00
|
|
|
return JITSymbol(nullptr);
|
|
|
|
|
},
|
|
|
|
|
[](Error Err) { cantFail(std::move(Err), "lookup failed"); });
|
2016-09-27 19:09:38 -07:00
|
|
|
Resolver->setAllowsZeroSymbols(true);
|
|
|
|
|
|
2017-05-08 22:51:36 -07:00
|
|
|
MCAsmLayout FinalLayout(
|
2016-09-27 19:09:38 -07:00
|
|
|
static_cast<MCObjectStreamer *>(Streamer.get())->getAssembler());
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2018-03-30 15:49:34 -07:00
|
|
|
SSP.reset(new decltype(SSP)::element_type());
|
|
|
|
|
ES.reset(new decltype(ES)::element_type(*SSP));
|
2019-07-24 14:03:43 -07:00
|
|
|
// Key for our main object created out of the input binary
|
|
|
|
|
auto K = ES->allocateVModule();
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
OLT.reset(new decltype(OLT)::element_type(
|
2018-03-30 15:49:34 -07:00
|
|
|
*ES,
|
|
|
|
|
[this, &Resolver](orc::VModuleKey Key) {
|
|
|
|
|
orc::RTDyldObjectLinkingLayer::Resources R;
|
2019-12-17 11:17:31 -08:00
|
|
|
R.MemMgr = BC->EFMM;
|
2018-03-30 15:49:34 -07:00
|
|
|
R.Resolver = Resolver;
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
// Get memory manager
|
2018-03-30 15:49:34 -07:00
|
|
|
return R;
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
},
|
2018-03-14 15:07:16 -07:00
|
|
|
// Loaded notifier
|
2018-03-30 15:49:34 -07:00
|
|
|
[&](orc::VModuleKey Key, const object::ObjectFile &Obj,
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
const RuntimeDyld::LoadedObjectInfo &) {
|
2019-07-24 14:03:43 -07:00
|
|
|
// Assign addresses to all sections. If key corresponds to the object
|
|
|
|
|
// created by ourselves, call our regular mapping function. If we are
|
|
|
|
|
// loading additional objects as part of runtime libraries for
|
|
|
|
|
// instrumentation, treat them as extra sections.
|
|
|
|
|
if (Key == K) {
|
|
|
|
|
mapFileSections(Key);
|
|
|
|
|
} else {
|
|
|
|
|
mapExtraSections(Key);
|
|
|
|
|
}
|
2018-03-14 15:07:16 -07:00
|
|
|
},
|
|
|
|
|
// Finalized notifier
|
2018-03-30 15:49:34 -07:00
|
|
|
[&](orc::VModuleKey Key) {
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
// Update output addresses based on the new section map and
|
2019-07-24 14:03:43 -07:00
|
|
|
// layout. Only do this for the object created by ourselves.
|
|
|
|
|
if (Key == K)
|
|
|
|
|
updateOutputValues(FinalLayout);
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
}));
|
|
|
|
|
|
|
|
|
|
OLT->setProcessAllSections(true);
|
2018-03-30 15:49:34 -07:00
|
|
|
cantFail(OLT->addObject(K, std::move(ObjectMemBuffer)));
|
|
|
|
|
cantFail(OLT->emitAndFinalize(K));
|
2016-09-27 19:09:38 -07:00
|
|
|
|
Refactor runtime library
Summary:
As we are adding more types of runtime libraries, it would be better to move the runtime library out of RewriteInstance so that it could grow separately. This also requires splitting the current implementation of Instrumentation.cpp to two separate pieces, one as normal Pass, one as the runtime library. The Instrumentation Pass would pass over the generated data to the runtime library, which will use to emit binary and perform linking.
This patch does the following:
1. Turn Instrumentation class into an optimization pass. Register the pass in the pass manager instead of in RewriteInstance.
2. Split all the data that are generated by Instrumentation that's needed by runtime library into a separate data structure called InstrumentationSummary. At the creation of Instrumentation pass, we create an instance of such data structure, which will be moved over to the runtime at the end of the pass.
3. Added a runtime library member to BinaryContext. Set the member at the end of Instrumentation pass.
4. In BinaryEmitter, make BinaryContext to also emit runtime library binary.
5. Created a base class RuntimeLibrary, that defines the interface of a runtime library, along with a few common helper functions.
6. Created InstrumentationRuntimeLibrary which inherits from RuntimeLibrary, that does all the work (mostly copied over) for emit and linking.
7. Added a new directory called RuntimeLibs, and put all the runtime library related files into it.
(cherry picked from FBD21694762)
2020-05-21 14:28:47 -07:00
|
|
|
if (auto *RtLibrary = BC->getRuntimeLibrary()) {
|
|
|
|
|
RtLibrary->link(*BC, ToolPath, *ES, *OLT);
|
2019-12-13 17:27:03 -08:00
|
|
|
}
|
2019-07-24 14:03:43 -07:00
|
|
|
|
2019-03-14 18:51:05 -07:00
|
|
|
// Once the code is emitted, we can rename function sections to actual
|
|
|
|
|
// output sections and de-register sections used for emission.
|
|
|
|
|
if (!BC->HasRelocations) {
|
2019-04-03 15:52:01 -07:00
|
|
|
for (auto &BFI : BC->getBinaryFunctions()) {
|
2019-03-14 18:51:05 -07:00
|
|
|
auto &Function = BFI.second;
|
|
|
|
|
if (auto Section = Function.getCodeSection())
|
|
|
|
|
BC->deregisterSection(*Section);
|
|
|
|
|
Function.CodeSectionName = Function.getOriginSectionName();
|
|
|
|
|
if (Function.isSplit()) {
|
|
|
|
|
if (auto ColdSection = Function.getColdCodeSection())
|
|
|
|
|
BC->deregisterSection(*ColdSection);
|
|
|
|
|
Function.ColdCodeSectionName = ".bolt.text";
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2017-10-16 16:53:50 -07:00
|
|
|
if (opts::PrintCacheMetrics) {
|
2017-11-14 16:51:24 -08:00
|
|
|
outs() << "BOLT-INFO: cache metrics after emitting functions:\n";
|
2019-04-03 15:52:01 -07:00
|
|
|
CacheMetrics::printAll(BC->getSortedFunctions());
|
2017-10-16 16:53:50 -07:00
|
|
|
}
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
if (opts::KeepTmp)
|
|
|
|
|
TempOut->keep();
|
|
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2019-11-03 21:57:15 -08:00
|
|
|
void RewriteInstance::updateMetadata() {
|
|
|
|
|
updateSDTMarkers();
|
|
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
if (opts::UpdateDebugSections) {
|
|
|
|
|
NamedRegionTimer T("updateDebugInfo", "update debug info", TimerGroupName,
|
|
|
|
|
TimerGroupDesc, opts::TimeRewrite);
|
|
|
|
|
DebugInfoRewriter->updateDebugInfo();
|
|
|
|
|
}
|
[BOLT][non-reloc] Change function splitting in non-relocation mode
Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.
For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.
After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.
One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.
Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.
As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.
New semantics for `-split-functions=<n>`:
-split-functions - split functions into hot and cold regions
=0 - do not split any function
=1 - in non-relocation mode only split functions too large to fit
into original code space
=2 - same as 1 (backwards compatibility)
=3 - split all functions
(cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
if (opts::WriteBoltInfoSection) {
|
|
|
|
|
addBoltInfoSection();
|
|
|
|
|
}
|
[BOLT][non-reloc] Change function splitting in non-relocation mode
Summary:
This diff applies to non-relocation mode mostly. In this mode, we are
limited by original function boundaries, i.e. if a function becomes
larger after optimizations (e.g. because of the newly introduced
branches) then we might not be able to write the optimized version,
unless we split the function. At the same time, we do not benefit from
function splitting as we do in the relocation mode since we are not
moving functions/fragments, and the hot code does not become more
compact.
For the reasons described above, we used to execute multiple re-write
attempts to optimize the binary and we would only split functions that
were too large to fit into their original space.
After the first attempt, we would know functions that did not fit
into their original space. Then we would re-run all our passes again
feeding back the function information and forcefully splitting
such functions. Some functions still wouldn't fit even after the
splitting (mostly because of the branch relaxation for conditional tail
calls that does not happen in non-relocation mode). Yet we have emitted
debug info as if they were successfully overwritten. That's why we had
one more stage to write the functions again, marking failed-to-emit
functions non-simple. Sadly, there was a bug in the way 2nd and 3rd
attempts interacted, and we were not splitting the functions correctly
and as a result we were emitting less optimized code.
One of the reasons we had the multi-pass rewrite scheme in place, was
that we did not have an ability to precisely estimate the code size
before the actual code emission. Recently, BinaryContext obtained such
functionality, and now we can use it instead of relying on the
multi-pass rewrite. This eliminates redundant work of re-running
the same function passes multiple times.
Because function splitting runs before a number of optimization passes
that run on post-CFG state (those rely on the splitting pass), we
cannot estimate the non-split code size with 100% accuracy. However,
it is good enough for over 99% of the cases to extract most of the
performance gains for the binary.
As a result of eliminating the multi-pass rewrite, the processing time
in non-relocation mode with `-split-functions=2` is greatly reduced.
With debug info update, it is less than half of what it used to be.
New semantics for `-split-functions=<n>`:
-split-functions - split functions into hot and cold regions
=0 - do not split any function
=1 - in non-relocation mode only split functions too large to fit
into original code space
=2 - same as 1 (backwards compatibility)
=3 - split all functions
(cherry picked from FBD17362607)
2019-09-11 15:42:22 -07:00
|
|
|
}
|
|
|
|
|
|
2019-11-03 21:57:15 -08:00
|
|
|
void RewriteInstance::updateSDTMarkers() {
|
|
|
|
|
NamedRegionTimer T("updateSDTMarkers", "update SDT markers", TimerGroupName,
|
|
|
|
|
TimerGroupDesc, opts::TimeRewrite);
|
|
|
|
|
|
|
|
|
|
SectionPatchers[".note.stapsdt"] = llvm::make_unique<SimpleBinaryPatcher>();
|
|
|
|
|
auto *SDTNotePatcher = static_cast<SimpleBinaryPatcher *>(
|
|
|
|
|
SectionPatchers[".note.stapsdt"].get());
|
|
|
|
|
for (auto &SDTInfoKV : BC->SDTMarkers) {
|
|
|
|
|
const auto OriginalAddress = SDTInfoKV.first;
|
|
|
|
|
auto &SDTInfo = SDTInfoKV.second;
|
|
|
|
|
const auto *F = BC->getBinaryFunctionContainingAddress(OriginalAddress);
|
|
|
|
|
if (!F)
|
|
|
|
|
continue;
|
|
|
|
|
const auto NewAddress = F->translateInputToOutputAddress(OriginalAddress);
|
|
|
|
|
SDTNotePatcher->addLE64Patch(SDTInfo.PCOffset, NewAddress);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2018-03-30 15:49:34 -07:00
|
|
|
void RewriteInstance::mapFileSections(orc::VModuleKey Key) {
|
2019-03-14 20:32:04 -07:00
|
|
|
mapCodeSections(Key);
|
2018-04-20 20:03:31 -07:00
|
|
|
mapDataSections(Key);
|
|
|
|
|
}
|
2018-06-20 12:03:24 -07:00
|
|
|
|
2019-03-15 13:43:36 -07:00
|
|
|
std::vector<BinarySection *>
|
|
|
|
|
RewriteInstance::getCodeSections() {
|
|
|
|
|
std::vector<BinarySection *> CodeSections;
|
|
|
|
|
for (auto &Section : BC->textSections()) {
|
|
|
|
|
if (Section.hasValidSectionID())
|
|
|
|
|
CodeSections.emplace_back(&Section);
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
auto compareSections = [&](const BinarySection *A, const BinarySection *B) {
|
|
|
|
|
// Place movers before anything else.
|
|
|
|
|
if (A->getName() == BC->getHotTextMoverSectionName())
|
|
|
|
|
return true;
|
|
|
|
|
if (B->getName() == BC->getHotTextMoverSectionName())
|
|
|
|
|
return false;
|
|
|
|
|
|
|
|
|
|
// Depending on the option, put main text at the beginning or at the end.
|
|
|
|
|
if (opts::HotFunctionsAtEnd) {
|
|
|
|
|
return B->getName() == BC->getMainCodeSectionName();
|
|
|
|
|
} else {
|
|
|
|
|
return A->getName() == BC->getMainCodeSectionName();
|
|
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// Determine the order of sections.
|
|
|
|
|
std::stable_sort(CodeSections.begin(), CodeSections.end(), compareSections);
|
|
|
|
|
|
|
|
|
|
return CodeSections;
|
|
|
|
|
}
|
|
|
|
|
|
2019-03-14 20:32:04 -07:00
|
|
|
void RewriteInstance::mapCodeSections(orc::VModuleKey Key) {
|
2019-03-21 21:13:45 -07:00
|
|
|
auto TextSection = BC->getUniqueSectionByName(BC->getMainCodeSectionName());
|
|
|
|
|
assert(TextSection && ".text section not found in output");
|
|
|
|
|
|
2017-12-09 21:40:39 -08:00
|
|
|
if (BC->HasRelocations) {
|
2019-03-21 21:13:45 -07:00
|
|
|
assert(TextSection->hasValidSectionID() && ".text section should be valid");
|
|
|
|
|
|
2019-03-14 20:32:04 -07:00
|
|
|
// Populate the list of sections to be allocated.
|
2019-03-15 13:43:36 -07:00
|
|
|
auto CodeSections = getCodeSections();
|
2019-03-21 21:13:45 -07:00
|
|
|
DEBUG(dbgs() << "Code sections in the order of output:\n";
|
2019-03-14 20:32:04 -07:00
|
|
|
for (const auto *Section : CodeSections) {
|
|
|
|
|
dbgs() << Section->getName() << '\n';
|
2019-03-14 18:51:05 -07:00
|
|
|
});
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2019-03-21 21:13:45 -07:00
|
|
|
uint64_t PaddingSize{0}; // size of padding required at the end
|
|
|
|
|
|
|
|
|
|
// Allocate sections starting at a given Address.
|
|
|
|
|
auto allocateAt = [&](uint64_t Address) {
|
|
|
|
|
for (auto *Section : CodeSections) {
|
|
|
|
|
Address = alignTo(Address, Section->getAlignment());
|
|
|
|
|
Section->setOutputAddress(Address);
|
|
|
|
|
Address += Section->getOutputSize();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Make sure we allocate enough space for huge pages.
|
|
|
|
|
if (opts::HotText) {
|
|
|
|
|
auto HotTextEnd = TextSection->getOutputAddress() +
|
|
|
|
|
TextSection->getOutputSize();
|
|
|
|
|
HotTextEnd = alignTo(HotTextEnd, BC->PageAlign);
|
|
|
|
|
if (HotTextEnd > Address) {
|
|
|
|
|
PaddingSize = HotTextEnd - Address;
|
|
|
|
|
Address = HotTextEnd;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return Address;
|
|
|
|
|
};
|
2019-03-14 20:32:04 -07:00
|
|
|
|
|
|
|
|
// Check if we can fit code in the original .text
|
2019-03-21 21:13:45 -07:00
|
|
|
bool AllocationDone{false};
|
2019-03-14 20:32:04 -07:00
|
|
|
if (opts::UseOldText) {
|
2019-03-21 21:13:45 -07:00
|
|
|
const auto CodeSize = allocateAt(BC->OldTextSectionAddress) -
|
|
|
|
|
BC->OldTextSectionAddress;
|
2019-03-14 20:32:04 -07:00
|
|
|
|
|
|
|
|
if (CodeSize <= BC->OldTextSectionSize) {
|
|
|
|
|
outs() << "BOLT-INFO: using original .text for new code with 0x"
|
2020-04-19 15:02:50 -07:00
|
|
|
<< Twine::utohexstr(opts::AlignText) << " alignment\n";
|
2019-03-21 21:13:45 -07:00
|
|
|
AllocationDone = true;
|
|
|
|
|
} else {
|
2018-09-24 20:58:31 -07:00
|
|
|
errs() << "BOLT-WARNING: original .text too small to fit the new code"
|
2020-04-19 15:02:50 -07:00
|
|
|
<< " using 0x" << Twine::utohexstr(opts::AlignText)
|
|
|
|
|
<< " alignment. " << CodeSize
|
2018-09-24 20:58:31 -07:00
|
|
|
<< " bytes needed, have " << BC->OldTextSectionSize
|
|
|
|
|
<< " bytes available.\n";
|
|
|
|
|
opts::UseOldText = false;
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
2019-03-14 18:51:05 -07:00
|
|
|
}
|
|
|
|
|
|
2019-03-21 21:13:45 -07:00
|
|
|
if (!AllocationDone) {
|
|
|
|
|
NextAvailableAddress = allocateAt(NextAvailableAddress);
|
2019-03-14 20:32:04 -07:00
|
|
|
}
|
2019-03-14 18:51:05 -07:00
|
|
|
|
2019-03-21 21:13:45 -07:00
|
|
|
// Do the mapping for ORC layer based on the allocation.
|
|
|
|
|
for (auto *Section : CodeSections) {
|
|
|
|
|
DEBUG(dbgs() << "BOLT: mapping " << Section->getName()
|
|
|
|
|
<< " at 0x" << Twine::utohexstr(Section->getAllocAddress())
|
|
|
|
|
<< " to 0x" << Twine::utohexstr(Section->getOutputAddress())
|
|
|
|
|
<< '\n');
|
|
|
|
|
OLT->mapSectionAddress(Key, Section->getSectionID(),
|
|
|
|
|
Section->getOutputAddress());
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
Section->setOutputFileOffset(
|
2019-03-21 21:13:45 -07:00
|
|
|
getFileOffsetForAddress(Section->getOutputAddress()));
|
2016-11-09 11:19:02 -08:00
|
|
|
}
|
|
|
|
|
|
2019-03-21 21:13:45 -07:00
|
|
|
// Check if we need to insert a padding section for hot text.
|
|
|
|
|
if (PaddingSize && !opts::UseOldText) {
|
|
|
|
|
outs() << "BOLT-INFO: padding code to 0x"
|
|
|
|
|
<< Twine::utohexstr(NextAvailableAddress)
|
|
|
|
|
<< " to accommodate hot text\n";
|
2018-07-08 12:14:08 -07:00
|
|
|
}
|
|
|
|
|
|
2019-03-21 21:13:45 -07:00
|
|
|
return;
|
|
|
|
|
}
|
2016-10-07 09:34:16 -07:00
|
|
|
|
2019-03-21 21:13:45 -07:00
|
|
|
// Processing in non-relocation mode.
|
|
|
|
|
auto NewTextSectionStartAddress = NextAvailableAddress;
|
2016-03-11 11:30:30 -08:00
|
|
|
|
2019-03-21 21:13:45 -07:00
|
|
|
// Prepare .text section for injected functions
|
|
|
|
|
if (TextSection->hasValidSectionID()) {
|
|
|
|
|
uint64_t NewTextSectionOffset = 0;
|
|
|
|
|
auto Padding = OffsetToAlignment(NewTextSectionStartAddress,
|
|
|
|
|
BC->PageAlign);
|
|
|
|
|
NextAvailableAddress += Padding;
|
|
|
|
|
NewTextSectionStartAddress = NextAvailableAddress;
|
|
|
|
|
NewTextSectionOffset = getFileOffsetForAddress(NextAvailableAddress);
|
|
|
|
|
NextAvailableAddress += Padding + TextSection->getOutputSize();
|
|
|
|
|
TextSection->setOutputAddress(NewTextSectionStartAddress);
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
TextSection->setOutputFileOffset(NewTextSectionOffset);
|
2017-01-17 15:49:59 -08:00
|
|
|
|
2019-03-21 21:13:45 -07:00
|
|
|
DEBUG(dbgs() << "BOLT: mapping .text 0x"
|
|
|
|
|
<< Twine::utohexstr(TextSection->getAllocAddress())
|
|
|
|
|
<< " to 0x" << Twine::utohexstr(NewTextSectionStartAddress)
|
|
|
|
|
<< '\n');
|
|
|
|
|
OLT->mapSectionAddress(Key, TextSection->getSectionID(),
|
|
|
|
|
NewTextSectionStartAddress);
|
|
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2019-04-03 15:52:01 -07:00
|
|
|
for (auto &BFI : BC->getBinaryFunctions()) {
|
2019-03-21 21:13:45 -07:00
|
|
|
auto &Function = BFI.second;
|
2020-05-03 13:54:45 -07:00
|
|
|
if (!Function.isEmitted())
|
2019-03-21 21:13:45 -07:00
|
|
|
continue;
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2019-03-21 21:13:45 -07:00
|
|
|
auto TooLarge = false;
|
|
|
|
|
auto FuncSection = Function.getCodeSection();
|
|
|
|
|
assert(FuncSection && "cannot find section for function");
|
|
|
|
|
FuncSection->setOutputAddress(Function.getAddress());
|
|
|
|
|
DEBUG(dbgs() << "BOLT: mapping 0x"
|
|
|
|
|
<< Twine::utohexstr(FuncSection->getAllocAddress())
|
|
|
|
|
<< " to 0x" << Twine::utohexstr(Function.getAddress())
|
|
|
|
|
<< '\n');
|
|
|
|
|
OLT->mapSectionAddress(Key, FuncSection->getSectionID(),
|
|
|
|
|
Function.getAddress());
|
|
|
|
|
Function.setImageAddress(FuncSection->getAllocAddress());
|
|
|
|
|
Function.setImageSize(FuncSection->getOutputSize());
|
|
|
|
|
if (Function.getImageSize() > Function.getMaxSize()) {
|
|
|
|
|
TooLarge = true;
|
|
|
|
|
FailedAddresses.emplace_back(Function.getAddress());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Map jump tables if updating in-place.
|
|
|
|
|
if (opts::JumpTables == JTS_BASIC) {
|
|
|
|
|
for (auto &JTI : Function.JumpTables) {
|
|
|
|
|
auto *JT = JTI.second;
|
|
|
|
|
auto &Section = JT->getOutputSection();
|
|
|
|
|
Section.setOutputAddress(JT->getAddress());
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: mapping " << Section.getName()
|
|
|
|
|
<< " to 0x" << Twine::utohexstr(JT->getAddress())
|
|
|
|
|
<< '\n');
|
|
|
|
|
OLT->mapSectionAddress(Key, Section.getSectionID(),
|
|
|
|
|
JT->getAddress());
|
|
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
2019-03-21 21:13:45 -07:00
|
|
|
|
|
|
|
|
if (!Function.isSplit())
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
auto ColdSection = Function.getColdCodeSection();
|
|
|
|
|
assert(ColdSection && "cannot find section for cold part");
|
|
|
|
|
// Cold fragments are aligned at 16 bytes.
|
|
|
|
|
NextAvailableAddress = alignTo(NextAvailableAddress, 16);
|
|
|
|
|
auto &ColdPart = Function.cold();
|
|
|
|
|
if (TooLarge) {
|
|
|
|
|
// The corresponding FDE will refer to address 0.
|
|
|
|
|
ColdPart.setAddress(0);
|
|
|
|
|
ColdPart.setImageAddress(0);
|
|
|
|
|
ColdPart.setImageSize(0);
|
|
|
|
|
ColdPart.setFileOffset(0);
|
|
|
|
|
} else {
|
|
|
|
|
ColdPart.setAddress(NextAvailableAddress);
|
|
|
|
|
ColdPart.setImageAddress(ColdSection->getAllocAddress());
|
|
|
|
|
ColdPart.setImageSize(ColdSection->getOutputSize());
|
|
|
|
|
ColdPart.setFileOffset(getFileOffsetForAddress(NextAvailableAddress));
|
|
|
|
|
ColdSection->setOutputAddress(ColdPart.getAddress());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
DEBUG(dbgs() << "BOLT: mapping cold fragment 0x"
|
|
|
|
|
<< Twine::utohexstr(ColdPart.getImageAddress())
|
|
|
|
|
<< " to 0x"
|
|
|
|
|
<< Twine::utohexstr(ColdPart.getAddress())
|
|
|
|
|
<< " with size "
|
|
|
|
|
<< Twine::utohexstr(ColdPart.getImageSize()) << '\n');
|
|
|
|
|
OLT->mapSectionAddress(Key, ColdSection->getSectionID(),
|
|
|
|
|
ColdPart.getAddress());
|
|
|
|
|
|
|
|
|
|
NextAvailableAddress += ColdPart.getImageSize();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Add the new text section aggregating all existing code sections.
|
|
|
|
|
// This is pseudo-section that serves a purpose of creating a corresponding
|
|
|
|
|
// entry in section header table.
|
|
|
|
|
auto NewTextSectionSize = NextAvailableAddress - NewTextSectionStartAddress;
|
|
|
|
|
if (NewTextSectionSize) {
|
|
|
|
|
const auto Flags = BinarySection::getFlags(/*IsReadOnly=*/true,
|
|
|
|
|
/*IsText=*/true,
|
|
|
|
|
/*IsAllocatable=*/true);
|
2020-03-11 15:51:32 -07:00
|
|
|
auto &Section =
|
|
|
|
|
BC->registerOrUpdateSection(getBOLTTextSectionName(),
|
|
|
|
|
ELF::SHT_PROGBITS,
|
|
|
|
|
Flags,
|
|
|
|
|
/*Data=*/nullptr,
|
|
|
|
|
NewTextSectionSize,
|
|
|
|
|
16);
|
2019-03-21 21:13:45 -07:00
|
|
|
Section.setOutputAddress(NewTextSectionStartAddress);
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
Section.setOutputFileOffset(
|
2019-03-21 21:13:45 -07:00
|
|
|
getFileOffsetForAddress(NewTextSectionStartAddress));
|
2016-02-12 19:01:53 -08:00
|
|
|
}
|
2018-04-20 20:03:31 -07:00
|
|
|
}
|
2015-12-18 17:00:46 -08:00
|
|
|
|
2018-04-20 20:03:31 -07:00
|
|
|
void RewriteInstance::mapDataSections(orc::VModuleKey Key) {
|
2015-12-18 17:00:46 -08:00
|
|
|
// Map special sections to their addresses in the output image.
|
2016-09-27 19:09:38 -07:00
|
|
|
// These are the sections that we generate via MCStreamer.
|
|
|
|
|
// The order is important.
|
2019-06-19 20:10:49 -07:00
|
|
|
std::vector<std::string> Sections = {
|
2020-03-11 15:51:32 -07:00
|
|
|
".eh_frame", Twine(getOrgSecPrefix(), ".eh_frame").str(),
|
Adding automatic huge page support
Summary:
This patch enables automated hugify for Bolt.
When running Bolt against a binary with -hugify specified, Bolt will inject a call to a runtime library function at the entry of the binary. The runtime library calls madvise to map the hot code region into a 2M huge page. We support both new kernel with THP support and old kernels. For kernels with THP support we simply make a madvise call, while for old kernels, we first copy the code out, remap the memory with huge page, and then copy the code back.
With this change, we no longer need to manually call into hugify_self and precompile it with --hot-text. Instead, we could simply combine --hugify option with existing optimizations, and at runtime it will automatically move hot code into 2M pages.
Some details around the changes made:
1. Add an command line option to support --hugify. --hugify will automatically turn on --hot-text to get the proper hot code symbols. However, running with both --hugify and --hot-text is not allowed, since --hot-text is used on binaries that has precompiled call to hugify_self, which contradicts with the purpose of --hugify.
2. Moved the common utility functions out of instr.cpp to common.h, which will also be used by hugify.cpp. Added a few new system calls definitions.
3. Added a new class that inherits RuntimeLibrary, and implemented the necessary emit and link logic for hugify.
4. Added a simple test for hugify.
(cherry picked from FBD21384529)
2020-05-02 11:14:38 -07:00
|
|
|
".gcc_except_table", ".rodata", ".rodata.cold"};
|
|
|
|
|
if (auto *RtLibrary = BC->getRuntimeLibrary()) {
|
|
|
|
|
RtLibrary->addRuntimeLibSections(Sections);
|
|
|
|
|
}
|
2016-02-12 19:01:53 -08:00
|
|
|
for (auto &SectionName : Sections) {
|
2018-02-01 16:33:43 -08:00
|
|
|
auto Section = BC->getUniqueSectionByName(SectionName);
|
|
|
|
|
if (!Section || !Section->isAllocatable() || !Section->isFinalized())
|
2016-09-16 15:54:32 -07:00
|
|
|
continue;
|
2018-02-01 16:33:43 -08:00
|
|
|
NextAvailableAddress = alignTo(NextAvailableAddress,
|
|
|
|
|
Section->getAlignment());
|
2016-09-16 15:54:32 -07:00
|
|
|
DEBUG(dbgs() << "BOLT: mapping section " << SectionName << " (0x"
|
2018-02-01 16:33:43 -08:00
|
|
|
<< Twine::utohexstr(Section->getAllocAddress())
|
2016-09-16 15:54:32 -07:00
|
|
|
<< ") to 0x" << Twine::utohexstr(NextAvailableAddress)
|
2018-04-20 20:03:31 -07:00
|
|
|
<< ":0x" << Twine::utohexstr(NextAvailableAddress +
|
|
|
|
|
Section->getOutputSize())
|
2016-09-16 15:54:32 -07:00
|
|
|
<< '\n');
|
|
|
|
|
|
2018-03-30 15:49:34 -07:00
|
|
|
OLT->mapSectionAddress(Key, Section->getSectionID(), NextAvailableAddress);
|
2019-03-14 18:51:05 -07:00
|
|
|
Section->setOutputAddress(NextAvailableAddress);
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
Section->setOutputFileOffset(getFileOffsetForAddress(NextAvailableAddress));
|
2016-09-16 15:54:32 -07:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
NextAvailableAddress += Section->getOutputSize();
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
2015-12-18 17:00:46 -08:00
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
// Handling for sections with relocations.
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
for (auto &Section : BC->sections()) {
|
|
|
|
|
if (!Section.hasSectionRef())
|
2018-01-23 15:10:24 -08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
StringRef SectionName = Section.getName();
|
2018-02-01 16:33:43 -08:00
|
|
|
auto OrgSection =
|
2020-03-11 15:51:32 -07:00
|
|
|
BC->getUniqueSectionByName((getOrgSecPrefix() + SectionName).str());
|
2018-02-01 16:33:43 -08:00
|
|
|
if (!OrgSection ||
|
|
|
|
|
!OrgSection->isAllocatable() ||
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
!OrgSection->isFinalized() ||
|
|
|
|
|
!OrgSection->hasValidSectionID())
|
2016-09-27 19:09:38 -07:00
|
|
|
continue;
|
2016-04-06 18:03:44 -07:00
|
|
|
|
2019-03-14 18:51:05 -07:00
|
|
|
if (OrgSection->getOutputAddress()) {
|
2016-09-27 19:09:38 -07:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: section " << SectionName
|
|
|
|
|
<< " is already mapped at 0x"
|
2019-03-14 18:51:05 -07:00
|
|
|
<< Twine::utohexstr(OrgSection->getOutputAddress()) << '\n');
|
2016-09-27 19:09:38 -07:00
|
|
|
continue;
|
2016-03-28 17:45:22 -07:00
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
DEBUG(dbgs() << "BOLT: mapping original section " << SectionName << " (0x"
|
2018-02-01 16:33:43 -08:00
|
|
|
<< Twine::utohexstr(OrgSection->getAllocAddress())
|
2016-09-27 19:09:38 -07:00
|
|
|
<< ") to 0x" << Twine::utohexstr(Section.getAddress())
|
|
|
|
|
<< '\n');
|
2016-03-28 17:45:22 -07:00
|
|
|
|
2018-03-30 15:49:34 -07:00
|
|
|
OLT->mapSectionAddress(Key, OrgSection->getSectionID(),
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
Section.getAddress());
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2019-03-14 18:51:05 -07:00
|
|
|
OrgSection->setOutputAddress(Section.getAddress());
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
OrgSection->setOutputFileOffset(Section.getContents().data() -
|
|
|
|
|
InputFile->getData().data());
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
|
|
|
|
|
2019-07-24 14:03:43 -07:00
|
|
|
void RewriteInstance::mapExtraSections(orc::VModuleKey Key) {
|
|
|
|
|
for (auto &Section : BC->allocatableSections()) {
|
|
|
|
|
if (Section.getOutputAddress() || !Section.hasValidSectionID())
|
|
|
|
|
continue;
|
|
|
|
|
NextAvailableAddress =
|
|
|
|
|
alignTo(NextAvailableAddress, Section.getAlignment());
|
|
|
|
|
Section.setOutputAddress(NextAvailableAddress);
|
|
|
|
|
NextAvailableAddress += Section.getOutputSize();
|
|
|
|
|
|
|
|
|
|
DEBUG(dbgs() << "BOLT: (extra) mapping " << Section.getName()
|
|
|
|
|
<< " at 0x" << Twine::utohexstr(Section.getAllocAddress())
|
|
|
|
|
<< " to 0x" << Twine::utohexstr(Section.getOutputAddress())
|
|
|
|
|
<< '\n');
|
|
|
|
|
|
|
|
|
|
OLT->mapSectionAddress(Key, Section.getSectionID(),
|
|
|
|
|
Section.getOutputAddress());
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
Section.setOutputFileOffset(
|
2019-07-24 14:03:43 -07:00
|
|
|
getFileOffsetForAddress(Section.getOutputAddress()));
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2017-05-08 22:51:36 -07:00
|
|
|
void RewriteInstance::updateOutputValues(const MCAsmLayout &Layout) {
|
2019-04-03 15:52:01 -07:00
|
|
|
for (auto &BFI : BC->getBinaryFunctions()) {
|
2018-07-08 12:14:08 -07:00
|
|
|
auto &Function = BFI.second;
|
2019-11-03 21:57:15 -08:00
|
|
|
Function.updateOutputValues(Layout);
|
2018-07-08 12:14:08 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
for (auto *InjectedFunction : BC->getInjectedBinaryFunctions()) {
|
2019-11-03 21:57:15 -08:00
|
|
|
InjectedFunction->updateOutputValues(Layout);
|
2017-05-08 22:51:36 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-03-03 10:13:11 -08:00
|
|
|
void RewriteInstance::patchELFPHDRTable() {
|
|
|
|
|
auto ELF64LEFile = dyn_cast<ELF64LEObjectFile>(InputFile);
|
2016-02-08 10:02:48 -08:00
|
|
|
if (!ELF64LEFile) {
|
|
|
|
|
errs() << "BOLT-ERROR: only 64-bit LE ELF binaries are supported\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
|
|
|
|
auto Obj = ELF64LEFile->getELFFile();
|
|
|
|
|
auto &OS = Out->os();
|
|
|
|
|
|
2016-02-12 19:01:53 -08:00
|
|
|
// Write/re-write program headers.
|
2016-03-03 10:13:11 -08:00
|
|
|
Phnum = Obj->getHeader()->e_phnum;
|
2016-02-12 19:01:53 -08:00
|
|
|
if (PHDRTableOffset) {
|
|
|
|
|
// Writing new pheader table.
|
|
|
|
|
Phnum += 1; // only adding one new segment
|
|
|
|
|
// Segment size includes the size of the PHDR area.
|
|
|
|
|
NewTextSegmentSize = NextAvailableAddress - PHDRTableAddress;
|
|
|
|
|
} else {
|
|
|
|
|
assert(!PHDRTableAddress && "unexpected address for program header table");
|
|
|
|
|
// Update existing table.
|
|
|
|
|
PHDRTableOffset = Obj->getHeader()->e_phoff;
|
|
|
|
|
NewTextSegmentSize = NextAvailableAddress - NewTextSegmentAddress;
|
|
|
|
|
}
|
|
|
|
|
OS.seek(PHDRTableOffset);
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2016-02-12 19:01:53 -08:00
|
|
|
bool ModdedGnuStack = false;
|
2017-05-25 10:29:38 -07:00
|
|
|
(void)ModdedGnuStack;
|
2016-02-12 19:01:53 -08:00
|
|
|
bool AddedSegment = false;
|
2017-05-25 10:29:38 -07:00
|
|
|
(void)AddedSegment;
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2020-06-26 16:52:07 -07:00
|
|
|
auto createNewTextPhdr = [&]() {
|
|
|
|
|
ELFFile<ELF64LE>::Elf_Phdr NewPhdr;
|
|
|
|
|
NewPhdr.p_type = ELF::PT_LOAD;
|
|
|
|
|
if (PHDRTableAddress) {
|
|
|
|
|
NewPhdr.p_offset = PHDRTableOffset;
|
|
|
|
|
NewPhdr.p_vaddr = PHDRTableAddress;
|
|
|
|
|
NewPhdr.p_paddr = PHDRTableAddress;
|
|
|
|
|
} else {
|
|
|
|
|
NewPhdr.p_offset = NewTextSegmentOffset;
|
|
|
|
|
NewPhdr.p_vaddr = NewTextSegmentAddress;
|
|
|
|
|
NewPhdr.p_paddr = NewTextSegmentAddress;
|
|
|
|
|
}
|
|
|
|
|
NewPhdr.p_filesz = NewTextSegmentSize;
|
|
|
|
|
NewPhdr.p_memsz = NewTextSegmentSize;
|
|
|
|
|
NewPhdr.p_flags = ELF::PF_X | ELF::PF_R;
|
|
|
|
|
// FIXME: Currently instrumentation is experimental and the runtime data
|
|
|
|
|
// is emitted with code, thus everything needs to be writable
|
|
|
|
|
if (opts::Instrument)
|
|
|
|
|
NewPhdr.p_flags |= ELF::PF_W;
|
|
|
|
|
NewPhdr.p_align = BC->PageAlign;
|
|
|
|
|
|
|
|
|
|
return NewPhdr;
|
|
|
|
|
};
|
|
|
|
|
|
2016-02-08 10:02:48 -08:00
|
|
|
// Copy existing program headers with modifications.
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
for (auto &Phdr : cantFail(Obj->program_headers())) {
|
2016-02-12 19:01:53 -08:00
|
|
|
auto NewPhdr = Phdr;
|
|
|
|
|
if (PHDRTableAddress && Phdr.p_type == ELF::PT_PHDR) {
|
2016-02-08 10:02:48 -08:00
|
|
|
NewPhdr.p_offset = PHDRTableOffset;
|
|
|
|
|
NewPhdr.p_vaddr = PHDRTableAddress;
|
|
|
|
|
NewPhdr.p_paddr = PHDRTableAddress;
|
|
|
|
|
NewPhdr.p_filesz = sizeof(NewPhdr) * Phnum;
|
|
|
|
|
NewPhdr.p_memsz = sizeof(NewPhdr) * Phnum;
|
|
|
|
|
} else if (Phdr.p_type == ELF::PT_GNU_EH_FRAME) {
|
2018-02-01 16:33:43 -08:00
|
|
|
auto EHFrameHdrSec = BC->getUniqueSectionByName(".eh_frame_hdr");
|
|
|
|
|
if (EHFrameHdrSec &&
|
|
|
|
|
EHFrameHdrSec->isAllocatable() &&
|
|
|
|
|
EHFrameHdrSec->isFinalized()) {
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
NewPhdr.p_offset = EHFrameHdrSec->getOutputFileOffset();
|
2019-03-14 18:51:05 -07:00
|
|
|
NewPhdr.p_vaddr = EHFrameHdrSec->getOutputAddress();
|
|
|
|
|
NewPhdr.p_paddr = EHFrameHdrSec->getOutputAddress();
|
2018-02-01 16:33:43 -08:00
|
|
|
NewPhdr.p_filesz = EHFrameHdrSec->getOutputSize();
|
|
|
|
|
NewPhdr.p_memsz = EHFrameHdrSec->getOutputSize();
|
2016-07-12 16:43:53 -07:00
|
|
|
}
|
2016-02-12 19:01:53 -08:00
|
|
|
} else if (opts::UseGnuStack && Phdr.p_type == ELF::PT_GNU_STACK) {
|
2020-06-26 16:52:07 -07:00
|
|
|
NewPhdr = createNewTextPhdr();
|
2016-02-12 19:01:53 -08:00
|
|
|
ModdedGnuStack = true;
|
|
|
|
|
} else if (!opts::UseGnuStack && Phdr.p_type == ELF::PT_DYNAMIC) {
|
2020-06-26 16:52:07 -07:00
|
|
|
// Insert the new header before DYNAMIC.
|
|
|
|
|
auto NewTextPhdr = createNewTextPhdr();
|
2016-02-12 19:01:53 -08:00
|
|
|
OS.write(reinterpret_cast<const char *>(&NewTextPhdr),
|
|
|
|
|
sizeof(NewTextPhdr));
|
|
|
|
|
AddedSegment = true;
|
2016-02-08 10:02:48 -08:00
|
|
|
}
|
2016-02-12 19:01:53 -08:00
|
|
|
OS.write(reinterpret_cast<const char *>(&NewPhdr), sizeof(NewPhdr));
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
|
|
|
|
|
2020-06-26 16:52:07 -07:00
|
|
|
if (!opts::UseGnuStack && !AddedSegment) {
|
|
|
|
|
// Append the new header to the end of the table.
|
|
|
|
|
auto NewTextPhdr = createNewTextPhdr();
|
|
|
|
|
OS.write(reinterpret_cast<const char *>(&NewTextPhdr),
|
|
|
|
|
sizeof(NewTextPhdr));
|
|
|
|
|
}
|
|
|
|
|
|
2016-02-12 19:01:53 -08:00
|
|
|
assert((!opts::UseGnuStack || ModdedGnuStack) &&
|
|
|
|
|
"could not find GNU_STACK program header to modify");
|
2016-03-03 10:13:11 -08:00
|
|
|
}
|
|
|
|
|
|
2016-09-16 15:54:32 -07:00
|
|
|
namespace {
|
2017-04-06 10:49:59 -07:00
|
|
|
|
|
|
|
|
/// Write padding to \p OS such that its current \p Offset becomes aligned
|
|
|
|
|
/// at \p Alignment. Return new (aligned) offset.
|
|
|
|
|
uint64_t appendPadding(raw_pwrite_stream &OS,
|
|
|
|
|
uint64_t Offset,
|
|
|
|
|
uint64_t Alignment) {
|
2017-05-16 17:29:31 -07:00
|
|
|
if (!Alignment)
|
|
|
|
|
return Offset;
|
|
|
|
|
|
2017-04-06 10:49:59 -07:00
|
|
|
const auto PaddingSize = OffsetToAlignment(Offset, Alignment);
|
|
|
|
|
for (unsigned I = 0; I < PaddingSize; ++I)
|
2016-09-16 15:54:32 -07:00
|
|
|
OS.write((unsigned char)0);
|
2017-04-06 10:49:59 -07:00
|
|
|
return Offset + PaddingSize;
|
2016-09-16 15:54:32 -07:00
|
|
|
}
|
2017-04-06 10:49:59 -07:00
|
|
|
|
2016-09-16 15:54:32 -07:00
|
|
|
}
|
|
|
|
|
|
2016-03-03 10:13:11 -08:00
|
|
|
void RewriteInstance::rewriteNoteSections() {
|
|
|
|
|
auto ELF64LEFile = dyn_cast<ELF64LEObjectFile>(InputFile);
|
|
|
|
|
if (!ELF64LEFile) {
|
|
|
|
|
errs() << "BOLT-ERROR: only 64-bit LE ELF binaries are supported\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
|
|
|
|
auto Obj = ELF64LEFile->getELFFile();
|
|
|
|
|
auto &OS = Out->os();
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2017-01-17 15:49:59 -08:00
|
|
|
uint64_t NextAvailableOffset = getFileOffsetForAddress(NextAvailableAddress);
|
2016-02-12 19:01:53 -08:00
|
|
|
assert(NextAvailableOffset >= FirstNonAllocatableOffset &&
|
|
|
|
|
"next available offset calculation failure");
|
2016-03-03 10:13:11 -08:00
|
|
|
OS.seek(NextAvailableOffset);
|
|
|
|
|
|
|
|
|
|
// Copy over non-allocatable section contents and update file offsets.
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
for (auto &Section : cantFail(Obj->sections())) {
|
2016-03-03 10:13:11 -08:00
|
|
|
if (Section.sh_type == ELF::SHT_NULL)
|
|
|
|
|
continue;
|
|
|
|
|
if (Section.sh_flags & ELF::SHF_ALLOC)
|
|
|
|
|
continue;
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2019-10-29 14:49:49 -07:00
|
|
|
StringRef SectionName =
|
|
|
|
|
cantFail(Obj->getSectionName(&Section), "cannot get section name");
|
|
|
|
|
|
|
|
|
|
if (shouldStrip(Section, SectionName))
|
2016-09-27 19:09:38 -07:00
|
|
|
continue;
|
|
|
|
|
|
2016-03-03 10:13:11 -08:00
|
|
|
// Insert padding as needed.
|
2017-04-06 10:49:59 -07:00
|
|
|
NextAvailableOffset =
|
|
|
|
|
appendPadding(OS, NextAvailableOffset, Section.sh_addralign);
|
2016-03-03 10:13:11 -08:00
|
|
|
|
2016-05-16 17:02:17 -07:00
|
|
|
// New section size.
|
2016-03-11 11:30:30 -08:00
|
|
|
uint64_t Size = 0;
|
|
|
|
|
|
2016-11-11 14:33:34 -08:00
|
|
|
// Copy over section contents unless it's one of the sections we overwrite.
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
if (!willOverwriteSection(SectionName)) {
|
2016-03-11 11:30:30 -08:00
|
|
|
Size = Section.sh_size;
|
Update subroutine address ranges in binary.
Summary:
[WIP] Update DWARF info for function address ranges.
This diff currently does not work for unknown reasons,
but I'm describing here what's the current state.
According to both llvm-dwarf and readelf our output seems correct,
but GDB does not interpret it as expected. All details go below in
hope I missed something.
I couldn't actually track the whole change that introduced support for
what we need in gdb yet, but I think I can get to it
(2007-12-04: Support
lexical bocks and function bodies that occupy non-contiguous address ranges). I have reasons to believe gdb at least at some
nges).
The set of introduced changes was basically this:
- After disassembly, iterate over the DIEs in .debug_info and find the
ones that correspond to each BinaryFunction.
- Refactor DebugArangesWriter to also write addresses of functions to
.debug_ranges and track the offsets of function address ranges there
- Add some infrastructure to facilitate patching the binary in
simple ways (BinaryPatcher.h)
- In RewriteInstance, after writing .debug_ranges already with
function address ranges, for each function do:
-- Find the abbreviation corresponding to the function
-- Patch .debug_abbrev to replace DW_AT_low_pc with DW_AT_ranges and
DW_AT_high_pc with DW_AT_producer (I'll explain this hack below).
Also patch the corresponding forms to DW_FORM_sec_offset and
DW_FORM_string (null-terminated in-place string).
-- Patch debug_info with the .debug_ranges offset in place of
the first 4 bytes of DW_AT_low_pc (DW_AT_ranges only occupies 4
bytes whereas low_pc occupies 8), and write an arbitrary string
in-place in the other 12 bytes that were the 4 MSB of low_pc
and the 8 bytes of high_pc before the patch. This depends on
low_pc and high_pc being put consecutively by the compiler, but
it serves to validate the idea. I tried another way of doing it
that does not rely on this but it didn't work either and I believe
the reason for either not working is the same (and still unknown,
but unrelated to them. I might be wrong though, and if I find yet
another way of doing it I may try it). The other way was to
use a form of DW_FORM_data8 for the section offset. This is
disallowed by the specification, but I doubt gdb validates this,
as it's just easier to store it as 64-bit anyway as this is even
necessary to support 64-bit DWARF (which is not what gcc generates
by default apparently).
I still need to make changes to the diff to make it production-ready,
but first I want to figure out why it doesn't work as expected.
By looking at the output of llvm-dwarfdump or readelf, all of
.debug_ranges, .debug_abbrev and .debug_info seem to have been
correctly updated. However, gdb seems to have serious problems with
what we write.
(In fact, readelf --debug-dump=Ranges shows some funny warning messages
of the form ("Warning: There is a hole [0x100 - 0x120] in .debug_ranges"),
but I played around with this and it seems it's just because no
compile unit was using these ranges. Changing .debug_info apparently
changes these warnings, so they seem to be unrelated to the section
itself. Also looking at the hex dump of the section doesn't help,
as everything seems fine. llvm-dwarfdump doesn't say anything.
So I think .debug_ranges is fine.)
The result is that gdb not only doesn't show the function name as we
wanted, but it also stops showing line number information.
Apparently it's not reading/interpreting the address ranges at all,
and so the functions now have no associated address ranges, only the
symbol value which allows one to put a breakpoint in the function,
but not to show source code.
As this left me without more ideas of what to try to feed gdb with,
I believe the most promising next trial is to try to debug gdb itself,
unless someone spots anything I missed.
I found where the interesting part of the code lies for this
case (gdb/dwarf2read.c and some other related files, but mainly that one).
It seems in some parts gdb uses DW_AT_ranges for only getting
its lowest and highest addresses and setting that as low_pc and
high_pc (see dwarf2_get_pc_bounds in gdb's code and where it's called).
I really hope this is not actually the case for
function address ranges. I'll investigate this further. Otherwise
I don't think any changes we make will make it work as initially
intended, as we'll simply need gdb to support it and in that case it
doesn't.
(cherry picked from FBD3073641)
2016-03-16 18:08:29 -07:00
|
|
|
std::string Data = InputFile->getData().substr(Section.sh_offset, Size);
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
auto SectionPatchersIt = SectionPatchers.find(SectionName);
|
Update subroutine address ranges in binary.
Summary:
[WIP] Update DWARF info for function address ranges.
This diff currently does not work for unknown reasons,
but I'm describing here what's the current state.
According to both llvm-dwarf and readelf our output seems correct,
but GDB does not interpret it as expected. All details go below in
hope I missed something.
I couldn't actually track the whole change that introduced support for
what we need in gdb yet, but I think I can get to it
(2007-12-04: Support
lexical bocks and function bodies that occupy non-contiguous address ranges). I have reasons to believe gdb at least at some
nges).
The set of introduced changes was basically this:
- After disassembly, iterate over the DIEs in .debug_info and find the
ones that correspond to each BinaryFunction.
- Refactor DebugArangesWriter to also write addresses of functions to
.debug_ranges and track the offsets of function address ranges there
- Add some infrastructure to facilitate patching the binary in
simple ways (BinaryPatcher.h)
- In RewriteInstance, after writing .debug_ranges already with
function address ranges, for each function do:
-- Find the abbreviation corresponding to the function
-- Patch .debug_abbrev to replace DW_AT_low_pc with DW_AT_ranges and
DW_AT_high_pc with DW_AT_producer (I'll explain this hack below).
Also patch the corresponding forms to DW_FORM_sec_offset and
DW_FORM_string (null-terminated in-place string).
-- Patch debug_info with the .debug_ranges offset in place of
the first 4 bytes of DW_AT_low_pc (DW_AT_ranges only occupies 4
bytes whereas low_pc occupies 8), and write an arbitrary string
in-place in the other 12 bytes that were the 4 MSB of low_pc
and the 8 bytes of high_pc before the patch. This depends on
low_pc and high_pc being put consecutively by the compiler, but
it serves to validate the idea. I tried another way of doing it
that does not rely on this but it didn't work either and I believe
the reason for either not working is the same (and still unknown,
but unrelated to them. I might be wrong though, and if I find yet
another way of doing it I may try it). The other way was to
use a form of DW_FORM_data8 for the section offset. This is
disallowed by the specification, but I doubt gdb validates this,
as it's just easier to store it as 64-bit anyway as this is even
necessary to support 64-bit DWARF (which is not what gcc generates
by default apparently).
I still need to make changes to the diff to make it production-ready,
but first I want to figure out why it doesn't work as expected.
By looking at the output of llvm-dwarfdump or readelf, all of
.debug_ranges, .debug_abbrev and .debug_info seem to have been
correctly updated. However, gdb seems to have serious problems with
what we write.
(In fact, readelf --debug-dump=Ranges shows some funny warning messages
of the form ("Warning: There is a hole [0x100 - 0x120] in .debug_ranges"),
but I played around with this and it seems it's just because no
compile unit was using these ranges. Changing .debug_info apparently
changes these warnings, so they seem to be unrelated to the section
itself. Also looking at the hex dump of the section doesn't help,
as everything seems fine. llvm-dwarfdump doesn't say anything.
So I think .debug_ranges is fine.)
The result is that gdb not only doesn't show the function name as we
wanted, but it also stops showing line number information.
Apparently it's not reading/interpreting the address ranges at all,
and so the functions now have no associated address ranges, only the
symbol value which allows one to put a breakpoint in the function,
but not to show source code.
As this left me without more ideas of what to try to feed gdb with,
I believe the most promising next trial is to try to debug gdb itself,
unless someone spots anything I missed.
I found where the interesting part of the code lies for this
case (gdb/dwarf2read.c and some other related files, but mainly that one).
It seems in some parts gdb uses DW_AT_ranges for only getting
its lowest and highest addresses and setting that as low_pc and
high_pc (see dwarf2_get_pc_bounds in gdb's code and where it's called).
I really hope this is not actually the case for
function address ranges. I'll investigate this further. Otherwise
I don't think any changes we make will make it work as initially
intended, as we'll simply need gdb to support it and in that case it
doesn't.
(cherry picked from FBD3073641)
2016-03-16 18:08:29 -07:00
|
|
|
if (SectionPatchersIt != SectionPatchers.end()) {
|
|
|
|
|
(*SectionPatchersIt->second).patchBinary(Data);
|
|
|
|
|
}
|
|
|
|
|
OS << Data;
|
2017-04-06 10:49:59 -07:00
|
|
|
|
|
|
|
|
// Add padding as the section extension might rely on the alignment.
|
|
|
|
|
Size = appendPadding(OS, Size, Section.sh_addralign);
|
2016-03-11 11:30:30 -08:00
|
|
|
}
|
2016-03-03 10:13:11 -08:00
|
|
|
|
2016-03-09 16:06:41 -08:00
|
|
|
// Perform section post-processing.
|
2018-02-01 16:33:43 -08:00
|
|
|
auto BSec = BC->getUniqueSectionByName(SectionName);
|
|
|
|
|
uint8_t *SectionData = nullptr;
|
|
|
|
|
if (BSec && !BSec->isAllocatable()) {
|
|
|
|
|
assert(BSec->getAlignment() <= Section.sh_addralign &&
|
2016-03-03 10:13:11 -08:00
|
|
|
"alignment exceeds value in file");
|
2016-03-09 16:06:41 -08:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
if (BSec->getAllocAddress()) {
|
|
|
|
|
SectionData = BSec->getOutputData();
|
2017-04-05 09:29:24 -07:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: " << (Size ? "appending" : "writing")
|
2016-05-16 17:02:17 -07:00
|
|
|
<< " contents to section "
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
<< SectionName << '\n');
|
2018-02-01 16:33:43 -08:00
|
|
|
OS.write(reinterpret_cast<char *>(SectionData),
|
|
|
|
|
BSec->getOutputSize());
|
|
|
|
|
Size += BSec->getOutputSize();
|
2016-03-09 16:06:41 -08:00
|
|
|
}
|
|
|
|
|
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
BSec->setOutputFileOffset(NextAvailableOffset);
|
|
|
|
|
BSec->flushPendingRelocations(OS,
|
|
|
|
|
[this] (const MCSymbol *S) {
|
|
|
|
|
return getNewValueForSymbol(S->getName());
|
|
|
|
|
});
|
2016-03-03 10:13:11 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Set/modify section info.
|
2018-02-01 16:33:43 -08:00
|
|
|
auto &NewSection =
|
|
|
|
|
BC->registerOrUpdateNoteSection(SectionName,
|
|
|
|
|
SectionData,
|
|
|
|
|
Size,
|
|
|
|
|
Section.sh_addralign,
|
|
|
|
|
BSec ? BSec->isReadOnly() : false,
|
|
|
|
|
BSec ? BSec->getELFType()
|
2020-02-18 09:20:17 -08:00
|
|
|
: ELF::SHT_PROGBITS);
|
2019-03-14 18:51:05 -07:00
|
|
|
NewSection.setOutputAddress(0);
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
NewSection.setOutputFileOffset(NextAvailableOffset);
|
2016-03-03 10:13:11 -08:00
|
|
|
|
|
|
|
|
NextAvailableOffset += Size;
|
|
|
|
|
}
|
2017-05-16 17:29:31 -07:00
|
|
|
|
|
|
|
|
// Write new note sections.
|
2017-11-14 20:05:11 -08:00
|
|
|
for (auto &Section : BC->nonAllocatableSections()) {
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (Section.getOutputFileOffset() || !Section.getAllocAddress())
|
2017-05-16 17:29:31 -07:00
|
|
|
continue;
|
|
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
assert(!Section.hasPendingRelocations() && "cannot have pending relocs");
|
2017-05-16 17:29:31 -07:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
NextAvailableOffset = appendPadding(OS, NextAvailableOffset,
|
|
|
|
|
Section.getAlignment());
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
Section.setOutputFileOffset(NextAvailableOffset);
|
2017-05-16 17:29:31 -07:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: writing out new section "
|
|
|
|
|
<< Section.getName() << " of size " << Section.getOutputSize()
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
<< " at offset 0x"
|
|
|
|
|
<< Twine::utohexstr(Section.getOutputFileOffset()) << '\n');
|
2017-05-16 17:29:31 -07:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
OS.write(Section.getOutputContents().data(), Section.getOutputSize());
|
|
|
|
|
NextAvailableOffset += Section.getOutputSize();
|
2017-05-16 17:29:31 -07:00
|
|
|
}
|
2016-03-03 10:13:11 -08:00
|
|
|
}
|
|
|
|
|
|
2017-02-07 12:20:46 -08:00
|
|
|
template <typename ELFT>
|
2017-05-16 17:29:31 -07:00
|
|
|
void RewriteInstance::finalizeSectionStringTable(ELFObjectFile<ELFT> *File) {
|
2017-02-07 12:20:46 -08:00
|
|
|
auto *Obj = File->getELFFile();
|
|
|
|
|
|
|
|
|
|
// Pre-populate section header string table.
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
for (auto &Section : cantFail(Obj->sections())) {
|
|
|
|
|
StringRef SectionName =
|
|
|
|
|
cantFail(Obj->getSectionName(&Section), "cannot get section name");
|
|
|
|
|
SHStrTab.add(SectionName);
|
2019-03-14 18:51:05 -07:00
|
|
|
auto OutputSectionName = getOutputSectionName(Obj, Section);
|
|
|
|
|
if (OutputSectionName != SectionName) {
|
|
|
|
|
AllSHStrTabStrings.emplace_back(SHStrTabPool.intern(OutputSectionName));
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
SHStrTab.add(*AllSHStrTabStrings.back());
|
|
|
|
|
}
|
2017-02-07 12:20:46 -08:00
|
|
|
}
|
2017-11-14 20:05:11 -08:00
|
|
|
for (const auto &Section : BC->sections()) {
|
|
|
|
|
SHStrTab.add(Section.getName());
|
2017-05-16 17:29:31 -07:00
|
|
|
}
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
SHStrTab.finalize();
|
2017-02-07 12:20:46 -08:00
|
|
|
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
const auto SHStrTabSize = SHStrTab.getSize();
|
2017-05-16 17:29:31 -07:00
|
|
|
uint8_t *DataCopy = new uint8_t[SHStrTabSize];
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
memset(DataCopy, 0, SHStrTabSize);
|
|
|
|
|
SHStrTab.write(DataCopy);
|
2018-02-01 16:33:43 -08:00
|
|
|
BC->registerOrUpdateNoteSection(".shstrtab",
|
|
|
|
|
DataCopy,
|
|
|
|
|
SHStrTabSize,
|
|
|
|
|
/*Alignment=*/1,
|
|
|
|
|
/*IsReadOnly=*/true,
|
|
|
|
|
ELF::SHT_STRTAB);
|
2017-02-07 12:20:46 -08:00
|
|
|
}
|
|
|
|
|
|
2018-08-08 17:55:24 -07:00
|
|
|
void RewriteInstance::addBoltInfoSection() {
|
|
|
|
|
std::string DescStr;
|
|
|
|
|
raw_string_ostream DescOS(DescStr);
|
2017-05-24 14:14:16 -07:00
|
|
|
|
2018-08-08 17:55:24 -07:00
|
|
|
DescOS << "BOLT revision: " << BoltRevision << ", "
|
|
|
|
|
<< "command line:";
|
|
|
|
|
for (auto I = 0; I < Argc; ++I) {
|
|
|
|
|
DescOS << " " << Argv[I];
|
2017-05-24 14:14:16 -07:00
|
|
|
}
|
2018-08-08 17:55:24 -07:00
|
|
|
DescOS.flush();
|
|
|
|
|
|
2019-08-02 11:20:13 -07:00
|
|
|
// Encode as GNU GOLD VERSION so it is easily printable by 'readelf -n'
|
2018-08-08 17:55:24 -07:00
|
|
|
const auto BoltInfo =
|
2019-08-02 11:20:13 -07:00
|
|
|
BinarySection::encodeELFNote("GNU", DescStr, 4 /*NT_GNU_GOLD_VERSION*/);
|
2018-08-08 17:55:24 -07:00
|
|
|
BC->registerOrUpdateNoteSection(".note.bolt_info", copyByteArray(BoltInfo),
|
|
|
|
|
BoltInfo.size(),
|
|
|
|
|
/*Alignment=*/1,
|
|
|
|
|
/*IsReadOnly=*/true, ELF::SHT_NOTE);
|
2017-05-24 14:14:16 -07:00
|
|
|
}
|
|
|
|
|
|
2019-04-12 17:33:46 -07:00
|
|
|
void RewriteInstance::addBATSection() {
|
|
|
|
|
BC->registerOrUpdateNoteSection(BoltAddressTranslation::SECTION_NAME, nullptr,
|
|
|
|
|
0,
|
|
|
|
|
/*Alignment=*/1,
|
|
|
|
|
/*IsReadOnly=*/true, ELF::SHT_NOTE);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void RewriteInstance::encodeBATSection() {
|
|
|
|
|
std::string DescStr;
|
|
|
|
|
raw_string_ostream DescOS(DescStr);
|
|
|
|
|
|
|
|
|
|
BAT->write(DescOS);
|
|
|
|
|
DescOS.flush();
|
|
|
|
|
|
2019-08-02 11:20:13 -07:00
|
|
|
const auto BoltInfo =
|
|
|
|
|
BinarySection::encodeELFNote("BOLT", DescStr, BinarySection::NT_BOLT_BAT);
|
2019-04-12 17:33:46 -07:00
|
|
|
BC->registerOrUpdateNoteSection(BoltAddressTranslation::SECTION_NAME,
|
|
|
|
|
copyByteArray(BoltInfo), BoltInfo.size(),
|
|
|
|
|
/*Alignment=*/1,
|
|
|
|
|
/*IsReadOnly=*/true, ELF::SHT_NOTE);
|
|
|
|
|
}
|
|
|
|
|
|
2019-03-14 18:51:05 -07:00
|
|
|
template<typename ELFObjType, typename ELFShdrTy>
|
|
|
|
|
std::string RewriteInstance::getOutputSectionName(const ELFObjType *Obj,
|
|
|
|
|
const ELFShdrTy &Section) {
|
|
|
|
|
if (Section.sh_type == ELF::SHT_NULL)
|
|
|
|
|
return "";
|
|
|
|
|
|
|
|
|
|
StringRef SectionName =
|
|
|
|
|
cantFail(Obj->getSectionName(&Section), "cannot get section name");
|
|
|
|
|
|
2019-04-26 15:30:12 -07:00
|
|
|
if ((Section.sh_flags & ELF::SHF_ALLOC) && willOverwriteSection(SectionName))
|
2020-03-11 15:51:32 -07:00
|
|
|
return (getOrgSecPrefix() + SectionName).str();
|
2019-03-14 18:51:05 -07:00
|
|
|
|
|
|
|
|
return SectionName;
|
|
|
|
|
}
|
|
|
|
|
|
2019-10-29 14:49:49 -07:00
|
|
|
template <typename ELFShdrTy>
|
|
|
|
|
bool RewriteInstance::shouldStrip(const ELFShdrTy &Section,
|
|
|
|
|
StringRef SectionName) {
|
|
|
|
|
// Strip non-allocatable relocation sections.
|
|
|
|
|
if (!(Section.sh_flags & ELF::SHF_ALLOC) && Section.sh_type == ELF::SHT_RELA)
|
|
|
|
|
return true;
|
|
|
|
|
|
|
|
|
|
// Strip debug sections if not updating them.
|
|
|
|
|
if (isDebugSection(SectionName) && !opts::UpdateDebugSections)
|
|
|
|
|
return true;
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2017-06-27 16:25:59 -07:00
|
|
|
template <typename ELFT, typename ELFShdrTy>
|
2019-03-14 18:51:05 -07:00
|
|
|
std::vector<ELFShdrTy> RewriteInstance::getOutputSections(
|
|
|
|
|
ELFObjectFile<ELFT> *File, std::vector<uint32_t> &NewSectionIndex) {
|
2016-09-27 19:09:38 -07:00
|
|
|
auto *Obj = File->getELFFile();
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
auto Sections = cantFail(Obj->sections());
|
2016-02-12 19:01:53 -08:00
|
|
|
|
2019-03-14 18:51:05 -07:00
|
|
|
// Keep track of section header entries together with their name.
|
|
|
|
|
std::vector<std::pair<std::string, ELFShdrTy>> OutputSections;
|
|
|
|
|
auto addSection = [&](const std::string &Name, const ELFShdrTy &Section) {
|
|
|
|
|
auto NewSection = Section;
|
|
|
|
|
NewSection.sh_name = SHStrTab.getOffset(Name);
|
|
|
|
|
OutputSections.emplace_back(std::make_pair(Name, std::move(NewSection)));
|
|
|
|
|
};
|
2016-09-16 15:54:32 -07:00
|
|
|
|
2019-03-14 18:51:05 -07:00
|
|
|
// Copy over entries for original allocatable sections using modified name.
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
for (auto &Section : Sections) {
|
2016-02-12 19:01:53 -08:00
|
|
|
// Always ignore this section.
|
|
|
|
|
if (Section.sh_type == ELF::SHT_NULL) {
|
2019-03-14 18:51:05 -07:00
|
|
|
OutputSections.emplace_back(std::make_pair("", Section));
|
2016-02-12 19:01:53 -08:00
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
2016-03-03 10:13:11 -08:00
|
|
|
if (!(Section.sh_flags & ELF::SHF_ALLOC))
|
2016-09-27 19:09:38 -07:00
|
|
|
continue;
|
2016-02-12 19:01:53 -08:00
|
|
|
|
2019-03-14 18:51:05 -07:00
|
|
|
addSection(getOutputSectionName(Obj, Section), Section);
|
2016-03-03 10:13:11 -08:00
|
|
|
}
|
2016-02-12 19:01:53 -08:00
|
|
|
|
2019-03-14 18:51:05 -07:00
|
|
|
for (const auto &Section : BC->allocatableSections()) {
|
2017-11-14 20:05:11 -08:00
|
|
|
if (!Section.isFinalized())
|
2018-02-01 16:33:43 -08:00
|
|
|
continue;
|
|
|
|
|
|
2020-03-11 15:51:32 -07:00
|
|
|
if (Section.getName().startswith(getOrgSecPrefix()) ||
|
2020-03-06 15:06:37 -08:00
|
|
|
Section.isAnonymous()) {
|
2016-09-27 19:09:38 -07:00
|
|
|
if (opts::Verbosity)
|
2019-03-27 13:58:31 -07:00
|
|
|
outs() << "BOLT-INFO: not writing section header for section "
|
2019-03-14 18:51:05 -07:00
|
|
|
<< Section.getName() << '\n';
|
2016-03-03 10:13:11 -08:00
|
|
|
continue;
|
2016-09-02 14:15:29 -07:00
|
|
|
}
|
2017-06-27 16:25:59 -07:00
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
if (opts::Verbosity >= 1)
|
2018-02-01 16:33:43 -08:00
|
|
|
outs() << "BOLT-INFO: writing section header for "
|
2019-03-14 18:51:05 -07:00
|
|
|
<< Section.getName() << '\n';
|
2017-06-27 16:25:59 -07:00
|
|
|
ELFShdrTy NewSection;
|
2016-03-03 10:13:11 -08:00
|
|
|
NewSection.sh_type = ELF::SHT_PROGBITS;
|
2019-03-14 18:51:05 -07:00
|
|
|
NewSection.sh_addr = Section.getOutputAddress();
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
NewSection.sh_offset = Section.getOutputFileOffset();
|
2019-03-14 18:51:05 -07:00
|
|
|
NewSection.sh_size = Section.getOutputSize();
|
2016-03-03 10:13:11 -08:00
|
|
|
NewSection.sh_entsize = 0;
|
2019-03-14 18:51:05 -07:00
|
|
|
NewSection.sh_flags = Section.getELFFlags();
|
2016-03-03 10:13:11 -08:00
|
|
|
NewSection.sh_link = 0;
|
|
|
|
|
NewSection.sh_info = 0;
|
2019-03-14 18:51:05 -07:00
|
|
|
NewSection.sh_addralign = Section.getAlignment();
|
|
|
|
|
addSection(Section.getName(), NewSection);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sort all allocatable sections by their offset.
|
|
|
|
|
std::stable_sort(OutputSections.begin(), OutputSections.end(),
|
|
|
|
|
[] (const std::pair<std::string, ELFShdrTy> &A,
|
|
|
|
|
const std::pair<std::string, ELFShdrTy> &B) {
|
|
|
|
|
return A.second.sh_offset < B.second.sh_offset;
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
// Fix section sizes to prevent overlapping.
|
|
|
|
|
for (uint32_t Index = 1; Index < OutputSections.size(); ++Index) {
|
|
|
|
|
auto &PrevSection = OutputSections[Index - 1].second;
|
|
|
|
|
auto &Section = OutputSections[Index].second;
|
|
|
|
|
|
|
|
|
|
// Skip TBSS section size adjustment.
|
|
|
|
|
if (PrevSection.sh_type == ELF::SHT_NOBITS &&
|
|
|
|
|
(PrevSection.sh_flags & ELF::SHF_TLS))
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
if (PrevSection.sh_addr + PrevSection.sh_size > Section.sh_addr) {
|
|
|
|
|
if (opts::Verbosity > 1) {
|
|
|
|
|
outs() << "BOLT-INFO: adjusting size for section "
|
|
|
|
|
<< OutputSections[Index - 1].first << '\n';
|
|
|
|
|
}
|
|
|
|
|
PrevSection.sh_size = Section.sh_addr > PrevSection.sh_addr ?
|
|
|
|
|
Section.sh_addr - PrevSection.sh_addr : 0;
|
|
|
|
|
}
|
2016-03-03 10:13:11 -08:00
|
|
|
}
|
2016-02-12 19:01:53 -08:00
|
|
|
|
2017-05-16 17:29:31 -07:00
|
|
|
uint64_t LastFileOffset = 0;
|
2016-02-12 19:01:53 -08:00
|
|
|
|
2016-03-03 10:13:11 -08:00
|
|
|
// Copy over entries for non-allocatable sections performing necessary
|
2016-09-27 19:09:38 -07:00
|
|
|
// adjustments.
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
for (auto &Section : Sections) {
|
2016-03-03 10:13:11 -08:00
|
|
|
if (Section.sh_type == ELF::SHT_NULL)
|
|
|
|
|
continue;
|
|
|
|
|
if (Section.sh_flags & ELF::SHF_ALLOC)
|
|
|
|
|
continue;
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2018-04-20 20:03:31 -07:00
|
|
|
StringRef SectionName =
|
|
|
|
|
cantFail(Obj->getSectionName(&Section), "cannot get section name");
|
|
|
|
|
|
2019-10-29 14:49:49 -07:00
|
|
|
if (shouldStrip(Section, SectionName))
|
2019-04-26 15:30:12 -07:00
|
|
|
continue;
|
|
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
auto BSec = BC->getUniqueSectionByName(SectionName);
|
|
|
|
|
assert(BSec && "missing section info for non-allocatable section");
|
2016-02-12 19:01:53 -08:00
|
|
|
|
2016-03-03 10:13:11 -08:00
|
|
|
auto NewSection = Section;
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
NewSection.sh_offset = BSec->getOutputFileOffset();
|
2018-02-01 16:33:43 -08:00
|
|
|
NewSection.sh_size = BSec->getOutputSize();
|
2016-02-12 19:01:53 -08:00
|
|
|
|
2018-10-22 18:48:12 -07:00
|
|
|
if (NewSection.sh_type == ELF::SHT_SYMTAB) {
|
|
|
|
|
NewSection.sh_info = NumLocalSymbols;
|
|
|
|
|
}
|
|
|
|
|
|
2019-03-14 18:51:05 -07:00
|
|
|
addSection(SectionName, NewSection);
|
2017-05-16 17:29:31 -07:00
|
|
|
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
LastFileOffset = BSec->getOutputFileOffset();
|
2016-02-12 19:01:53 -08:00
|
|
|
}
|
|
|
|
|
|
2017-05-16 17:29:31 -07:00
|
|
|
// Create entries for new non-allocatable sections.
|
2017-11-14 20:05:11 -08:00
|
|
|
for (auto &Section : BC->nonAllocatableSections()) {
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (Section.getOutputFileOffset() <= LastFileOffset)
|
2017-05-16 17:29:31 -07:00
|
|
|
continue;
|
2017-02-07 12:20:46 -08:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
if (opts::Verbosity >= 1) {
|
|
|
|
|
outs() << "BOLT-INFO: writing section header for "
|
|
|
|
|
<< Section.getName() << '\n';
|
|
|
|
|
}
|
2017-06-27 16:25:59 -07:00
|
|
|
ELFShdrTy NewSection;
|
2018-02-01 16:33:43 -08:00
|
|
|
NewSection.sh_type = Section.getELFType();
|
2017-05-16 17:29:31 -07:00
|
|
|
NewSection.sh_addr = 0;
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
NewSection.sh_offset = Section.getOutputFileOffset();
|
2018-02-01 16:33:43 -08:00
|
|
|
NewSection.sh_size = Section.getOutputSize();
|
2017-05-16 17:29:31 -07:00
|
|
|
NewSection.sh_entsize = 0;
|
2018-02-01 16:33:43 -08:00
|
|
|
NewSection.sh_flags = Section.getELFFlags();
|
2017-05-16 17:29:31 -07:00
|
|
|
NewSection.sh_link = 0;
|
|
|
|
|
NewSection.sh_info = 0;
|
2018-02-01 16:33:43 -08:00
|
|
|
NewSection.sh_addralign = Section.getAlignment();
|
2019-03-14 18:51:05 -07:00
|
|
|
|
|
|
|
|
addSection(Section.getName(), NewSection);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Assign indices to sections.
|
|
|
|
|
std::unordered_map<std::string, uint64_t> NameToIndex;
|
|
|
|
|
for (uint32_t Index = 1; Index < OutputSections.size(); ++Index) {
|
|
|
|
|
const auto &SectionName = OutputSections[Index].first;
|
|
|
|
|
NameToIndex[SectionName] = Index;
|
|
|
|
|
if (auto Section = BC->getUniqueSectionByName(SectionName))
|
|
|
|
|
Section->setIndex(Index);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Update section index mapping
|
|
|
|
|
NewSectionIndex.clear();
|
|
|
|
|
NewSectionIndex.resize(Sections.size(), 0);
|
|
|
|
|
for (auto &Section : Sections) {
|
|
|
|
|
if (Section.sh_type == ELF::SHT_NULL)
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
auto OrgIndex = std::distance(Sections.begin(), &Section);
|
|
|
|
|
auto SectionName = getOutputSectionName(Obj, Section);
|
|
|
|
|
|
|
|
|
|
// Some sections are stripped
|
|
|
|
|
if (!NameToIndex.count(SectionName))
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
NewSectionIndex[OrgIndex] = NameToIndex[SectionName];
|
2017-05-16 17:29:31 -07:00
|
|
|
}
|
|
|
|
|
|
2019-03-14 18:51:05 -07:00
|
|
|
std::vector<ELFShdrTy> SectionsOnly(OutputSections.size());
|
|
|
|
|
std::transform(OutputSections.begin(), OutputSections.end(),
|
|
|
|
|
SectionsOnly.begin(),
|
|
|
|
|
[](std::pair<std::string, ELFShdrTy> &SectionInfo) {
|
|
|
|
|
return SectionInfo.second;
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
return SectionsOnly;
|
2017-06-27 16:25:59 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Rewrite section header table inserting new entries as needed. The sections
|
|
|
|
|
// header table size itself may affect the offsets of other sections,
|
|
|
|
|
// so we are placing it at the end of the binary.
|
|
|
|
|
//
|
|
|
|
|
// As we rewrite entries we need to track how many sections were inserted
|
|
|
|
|
// as it changes the sh_link value. We map old indices to new ones for
|
|
|
|
|
// existing sections.
|
|
|
|
|
template <typename ELFT>
|
|
|
|
|
void RewriteInstance::patchELFSectionHeaderTable(ELFObjectFile<ELFT> *File) {
|
2020-02-26 20:43:18 -08:00
|
|
|
using ELFShdrTy = typename ELFObjectFile<ELFT>::Elf_Shdr;
|
2017-06-27 16:25:59 -07:00
|
|
|
auto &OS = Out->os();
|
|
|
|
|
auto *Obj = File->getELFFile();
|
|
|
|
|
|
2019-03-14 18:51:05 -07:00
|
|
|
std::vector<uint32_t> NewSectionIndex;
|
|
|
|
|
auto OutputSections = getOutputSections(File, NewSectionIndex);
|
2017-06-07 20:06:29 -07:00
|
|
|
DEBUG(
|
|
|
|
|
dbgs() << "BOLT-DEBUG: old to new section index mapping:\n";
|
|
|
|
|
for (uint64_t I = 0; I < NewSectionIndex.size(); ++I) {
|
|
|
|
|
dbgs() << " " << I << " -> " << NewSectionIndex[I] << '\n';
|
|
|
|
|
}
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
// Align starting address for section header table.
|
|
|
|
|
auto SHTOffset = OS.tell();
|
2020-02-26 20:43:18 -08:00
|
|
|
SHTOffset = appendPadding(OS, SHTOffset, sizeof(ELFShdrTy));
|
2017-06-07 20:06:29 -07:00
|
|
|
|
|
|
|
|
// Write all section header entries while patching section references.
|
2019-03-14 18:51:05 -07:00
|
|
|
for (auto &Section : OutputSections) {
|
2017-06-07 20:06:29 -07:00
|
|
|
Section.sh_link = NewSectionIndex[Section.sh_link];
|
|
|
|
|
if (Section.sh_type == ELF::SHT_REL || Section.sh_type == ELF::SHT_RELA) {
|
|
|
|
|
if (Section.sh_info)
|
|
|
|
|
Section.sh_info = NewSectionIndex[Section.sh_info];
|
|
|
|
|
}
|
|
|
|
|
OS.write(reinterpret_cast<const char *>(&Section), sizeof(Section));
|
2017-05-16 17:29:31 -07:00
|
|
|
}
|
2017-02-22 11:29:52 -08:00
|
|
|
|
2016-02-12 19:01:53 -08:00
|
|
|
// Fix ELF header.
|
|
|
|
|
auto NewEhdr = *Obj->getHeader();
|
2017-05-08 22:51:36 -07:00
|
|
|
|
2017-12-09 21:40:39 -08:00
|
|
|
if (BC->HasRelocations) {
|
Refactor runtime library
Summary:
As we are adding more types of runtime libraries, it would be better to move the runtime library out of RewriteInstance so that it could grow separately. This also requires splitting the current implementation of Instrumentation.cpp to two separate pieces, one as normal Pass, one as the runtime library. The Instrumentation Pass would pass over the generated data to the runtime library, which will use to emit binary and perform linking.
This patch does the following:
1. Turn Instrumentation class into an optimization pass. Register the pass in the pass manager instead of in RewriteInstance.
2. Split all the data that are generated by Instrumentation that's needed by runtime library into a separate data structure called InstrumentationSummary. At the creation of Instrumentation pass, we create an instance of such data structure, which will be moved over to the runtime at the end of the pass.
3. Added a runtime library member to BinaryContext. Set the member at the end of Instrumentation pass.
4. In BinaryEmitter, make BinaryContext to also emit runtime library binary.
5. Created a base class RuntimeLibrary, that defines the interface of a runtime library, along with a few common helper functions.
6. Created InstrumentationRuntimeLibrary which inherits from RuntimeLibrary, that does all the work (mostly copied over) for emit and linking.
7. Added a new directory called RuntimeLibs, and put all the runtime library related files into it.
(cherry picked from FBD21694762)
2020-05-21 14:28:47 -07:00
|
|
|
if (auto *RtLibrary = BC->getRuntimeLibrary()) {
|
|
|
|
|
NewEhdr.e_entry = RtLibrary->getRuntimeStartAddress();
|
|
|
|
|
} else {
|
|
|
|
|
NewEhdr.e_entry = getNewFunctionAddress(NewEhdr.e_entry);
|
|
|
|
|
}
|
2020-06-23 12:22:58 -07:00
|
|
|
assert((NewEhdr.e_entry || !Obj->getHeader()->e_entry) &&
|
|
|
|
|
"cannot find new address for entry point");
|
2017-05-08 22:51:36 -07:00
|
|
|
}
|
2016-02-12 19:01:53 -08:00
|
|
|
NewEhdr.e_phoff = PHDRTableOffset;
|
|
|
|
|
NewEhdr.e_phnum = Phnum;
|
2016-03-03 10:13:11 -08:00
|
|
|
NewEhdr.e_shoff = SHTOffset;
|
2017-06-27 16:25:59 -07:00
|
|
|
NewEhdr.e_shnum = OutputSections.size();
|
2017-06-07 20:06:29 -07:00
|
|
|
NewEhdr.e_shstrndx = NewSectionIndex[NewEhdr.e_shstrndx];
|
2016-02-12 19:01:53 -08:00
|
|
|
OS.pwrite(reinterpret_cast<const char *>(&NewEhdr), sizeof(NewEhdr), 0);
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
|
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
template <typename ELFT,
|
|
|
|
|
typename ELFShdrTy,
|
|
|
|
|
typename WriteFuncTy,
|
|
|
|
|
typename StrTabFuncTy>
|
|
|
|
|
void RewriteInstance::updateELFSymbolTable(
|
|
|
|
|
ELFObjectFile<ELFT> *File,
|
|
|
|
|
bool PatchExisting,
|
|
|
|
|
const ELFShdrTy &SymTabSection,
|
|
|
|
|
const std::vector<uint32_t> &NewSectionIndex,
|
|
|
|
|
WriteFuncTy Write,
|
|
|
|
|
StrTabFuncTy AddToStrTab) {
|
2016-09-27 19:09:38 -07:00
|
|
|
auto *Obj = File->getELFFile();
|
2020-02-26 20:43:18 -08:00
|
|
|
using ELFSymTy = typename ELFObjectFile<ELFT>::Elf_Sym;
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
auto StringSection = cantFail(Obj->getStringTableForSymtab(SymTabSection));
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
unsigned NumHotTextSymsUpdated = 0;
|
|
|
|
|
unsigned NumHotDataSymsUpdated = 0;
|
2018-04-20 20:03:31 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
std::map<const BinaryFunction *, uint64_t> IslandSizes;
|
|
|
|
|
auto getConstantIslandSize = [&IslandSizes](const BinaryFunction &BF) {
|
|
|
|
|
auto Itr = IslandSizes.find(&BF);
|
|
|
|
|
if (Itr != IslandSizes.end())
|
|
|
|
|
return Itr->second;
|
|
|
|
|
return IslandSizes[&BF] = BF.estimateConstantIslandSize();
|
|
|
|
|
};
|
2017-11-14 20:05:11 -08:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
// Symbols for the new symbol table.
|
|
|
|
|
std::vector<ELFSymTy> Symbols;
|
|
|
|
|
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
// Add extra symbols for the function.
|
2020-06-24 12:36:15 -07:00
|
|
|
//
|
|
|
|
|
// Note that addExtraSymbols() could be called multiple times for the same
|
|
|
|
|
// function with different FunctionSymbol matching the main function entry
|
|
|
|
|
// point.
|
2020-02-26 20:43:18 -08:00
|
|
|
auto addExtraSymbols = [&](const BinaryFunction &Function,
|
|
|
|
|
const ELFSymTy &FunctionSymbol) {
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (Function.isPatched()) {
|
|
|
|
|
Function.forEachEntryPoint([&](uint64_t Offset, const MCSymbol *Symbol) {
|
|
|
|
|
ELFSymTy OrgSymbol = FunctionSymbol;
|
|
|
|
|
SmallVector<char, 256> Buf;
|
2020-06-24 12:36:15 -07:00
|
|
|
if (!Offset) {
|
|
|
|
|
// Use the original function symbol name. This guarantees that the
|
|
|
|
|
// name will be unique.
|
|
|
|
|
OrgSymbol.st_name = AddToStrTab(
|
|
|
|
|
Twine(cantFail(FunctionSymbol.getName(StringSection)))
|
|
|
|
|
.concat(".org.0").
|
|
|
|
|
toStringRef(Buf));
|
|
|
|
|
OrgSymbol.st_size = Function.getSize();
|
|
|
|
|
} else {
|
|
|
|
|
// It's unlikely that multiple functions with secondary entries will
|
|
|
|
|
// get folded/merged. However, in case this happens, we force local
|
|
|
|
|
// symbol visibility for secondary entries.
|
|
|
|
|
OrgSymbol.st_name = AddToStrTab(
|
|
|
|
|
Twine(Symbol->getName()).concat(".org.0").toStringRef(Buf));
|
|
|
|
|
OrgSymbol.setBindingAndType(ELF::STB_LOCAL, ELF::STT_FUNC);
|
|
|
|
|
OrgSymbol.st_size = 0;
|
|
|
|
|
}
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
OrgSymbol.st_value = Function.getAddress() + Offset;
|
|
|
|
|
OrgSymbol.st_shndx =
|
|
|
|
|
NewSectionIndex[Function.getSection().getSectionRef().getIndex()];
|
|
|
|
|
Symbols.emplace_back(OrgSymbol);
|
|
|
|
|
return true;
|
|
|
|
|
});
|
|
|
|
|
}
|
2020-04-04 20:12:38 -07:00
|
|
|
if (Function.isFolded()) {
|
|
|
|
|
auto *ICFParent = Function.getFoldedIntoFunction();
|
|
|
|
|
while (ICFParent->isFolded())
|
|
|
|
|
ICFParent = ICFParent->getFoldedIntoFunction();
|
|
|
|
|
auto ICFSymbol = FunctionSymbol;
|
|
|
|
|
SmallVector<char, 256> Buf;
|
|
|
|
|
ICFSymbol.st_name =
|
|
|
|
|
AddToStrTab(Twine(cantFail(FunctionSymbol.getName(StringSection)))
|
|
|
|
|
.concat(".icf.0")
|
|
|
|
|
.toStringRef(Buf));
|
|
|
|
|
ICFSymbol.st_value = ICFParent->getOutputAddress();
|
|
|
|
|
ICFSymbol.st_size = ICFParent->getOutputSize();
|
2020-06-09 19:12:06 -07:00
|
|
|
ICFSymbol.st_shndx = ICFParent->getCodeSection()->getIndex();
|
2020-04-04 20:12:38 -07:00
|
|
|
Symbols.emplace_back(ICFSymbol);
|
|
|
|
|
}
|
|
|
|
|
if (Function.isSplit() && Function.cold().getAddress()) {
|
2020-02-26 20:43:18 -08:00
|
|
|
auto NewColdSym = FunctionSymbol;
|
|
|
|
|
SmallVector<char, 256> Buf;
|
|
|
|
|
NewColdSym.st_name =
|
2020-04-04 20:12:38 -07:00
|
|
|
AddToStrTab(Twine(cantFail(FunctionSymbol.getName(StringSection)))
|
|
|
|
|
.concat(".cold.0")
|
|
|
|
|
.toStringRef(Buf));
|
2020-02-26 20:43:18 -08:00
|
|
|
NewColdSym.st_shndx = Function.getColdCodeSection()->getIndex();
|
|
|
|
|
NewColdSym.st_value = Function.cold().getAddress();
|
|
|
|
|
NewColdSym.st_size = Function.cold().getImageSize();
|
|
|
|
|
NewColdSym.setBindingAndType(ELF::STB_LOCAL, ELF::STT_FUNC);
|
|
|
|
|
Symbols.emplace_back(NewColdSym);
|
|
|
|
|
}
|
|
|
|
|
if (Function.hasConstantIsland()) {
|
|
|
|
|
auto DataMark = Function.getOutputDataAddress();
|
|
|
|
|
auto CISize = getConstantIslandSize(Function);
|
|
|
|
|
auto CodeMark = DataMark + CISize;
|
|
|
|
|
auto DataMarkSym = FunctionSymbol;
|
|
|
|
|
DataMarkSym.st_name = AddToStrTab("$d");
|
|
|
|
|
DataMarkSym.st_value = DataMark;
|
|
|
|
|
DataMarkSym.st_size = 0;
|
|
|
|
|
DataMarkSym.setType(ELF::STT_NOTYPE);
|
|
|
|
|
DataMarkSym.setBinding(ELF::STB_LOCAL);
|
|
|
|
|
auto CodeMarkSym = DataMarkSym;
|
|
|
|
|
CodeMarkSym.st_name = AddToStrTab("$x");
|
|
|
|
|
CodeMarkSym.st_value = CodeMark;
|
|
|
|
|
Symbols.emplace_back(DataMarkSym);
|
|
|
|
|
Symbols.emplace_back(CodeMarkSym);
|
|
|
|
|
}
|
|
|
|
|
if (Function.hasConstantIsland() && Function.isSplit()) {
|
|
|
|
|
auto DataMark = Function.getOutputColdDataAddress();
|
|
|
|
|
auto CISize = getConstantIslandSize(Function);
|
|
|
|
|
auto CodeMark = DataMark + CISize;
|
|
|
|
|
auto DataMarkSym = FunctionSymbol;
|
|
|
|
|
DataMarkSym.st_name = AddToStrTab("$d");
|
|
|
|
|
DataMarkSym.st_value = DataMark;
|
|
|
|
|
DataMarkSym.st_size = 0;
|
|
|
|
|
DataMarkSym.setType(ELF::STT_NOTYPE);
|
|
|
|
|
DataMarkSym.setBinding(ELF::STB_LOCAL);
|
|
|
|
|
auto CodeMarkSym = DataMarkSym;
|
|
|
|
|
CodeMarkSym.st_name = AddToStrTab("$x");
|
|
|
|
|
CodeMarkSym.st_value = CodeMark;
|
|
|
|
|
Symbols.emplace_back(DataMarkSym);
|
|
|
|
|
Symbols.emplace_back(CodeMarkSym);
|
|
|
|
|
}
|
|
|
|
|
};
|
2018-07-08 12:14:08 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
// For regular (non-dynamic) symbol table, exclude symbols referring
|
|
|
|
|
// to non-allocatable sections.
|
|
|
|
|
auto shouldStrip = [&](const ELFSymTy &Symbol) {
|
|
|
|
|
if (Symbol.isAbsolute() || !Symbol.isDefined())
|
|
|
|
|
return false;
|
|
|
|
|
|
|
|
|
|
// If we cannot link the symbol to a section, leave it as is.
|
|
|
|
|
auto Section = Obj->getSection(Symbol.st_shndx);
|
|
|
|
|
if (!Section)
|
|
|
|
|
return false;
|
|
|
|
|
|
|
|
|
|
// Remove the section symbol iif the corresponding section was stripped.
|
|
|
|
|
if (Symbol.getType() == ELF::STT_SECTION) {
|
|
|
|
|
if (!NewSectionIndex[Symbol.st_shndx])
|
|
|
|
|
return true;
|
|
|
|
|
return false;
|
2018-07-08 12:14:08 -07:00
|
|
|
}
|
|
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
// Symbols in non-allocatable sections are typically remnants of relocations
|
|
|
|
|
// emitted under "-emit-relocs" linker option. Delete those as we delete
|
|
|
|
|
// relocations against non-allocatable sections.
|
|
|
|
|
if (!((*Section)->sh_flags & ELF::SHF_ALLOC))
|
|
|
|
|
return true;
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
for (const ELFSymTy &Symbol : cantFail(Obj->symbols(&SymTabSection))) {
|
|
|
|
|
// For regular (non-dynamic) symbol table strip unneeded symbols.
|
|
|
|
|
if (!PatchExisting && shouldStrip(Symbol))
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
const auto *Function = BC->getBinaryFunctionAtAddress(Symbol.st_value,
|
|
|
|
|
/*Shallow=*/true);
|
|
|
|
|
// Ignore false function references, e.g. when the section address matches
|
|
|
|
|
// the address of the function.
|
|
|
|
|
if (Function && Symbol.getType() == ELF::STT_SECTION)
|
|
|
|
|
Function = nullptr;
|
|
|
|
|
|
|
|
|
|
// For non-dynamic symtab, make sure the symbol section matches that of
|
|
|
|
|
// the function. It can mismatch e.g. if the symbol is a section marker
|
|
|
|
|
// in which case we treat the symbol separately from the function.
|
|
|
|
|
// For dynamic symbol table, the section index could be wrong on the input,
|
|
|
|
|
// and its value is ignored by the runtime if it's different from
|
|
|
|
|
// SHN_UNDEF and SHN_ABS.
|
|
|
|
|
if (!PatchExisting && Function &&
|
|
|
|
|
Symbol.st_shndx != Function->getSection().getSectionRef().getIndex())
|
|
|
|
|
Function = nullptr;
|
|
|
|
|
|
|
|
|
|
// Create a new symbol based on the existing symbol.
|
|
|
|
|
auto NewSymbol = Symbol;
|
|
|
|
|
|
2020-04-16 00:05:01 -07:00
|
|
|
if (Function) {
|
2020-06-09 19:12:06 -07:00
|
|
|
// If the symbol matched a function that was not emitted, update the
|
|
|
|
|
// corresponding section index but otherwise leave it unchanged.
|
2020-04-16 00:05:01 -07:00
|
|
|
if (Function->isEmitted()) {
|
|
|
|
|
NewSymbol.st_value = Function->getOutputAddress();
|
|
|
|
|
NewSymbol.st_size = Function->getOutputSize();
|
|
|
|
|
NewSymbol.st_shndx = Function->getCodeSection()->getIndex();
|
2020-06-09 19:12:06 -07:00
|
|
|
} else if (Symbol.st_shndx < ELF::SHN_LORESERVE) {
|
|
|
|
|
NewSymbol.st_shndx = NewSectionIndex[Symbol.st_shndx];
|
2020-04-16 00:05:01 -07:00
|
|
|
}
|
2020-02-26 20:43:18 -08:00
|
|
|
|
|
|
|
|
// Add new symbols to the symbol table if necessary.
|
|
|
|
|
if (!PatchExisting)
|
|
|
|
|
addExtraSymbols(*Function, NewSymbol);
|
2020-04-16 00:05:01 -07:00
|
|
|
} else {
|
2020-02-26 20:43:18 -08:00
|
|
|
// Check if the function symbol matches address inside a function, i.e.
|
|
|
|
|
// it marks a secondary entry point.
|
|
|
|
|
Function = (Symbol.getType() == ELF::STT_FUNC)
|
|
|
|
|
? BC->getBinaryFunctionContainingAddress(Symbol.st_value,
|
|
|
|
|
/*CheckPastEnd=*/false,
|
|
|
|
|
/*UseMaxSize=*/true,
|
|
|
|
|
/*Shallow=*/true)
|
|
|
|
|
: nullptr;
|
|
|
|
|
|
|
|
|
|
if (Function && Function->isEmitted()) {
|
|
|
|
|
const auto OutputAddress =
|
|
|
|
|
Function->translateInputToOutputAddress(Symbol.st_value);
|
|
|
|
|
|
|
|
|
|
NewSymbol.st_value = OutputAddress;
|
|
|
|
|
// Force secondary entry points to have zero size.
|
|
|
|
|
NewSymbol.st_size = 0;
|
|
|
|
|
NewSymbol.st_shndx = OutputAddress >= Function->cold().getAddress() &&
|
|
|
|
|
OutputAddress < Function->cold().getImageSize()
|
|
|
|
|
? Function->getColdCodeSection()->getIndex()
|
|
|
|
|
: Function->getCodeSection()->getIndex();
|
2016-09-27 19:09:38 -07:00
|
|
|
} else {
|
2020-02-26 20:43:18 -08:00
|
|
|
// Check if the symbol belongs to moved data object and update it.
|
|
|
|
|
BinaryData *BD = opts::ReorderData.empty()
|
|
|
|
|
? nullptr
|
|
|
|
|
: BC->getBinaryDataAtAddress(Symbol.st_value);
|
|
|
|
|
if (BD && BD->isMoved() && !BD->isJumpTable()) {
|
|
|
|
|
assert((!BD->getSize() || !Symbol.st_size ||
|
|
|
|
|
Symbol.st_size == BD->getSize()) &&
|
2018-04-20 20:03:31 -07:00
|
|
|
"sizes must match");
|
|
|
|
|
|
|
|
|
|
auto &OutputSection = BD->getOutputSection();
|
2019-03-14 18:51:05 -07:00
|
|
|
assert(OutputSection.getIndex());
|
2018-04-20 20:03:31 -07:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: moving " << BD->getName() << " from "
|
2020-02-26 20:43:18 -08:00
|
|
|
<< *BC->getSectionNameForAddress(Symbol.st_value)
|
|
|
|
|
<< " (" << Symbol.st_shndx << ") to "
|
2018-04-20 20:03:31 -07:00
|
|
|
<< OutputSection.getName() << " ("
|
2019-03-14 18:51:05 -07:00
|
|
|
<< OutputSection.getIndex() << ")\n");
|
|
|
|
|
NewSymbol.st_shndx = OutputSection.getIndex();
|
2018-04-20 20:03:31 -07:00
|
|
|
NewSymbol.st_value = BD->getOutputAddress();
|
2020-02-26 20:43:18 -08:00
|
|
|
} else {
|
|
|
|
|
// Otherwise just update the section for the symbol.
|
|
|
|
|
if (Symbol.st_shndx < ELF::SHN_LORESERVE) {
|
|
|
|
|
NewSymbol.st_shndx = NewSectionIndex[Symbol.st_shndx];
|
|
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
2018-04-20 20:03:31 -07:00
|
|
|
|
2017-06-27 16:25:59 -07:00
|
|
|
// Detect local syms in the text section that we didn't update
|
2020-02-26 20:43:18 -08:00
|
|
|
// and that were preserved by the linker to support relocations against
|
|
|
|
|
// .text. Remove them from the symtab.
|
|
|
|
|
if (Symbol.getType() == ELF::STT_NOTYPE &&
|
|
|
|
|
Symbol.getBinding() == ELF::STB_LOCAL &&
|
|
|
|
|
Symbol.st_size == 0) {
|
|
|
|
|
if (BC->getBinaryFunctionContainingAddress(Symbol.st_value,
|
|
|
|
|
/*CheckPastEnd=*/false,
|
|
|
|
|
/*UseMaxSize=*/true,
|
|
|
|
|
/*Shallow=*/true)) {
|
|
|
|
|
// Can only delete the symbol if not patching. Such symbols should
|
|
|
|
|
// not exist in the dynamic symbol table.
|
|
|
|
|
assert(!PatchExisting && "cannot delete symbol");
|
2019-04-26 19:52:36 -07:00
|
|
|
continue;
|
2020-02-26 20:43:18 -08:00
|
|
|
}
|
2017-06-16 20:04:43 -07:00
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
2020-02-26 20:43:18 -08:00
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
// Handle special symbols based on their name.
|
|
|
|
|
auto SymbolName = Symbol.getName(StringSection);
|
|
|
|
|
assert(SymbolName && "cannot get symbol name");
|
2018-04-20 20:03:31 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
auto updateSymbolValue = [&](const StringRef Name, unsigned &IsUpdated) {
|
|
|
|
|
NewSymbol.st_value = getNewValueForSymbol(Name);
|
|
|
|
|
NewSymbol.st_shndx = ELF::SHN_ABS;
|
|
|
|
|
outs() << "BOLT-INFO: setting " << Name << " to 0x"
|
|
|
|
|
<< Twine::utohexstr(NewSymbol.st_value) << '\n';
|
|
|
|
|
++IsUpdated;
|
|
|
|
|
};
|
2018-04-20 20:03:31 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
if (opts::HotText && (*SymbolName == "__hot_start" ||
|
|
|
|
|
*SymbolName == "__hot_end"))
|
|
|
|
|
updateSymbolValue(*SymbolName, NumHotTextSymsUpdated);
|
2018-04-20 20:03:31 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
if (opts::HotData && (*SymbolName == "__hot_data_start" ||
|
|
|
|
|
*SymbolName == "__hot_data_end"))
|
|
|
|
|
updateSymbolValue(*SymbolName, NumHotDataSymsUpdated);
|
2018-04-20 20:03:31 -07:00
|
|
|
|
2020-06-18 11:10:41 -07:00
|
|
|
if (*SymbolName == "_end") {
|
|
|
|
|
unsigned Ignored;
|
|
|
|
|
updateSymbolValue(*SymbolName, Ignored);
|
2020-02-26 20:43:18 -08:00
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
if (PatchExisting) {
|
|
|
|
|
Write((&Symbol - cantFail(Obj->symbols(&SymTabSection)).begin()) *
|
|
|
|
|
sizeof(ELFSymTy),
|
|
|
|
|
NewSymbol);
|
|
|
|
|
} else {
|
|
|
|
|
Symbols.emplace_back(NewSymbol);
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
2020-02-26 20:43:18 -08:00
|
|
|
}
|
2017-10-10 18:06:45 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
if (PatchExisting) {
|
|
|
|
|
assert(Symbols.empty());
|
|
|
|
|
return;
|
|
|
|
|
}
|
2019-03-19 13:46:21 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
// Add symbols of injected functions
|
|
|
|
|
for (BinaryFunction *Function : BC->getInjectedBinaryFunctions()) {
|
|
|
|
|
ELFSymTy NewSymbol;
|
|
|
|
|
NewSymbol.st_shndx = Function->getCodeSection()->getIndex();
|
|
|
|
|
NewSymbol.st_value = Function->getOutputAddress();
|
|
|
|
|
NewSymbol.st_name = AddToStrTab(Function->getOneName());
|
|
|
|
|
NewSymbol.st_size = Function->getOutputSize();
|
|
|
|
|
NewSymbol.st_other = 0;
|
|
|
|
|
NewSymbol.setBindingAndType(ELF::STB_LOCAL, ELF::STT_FUNC);
|
|
|
|
|
Symbols.emplace_back(NewSymbol);
|
|
|
|
|
|
|
|
|
|
if (Function->isSplit()) {
|
|
|
|
|
auto NewColdSym = NewSymbol;
|
|
|
|
|
NewColdSym.setType(ELF::STT_NOTYPE);
|
|
|
|
|
SmallVector<char, 256> Buf;
|
|
|
|
|
NewColdSym.st_name = AddToStrTab(
|
|
|
|
|
Twine(Function->getPrintName()).concat(".cold.0").toStringRef(Buf));
|
|
|
|
|
NewColdSym.st_value = Function->cold().getAddress();
|
|
|
|
|
NewColdSym.st_size = Function->cold().getImageSize();
|
|
|
|
|
Symbols.emplace_back(NewColdSym);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
assert((!NumHotTextSymsUpdated || NumHotTextSymsUpdated == 2) &&
|
|
|
|
|
"either none or both __hot_start/__hot_end symbols were expected");
|
|
|
|
|
assert((!NumHotDataSymsUpdated || NumHotDataSymsUpdated == 2) &&
|
|
|
|
|
"either none or both __hot_data_start/__hot_data_end symbols were "
|
|
|
|
|
"expected");
|
|
|
|
|
|
|
|
|
|
auto addSymbol = [&](const std::string &Name) {
|
|
|
|
|
ELFSymTy Symbol;
|
|
|
|
|
Symbol.st_value = getNewValueForSymbol(Name);
|
|
|
|
|
Symbol.st_shndx = ELF::SHN_ABS;
|
|
|
|
|
Symbol.st_name = AddToStrTab(Name);
|
|
|
|
|
Symbol.st_size = 0;
|
|
|
|
|
Symbol.st_other = 0;
|
|
|
|
|
Symbol.setBindingAndType(ELF::STB_WEAK, ELF::STT_NOTYPE);
|
|
|
|
|
|
|
|
|
|
outs() << "BOLT-INFO: setting " << Name << " to 0x"
|
|
|
|
|
<< Twine::utohexstr(Symbol.st_value) << '\n';
|
|
|
|
|
|
|
|
|
|
Symbols.emplace_back(Symbol);
|
|
|
|
|
};
|
2018-04-20 20:03:31 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
if (opts::HotText && !NumHotTextSymsUpdated) {
|
|
|
|
|
addSymbol("__hot_start");
|
|
|
|
|
addSymbol("__hot_end");
|
|
|
|
|
}
|
2018-04-20 20:03:31 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
if (opts::HotData && !NumHotDataSymsUpdated) {
|
|
|
|
|
addSymbol("__hot_data_start");
|
|
|
|
|
addSymbol("__hot_data_end");
|
|
|
|
|
}
|
2017-10-10 18:06:45 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
// Put local symbols at the beginning.
|
|
|
|
|
std::stable_sort(Symbols.begin(), Symbols.end(),
|
|
|
|
|
[](const ELFSymTy &A, const ELFSymTy &B) {
|
|
|
|
|
if (A.getBinding() == ELF::STB_LOCAL &&
|
|
|
|
|
B.getBinding() != ELF::STB_LOCAL)
|
|
|
|
|
return true;
|
|
|
|
|
return false;
|
|
|
|
|
});
|
2018-04-20 20:03:31 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
for (const auto &Symbol : Symbols) {
|
|
|
|
|
Write(0, Symbol);
|
|
|
|
|
}
|
|
|
|
|
}
|
2019-03-19 13:46:21 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
template <typename ELFT>
|
|
|
|
|
void RewriteInstance::patchELFSymTabs(ELFObjectFile<ELFT> *File) {
|
|
|
|
|
auto *Obj = File->getELFFile();
|
|
|
|
|
using ELFShdrTy = typename ELFObjectFile<ELFT>::Elf_Shdr;
|
|
|
|
|
using ELFSymTy = typename ELFObjectFile<ELFT>::Elf_Sym;
|
2019-03-19 13:46:21 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
// Compute a preview of how section indices will change after rewriting, so
|
|
|
|
|
// we can properly update the symbol table based on new section indices.
|
|
|
|
|
std::vector<uint32_t> NewSectionIndex;
|
|
|
|
|
getOutputSections(File, NewSectionIndex);
|
|
|
|
|
|
|
|
|
|
// Set pointer at the end of the output file, so we can pwrite old symbol
|
|
|
|
|
// tables if we need to.
|
|
|
|
|
uint64_t NextAvailableOffset = getFileOffsetForAddress(NextAvailableAddress);
|
|
|
|
|
assert(NextAvailableOffset >= FirstNonAllocatableOffset &&
|
|
|
|
|
"next available offset calculation failure");
|
|
|
|
|
Out->os().seek(NextAvailableOffset);
|
2016-09-27 19:09:38 -07:00
|
|
|
|
|
|
|
|
// Update dynamic symbol table.
|
2020-02-26 20:43:18 -08:00
|
|
|
const ELFShdrTy *DynSymSection = nullptr;
|
2018-10-22 18:48:12 -07:00
|
|
|
for (const auto &Section : cantFail(Obj->sections())) {
|
2016-09-27 19:09:38 -07:00
|
|
|
if (Section.sh_type == ELF::SHT_DYNSYM) {
|
|
|
|
|
DynSymSection = &Section;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
2020-06-26 16:52:07 -07:00
|
|
|
assert((DynSymSection || BC->IsStaticExecutable) &&
|
|
|
|
|
"dynamic symbol table expected");
|
|
|
|
|
if (DynSymSection) {
|
|
|
|
|
updateELFSymbolTable(
|
|
|
|
|
File,
|
|
|
|
|
/*PatchExisting=*/true,
|
|
|
|
|
*DynSymSection,
|
|
|
|
|
NewSectionIndex,
|
|
|
|
|
[&](size_t Offset, const ELFSymTy &Sym) {
|
|
|
|
|
Out->os().pwrite(reinterpret_cast<const char *>(&Sym),
|
|
|
|
|
sizeof(ELFSymTy),
|
|
|
|
|
DynSymSection->sh_offset + Offset);
|
|
|
|
|
},
|
|
|
|
|
[](StringRef) -> size_t { return 0; });
|
|
|
|
|
}
|
2017-06-27 16:25:59 -07:00
|
|
|
|
|
|
|
|
// (re)create regular symbol table.
|
2020-02-26 20:43:18 -08:00
|
|
|
const ELFShdrTy *SymTabSection = nullptr;
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
for (const auto &Section : cantFail(Obj->sections())) {
|
2016-09-27 19:09:38 -07:00
|
|
|
if (Section.sh_type == ELF::SHT_SYMTAB) {
|
|
|
|
|
SymTabSection = &Section;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (!SymTabSection) {
|
|
|
|
|
errs() << "BOLT-WARNING: no symbol table found\n";
|
|
|
|
|
return;
|
|
|
|
|
}
|
2017-06-27 16:25:59 -07:00
|
|
|
|
2020-02-26 20:43:18 -08:00
|
|
|
const ELFShdrTy *StrTabSection =
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
cantFail(Obj->getSection(SymTabSection->sh_link));
|
2017-06-27 16:25:59 -07:00
|
|
|
std::string NewContents;
|
|
|
|
|
std::string NewStrTab =
|
|
|
|
|
File->getData().substr(StrTabSection->sh_offset, StrTabSection->sh_size);
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
auto SecName = cantFail(Obj->getSectionName(SymTabSection));
|
|
|
|
|
auto StrSecName = cantFail(Obj->getSectionName(StrTabSection));
|
2017-06-27 16:25:59 -07:00
|
|
|
|
2018-10-22 18:48:12 -07:00
|
|
|
NumLocalSymbols = 0;
|
2020-02-26 20:43:18 -08:00
|
|
|
updateELFSymbolTable(
|
|
|
|
|
File,
|
|
|
|
|
/*PatchExisting=*/false,
|
|
|
|
|
*SymTabSection,
|
|
|
|
|
NewSectionIndex,
|
|
|
|
|
[&](size_t Offset, const ELFSymTy &Sym) {
|
|
|
|
|
if (Sym.getBinding() == ELF::STB_LOCAL)
|
|
|
|
|
++NumLocalSymbols;
|
|
|
|
|
NewContents.append(reinterpret_cast<const char *>(&Sym),
|
|
|
|
|
sizeof(ELFSymTy));
|
|
|
|
|
},
|
|
|
|
|
[&](StringRef Str) {
|
|
|
|
|
size_t Idx = NewStrTab.size();
|
|
|
|
|
NewStrTab.append(Str.data(), Str.size());
|
|
|
|
|
NewStrTab.append(1, '\0');
|
|
|
|
|
return Idx;
|
|
|
|
|
});
|
2017-06-27 16:25:59 -07:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
BC->registerOrUpdateNoteSection(SecName,
|
|
|
|
|
copyByteArray(NewContents),
|
|
|
|
|
NewContents.size(),
|
|
|
|
|
/*Alignment=*/1,
|
|
|
|
|
/*IsReadOnly=*/true,
|
|
|
|
|
ELF::SHT_SYMTAB);
|
|
|
|
|
|
|
|
|
|
BC->registerOrUpdateNoteSection(StrSecName,
|
|
|
|
|
copyByteArray(NewStrTab),
|
|
|
|
|
NewStrTab.size(),
|
|
|
|
|
/*Alignment=*/1,
|
|
|
|
|
/*IsReadOnly=*/true,
|
|
|
|
|
ELF::SHT_STRTAB);
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <typename ELFT>
|
2018-08-16 16:53:14 -07:00
|
|
|
void
|
|
|
|
|
RewriteInstance::patchELFAllocatableRelaSections(ELFObjectFile<ELFT> *File) {
|
2016-09-27 19:09:38 -07:00
|
|
|
auto &OS = Out->os();
|
|
|
|
|
|
2018-08-16 16:53:14 -07:00
|
|
|
for (auto &RelaSection : BC->allocatableRelaSections()) {
|
|
|
|
|
for (const auto &Rel : RelaSection.getSectionRef().relocations()) {
|
|
|
|
|
if (Rel.getType() == ELF::R_X86_64_IRELATIVE ||
|
|
|
|
|
Rel.getType() == ELF::R_X86_64_RELATIVE) {
|
|
|
|
|
DataRefImpl DRI = Rel.getRawDataRefImpl();
|
|
|
|
|
const auto *RelA = File->getRela(DRI);
|
|
|
|
|
auto Address = RelA->r_addend;
|
|
|
|
|
auto NewAddress = getNewFunctionAddress(Address);
|
|
|
|
|
if (!NewAddress)
|
|
|
|
|
continue;
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: patching (I)RELATIVE "
|
|
|
|
|
<< RelaSection.getName() << " entry 0x"
|
|
|
|
|
<< Twine::utohexstr(Address) << " with 0x"
|
|
|
|
|
<< Twine::utohexstr(NewAddress) << '\n');
|
|
|
|
|
auto NewRelA = *RelA;
|
|
|
|
|
NewRelA.r_addend = NewAddress;
|
|
|
|
|
OS.pwrite(reinterpret_cast<const char *>(&NewRelA), sizeof(NewRelA),
|
|
|
|
|
reinterpret_cast<const char *>(RelA) - File->getData().data());
|
|
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <typename ELFT>
|
|
|
|
|
void RewriteInstance::patchELFGOT(ELFObjectFile<ELFT> *File) {
|
|
|
|
|
auto &OS = Out->os();
|
|
|
|
|
|
|
|
|
|
SectionRef GOTSection;
|
|
|
|
|
for (const auto &Section : File->sections()) {
|
|
|
|
|
StringRef SectionName;
|
|
|
|
|
Section.getName(SectionName);
|
|
|
|
|
if (SectionName == ".got") {
|
|
|
|
|
GOTSection = Section;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (!GOTSection.getObject()) {
|
|
|
|
|
errs() << "BOLT-INFO: no .got section found\n";
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
StringRef GOTContents;
|
|
|
|
|
GOTSection.getContents(GOTContents);
|
|
|
|
|
for (const uint64_t *GOTEntry =
|
|
|
|
|
reinterpret_cast<const uint64_t *>(GOTContents.data());
|
|
|
|
|
GOTEntry < reinterpret_cast<const uint64_t *>(GOTContents.data() +
|
|
|
|
|
GOTContents.size());
|
|
|
|
|
++GOTEntry) {
|
|
|
|
|
if (auto NewAddress = getNewFunctionAddress(*GOTEntry)) {
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: patching GOT entry 0x"
|
|
|
|
|
<< Twine::utohexstr(*GOTEntry) << " with 0x"
|
|
|
|
|
<< Twine::utohexstr(NewAddress) << '\n');
|
|
|
|
|
OS.pwrite(reinterpret_cast<const char *>(&NewAddress), sizeof(NewAddress),
|
|
|
|
|
reinterpret_cast<const char *>(GOTEntry) - File->getData().data());
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <typename ELFT>
|
|
|
|
|
void RewriteInstance::patchELFDynamic(ELFObjectFile<ELFT> *File) {
|
2020-06-26 16:52:07 -07:00
|
|
|
if (BC->IsStaticExecutable)
|
|
|
|
|
return;
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
auto *Obj = File->getELFFile();
|
|
|
|
|
auto &OS = Out->os();
|
|
|
|
|
|
|
|
|
|
using Elf_Phdr = typename ELFFile<ELFT>::Elf_Phdr;
|
|
|
|
|
using Elf_Dyn = typename ELFFile<ELFT>::Elf_Dyn;
|
|
|
|
|
|
|
|
|
|
// Locate DYNAMIC by looking through program headers.
|
|
|
|
|
uint64_t DynamicOffset = 0;
|
|
|
|
|
const Elf_Phdr *DynamicPhdr = 0;
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
for (auto &Phdr : cantFail(Obj->program_headers())) {
|
2016-09-27 19:09:38 -07:00
|
|
|
if (Phdr.p_type == ELF::PT_DYNAMIC) {
|
|
|
|
|
DynamicOffset = Phdr.p_offset;
|
|
|
|
|
DynamicPhdr = &Phdr;
|
|
|
|
|
assert(Phdr.p_memsz == Phdr.p_filesz && "dynamic sizes should match");
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
assert(DynamicPhdr && "missing dynamic in ELF binary");
|
|
|
|
|
|
2017-08-04 11:21:05 -07:00
|
|
|
bool ZNowSet = false;
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
// Go through all dynamic entries and patch functions addresses with
|
|
|
|
|
// new ones.
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
const Elf_Dyn *DTB = cantFail(Obj->dynamic_table_begin(DynamicPhdr),
|
|
|
|
|
"error accessing dynamic table");
|
|
|
|
|
const Elf_Dyn *DTE = cantFail(Obj->dynamic_table_end(DynamicPhdr),
|
|
|
|
|
"error accessing dynamic table");
|
|
|
|
|
for (auto *DE = DTB; DE != DTE; ++DE) {
|
2016-09-27 19:09:38 -07:00
|
|
|
auto NewDE = *DE;
|
|
|
|
|
bool ShouldPatch = true;
|
|
|
|
|
switch (DE->getTag()) {
|
|
|
|
|
default:
|
|
|
|
|
ShouldPatch = false;
|
|
|
|
|
break;
|
|
|
|
|
case ELF::DT_INIT:
|
|
|
|
|
case ELF::DT_FINI:
|
2017-12-09 21:40:39 -08:00
|
|
|
if (BC->HasRelocations) {
|
2017-08-04 11:21:05 -07:00
|
|
|
if (auto NewAddress = getNewFunctionAddress(DE->getPtr())) {
|
|
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: patching dynamic entry of type "
|
|
|
|
|
<< DE->getTag() << '\n');
|
|
|
|
|
NewDE.d_un.d_ptr = NewAddress;
|
|
|
|
|
}
|
|
|
|
|
}
|
Refactor runtime library
Summary:
As we are adding more types of runtime libraries, it would be better to move the runtime library out of RewriteInstance so that it could grow separately. This also requires splitting the current implementation of Instrumentation.cpp to two separate pieces, one as normal Pass, one as the runtime library. The Instrumentation Pass would pass over the generated data to the runtime library, which will use to emit binary and perform linking.
This patch does the following:
1. Turn Instrumentation class into an optimization pass. Register the pass in the pass manager instead of in RewriteInstance.
2. Split all the data that are generated by Instrumentation that's needed by runtime library into a separate data structure called InstrumentationSummary. At the creation of Instrumentation pass, we create an instance of such data structure, which will be moved over to the runtime at the end of the pass.
3. Added a runtime library member to BinaryContext. Set the member at the end of Instrumentation pass.
4. In BinaryEmitter, make BinaryContext to also emit runtime library binary.
5. Created a base class RuntimeLibrary, that defines the interface of a runtime library, along with a few common helper functions.
6. Created InstrumentationRuntimeLibrary which inherits from RuntimeLibrary, that does all the work (mostly copied over) for emit and linking.
7. Added a new directory called RuntimeLibs, and put all the runtime library related files into it.
(cherry picked from FBD21694762)
2020-05-21 14:28:47 -07:00
|
|
|
if (DE->getTag() == ELF::DT_FINI) {
|
|
|
|
|
if (auto *RtLibrary = BC->getRuntimeLibrary()) {
|
Adding automatic huge page support
Summary:
This patch enables automated hugify for Bolt.
When running Bolt against a binary with -hugify specified, Bolt will inject a call to a runtime library function at the entry of the binary. The runtime library calls madvise to map the hot code region into a 2M huge page. We support both new kernel with THP support and old kernels. For kernels with THP support we simply make a madvise call, while for old kernels, we first copy the code out, remap the memory with huge page, and then copy the code back.
With this change, we no longer need to manually call into hugify_self and precompile it with --hot-text. Instead, we could simply combine --hugify option with existing optimizations, and at runtime it will automatically move hot code into 2M pages.
Some details around the changes made:
1. Add an command line option to support --hugify. --hugify will automatically turn on --hot-text to get the proper hot code symbols. However, running with both --hugify and --hot-text is not allowed, since --hot-text is used on binaries that has precompiled call to hugify_self, which contradicts with the purpose of --hugify.
2. Moved the common utility functions out of instr.cpp to common.h, which will also be used by hugify.cpp. Added a few new system calls definitions.
3. Added a new class that inherits RuntimeLibrary, and implemented the necessary emit and link logic for hugify.
4. Added a simple test for hugify.
(cherry picked from FBD21384529)
2020-05-02 11:14:38 -07:00
|
|
|
if (auto Addr = RtLibrary->getRuntimeFiniAddress()) {
|
|
|
|
|
NewDE.d_un.d_ptr = Addr;
|
|
|
|
|
}
|
Refactor runtime library
Summary:
As we are adding more types of runtime libraries, it would be better to move the runtime library out of RewriteInstance so that it could grow separately. This also requires splitting the current implementation of Instrumentation.cpp to two separate pieces, one as normal Pass, one as the runtime library. The Instrumentation Pass would pass over the generated data to the runtime library, which will use to emit binary and perform linking.
This patch does the following:
1. Turn Instrumentation class into an optimization pass. Register the pass in the pass manager instead of in RewriteInstance.
2. Split all the data that are generated by Instrumentation that's needed by runtime library into a separate data structure called InstrumentationSummary. At the creation of Instrumentation pass, we create an instance of such data structure, which will be moved over to the runtime at the end of the pass.
3. Added a runtime library member to BinaryContext. Set the member at the end of Instrumentation pass.
4. In BinaryEmitter, make BinaryContext to also emit runtime library binary.
5. Created a base class RuntimeLibrary, that defines the interface of a runtime library, along with a few common helper functions.
6. Created InstrumentationRuntimeLibrary which inherits from RuntimeLibrary, that does all the work (mostly copied over) for emit and linking.
7. Added a new directory called RuntimeLibs, and put all the runtime library related files into it.
(cherry picked from FBD21694762)
2020-05-21 14:28:47 -07:00
|
|
|
}
|
|
|
|
|
}
|
2017-08-04 11:21:05 -07:00
|
|
|
break;
|
|
|
|
|
case ELF::DT_FLAGS:
|
|
|
|
|
if (BC->RequiresZNow) {
|
|
|
|
|
NewDE.d_un.d_val |= ELF::DF_BIND_NOW;
|
|
|
|
|
ZNowSet = true;
|
|
|
|
|
}
|
|
|
|
|
break;
|
|
|
|
|
case ELF::DT_FLAGS_1:
|
|
|
|
|
if (BC->RequiresZNow) {
|
|
|
|
|
NewDE.d_un.d_val |= ELF::DF_1_NOW;
|
|
|
|
|
ZNowSet = true;
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
if (ShouldPatch) {
|
|
|
|
|
OS.pwrite(reinterpret_cast<const char *>(&NewDE), sizeof(NewDE),
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
DynamicOffset + (DE - DTB) * sizeof(*DE));
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
|
|
|
|
}
|
2017-08-04 11:21:05 -07:00
|
|
|
|
|
|
|
|
if (BC->RequiresZNow && !ZNowSet) {
|
|
|
|
|
errs() << "BOLT-ERROR: output binary requires immediate relocation "
|
|
|
|
|
"processing which depends on DT_FLAGS or DT_FLAGS_1 presence in "
|
|
|
|
|
".dynamic. Please re-link the binary with -znow.\n";
|
|
|
|
|
exit(1);
|
|
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
|
|
|
|
|
2019-12-13 17:27:03 -08:00
|
|
|
template <typename ELFT>
|
|
|
|
|
void RewriteInstance::readELFDynamic(ELFObjectFile<ELFT> *File) {
|
|
|
|
|
auto *Obj = File->getELFFile();
|
|
|
|
|
|
|
|
|
|
using Elf_Phdr = typename ELFFile<ELFT>::Elf_Phdr;
|
|
|
|
|
using Elf_Dyn = typename ELFFile<ELFT>::Elf_Dyn;
|
|
|
|
|
|
|
|
|
|
// Locate DYNAMIC by looking through program headers.
|
|
|
|
|
const Elf_Phdr *DynamicPhdr = 0;
|
|
|
|
|
for (auto &Phdr : cantFail(Obj->program_headers())) {
|
|
|
|
|
if (Phdr.p_type == ELF::PT_DYNAMIC) {
|
|
|
|
|
DynamicPhdr = &Phdr;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2020-06-26 16:52:07 -07:00
|
|
|
if (!DynamicPhdr) {
|
|
|
|
|
outs() << "BOLT-INFO: static input executable detected\n";
|
|
|
|
|
BC->IsStaticExecutable = true;
|
|
|
|
|
return;
|
2020-03-08 19:04:39 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
assert(DynamicPhdr->p_memsz == DynamicPhdr->p_filesz &&
|
|
|
|
|
"dynamic section sizes should match");
|
|
|
|
|
|
|
|
|
|
// Go through all dynamic entries to locate entries of interest.
|
2019-12-13 17:27:03 -08:00
|
|
|
const Elf_Dyn *DTB = cantFail(Obj->dynamic_table_begin(DynamicPhdr),
|
|
|
|
|
"error accessing dynamic table");
|
|
|
|
|
const Elf_Dyn *DTE = cantFail(Obj->dynamic_table_end(DynamicPhdr),
|
|
|
|
|
"error accessing dynamic table");
|
|
|
|
|
for (auto *DE = DTB; DE != DTE; ++DE) {
|
2020-06-23 12:22:58 -07:00
|
|
|
switch (DE->getTag()) {
|
|
|
|
|
case ELF::DT_FINI:
|
|
|
|
|
BC->FiniFunctionAddress = DE->getPtr();
|
|
|
|
|
break;
|
|
|
|
|
case ELF::DT_RELA:
|
|
|
|
|
BC->DynamicRelocationsAddress = DE->getPtr();
|
|
|
|
|
break;
|
|
|
|
|
case ELF::DT_RELASZ:
|
|
|
|
|
BC->DynamicRelocationsSize = DE->getVal();
|
|
|
|
|
break;
|
|
|
|
|
}
|
2019-12-13 17:27:03 -08:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
uint64_t RewriteInstance::getNewFunctionAddress(uint64_t OldAddress) {
|
[BOLT] Basic support for split functions
Summary:
This adds very basic and limited support for split functions.
In non-relocation mode, split functions are ignored, while their debug
info is properly updated. No support in the relocation mode yet.
Split functions consist of a main body and one or more fragments.
For fragments, the main part is called their parent. Any fragment
could only be entered via its parent or another fragment.
The short-term goal is to correctly update debug information for split
functions, while the long-term goal is to have a complete support
including full optimization. Note that if we don't detect split
bodies, we would have to add multiple entry points via tail calls,
which we would rather avoid.
Parent functions and fragments are represented by a `BinaryFunction`
and are marked accordingly. For now they are marked as non-simple, and
thus only supported in non-relocation mode. Once we start building a
CFG, it should be a common graph (i.e. the one that includes all
fragments) in the parent function.
The function discovery is unchanged, except for the detection of
`\.cold\.` pattern in the function name, which automatically marks the
function as a fragment of another function.
Because of the local function name ambiguity, we cannot rely on the
function name to establish child fragment and parent relationship.
Instead we rely on disassembly processing.
`BinaryContext::getBinaryFunctionContainingAddress()` now returns a
parent function if an address from its fragment is passed.
There's no jump table support at the moment. Jump tables can have
source and destinations in both fragment and parent.
Parent functions that enter their fragments via C++ exception handling
mechanism are not yet supported.
(cherry picked from FBD14970569)
2019-04-16 10:24:34 -07:00
|
|
|
const auto *Function = BC->getBinaryFunctionAtAddress(OldAddress,
|
|
|
|
|
/*Shallow=*/true);
|
2016-09-27 19:09:38 -07:00
|
|
|
if (!Function)
|
|
|
|
|
return 0;
|
2017-05-08 22:51:36 -07:00
|
|
|
return Function->getOutputAddress();
|
2016-02-08 10:02:48 -08:00
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
|
|
|
|
|
void RewriteInstance::rewriteFile() {
|
2020-05-07 23:00:29 -07:00
|
|
|
std::error_code EC;
|
|
|
|
|
Out = llvm::make_unique<ToolOutputFile>(opts::OutputFilename, EC,
|
|
|
|
|
sys::fs::F_None, 0777);
|
|
|
|
|
check_error(EC, "cannot create output executable file");
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
auto &OS = Out->os();
|
|
|
|
|
|
2020-05-07 23:00:29 -07:00
|
|
|
// Copy allocatable part of the input.
|
|
|
|
|
OS << InputFile->getData().substr(0, FirstNonAllocatableOffset);
|
|
|
|
|
|
2016-02-08 10:02:48 -08:00
|
|
|
// We obtain an asm-specific writer so that we can emit nops in an
|
|
|
|
|
// architecture-specific way at the end of the function.
|
2015-11-23 17:54:18 -08:00
|
|
|
auto MCE = BC->TheTarget->createMCCodeEmitter(*BC->MII, *BC->MRI, *BC->Ctx);
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
auto MAB =
|
|
|
|
|
BC->TheTarget->createMCAsmBackend(*BC->STI, *BC->MRI, MCTargetOptions());
|
|
|
|
|
std::unique_ptr<MCStreamer> Streamer(BC->TheTarget->createMCObjectStreamer(
|
|
|
|
|
*BC->TheTriple, *BC->Ctx, std::unique_ptr<MCAsmBackend>(MAB), OS,
|
|
|
|
|
std::unique_ptr<MCCodeEmitter>(MCE), *BC->STI,
|
|
|
|
|
/* RelaxAll */ false,
|
|
|
|
|
/*IncrementalLinkerCompatible */ false,
|
|
|
|
|
/* DWARFMustBeAtTheEnd */ false));
|
2016-03-11 11:30:30 -08:00
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
auto &Writer = static_cast<MCObjectStreamer *>(Streamer.get())
|
|
|
|
|
->getAssembler()
|
|
|
|
|
.getWriter();
|
|
|
|
|
|
2016-02-12 19:01:53 -08:00
|
|
|
// Make sure output stream has enough reserved space, otherwise
|
|
|
|
|
// pwrite() will fail.
|
2017-01-17 15:49:59 -08:00
|
|
|
auto Offset = OS.seek(getFileOffsetForAddress(NextAvailableAddress));
|
2017-05-25 10:29:38 -07:00
|
|
|
(void)Offset;
|
2017-01-17 15:49:59 -08:00
|
|
|
assert(Offset == getFileOffsetForAddress(NextAvailableAddress) &&
|
2016-02-08 10:02:48 -08:00
|
|
|
"error resizing output file");
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2017-12-09 21:40:39 -08:00
|
|
|
if (!BC->HasRelocations) {
|
2016-09-27 19:09:38 -07:00
|
|
|
// Overwrite functions in the output file.
|
|
|
|
|
uint64_t CountOverwrittenFunctions = 0;
|
|
|
|
|
uint64_t OverwrittenScore = 0;
|
2019-04-03 15:52:01 -07:00
|
|
|
for (auto &BFI : BC->getBinaryFunctions()) {
|
2016-09-27 19:09:38 -07:00
|
|
|
auto &Function = BFI.second;
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
if (Function.getImageAddress() == 0 || Function.getImageSize() == 0)
|
|
|
|
|
continue;
|
2016-04-05 19:35:45 -07:00
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
if (Function.getImageSize() > Function.getMaxSize()) {
|
|
|
|
|
if (opts::Verbosity >= 1) {
|
|
|
|
|
errs() << "BOLT-WARNING: new function size (0x"
|
|
|
|
|
<< Twine::utohexstr(Function.getImageSize())
|
|
|
|
|
<< ") is larger than maximum allowed size (0x"
|
|
|
|
|
<< Twine::utohexstr(Function.getMaxSize())
|
|
|
|
|
<< ") for function " << Function << '\n';
|
|
|
|
|
}
|
|
|
|
|
FailedAddresses.emplace_back(Function.getAddress());
|
|
|
|
|
continue;
|
2016-09-02 14:15:29 -07:00
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
if (Function.isSplit() && (Function.cold().getImageAddress() == 0 ||
|
|
|
|
|
Function.cold().getImageSize() == 0))
|
|
|
|
|
continue;
|
2016-09-08 14:52:26 -07:00
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
OverwrittenScore += Function.getFunctionScore();
|
|
|
|
|
// Overwrite function in the output file.
|
|
|
|
|
if (opts::Verbosity >= 2) {
|
2018-06-14 14:27:20 -07:00
|
|
|
outs() << "BOLT: rewriting function \"" << Function << "\"\n";
|
2016-09-27 19:09:38 -07:00
|
|
|
}
|
2017-01-17 15:49:59 -08:00
|
|
|
OS.pwrite(reinterpret_cast<char *>(Function.getImageAddress()),
|
2017-05-08 22:51:36 -07:00
|
|
|
Function.getImageSize(),
|
|
|
|
|
Function.getFileOffset());
|
2016-09-27 19:09:38 -07:00
|
|
|
|
|
|
|
|
// Write nops at the end of the function.
|
2017-01-17 15:49:59 -08:00
|
|
|
auto Pos = OS.tell();
|
|
|
|
|
OS.seek(Function.getFileOffset() + Function.getImageSize());
|
2016-09-27 19:09:38 -07:00
|
|
|
MAB->writeNopData(Function.getMaxSize() - Function.getImageSize(),
|
|
|
|
|
&Writer);
|
2017-01-17 15:49:59 -08:00
|
|
|
OS.seek(Pos);
|
|
|
|
|
|
|
|
|
|
// Write jump tables if updating in-place.
|
|
|
|
|
if (opts::JumpTables == JTS_BASIC) {
|
|
|
|
|
for (auto &JTI : Function.JumpTables) {
|
2017-11-14 20:05:11 -08:00
|
|
|
auto *JT = JTI.second;
|
2018-04-20 20:03:31 -07:00
|
|
|
auto &Section = JT->getOutputSection();
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
Section.setOutputFileOffset(
|
|
|
|
|
getFileOffsetForAddress(JT->getAddress()));
|
|
|
|
|
assert(Section.getOutputFileOffset() && "no matching offset in file");
|
2017-11-14 20:05:11 -08:00
|
|
|
OS.pwrite(reinterpret_cast<const char*>(Section.getOutputData()),
|
|
|
|
|
Section.getOutputSize(),
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
Section.getOutputFileOffset());
|
2017-01-17 15:49:59 -08:00
|
|
|
}
|
|
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
|
|
|
|
|
if (!Function.isSplit()) {
|
|
|
|
|
++CountOverwrittenFunctions;
|
|
|
|
|
if (opts::MaxFunctions &&
|
|
|
|
|
CountOverwrittenFunctions == opts::MaxFunctions) {
|
2018-06-14 14:27:20 -07:00
|
|
|
outs() << "BOLT: maximum number of functions reached\n";
|
2016-09-27 19:09:38 -07:00
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
// Write cold part
|
|
|
|
|
if (opts::Verbosity >= 2) {
|
2018-06-14 14:27:20 -07:00
|
|
|
outs() << "BOLT: rewriting function \"" << Function
|
2016-09-27 19:09:38 -07:00
|
|
|
<< "\" (cold part)\n";
|
|
|
|
|
}
|
2017-05-08 22:51:36 -07:00
|
|
|
OS.pwrite(reinterpret_cast<char*>(Function.cold().getImageAddress()),
|
|
|
|
|
Function.cold().getImageSize(),
|
|
|
|
|
Function.cold().getFileOffset());
|
2016-09-27 19:09:38 -07:00
|
|
|
|
|
|
|
|
// FIXME: write nops after cold part too.
|
2015-11-23 17:54:18 -08:00
|
|
|
|
|
|
|
|
++CountOverwrittenFunctions;
|
|
|
|
|
if (opts::MaxFunctions &&
|
|
|
|
|
CountOverwrittenFunctions == opts::MaxFunctions) {
|
2018-06-14 14:27:20 -07:00
|
|
|
outs() << "BOLT: maximum number of functions reached\n";
|
2015-11-23 17:54:18 -08:00
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-09-27 19:09:38 -07:00
|
|
|
// Print function statistics.
|
|
|
|
|
outs() << "BOLT: " << CountOverwrittenFunctions
|
2019-04-03 15:52:01 -07:00
|
|
|
<< " out of " << BC->getBinaryFunctions().size()
|
2016-09-27 19:09:38 -07:00
|
|
|
<< " functions were overwritten.\n";
|
2017-11-28 09:57:21 -08:00
|
|
|
if (BC->TotalScore != 0) {
|
|
|
|
|
double Coverage = OverwrittenScore / (double) BC->TotalScore * 100.0;
|
2019-09-03 22:24:06 -07:00
|
|
|
outs() << format("BOLT-INFO: rewritten functions cover %.2lf", Coverage)
|
2016-09-27 19:09:38 -07:00
|
|
|
<< "% of the execution count of simple functions of "
|
2019-09-03 22:24:06 -07:00
|
|
|
"this binary\n";
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
|
|
|
|
}
|
2015-12-18 17:00:46 -08:00
|
|
|
|
2017-12-09 21:40:39 -08:00
|
|
|
if (BC->HasRelocations && opts::TrapOldCode) {
|
2017-01-17 15:49:59 -08:00
|
|
|
auto SavedPos = OS.tell();
|
2016-09-27 19:09:38 -07:00
|
|
|
// Overwrite function body to make sure we never execute these instructions.
|
2019-04-03 15:52:01 -07:00
|
|
|
for (auto &BFI : BC->getBinaryFunctions()) {
|
2016-09-27 19:09:38 -07:00
|
|
|
auto &BF = BFI.second;
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
if (!BF.getFileOffset() || !BF.isEmitted())
|
2016-09-27 19:09:38 -07:00
|
|
|
continue;
|
2017-01-17 15:49:59 -08:00
|
|
|
OS.seek(BF.getFileOffset());
|
2016-09-27 19:09:38 -07:00
|
|
|
for (unsigned I = 0; I < BF.getMaxSize(); ++I)
|
2017-01-17 15:49:59 -08:00
|
|
|
OS.write((unsigned char)
|
2016-09-27 19:09:38 -07:00
|
|
|
Streamer->getContext().getAsmInfo()->getTrapFillValue());
|
|
|
|
|
}
|
2017-01-17 15:49:59 -08:00
|
|
|
OS.seek(SavedPos);
|
2016-03-03 10:13:11 -08:00
|
|
|
}
|
2015-12-18 17:00:46 -08:00
|
|
|
|
2017-01-17 15:49:59 -08:00
|
|
|
// Write all non-local sections, i.e. those not emitted with the function.
|
2017-11-14 20:05:11 -08:00
|
|
|
for (auto &Section : BC->allocatableSections()) {
|
2020-02-18 09:20:17 -08:00
|
|
|
if (!Section.isFinalized() || !Section.getOutputData())
|
2016-02-08 10:02:48 -08:00
|
|
|
continue;
|
2020-02-18 09:20:17 -08:00
|
|
|
|
2016-09-02 14:15:29 -07:00
|
|
|
if (opts::Verbosity >= 1) {
|
2018-06-14 14:27:20 -07:00
|
|
|
outs() << "BOLT: writing new section " << Section.getName()
|
[BOLT][Refactoring] Isolate changes to MC layer
Summary:
Changes that we made to MCInst, MCOperand, MCExpr, etc. are now all
moved into tools/llvm-bolt. That required a change to the way we handle
annotations and any extra operands for MCInst.
Any MCPlus information is now attached via an extra operand of type
MCInst with an opcode ANNOTATION_LABEL. Since this operand is MCInst, we
attach extra info as operands to this instruction. For first-level
annotations use functions to access the information, such as
getConditionalTailCall() or getEHInfo(), etc. For the rest, optional or
second-class annotations, use a general named-annotation interface such
as getAnnotationAs<uint64_t>(Inst, "Count").
I did a test on HHVM binary, and a memory consumption went down a little
bit while the runtime remained the same.
(cherry picked from FBD7405412)
2018-03-19 18:32:12 -07:00
|
|
|
<< "\n data at 0x" << Twine::utohexstr(Section.getAllocAddress())
|
|
|
|
|
<< "\n of size " << Section.getOutputSize()
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
<< "\n at offset " << Section.getOutputFileOffset() << '\n';
|
2016-09-02 14:15:29 -07:00
|
|
|
}
|
2018-02-01 16:33:43 -08:00
|
|
|
OS.pwrite(reinterpret_cast<const char*>(Section.getOutputData()),
|
|
|
|
|
Section.getOutputSize(),
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
Section.getOutputFileOffset());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
for (auto &Section : BC->allocatableSections()) {
|
|
|
|
|
Section.flushPendingRelocations(OS,
|
|
|
|
|
[this] (const MCSymbol *S) {
|
|
|
|
|
return getNewValueForSymbol(S->getName());
|
|
|
|
|
});
|
2016-02-08 10:02:48 -08:00
|
|
|
}
|
|
|
|
|
|
2016-11-11 14:33:34 -08:00
|
|
|
// If .eh_frame is present create .eh_frame_hdr.
|
2018-02-01 16:33:43 -08:00
|
|
|
if (EHFrameSection && EHFrameSection->isFinalized()) {
|
|
|
|
|
writeEHFrameHeader();
|
2015-12-18 17:00:46 -08:00
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2019-04-12 17:33:46 -07:00
|
|
|
// Add BOLT Addresses Translation maps to allow profile collection to
|
|
|
|
|
// happen in the output binary
|
|
|
|
|
if (opts::EnableBAT)
|
|
|
|
|
addBATSection();
|
|
|
|
|
|
2016-03-03 10:13:11 -08:00
|
|
|
// Patch program header table.
|
|
|
|
|
patchELFPHDRTable();
|
2016-02-08 10:02:48 -08:00
|
|
|
|
2017-05-16 17:29:31 -07:00
|
|
|
// Finalize memory image of section string table.
|
|
|
|
|
finalizeSectionStringTable();
|
|
|
|
|
|
2017-09-20 10:43:01 -07:00
|
|
|
// Update symbol tables.
|
|
|
|
|
patchELFSymTabs();
|
2017-06-27 16:25:59 -07:00
|
|
|
|
2018-08-08 17:55:24 -07:00
|
|
|
patchBuildID();
|
|
|
|
|
|
2019-04-12 17:33:46 -07:00
|
|
|
if (opts::EnableBAT)
|
|
|
|
|
encodeBATSection();
|
|
|
|
|
|
2016-03-03 10:13:11 -08:00
|
|
|
// Copy non-allocatable sections once allocatable part is finished.
|
|
|
|
|
rewriteNoteSections();
|
|
|
|
|
|
2017-08-04 11:21:05 -07:00
|
|
|
// Patch dynamic section/segment.
|
|
|
|
|
patchELFDynamic();
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2017-12-09 21:40:39 -08:00
|
|
|
if (BC->HasRelocations) {
|
2018-08-16 16:53:14 -07:00
|
|
|
patchELFAllocatableRelaSections();
|
2017-01-17 15:49:59 -08:00
|
|
|
patchELFGOT();
|
|
|
|
|
}
|
2016-09-27 19:09:38 -07:00
|
|
|
|
2016-03-03 10:13:11 -08:00
|
|
|
// Update ELF book-keeping info.
|
|
|
|
|
patchELFSectionHeaderTable();
|
2015-11-23 17:54:18 -08:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
if (opts::PrintSections) {
|
|
|
|
|
outs() << "BOLT-INFO: Sections after processing:\n";
|
|
|
|
|
BC->printSections(outs());
|
|
|
|
|
}
|
|
|
|
|
|
2015-11-23 17:54:18 -08:00
|
|
|
Out->keep();
|
2016-11-15 10:40:00 -08:00
|
|
|
|
|
|
|
|
// If requested, open again the binary we just wrote to dump its EH Frame
|
|
|
|
|
if (opts::DumpEHFrame) {
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
Expected<OwningBinary<Binary>> BinaryOrErr =
|
2016-11-15 10:40:00 -08:00
|
|
|
createBinary(opts::OutputFilename);
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
if (auto E = BinaryOrErr.takeError())
|
|
|
|
|
report_error(opts::OutputFilename, std::move(E));
|
2016-11-15 10:40:00 -08:00
|
|
|
Binary &Binary = *BinaryOrErr.get().getBinary();
|
|
|
|
|
|
|
|
|
|
if (auto *E = dyn_cast<ELFObjectFileBase>(&Binary)) {
|
[BOLT rebase] Rebase fixes on top of LLVM Feb2018
Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.
The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.
(cherry picked from FBD7078072)
2018-02-06 15:00:23 -08:00
|
|
|
auto DwCtx = DWARFContext::create(*E);
|
|
|
|
|
const auto &EHFrame = DwCtx->getEHFrame();
|
2016-11-15 10:40:00 -08:00
|
|
|
outs() << "BOLT-INFO: Dumping rewritten .eh_frame\n";
|
2018-03-30 15:49:34 -07:00
|
|
|
EHFrame->dump(outs(), &*BC->MRI, NoneType());
|
2016-11-15 10:40:00 -08:00
|
|
|
}
|
|
|
|
|
}
|
2015-11-23 17:54:18 -08:00
|
|
|
}
|
2016-03-02 18:40:10 -08:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
void RewriteInstance::writeEHFrameHeader() {
|
2019-03-14 18:51:05 -07:00
|
|
|
DWARFDebugFrame NewEHFrame(true, EHFrameSection->getOutputAddress());
|
2018-02-01 16:33:43 -08:00
|
|
|
NewEHFrame.parse(DWARFDataExtractor(EHFrameSection->getOutputContents(),
|
|
|
|
|
BC->AsmInfo->isLittleEndian(),
|
|
|
|
|
BC->AsmInfo->getCodePointerSize()));
|
|
|
|
|
|
2020-04-19 12:55:43 -07:00
|
|
|
uint64_t OldEHFrameAddress{0};
|
|
|
|
|
StringRef OldEHFrameContents;
|
2020-03-07 11:19:09 -08:00
|
|
|
auto OldEHFrameSection =
|
2020-03-11 15:51:32 -07:00
|
|
|
BC->getUniqueSectionByName(Twine(getOrgSecPrefix(), ".eh_frame").str());
|
2020-04-19 12:55:43 -07:00
|
|
|
if (OldEHFrameSection) {
|
|
|
|
|
OldEHFrameAddress = OldEHFrameSection->getOutputAddress();
|
|
|
|
|
OldEHFrameContents = OldEHFrameSection->getOutputContents();
|
|
|
|
|
}
|
|
|
|
|
DWARFDebugFrame OldEHFrame(true, OldEHFrameAddress);
|
|
|
|
|
OldEHFrame.parse(DWARFDataExtractor(OldEHFrameContents,
|
2018-02-01 16:33:43 -08:00
|
|
|
BC->AsmInfo->isLittleEndian(),
|
|
|
|
|
BC->AsmInfo->getCodePointerSize()));
|
2016-11-11 14:33:34 -08:00
|
|
|
|
|
|
|
|
DEBUG(dbgs() << "BOLT: writing a new .eh_frame_hdr\n");
|
|
|
|
|
|
2017-04-06 10:49:59 -07:00
|
|
|
NextAvailableAddress =
|
|
|
|
|
appendPadding(Out->os(), NextAvailableAddress, EHFrameHdrAlign);
|
2016-11-11 14:33:34 -08:00
|
|
|
|
2019-03-14 18:51:05 -07:00
|
|
|
const auto EHFrameHdrOutputAddress = NextAvailableAddress;
|
2018-02-01 16:33:43 -08:00
|
|
|
const auto EHFrameHdrFileOffset =
|
|
|
|
|
getFileOffsetForAddress(NextAvailableAddress);
|
2016-11-11 14:33:34 -08:00
|
|
|
|
|
|
|
|
auto NewEHFrameHdr =
|
|
|
|
|
CFIRdWrt->generateEHFrameHeader(OldEHFrame,
|
|
|
|
|
NewEHFrame,
|
2019-03-14 18:51:05 -07:00
|
|
|
EHFrameHdrOutputAddress,
|
2016-11-11 14:33:34 -08:00
|
|
|
FailedAddresses);
|
|
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
assert(Out->os().tell() == EHFrameHdrFileOffset && "offset mismatch");
|
|
|
|
|
Out->os().write(NewEHFrameHdr.data(), NewEHFrameHdr.size());
|
2016-11-11 14:33:34 -08:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
const auto Flags = BinarySection::getFlags(/*IsReadOnly=*/true,
|
|
|
|
|
/*IsText=*/false,
|
|
|
|
|
/*IsAllocatable=*/true);
|
|
|
|
|
auto &EHFrameHdrSec = BC->registerOrUpdateSection(".eh_frame_hdr",
|
|
|
|
|
ELF::SHT_PROGBITS,
|
|
|
|
|
Flags,
|
|
|
|
|
nullptr,
|
|
|
|
|
NewEHFrameHdr.size(),
|
|
|
|
|
/*Alignment=*/1);
|
[BOLT] Support for lite mode with relocations
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
2020-06-15 00:15:47 -07:00
|
|
|
EHFrameHdrSec.setOutputFileOffset(EHFrameHdrFileOffset);
|
2019-03-14 18:51:05 -07:00
|
|
|
EHFrameHdrSec.setOutputAddress(EHFrameHdrOutputAddress);
|
2016-11-11 14:33:34 -08:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
NextAvailableAddress += EHFrameHdrSec.getOutputSize();
|
2016-11-11 14:33:34 -08:00
|
|
|
|
2020-03-07 11:19:09 -08:00
|
|
|
// Merge new .eh_frame with original so that gdb can locate all FDEs.
|
2020-04-19 12:55:43 -07:00
|
|
|
if (OldEHFrameSection) {
|
|
|
|
|
const auto EHFrameSectionSize = (OldEHFrameSection->getOutputAddress() +
|
|
|
|
|
OldEHFrameSection->getOutputSize() -
|
|
|
|
|
EHFrameSection->getOutputAddress());
|
|
|
|
|
EHFrameSection =
|
|
|
|
|
BC->registerOrUpdateSection(".eh_frame",
|
|
|
|
|
EHFrameSection->getELFType(),
|
|
|
|
|
EHFrameSection->getELFFlags(),
|
|
|
|
|
EHFrameSection->getOutputData(),
|
|
|
|
|
EHFrameSectionSize,
|
|
|
|
|
EHFrameSection->getAlignment());
|
|
|
|
|
BC->deregisterSection(*OldEHFrameSection);
|
|
|
|
|
}
|
2018-02-01 16:33:43 -08:00
|
|
|
|
2016-11-11 14:33:34 -08:00
|
|
|
DEBUG(dbgs() << "BOLT-DEBUG: size of .eh_frame after merge is "
|
2018-02-01 16:33:43 -08:00
|
|
|
<< EHFrameSection->getOutputSize() << '\n');
|
2016-11-11 14:33:34 -08:00
|
|
|
}
|
|
|
|
|
|
2020-06-22 16:16:08 -07:00
|
|
|
uint64_t RewriteInstance::getNewValueForSymbol(const StringRef Name) {
|
|
|
|
|
uint64_t Value = cantFail(OLT->findSymbol(Name, false).getAddress(),
|
|
|
|
|
"findSymbol() failed");
|
|
|
|
|
if (Value != 0)
|
|
|
|
|
return Value;
|
|
|
|
|
|
|
|
|
|
// Return the original value if we haven't emitted the symbol.
|
|
|
|
|
auto *BD = BC->getBinaryDataByName(Name);
|
|
|
|
|
if (!BD)
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
|
|
return BD->getAddress();
|
|
|
|
|
}
|
|
|
|
|
|
2017-01-17 15:49:59 -08:00
|
|
|
uint64_t RewriteInstance::getFileOffsetForAddress(uint64_t Address) const {
|
|
|
|
|
// Check if it's possibly part of the new segment.
|
|
|
|
|
if (Address >= NewTextSegmentAddress) {
|
|
|
|
|
return Address - NewTextSegmentAddress + NewTextSegmentOffset;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Find an existing segment that matches the address.
|
2020-06-26 16:52:07 -07:00
|
|
|
const auto SegmentInfoI = BC->SegmentMapInfo.upper_bound(Address);
|
|
|
|
|
if (SegmentInfoI == BC->SegmentMapInfo.begin())
|
2017-01-17 15:49:59 -08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
|
|
const auto &SegmentInfo = std::prev(SegmentInfoI)->second;
|
|
|
|
|
if (Address < SegmentInfo.Address ||
|
|
|
|
|
Address >= SegmentInfo.Address + SegmentInfo.FileSize)
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
|
|
return SegmentInfo.FileOffset + Address - SegmentInfo.Address;
|
|
|
|
|
}
|
|
|
|
|
|
2017-02-07 12:20:46 -08:00
|
|
|
bool RewriteInstance::willOverwriteSection(StringRef SectionName) {
|
2017-09-20 10:43:01 -07:00
|
|
|
for (auto &OverwriteName : SectionsToOverwrite) {
|
|
|
|
|
if (SectionName == OverwriteName)
|
|
|
|
|
return true;
|
2016-05-16 17:02:17 -07:00
|
|
|
}
|
2019-04-26 15:30:12 -07:00
|
|
|
for (auto &OverwriteName : DebugSectionsToOverwrite) {
|
|
|
|
|
if (SectionName == OverwriteName)
|
|
|
|
|
return true;
|
|
|
|
|
}
|
2019-04-12 17:33:46 -07:00
|
|
|
|
2018-02-01 16:33:43 -08:00
|
|
|
auto Section = BC->getUniqueSectionByName(SectionName);
|
|
|
|
|
return Section && Section->isAllocatable() && Section->isFinalized();
|
2016-05-16 17:02:17 -07:00
|
|
|
}
|
2019-04-26 15:30:12 -07:00
|
|
|
|
|
|
|
|
bool RewriteInstance::isDebugSection(StringRef SectionName) {
|
|
|
|
|
if (SectionName.startswith(".debug_") || SectionName == ".gdb_index")
|
|
|
|
|
return true;
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
Generate heatmap for linux kernel
Summary:
This diff handles several challenges related to heatmap generation for Linux kernel (vmlinux elf file):
- If the input binary elf file contains the section `__ksymtab`, this diff assumes that this is the linux kernel `vmlinux` file and enables an extra flag `LinuxKernelMode`
- In `LinuxKernelMode`, we only support heat map generation right now, therefore it ensures that current BOLT mode is heat map generation. Otherwise, it exits with error.
- For some Linux symbol and section combinations, BOLT may not be able to find section for symbol (specially symbols that specifies the end of some section). For such cases, we show an warning message without exiting which was the previous behavior.
- Linux kernel elf file does not contain dynamic section, therefore, we don't exit when no dynamic section is found for linux kernel binary.
- Current `ParseMMap` logic does not work with linux kernel. MMap entries for linux kernel uses `PERF_RECORD_MMAP` format instead of typical `PERF_RECORD_MMAP2` format. Since linux kernel address mapping is absolute (same as specified in the ELF file), we avoid calling `ParseMMap` in linux kernel mode.
- Linux kernel entries are registered with PID -1, therefore `BinaryMMapInfo` lookup is not required for linux kernel entries. Similarly, `adjustLBR` is also not required.
- Default max address in linux kernel mode is highest unsigned 64-bit integer instead of current 4GBs.
- Added another new parameter for heatmap, `MinAddress`, in case of Linux kernel mode which is `KernelBaseAddress`, otherwise, it is 0. While registering Heatmap sample counts from LBR entries, any address lower than this `MinAddress` is ignored.
- `IgnoreInterruptLBR` is disabled in linux kernel mode to ensure that kernel entries are processed
Currently, linux kernel heat map also include heat map for Linux kernel modules that are not part of vmlinux elf file. This is intentional to identify other potential optimization opportunities. If reviewers think, those modules should be omitted, I will disable those modules based on highest end address of a vmlinux elf section.
(cherry picked from FBD21992765)
2020-06-10 23:00:39 -07:00
|
|
|
|
|
|
|
|
bool RewriteInstance::isKSymtabSection(StringRef SectionName) {
|
|
|
|
|
if (SectionName.startswith("__ksymtab"))
|
|
|
|
|
return true;
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|