Summary:
This commit is the first step in rebasing all of BOLT
history in the LLVM monorepo. It also solves trivial build issues
by updating BOLT codebase to use current LLVM. There is still work
left in rebasing some BOLT features and in making sure everything
is working as intended.
History has been rewritten to put BOLT in the /bolt folder, as
opposed to /tools/llvm-bolt.
(cherry picked from FBD33289252)
Summary:
Add '-lite' support for relocations for improved processing time,
memory consumption, and more resilient processing of binaries with
embedded assembly code.
In lite relocation mode, BOLT will skip full processing of functions
without a profile. It will run scanExternalRefs() on such functions
to discover external references and to create internal relocations
to update references to optimized functions.
Note that we could have relied on the compiler/linker to provide
relocations for function references. However, there's no assurance
that all such references are reported. E.g., the compiler can resolve
inter-procedural references internally, leaving no relocations
for the linker.
The scan process takes about <10 seconds per 100MB of code on modern
hardware. It's a reasonable overhead to live with considering the
flexibility it provides.
If BOLT fails to scan or disassemble a function, .e.g., due to a data
object embedded in code, or an unsupported instruction, it enables a
patching mode to guarantee that the failed function will call
optimized/moved versions of functions. The patching happens at original
function entry points.
'-skip=<func1,func2,...>' option now can be used to skip processing of
arbitrary functions in the relocation mode.
With '-use-old-text' or '-strict' we require all functions to be
processed. As such, it is incompatible with '-lite' option,
and '-skip' option will only disable optimizations of listed
functions, not their disassembly and emission.
(cherry picked from FBD22040717)
Summary:
In a rare case, we may fold a function and fail to emit it in
non-relocation mode due to a function size increase. At the same time,
the function that the original function was folded into could have been
successfully emitted, e.g. because it was split in the presence of a
profile information.
Later, because the function was not emitted, we have to use its original
.eh_frame entry in the preserved .eh_frame section. However, that entry
is no longer referencing the original function, but the function that
the original was folded into. This happens since the original symbol gets
emitted at the other function location. As a result, .eh_frame entry for
the folded function is missing.
To prevent incorrect update of the original .eh_frame, create
relocations against absolute values. This guarantees preservation of the
section contents while updating pc-relative references.
(cherry picked from FBD21061130)
Summary:
Avoid directly allocating string and description tables in
binary's static data region, since they are not needed during runtime
except when writing the profile at exit. Change the runtime library to
open the tables on disk and read only when necessary.
(cherry picked from FBD16626030)
Summary:
This refactoring makes it easier to create new code sections and control
code placement. As an example, cold code is being placed into
".text.cold" which is emitted independently from ".text", and the final
address assignment becomes more flexible.
Previously, in non-relocation mode we used to emit temporary section
name into .shstrtab. This resulted in unnecessary bloat of this section.
There was unnecessary padding emitted at the end of text section. After
fixing this, the output binary becomes smaller.
I had to change the way exception handling tables are re-written
as the current infra does not support cross-section label difference.
This means we have to emit absolute landing pad addresses, which might
not work for PIE binaries. I'm going to address this once I investigate
the current exception handling issues in PIEs.
This diff temporarily disables "-hot-functions-at-end" option.
(cherry picked from FBD14475693)
Summary:
This diff replaces the addresses in all the {SYMBOLat,HOLEat,DATAat} symbols with hash values based on the data contained in the symbol. It should make the profiling data for anonymous symbols robust to address changes.
The only small problem with this approach is that the hashed name for padding symbols of the same size collide frequently. This shouldn't be a big deal since it would be weird if those symbols were hot.
On a test run with hhvm there were 26 collisions (out of ~338k symbols). Most of the collisions were from small (2,4,8 byte) objects.
(cherry picked from FBD7134261)
Summary:
Enable BOLT to reorder data sections in a binary based on memory
profiling data.
This diff adds a new pass to BOLT that can reorder data sections for
better locality based on memory profiling data. For now, the algorithm
to order data is primitive and just relies on the frequency of loads to
order the contents of a section. We could probably do a lot better by
looking at what functions use the hot data and grouping together hot
data that is used by a single function (or cluster of functions).
Block ordering might give some hints on how to order the data better as
well.
The new pass has two basic modes: inplace and split (when inplace is
false). The default is split since inplace hasn't really been tested
much. When splitting is on, the cold data is copied to a "cold" version
of the section while the hot data is kept in the original section, e.g.
for .rodata, .rodata will contain the hot data and .bolt.org.rodata will
contain the cold bits. In inplace mode, the section contents are
reordered inplace. In either mode, all relocations to data within that
section are updated to reflect new data locations.
Things to improve:
- The current algorithm is really dumb and doesn't seem to lead to any
wins. It certainly could use some improvement.
- Private symbols can have data that leaks over to an adjacent symbol,
e.g. a string that has a common suffix can start in one symbol and
leak over (with the common suffix) into the next. For now, we punt on
adjacent private symbols.
- Handle ambiguous relocations better. Section relocations that point
to the boundary of two symbols will prevent the adjacent symbols from
being moved because we can't tell which symbol the relocation is for.
- Handle jump tables. Right now jump table support must be basic if
data reordering is enabled.
- Being able to handle TLS. A good amount of data access in some
binaries are happening in TLS. It would be worthwhile to be able to
reorder any TLS sections too.
- Handle sections with writeable data. This hasn't been tested so
probably won't work. We could try to prevent false sharing in
writeable sections as well.
- A pie in the sky goal would be to use DWARF info to reorder types.
(cherry picked from FBD6792876)
Summary:
BOLT sources are being moved under tools/llvm-bolt/src
and tools/llvm-bolt will contain more files such as LICENSE.txt,
README.txt, etc.
Remove trailing white spaces from our sources.
Create llvm.patch by running
> git diff f137ed238db11440f03083b1c88b7ffc0f4af65e include lib > \
tools/llvm-bolt/llvm.patch
README.txt has instructions on checking out sources and applying the
patch.
(cherry picked from FBD7878380)