Files
llvm/bolt/src/BinarySection.cpp

125 lines
3.8 KiB
C++
Raw Normal View History

//===--- BinarySection.cpp - Interface for object file section -----------===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
//===----------------------------------------------------------------------===//
#include "BinarySection.h"
[BOLT] Static data reordering pass. Summary: Enable BOLT to reorder data sections in a binary based on memory profiling data. This diff adds a new pass to BOLT that can reorder data sections for better locality based on memory profiling data. For now, the algorithm to order data is primitive and just relies on the frequency of loads to order the contents of a section. We could probably do a lot better by looking at what functions use the hot data and grouping together hot data that is used by a single function (or cluster of functions). Block ordering might give some hints on how to order the data better as well. The new pass has two basic modes: inplace and split (when inplace is false). The default is split since inplace hasn't really been tested much. When splitting is on, the cold data is copied to a "cold" version of the section while the hot data is kept in the original section, e.g. for .rodata, .rodata will contain the hot data and .bolt.org.rodata will contain the cold bits. In inplace mode, the section contents are reordered inplace. In either mode, all relocations to data within that section are updated to reflect new data locations. Things to improve: - The current algorithm is really dumb and doesn't seem to lead to any wins. It certainly could use some improvement. - Private symbols can have data that leaks over to an adjacent symbol, e.g. a string that has a common suffix can start in one symbol and leak over (with the common suffix) into the next. For now, we punt on adjacent private symbols. - Handle ambiguous relocations better. Section relocations that point to the boundary of two symbols will prevent the adjacent symbols from being moved because we can't tell which symbol the relocation is for. - Handle jump tables. Right now jump table support must be basic if data reordering is enabled. - Being able to handle TLS. A good amount of data access in some binaries are happening in TLS. It would be worthwhile to be able to reorder any TLS sections too. - Handle sections with writeable data. This hasn't been tested so probably won't work. We could try to prevent false sharing in writeable sections as well. - A pie in the sky goal would be to use DWARF info to reorder types. (cherry picked from FBD6792876)
2018-04-20 20:03:31 -07:00
#include "BinaryContext.h"
#include "llvm/Support/CommandLine.h"
#undef DEBUG_TYPE
[BOLT] Static data reordering pass. Summary: Enable BOLT to reorder data sections in a binary based on memory profiling data. This diff adds a new pass to BOLT that can reorder data sections for better locality based on memory profiling data. For now, the algorithm to order data is primitive and just relies on the frequency of loads to order the contents of a section. We could probably do a lot better by looking at what functions use the hot data and grouping together hot data that is used by a single function (or cluster of functions). Block ordering might give some hints on how to order the data better as well. The new pass has two basic modes: inplace and split (when inplace is false). The default is split since inplace hasn't really been tested much. When splitting is on, the cold data is copied to a "cold" version of the section while the hot data is kept in the original section, e.g. for .rodata, .rodata will contain the hot data and .bolt.org.rodata will contain the cold bits. In inplace mode, the section contents are reordered inplace. In either mode, all relocations to data within that section are updated to reflect new data locations. Things to improve: - The current algorithm is really dumb and doesn't seem to lead to any wins. It certainly could use some improvement. - Private symbols can have data that leaks over to an adjacent symbol, e.g. a string that has a common suffix can start in one symbol and leak over (with the common suffix) into the next. For now, we punt on adjacent private symbols. - Handle ambiguous relocations better. Section relocations that point to the boundary of two symbols will prevent the adjacent symbols from being moved because we can't tell which symbol the relocation is for. - Handle jump tables. Right now jump table support must be basic if data reordering is enabled. - Being able to handle TLS. A good amount of data access in some binaries are happening in TLS. It would be worthwhile to be able to reorder any TLS sections too. - Handle sections with writeable data. This hasn't been tested so probably won't work. We could try to prevent false sharing in writeable sections as well. - A pie in the sky goal would be to use DWARF info to reorder types. (cherry picked from FBD6792876)
2018-04-20 20:03:31 -07:00
#define DEBUG_TYPE "binary-section"
using namespace llvm;
using namespace bolt;
namespace opts {
extern cl::opt<bool> PrintRelocations;
}
BinarySection::~BinarySection() {
[BOLT] Static data reordering pass. Summary: Enable BOLT to reorder data sections in a binary based on memory profiling data. This diff adds a new pass to BOLT that can reorder data sections for better locality based on memory profiling data. For now, the algorithm to order data is primitive and just relies on the frequency of loads to order the contents of a section. We could probably do a lot better by looking at what functions use the hot data and grouping together hot data that is used by a single function (or cluster of functions). Block ordering might give some hints on how to order the data better as well. The new pass has two basic modes: inplace and split (when inplace is false). The default is split since inplace hasn't really been tested much. When splitting is on, the cold data is copied to a "cold" version of the section while the hot data is kept in the original section, e.g. for .rodata, .rodata will contain the hot data and .bolt.org.rodata will contain the cold bits. In inplace mode, the section contents are reordered inplace. In either mode, all relocations to data within that section are updated to reflect new data locations. Things to improve: - The current algorithm is really dumb and doesn't seem to lead to any wins. It certainly could use some improvement. - Private symbols can have data that leaks over to an adjacent symbol, e.g. a string that has a common suffix can start in one symbol and leak over (with the common suffix) into the next. For now, we punt on adjacent private symbols. - Handle ambiguous relocations better. Section relocations that point to the boundary of two symbols will prevent the adjacent symbols from being moved because we can't tell which symbol the relocation is for. - Handle jump tables. Right now jump table support must be basic if data reordering is enabled. - Being able to handle TLS. A good amount of data access in some binaries are happening in TLS. It would be worthwhile to be able to reorder any TLS sections too. - Handle sections with writeable data. This hasn't been tested so probably won't work. We could try to prevent false sharing in writeable sections as well. - A pie in the sky goal would be to use DWARF info to reorder types. (cherry picked from FBD6792876)
2018-04-20 20:03:31 -07:00
if (isReordered()) {
delete[] getData();
return;
}
if (!isAllocatable() &&
(!hasSectionRef() ||
OutputContents.data() != getContents(Section).data())) {
delete[] getOutputData();
}
}
void BinarySection::print(raw_ostream &OS) const {
OS << getName() << ", "
<< "0x" << Twine::utohexstr(getAddress()) << ", "
<< getSize()
<< " (0x" << Twine::utohexstr(getFileAddress()) << ", "
<< getOutputSize() << ")"
<< ", data = " << getData()
<< ", output data = " << getOutputData();
if (isAllocatable())
OS << " (allocatable)";
[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)
2017-11-14 20:05:11 -08:00
if (isVirtual())
OS << " (virtual)";
if (isTLS())
OS << " (tls)";
if (opts::PrintRelocations) {
for (auto &R : relocations())
OS << "\n " << R;
}
}
[BOLT] Static data reordering pass. Summary: Enable BOLT to reorder data sections in a binary based on memory profiling data. This diff adds a new pass to BOLT that can reorder data sections for better locality based on memory profiling data. For now, the algorithm to order data is primitive and just relies on the frequency of loads to order the contents of a section. We could probably do a lot better by looking at what functions use the hot data and grouping together hot data that is used by a single function (or cluster of functions). Block ordering might give some hints on how to order the data better as well. The new pass has two basic modes: inplace and split (when inplace is false). The default is split since inplace hasn't really been tested much. When splitting is on, the cold data is copied to a "cold" version of the section while the hot data is kept in the original section, e.g. for .rodata, .rodata will contain the hot data and .bolt.org.rodata will contain the cold bits. In inplace mode, the section contents are reordered inplace. In either mode, all relocations to data within that section are updated to reflect new data locations. Things to improve: - The current algorithm is really dumb and doesn't seem to lead to any wins. It certainly could use some improvement. - Private symbols can have data that leaks over to an adjacent symbol, e.g. a string that has a common suffix can start in one symbol and leak over (with the common suffix) into the next. For now, we punt on adjacent private symbols. - Handle ambiguous relocations better. Section relocations that point to the boundary of two symbols will prevent the adjacent symbols from being moved because we can't tell which symbol the relocation is for. - Handle jump tables. Right now jump table support must be basic if data reordering is enabled. - Being able to handle TLS. A good amount of data access in some binaries are happening in TLS. It would be worthwhile to be able to reorder any TLS sections too. - Handle sections with writeable data. This hasn't been tested so probably won't work. We could try to prevent false sharing in writeable sections as well. - A pie in the sky goal would be to use DWARF info to reorder types. (cherry picked from FBD6792876)
2018-04-20 20:03:31 -07:00
std::set<Relocation> BinarySection::reorderRelocations(bool Inplace) const {
assert(PendingRelocations.empty() &&
"reodering pending relocations not supported");
std::set<Relocation> NewRelocations;
for (const auto &Rel : relocations()) {
auto RelAddr = Rel.Offset + getAddress();
auto *BD = BC.getBinaryDataContainingAddress(RelAddr);
BD = BD->getAtomicRoot();
assert(BD);
if ((!BD->isMoved() && !Inplace) || BD->isJumpTable())
continue;
auto NewRel(Rel);
auto RelOffset = RelAddr - BD->getAddress();
NewRel.Offset = BD->getOutputOffset() + RelOffset;
assert(NewRel.Offset < getSize());
DEBUG(dbgs() << "BOLT-DEBUG: moving " << Rel << " -> " << NewRel << "\n");
auto Res = NewRelocations.emplace(std::move(NewRel));
(void)Res;
assert(Res.second && "Can't overwrite existing relocation");
}
return NewRelocations;
}
void BinarySection::reorderContents(const std::vector<BinaryData *> &Order,
bool Inplace) {
IsReordered = true;
Relocations = reorderRelocations(Inplace);
std::string Str;
raw_string_ostream OS(Str);
auto *Src = Contents.data();
DEBUG(dbgs() << "BOLT-DEBUG: reorderContents for " << Name << "\n");
for (auto *BD : Order) {
assert((BD->isMoved() || !Inplace) && !BD->isJumpTable());
assert(BD->isAtomic() && BD->isMoveable());
const auto SrcOffset = BD->getAddress() - getAddress();
assert(SrcOffset < Contents.size());
assert(SrcOffset == BD->getOffset());
while (OS.tell() < BD->getOutputOffset()) {
OS.write((unsigned char)0);
}
DEBUG(dbgs() << "BOLT-DEBUG: " << BD->getName()
<< " @ " << OS.tell() << "\n");
OS.write(&Src[SrcOffset], BD->getOutputSize());
}
if (Relocations.empty()) {
// If there are no existing relocations, tack a phony one at the end
// of the reordered segment to force LLVM to recognize and map this
// section.
auto *ZeroSym = BC.registerNameAtAddress("Zero", 0, 0, 0);
addRelocation(OS.tell(), ZeroSym, ELF::R_X86_64_64, 0xdeadbeef);
uint64_t Zero = 0;
OS.write(reinterpret_cast<const char *>(&Zero), sizeof(Zero));
}
auto *NewData = reinterpret_cast<char *>(copyByteArray(OS.str()));
Contents = OutputContents = StringRef(NewData, OS.str().size());
OutputSize = Contents.size();
}