mirror of
https://github.com/intel/llvm.git
synced 2026-01-14 20:10:50 +08:00
Summary: A new block reordering algorithm, cache+, that is designed to optimize i-cache performance. On a high level, this algorithm is a greedy heuristic that merges clusters (ordered sequences) of basic blocks, similarly to how it is done in OptimizeCacheReorderAlgorithm. There are two important differences: (a) the metric that is optimized in the procedure, and (b) how two clusters are merged together. Initially all clusters are isolated basic blocks. On every iteration, we pick a pair of clusters whose merging yields the biggest increase in the ExtTSP metric (see CacheMetrics.cpp for exact implementation), which models how i-cache "friendly" a pecific cluster is. A pair of clusters giving the maximum gain is merged to a new clusters. The procedure stops when there is only one cluster left, or when merging does not increase ExtTSP. In the latter case, the remaining clusters are sorted by density. An important aspect is the way two clusters are merged. Unlike earlier algorithms (e.g., OptimizeCacheReorderAlgorithm or Pettis-Hansen), two clusters, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three clusters (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. (cherry picked from FBD6466264)
39 lines
1.3 KiB
C++
39 lines
1.3 KiB
C++
//===- CacheMetrics.h - Interface for instruction cache evaluation --===//
|
|
//
|
|
// Functions to show metrics of cache lines
|
|
//
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
#ifndef LLVM_TOOLS_LLVM_BOLT_CACHEMETRICS_H
|
|
#define LLVM_TOOLS_LLVM_BOLT_CACHEMETRICS_H
|
|
|
|
#include "BinaryFunction.h"
|
|
#include <vector>
|
|
|
|
namespace llvm {
|
|
namespace bolt {
|
|
namespace CacheMetrics {
|
|
|
|
/// Calculate various metrics related to instruction cache performance.
|
|
void printAll(const std::vector<BinaryFunction *> &BinaryFunctions);
|
|
|
|
/// Calculate Extended-TSP metric, which quantifies the expected number of
|
|
/// i-cache misses for a given pair of basic blocks. The parameters are:
|
|
/// - SrcAddr is the address of the source block;
|
|
/// - SrcSize is the size of the source block;
|
|
/// - DstAddr is the address of the destination block;
|
|
/// - Count is the number of jumps between the pair of blocks.
|
|
double extTSPScore(uint64_t SrcAddr,
|
|
uint64_t SrcSize,
|
|
uint64_t DstAddr,
|
|
uint64_t Count);
|
|
|
|
} // namespace CacheMetrics
|
|
} // namespace bolt
|
|
} // namespace llvm
|
|
|
|
#endif //LLVM_CACHEMETRICS_H
|