Summary:
We extended DynoStats to dump the histogram per instruction opcode. By
default the dump is turned off. Use '-print-dyno-opcode-stats' to enable
the dump.
BOLT also dumps for each instruction opcode the maximum execution count and
corresponding function name and basic block offsets where the instruction
occurs. Below is a sample of the dump:
Opcode, Execution Count, Max Exec Count, Function Name:Offset
SHR8rCL, 232, 232, _ZNK5folly14AsyncSSLSocket4goodEv:53
VPADDDYrr, 13956, 388, chacha20_encrypt_bytes.part.0/3:736
PMOVSXBWrr, 4, 2, ares_expand_name/1:264
VMOVAPSmr, 1082, 43, chacha20_encrypt_bytes.part.0/3:2864
VPSHUFBrr, 9540, 1667, chacha20_encrypt_bytes.part.0/3:4416
VPUNPCKLDQYrr, 1102, 188, jsimd_ycc_rgb_convert_avx2/1:125
VPBROADCASTQYrm, 39, 39, chacha20_encrypt_bytes.part.0/3:400
PMOVSXWDrr, 8, 2, ares_expand_name/1:264
VPORrr, 817, 129, jsimd_idct_islow_avx2/1:41
PSLLDri, 8690752, 65644, blockmix_salsa8_xor/1:1424
(cherry picked from FBD28859624)
Summary:
Ran iwyu multiple times, manually picked header remove lines.
Reached fixed point wrt removal: iwyu doesn't automatically remove
any more headers or forward declarations.
(cherry picked from FBD29569221)
Summary:
This commit is the first step in rebasing all of BOLT
history in the LLVM monorepo. It also solves trivial build issues
by updating BOLT codebase to use current LLVM. There is still work
left in rebasing some BOLT features and in making sure everything
is working as intended.
History has been rewritten to put BOLT in the /bolt folder, as
opposed to /tools/llvm-bolt.
(cherry picked from FBD33289252)
Summary:
Introduce new BinaryFunction flag `IsCanonicalCFG`, which gets
unset by SCTC pass. Make DynoStats collection conditional on this
new flag.
SCTC leaves CFG in a state where branch counters of BBs with tail
calls/conditional tail calls are not available (except via annotations,
which get stripped by `lower-annotations`). Without branch
counters, DynoStats are invalid.
(cherry picked from FBD24558050)
Summary:
The commit that fixed ICF determinism in non-relocation mode disabled
profile merging for functions. Dyno stats output needs to be updated to
reflect the lack of merge.
(cherry picked from FBD21366046)
Summary:
After stripping annotations, conditional tail calls no longer can be
identified by their corresponding tag. We can check the number of basic
block successors instead.
Fixesfacebookincubator/BOLT#58.
(cherry picked from FBD16444718)
Summary:
In strict relocation mode we rely on relocations to represent all
possible entry points into a function. Most of the code generated by
tested compilers (gcc and clang) will result in relocations against
any internal labels for jump tables and for computed goto tables.
In situations where we cannot properly reconstruct a jump table, or when
we cannot determine a table that guides an indirect jump, e.g. when
multiple computed goto tables are used, we conservatively assume that
the indirect jump can end up at any possible basic block referenced by
relocations.
In strict mode, simple functions may include the aforementioned
instructions with unknown control flow with a conservative list of
destinations added to the containing basic block. This allows us to
expand coverage of simple functions and to enable code reordering
optimizations for more functions.
The strict mode is recommended when BOLT is used with a well-formed
code generated by a compiler.
To use the strict mode, add "-strict" on the command line.
Another effect of this diff, is that with relocations, we will always
replace the immediate operand of an instruction with a symbol if the
relocation exists against this operand.
Also this diff fixes issues with Clang compiled with -fpic.
(cherry picked from FBD15872849)