This patch adds the `REQUIRES: shell` directive to the BOLT permission
test to ensure it only runs in environments with a full-featured
Unix-like shell. This change is necessary because the test relies on
advanced shell capabilities that are not supported by lit's internal
shell.
**Reasoning:** The BOLT permission test uses features like running
commands in the background with `&`, performing arithmetic operations,
and handling special number formats (octal). These features require a
more capable shell than what lit's internal shell provides. Without a
proper shell, the test could fail or behave unpredictably.
This change is relevant for enabling the lit internal shell by default,
as outlined in [[RFC] Enabling the Lit Internal Shell by
Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)
This patch aborts BOLT execution if it finds out-of-section (section
end) symbol in GOT table. In order to handle such situations properly in
future, we would need to have an arch-dependent way to analyze
relocations or its sequences, e.g., for ARM it would probably be ADRP +
LDR analysis in order to get GOT entry address. Currently, it is also
challenging because GOT-related relocation symbols are replaced to
__BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems
to be only? related to static binaries. For the most part, it seems that
it should be handled on the linker stage, since static binary should not
have GOT table at all. LLD linker with relaxations enabled would replace
instruction addresses from GOT directly to target symbols, which
eliminates the problem.
Anyway, in order to achieve detection of such cases, this patch fixes a
few things in BOLT:
1. For the end symbols, we're now using the section provided by ELF
binary. Previously it would be tied with a wrong section found by symbol
address.
2. The end symbols would have limited registration we would only
add them in name->data GlobalSymbols map, since using address->data
BinaryDataMap map would likely be impossible due to address duality of
such symbols.
3. The outdated BD->getSection (currently returning refence, not
pointer) check in postProcessSymbolTable is replaced by getSize check in
order to allow zero-sized top-level symbols if they are located in
zero-sized sections. For the most part, such things could only be found
in tests, but I don't see a reason not to handle such cases.
4. Updated section-end-sym test and removed x86_64 requirement since
there is no reason for this (tested on aarch64 linux)
The test was provided by peterwaller-arm (thank you) in #100096 and
slightly modified by me.
After porting BOLT to RISCV some of the relocations were broken on both
AArch64 and X86.
On AArch64 the example of broken relocations would be GOT, during
handling them, we should replace the symbol to __BOLT_got_zero in order
to address GOT entry, not the symbol that addresses this entry. This is
done further in code, so it is too early to add rel here.
On X86 it is a mistake to add relocations without addend. This is the
exact problem that is raised on #97937. Due to different code generation
I had to use gcc-generated yaml test, since with clang I wasn't able to
reproduce problem.
Added tests for both architectures and made the problematic condition
riscV-specific.
One of the problems related to #93151 is probably that aarch64 target
might have different names in different env, so extend aarch64 cmake cpu
match with different name aliases.
Take a common weak reference pattern for example
```
__attribute__((weak)) void undef_weak_fun();
if (&undef_weak_fun)
undef_weak_fun();
```
In this case, an undefined weak symbol `undef_weak_fun` has an address
of zero, and Bolt incorrectly changes the relocation for the
corresponding symbol to symbol@PLT, leading to incorrect runtime
behavior.
When BOLT is run in AggregateOnly mode (perf2bolt), it exits with code
zero so destructors are not run thus TimerGroup never prints the timers.
Add explicit printing just before the exit to honor options requesting
timers (`--time-rewrite`, `--time-aggr`).
Test Plan: updated bolt/test/timers.c
Reviewers: ayermolo, maksfb, rafaelauler, dcci
Reviewed By: dcci
Pull Request: https://github.com/llvm/llvm-project/pull/101270
Running into an error with removing DWP where the assertion
`RelaxAllView &&
"RegisterMCTargetOptionsFlags not created."'` failed. This is a result
of DWP bringing the mc::RegisterMCTargetOptionsFlags option in, and the
option being removed with DWP. The need for this option didn't
originally exist because we didn't use MC in DWARFRewriter, but we
switched to using DWARFStreamer which needed the option.
https://reviews.llvm.org/D75579https://reviews.llvm.org/D106417
Updates the Dockerfile to use Ubuntu 24.04 due to CMake wanting a newer
version. Can be tested by trying to build the Docker image currently in
main and then try building the Docker image in this PR.
Continue from #87196 as author did not have much time, I have taken over
working on this PR. We would like to have this so it'll be easier to
package for Nix.
Can be tested by copying cmake, bolt, third-party, and llvm directories
out into their own directory with this PR applied and then build bolt.
---------
Co-authored-by: pca006132 <john.lck40@gmail.com>
Multi-way splitting can cause multiple fragments to access the same jump
table. Relax the assumption that a jump table can only have up to two
parents.
Test Plan: added bolt/test/X86/three-way-split-jt.s
Reviewers: ayermolo, dcci, rafaelauler, maksfb
Reviewed By: rafaelauler, dcci
Pull Request: https://github.com/llvm/llvm-project/pull/99988
Three-way splitting can create references between split fragments (warm
to cold or vice versa) that are not handled by
`isChildOf/isParentOf/isChildOrParentOf`. Generalize fragment
relationships to allow checking if two functions belong to one group,
potentially in presence of ICF which can join multiple groups.
Test Plan: NFC for existing tests
Reviewers: maksfb, ayermolo, rafaelauler, dcci
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/99979
Followup to the splitting of processUnitDIE, moves code that accesses
common resource to be outside of the function that will be parallelized.
Followup to #99957
In bolt/lib/Passes/AsmDump.cpp, the MCInstPrinter is created with false
AsmVerbose. The AsmVerbose argument to createAsmStreamer is unused.
Deprecate the legacy Target::createAsmStreamer overload, which might be
used by downstream.
Implemented call graph function matching. First, two call graphs are
constructed for both profiled and binary functions. Then functions are
hashed based on the names of their callee/caller functions. Finally,
functions are matched based on these neighbor hashes and the
longest common prefix of their names. The `match-with-call-graph`
flag turns this matching on.
Test Plan: Added match-with-call-graph.test. Matched 164 functions
in a large binary with 10171 profiled functions.
Read pseudo probes in regular and BAT YAML profile generation, and
attach them to YAML profile basic blocks. This exposes GUID, probe id,
and probe type in profile for future use in stale profile matching.
Test Plan: updated pseudoprobe-decoding-inline.test
Reviewers: dcci, rafaelauler, ayermolo, maksfb
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/99554
Add a BinaryFunction field for pseudo probe function GUID.
Populate it during pseudo probe section parsing, and emit it in YAML
profile (both regular and BAT), along with function checksum.
To be used for stale function matching.
Test Plan: update pseudoprobe-decoding-inline.test
Detect and support fixed PIC indirect jumps of the following form:
```
movslq En(%rip), %r1
leaq PIC_JUMP_TABLE(%rip), %r2
addq %r2, %r1
jmpq *%r1
```
with PIC_JUMP_TABLE that looks like following:
```
JT: ----------
E1:| L1 - JT |
|----------|
E2:| L2 - JT |
|----------|
| |
......
En:| Ln - JT |
----------
```
The code could be produced by compilers, see
https://github.com/llvm/llvm-project/issues/91648.
Test Plan: updated jump-table-fixed-ref-pic.test
Reviewers: maksfb, ayermolo, dcci, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/91667
On clang 14 the build is failing with:
reference to local binding 'ParentName' declared in enclosing function
'llvm::bolt::RewriteInstance::registerFragments'
With aggressive ICF, it's possible to have different local symbols
(under different FILE symbols) to be mapped to the same address.
FileSymRefs only keeps a single SymbolRef per address, which prevents
fragment matching from finding the correct symbol to perform parent
function lookup.
Work around this issue by switching FileSymRefs to a multimap. In
future, uses of FileSymRefs can be replaced with SortedSymbols which
keeps essentially the same information.
Test Plan: added ambiguous_fragment.test
Reviewers: dcci, ayermolo, maksfb, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/98992