cuEntries was sorted indirectly through a separate `cuIndices`.
Eliminate cuIndices for simplicity.
Linking chromium_framework from `#48001` with `-no_uuid` gives identical
executable using this patch.
Further to
https://github.com/llvm/llvm-project/pull/147134#discussion_r2337246489,
switch to use the madvise() api to page in mmap'd files and
1) All new code compiled in #if LLVM_ENABLE_THREADS is set so it can be
seen where the changes were from this PR.
2) The new PR moves to use madvise() instead of the ad-hoc page
referencing code I wrote which should avoid SIGSEGVs if the buffer is
deallocated.
3) A new property SerialBackgroundQueue().stopAllWork to be used to stop
background workers when there is no further call for them. Usually the
background "page-in" threads have completed first but it seems with this
troublesome test this is not always the case and buffers stored in the
static input file cache are being deallocated while being referenced.
---------
Co-authored-by: James Henderson <James.Henderson@sony.com>
I noticed that we had a hardcoded value of 4 for the pcrel section
relocations, which seems like an issue given that we recently added
support for 1-byte branch relocations in
https://github.com/llvm/llvm-project/pull/164439. The code included an
assert that the relevant relocation had the BYTE4 attribute, but that is
actually not enough to use a hardcoded value of 4: we need to assert
that the *other* `BYTE<n>` attributes are not set either.
However, since we did not support local branch relocations, that doesn't
seem to have mattered in practice. That said, local branch relocations
can be emitted by compilers, and ld64 does handle the 4-byte version of
them, so I've added support for it here.
ld64 actually seems to reject 1-byte section relocations, so the
questionable code is actually probably fine (minus the incorrect
assert). So we have two options: add an equivalent check in LLD, or just
support 1-byte local branch relocations. Supporting it actually requires
less code, so I've gone with that option here.
Currently, if multiple external weak symbols are defined at the same
address in an object file (e.g., by using the .set assembler directive
to alias them to a single weak variable), ld64.lld treats them as a
single unit. When any one of these symbols is overridden by a strong
definition, all of the original weak symbols resolve to the strong
definition.
This patch changes the behavior in `transplantSymbolsAtOffset`. When a
weak symbol is being replaced by a strong one, only non-external (local)
symbols at the same offset are moved to the new symbol's section. Other
*external* symbols are no longer transplanted.
This allows each external weak symbol to be overridden independently.
This behavior is consistent with Apple's ld-classic, but diverges from
ld-prime in one case, as noted on
https://github.com/llvm/llvm-project/issues/167262 (this discrepancy has
recently been reported to Apple).
### Backward Compatibility
This change alters linker behavior for a specific scenario. The creation
of multiple external weak symbols aliased to the same address via
assembler directives is primarily an advanced technique. It's unlikely
that existing builds rely on the current behavior of all aliases being
overridden together.
If there are concerns, this could be put behind a linker option, but the
new default seems more correct, less surprising, and is consistent with
ld-classic.
### Testing
The new lit test `test/MachO/weak-alias-override.s` verifies this
behavior using llvm-nm.
Fixes#167262
https://github.com/llvm/llvm-project/pull/140307 added support for
cstring hashes in the orderfile to layout cstrings in a specific order,
but only when `--deduplicate-strings` is used. This PR supports cstring
ordering when `--no-deduplicate-strings` is used.
1. Create `cStringPriorities`, separate from `priorities`, to hold only
priorities for cstring pieces. This allows us to lookup by hash
directly, instead of first converting to a string. It also fixes a
contrived bug where we want to order a symbol named `CSTR;12345` rather
than a cstring.
2. Rather than calling `buildCStringPriorities()` which always
constructs and returns a vector, we use `forEachStringPiece()` to
efficiently iterate over cstring pieces without creating a new vector if
no cstring is ordered.
3. Create `SymbolPriorityEntry::{get,set}Priority()` helper functions to
simplify code.
Ran into a use case where we had a MachO object file with a section
symbol which did not have a section associated with it segfaults during
linking. This patch aims to handle such cases gracefully and avoid the
linker from crashing.
---------
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
Since equalsVariable runs a lot more times, we want to minimize the work
it
needs to do. Anything not dependent on the icfEqClass values should get
hoisted
out.
With this change, ICF runs ~1.7% faster when linking clang.
Benchmarking approach:
cbdr sample -b ~/extract-icf-time.sh ~/old/ld64.lld bin/ld64.lld
--timeout=300s | cbdr analyze -s 95
`extract-icf-time.sh` runs the clang link command with the `--icf=all
--time-trace` flags, then parses out the ICF duration from the resulting
time
trace using `jq`:
jq '{ICF: (.traceEvents[] | select(.name == "Fold Identical Code
Sections") | .dur)}'
Output:
</Users/jezng/extract-icf-time.sh ["/Users/jezng/old/ld64.lld"]>
</Users/jezng/extract-icf-time.sh ["bin/ld64.lld"]> difference (95% CI)
ICF 83678.207 ± 1502.778 82234.751 ± 1290.984 [ -2.0% .. -1.4%]
samples 208 225
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.
For more context, see https://github.com/llvm/llvm-project/pull/163117.
We typically shouldn't get this, but when we do (e.g. in #139439) we
should error out gracefully instead of crashing.
Note that we are stricter than ld64 here; ld64 appears to be able to
handle section offsets that point outside literal sections if the end
result is a valid pointer to another section in the input object file.
Supporting this would probably be a pain given our current design, and
it seems like enough of an edge case that it's onot worth it.
`Reloc::length` actually encodes the log2 of the length. Thanks @int3
for pointing this out in
https://github.com/llvm/llvm-project/issues/160894#issuecomment-3416250179.
This code computes hashes of relocations. With the correct length, the
hashes should more accurately represent the relocation. In my testing of
some large binaries, the compressed size change is negligable.
Also expand the #ifdef to remove unused code in this configuration. As
suggested in
https://github.com/llvm/llvm-project/pull/147134#issuecomment-3328612158.
I have also:
* Expanded the error message to explain why it's not allowed.
* Added a test for the error.
* Marked the original test as unsupported when threads are disabled.
Fixes issues we have had on Armv8 with threading disabled where this
test would crash every so often.
This change will hopefully be superseded by #157917, but that has been
in review a long time and I want to make the bot stable again.
I could just disable the test, but I'd like lld to function properly in
general in the meantime too.
Co-authored-by: John Holdsworth <github@johnholdsworth.com>
Add the flag `--tail-merge-strings` to enable tail merging of cstrings.
For example, if we have strings `mystring\0` and `ring\0`, we could
place `mystring\0` at address `0x1000` and `ring\0` at address `0x1004`
and have them share the same underlying data.
It turns out that many ObjC method names can be tail merged. For
example, `error:` and `doFoo:error:`. On a large iOS binary, we saw
nearly a 15% size improvement in the `__TEXT__objc_methname` section and
negligible impact on link time.
```
$ bloaty --domain=vm merged.o.stripped -- base.o.stripped
VM SIZE
--------------
+95% +5.85Ki [__TEXT]
-2.4% -239Ki __TEXT,__cstring
-14.5% -710Ki __TEXT,__objc_methname
-1.0% -944Ki TOTAL
```
Tail merging for MachO was originally removed in
7c269db779.
The previous implementation used `StringTableBuilder`, but that was
removed in
4308f031cd
to ensure deduplicated strings are aligned correctly. This
implementation ensures that tail merged strings are also aligned
correctly.
Special thanks to nocchijiang for pointing this out in
https://github.com/llvm/llvm-project/pull/158720#issuecomment-3310416030.
Depends on https://github.com/llvm/llvm-project/pull/161253.
Use `llvm::Align` instead of directly storing the shift amount for
clarity. Also remove the `DeduplicatedCStringSection::StringOffset` in
favor of simply storing the `uint64_t` offset since `trailingZeros` is
not used outside of `finalizeContents()`. These two changes allow us to
refactor `finalizeContents()`.
No function change intended.
Depends on https://github.com/llvm/llvm-project/pull/161241.
Move `llvm::countr_zero()` into a helper function to reduce code
duplication between `CStringSection` and `DeduplicatedCStringSection`.
More importantly, this moves a giant comment to that helper function
since it pertains to both classes.
The test name had a typo from
https://github.com/llvm/llvm-project/pull/140307. Fix it.
I realized cstring ordering is not supported when string deduplication
is turned off. We could easily call `buildCStringPriorities()` in
`CStringSection::finalizeContents()`, but I worry it might harm build
performance since it creates multiple vectors and searches though maps.
If users are not deduplicating strings, they probably won't care to
order them, but it would be good to support this.
Add the `--{no-}separate-cstring-literal-sections` option to emit
cstring literals into sections defined by their section name. This
allows for changes like https://github.com/swiftlang/swift/pull/84300
and https://github.com/swiftlang/swift/pull/84236 to actually have an
affect. The default behavior has not changed.
The reason this is useful is because strings in different sections might
have different access patterns at runtime. By splitting these strings
into separate sections, we may reduce the number of page faults during
startup. For example, the ObjC runtime accesses all strings in
`__objc_classname` before main.
Silence the following warning with latest MSVC:
```
C:\git\llvm-project\lld\MachO\Driver.cpp(358): warning C4189: 't': local variable is initialized but not referenced
```
The parsing of the --read-workers argument v is implemented like this:
unsigned threads = 0
if (!llvm::to_integer(v, threads, 0) || threads < 0) {
...
As reported by a compiler warning, the value of the "threads < 0"
expession is never going to be true. It could only evaluate to true if v
represents a negative number, but in this case llvm::to_integer returns
false since threads is unsigned and hence the second operand of the ||
operator will not be evaluated.
This patch removes the useless || operand to silence compiler warnings.
Since I had to first find out if --read-workers=0 is valid or not (it is),
I also added a test to document the valid values for the option and I adjusted
the error message on invalid values to clearly state that 0 is valid.
If an export trie is encoded incorrectly, and one of the children
offsets points back to one of the nodes earlier in the serialization,
the current code will end up in an infinite recursion, and eventually
fail exhausting the available memory.
The failure can be avoided if, before recursing, one checks that the
offset is valid, that is, that the offset is beyond the current
position. This is similar to a check done by llvm-objdump which reports
the trie being corrupted.
This PR adds a new option to lld `--read-workers=20` that defers all
disk I/o then performs it multithreaded so the process is never stalled
waiting for the I/o of the page-in of mapped input files. This results
in a saving of elapsed time. For a large link (iterating on Chromium)
these are the baseline linkage times saving a single file and rebuilding
(seconds inside Xcode):
26.01, 25.84, 26.15, 26.03, 27.10, 25.90, 25.86, 25.81, 25.80, 25.87
With the proposed code change, and using the `--read-workers=20` option,
the linking times reduce to the following:
21.13, 20.35, 20.01, 20.01, 20.30, 20.39, 19.97, 20.23, 20.17, 20.23
The secret sauce is in the new function `multiThreadedPageIn()` in
Driver.cpp. Without the option lld behaves as before.
Edit: with subsequent commits I've taken this novel i/o approach to its
full potential. Latest linking times are now:
13.2, 11.9, 12.12, 12.01, 11.99, 13.11, 11.93, 11.95, 12.18, 11.97
Chrome is still linking and running so it doesn't look like anything is
broken. Despite being multi-threaded all memory access is readonly and
the original code paths are not changed. All that is happening is the
system is being asked to proactively page in files rather than waiting
for processing to page fault which would otherwise stall the process.
---------
Co-authored-by: Daniel Rodríguez Troitiño <drodrigueztroitino@gmail.com>
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
The processing of `-oso_prefix` uses `llvm::sys::fs::real_path` from the
user value, but it is later tried to be matched with the result of
`make_absolute`. While `real_path` resolves special symbols like `.`,
`..` and `~`, and resolves symlinks along the path, `make_absolute` does
neither, causing an incompatibility in some situations.
In macOS, temporary directories would normally be reported as
`/var/folders/<random>`, but `/var` is in fact a symlink to
`private/var`. If own is working on a temporary directory and uses
`-oso_prefix .`, it will be expanded to `/private/var/folder/<random>`,
while `make_absolute` will expand to `/var/folder/<random>` instead, and
`-oso_prefix` will fail to remove the prefix from the `N_OSO` entries,
leaving absolute paths to the temporary directory in the resulting file.
This would happen in any situation in which the working directory
includes a symlink, not only in temporary directories.
One can change the usage of `make_absolute` to use `real_path` as well,
but `real_path` will mean checking the file system for each `N_OSO`
entry. The other solution is stop using `real_path` when processing
`-oso_prefix` and manually expand an input of `.` like `make_absolute`
will do.
This second option is the one implemented here, since it is the closest
to the visible behaviour of ld64 (like the removed comment notes), so it
is the better one for compatibility. This means that a test that checked
the usage of the tilde as `-oso_prefix` needs to be removed (since it
was done by using `real_path`), and two new tests are provided checking
that symlinks do not affect the result. The second test checks a change
in behaviour, in which if one provides the input files with a prefix of
`./`, even when using `-oso_prefix .` because the matching is textual,
the `./` prefix will stay in the `N_OSO` entries. This matches the
observed behaviour of ld64.
The backend emits `.loh` directives for arm64_32 as well. Our pass
already handles 32-bit pointer loads correctly (there was an extraneous
sanity check for 8-byte pointer sizes, I removed that here), so we can
enable them for all arm64 subtargets, including our upcoming arm64e
support.
Moving it away from the arm64 `TargetInfo` class will let us enable it
more easily for arm64_32 and the soon-to-be-added arm64e target as well.
This is the NFC part of #148964
This is causing use-after-frees due to references getting invalidating
after the loadedDylibs map grows, see comments on the PR.
This reverts commit 475a8a47ea.
Expand the `-order_file` also accept cstrings to order.
The purpose is to order hot cstrings for performance (implemented in
this diff), and then later on we can also order cold cstrings for
compression size win.
Due to the speciality of cstrings, there's no way to pass in symbol
names in the order file as the existing -order_file, so we expect `<hash
of cstring literal content>` to represent/identify each cstring.
```
// An order file has one entry per line, in the following format:
//
// <cpu>:<object file>:[<symbol name> | CStringEntryPrefix <cstring hash>]
//
// <cpu> and <object file> are optional.
// If not specified, then that entry tries to match either,
//
// 1) any symbol of the <symbol name>;
// Parsing this format is not quite straightforward because the symbol name
// itself can contain colons, so when encountering a colon, we consider the
// preceding characters to decide if it can be a valid CPU type or file path.
// If a symbol is matched by multiple entries, then it takes the
// lowest-ordered entry (the one nearest to the front of the list.)
//
// or 2) any cstring literal with the given hash, if the entry has the
// CStringEntryPrefix prefix defined below in the file. <cstring hash> is the
// hash of cstring literal content.
//
// Cstring literals are not symbolized, we can't identify them by name
// However, cstrings are deduplicated, hence unique, so we use the hash of
// the content of cstring literals to identify them and assign priority to it.
// We use the same hash as used in StringPiece, i.e. 31 bit:
// xxh3_64bits(string) & 0x7fffffff
//
```
The ordering of cstring has to happen during/before the finalizing of
the cstring section content in the `finalizeContents()` function, which
happens before the writer is run
---------
Co-authored-by: Sharon Xu <sharonxu@fb.com>
```
/// Symbols can be appended with "(.__uniq.xxxx)?.llvm.yyyy" where "xxxx" and
/// "yyyy" are numbers that could change between builds. We need to use the root
/// symbol name before this suffix so these symbols can be matched with profiles
/// which may have different suffixes.
```
Just like what we are doing in BP,
https://github.com/llvm/llvm-project/blob/main/lld/MachO/BPSectionOrderer.cpp#L127
the patch removes the suffixes when parsing the order file and getting
the symbol priority to have a better symbol match.
---------
Co-authored-by: Sharon Xu <sharonxu@fb.com>
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
With an upcoming change to llvm/ProfileData headers, this file ends up
with both llvm::Reloc in scope as well as the lld::macho::Reloc type
intended for use here. Qualify the use in this file with the intended
namespace to avoid compilation errors.
In `applyAdrpAddLdr()` we make a transformation that is identical to the
one in `applyAdrpAdd()`, so lets reuse that code. Also refactor
`forEachHint()` to use more `ArrayRef` and move around some lines for
consistancy.
Framework load paths can be either the top level framework name, or
subpaths of the framework bundle pointing to specific framework binary
versions. Extend the framework lookup logic to handle the latter case.
Enhance branch extension logic to handle __objc_stubs identically to
__stubs
The branch extension algorithm currently has specific handling for the
`__stubs` section:
1. It ensures all `__stubs` content is directly reachable via branches
from the text section.
2. It calculates the latest text section address that might require
thunks to reach the end of `__stubs`.
The `__objc_stubs` section requires precisely the same handling due to
its similar nature, but this was not implemented.
This commit generalizes the existing logic so it applies consistently to
both the `__stubs` and `__objc_stubs` sections, ensuring correct
reachability and thunk placement for both. Without this change it's
possible to get relocation errors during linking in scenarios where the
`__objc_stubs` section is large enough.
When loading frameworks it is possible to have load commands for the
same framework through symlinks and the real path. To avoid loading
these multiple times find the real path before checking the dylib cache.
Details:
When we have the following scenario:
- lib_a re-exports lib_b
- lib_b re-exports @rpath/lib_c
+ lib_b contains LC_RPATH
Previously, lld-macho cannot find lib_c because it was attempting to
resolve the '@rpath' from lib_b (which had no LC_RPATH defined). The
change here is to also consider all the LC_RPATH rom lib_b when trying
to find lib_c.
Inspired by real-life example when linking with
libXCTestSwiftSupport.dylib (which re-exports XCTest, which re-exports
XCTestCore)