so that we can remove the global `ctx` from toString implementations.
Rename lld::toString (to lld::elf::toStr) to simplify name lookup (we
have many llvm::toString and another lld::toString(const llvm::opt::Arg
&)).
The current diagnostic functions log/warn/error/fatal lack a context
argument and call the global `lld::errorHandler()`, which prevents
multiple lld instances in one process.
This patch introduces context-aware replacements:
* log => Log(ctx)
* warn => Warn(ctx)
* errorOrWarn => Err(ctx)
* error => ErrAlways(ctx)
* fatal => Fatal(ctx)
Example: `errorOrWarn(toString(f) + "xxx")` => `Err(ctx) << f << "xxx"`.
(`toString(f)` is shortened to `f` as a bonus and may access `ctx`
without accessing the global variable (see `Target.cpp`)).
`ctx.e = &context->e;` can be replaced with a non-global Errorhandler
when `ctx` becomes a local variable.
(For the ELF port, the long term goal is to eliminate `error`. Most can
be straightforwardly converted to `Err(ctx)`.)
Remove the global variable `symtab` and add a member variable
(`std::unique_ptr<SymbolTable>`) to `Ctx` instead.
This is one step toward eliminating global states.
Pull Request: https://github.com/llvm/llvm-project/pull/109612
Ctx was introduced in March 2022 as a more suitable place for such
singletons.
llvm/Support/thread.h includes <thread>, which transitively includes
sstream in libc++ and uses ios_base::in, so we cannot use `#define in ctx.sec`.
`symtab, config, ctx` are now the only variables using
LLVM_LIBRARY_VISIBILITY.
Fix the use-after-free bug and re-apply
https://github.com/llvm/llvm-project/pull/106193
* Without the fix, the string referenced by `objSym.Name` could be
destroyed even if string saver keeps a copy of the referenced string.
This caused use-after-free.
* The fix ([latest
commit](9776ed44cf))
updates `objSym.Name` to reference (via `StringRef`) the string saver's
copy.
Test:
1. For `lld/test/ELF/lto/asmundef.ll`, its test failure is reproducible
with `-DLLVM_USE_SANITIZER=Address` and gone with the fix.
3. Run all tests by following
https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild#try-local-changes.
* Without the fix, `ELF/lto/asmundef.ll` aborted the multi-stage test at
`@@@BUILD_STEP stage2/asan_ubsan check@@@`, defined
[here](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh#L30)
* With the fix, the [multi-stage
test](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh)
pass stage2 {asan, ubsan, masan}. This is also the test used by
https://lab.llvm.org/buildbot/#/builders/169
**Original commit message**
`StringMap<T>` creates a [copy of the
string](d4c519e7b2/llvm/include/llvm/ADT/StringMapEntry.h (L55-L58))
for entry insertions and intentionally keep copies [since the
implementation optimizes string memory
usage](d4c519e7b2/llvm/include/llvm/ADT/StringMap.h (L124)).
On the other hand, linker keeps copies of symbol names [1] in
`lld::elf::parseFiles` [2] before invoking `compileBitcodeFiles` [3].
This change proposes to optimize away string copies inside
[LTO::GlobalResolutions](24e791b416/llvm/include/llvm/LTO/LTO.h (L409)),
which will make LTO indexing more memory efficient for ELF. There are
similar opportunities for other (COFF, wasm, MachO) formats.
The optimization takes place for lld (ELF) only. For the rest of use
cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep
copies and use global resolution key for de-duplication.
Together with @kazutakahirata's work to make `ComputeCrossModuleImport`
more memory efficient, we see a ~20% peak memory usage reduction in a
binary where peak memory usage needs to go down. Thanks to the
optimization in
329ba523cc,
the max (as opposed to the sum) of `ComputeCrossModuleImport` or
`GlobalResolution` shows up in peak memory usage.
* Regarding correctness, the set of
[resolved](80c47ad3ae/llvm/lib/LTO/LTO.cpp (L739))
[per-module
symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L188-L191))
is a subset of
[llvm::lto::InputFile::Symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L120)).
And bitcode symbol parsing saves symbol name when iterating
`obj->symbols` in `BitcodeFile::parse` already. This change updates
`BitcodeFile::parseLazy` to keep copies of per-module undefined symbols.
* Presumably the undefined symbols in a LTO unit (copied in this patch
in linker unique saver) is a small set compared with the set of symbols
in global-resolution (copied before this patch), making this a
worthwhile trade-off. Benchmarking this change alone shows measurable
memory savings across various benchmarks.
[1] ELF
1cea5c2138/lld/ELF/InputFiles.cpp (L1748)
[2]
ef7b18a53c/lld/ELF/Driver.cpp (L2863)
[3]
ef7b18a53c/lld/ELF/Driver.cpp (L2995)
`StringMap<T>` creates a [copy of the
string](d4c519e7b2/llvm/include/llvm/ADT/StringMapEntry.h (L55-L58))
for entry insertions and intentionally keep copies [since the
implementation optimizes string memory
usage](d4c519e7b2/llvm/include/llvm/ADT/StringMap.h (L124)).
On the other hand, linker keeps copies of symbol names [1] in
`lld::elf::parseFiles` [2] before invoking `compileBitcodeFiles` [3].
This change proposes to optimize away string copies inside
[LTO::GlobalResolutions](24e791b416/llvm/include/llvm/LTO/LTO.h (L409)),
which will make LTO indexing more memory efficient for ELF. There are
similar opportunities for other (COFF, wasm, MachO) formats.
The optimization takes place for lld (ELF) only. For the rest of use
cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep
copies and use global resolution key for de-duplication.
Together with @kazutakahirata's work to make `ComputeCrossModuleImport`
more memory efficient, we see a ~20% peak memory usage reduction in a
binary where peak memory usage needs to go down. Thanks to the
optimization in
329ba523cc,
the max (as opposed to the sum) of `ComputeCrossModuleImport` or
`GlobalResolution` shows up in peak memory usage.
* Regarding correctness, the set of
[resolved](80c47ad3ae/llvm/lib/LTO/LTO.cpp (L739))
[per-module
symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L188-L191))
is a subset of
[llvm::lto::InputFile::Symbols](80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L120)).
And bitcode symbol parsing saves symbol name when iterating
`obj->symbols` in `BitcodeFile::parse` already. This change updates
`BitcodeFile::parseLazy` to keep copies of per-module undefined symbols.
* Presumably the undefined symbols in a LTO unit (copied in this patch
in linker unique saver) is a small set compared with the set of symbols
in global-resolution (copied before this patch), making this a
worthwhile trade-off. Benchmarking this change alone shows measurable
memory savings across various benchmarks.
[1] ELF
1cea5c2138/lld/ELF/InputFiles.cpp (L1748)
[2]
ef7b18a53c/lld/ELF/Driver.cpp (L2863)
[3]
ef7b18a53c/lld/ELF/Driver.cpp (L2995)
lld ELF
[BitcodeFile](a527248a3c/lld/ELF/InputFiles.h (L328))
uses [string
saver](a527248a3c/lld/include/lld/Common/CommonLinkerContext.h (L57))
to keep copies of bitcode symbols. Symbol duplication is very common
when compiling application binaries.
This change proposes to introduce a UniqueStringSaver in lld context and
use it for bitcode symbol parsing. The implementation covers ELF only.
Similar opportunities should exist on other (COFF, MachO, wasm) formats.
For an internal production binary where lto indexing takes ~10GiB
originally, this changes optimizes away ~800MiB (~7.8%), measured by
https://github.com/google/pprof. Flame graph breaks down memory by usage
call stacks and agrees with this measurement.
Previously, we selected the Thumb2 PLT sequences if any input object is
marked as not supporting the ARM ISA, which then causes assertion
failures when calls from ARM code in other objects are seen. I think the
intention here was to only use Thumb PLTs when the target does not have
the ARM ISA available, signalled by no objects being marked as having it
available. To do that we need to track which ISAs we have seen as we
parse the build attributes, and defer the decision about PLTs until all
input objects have been parsed.
This bug was triggered by real code in picolibc, which have some
versions of string.h functions built with Thumb2-only build attributes,
so that they are compatible with v7-A, v7-R and v7-M.
Fixes#99008.
... using the temporary section type code 0x40000020
(`clang -c -Wa,--crel,--allow-experimental-crel`). LLVM will change the
code and break compatibility (Clang and lld of different versions are
not guaranteed to cooperate, unlike other features). CREL with implicit
addends are not supported.
---
Introduce `RelsOrRelas::crels` to iterate over SHT_CREL sections and
update users to check `crels`.
(The decoding performance is critical and error checking is difficult.
Follow `skipLeb` and `R_*LEB128` handling, do not use
`llvm::decodeULEB128`, whichs compiles to a lot of code.)
A few users (e.g. .eh_frame, LLDDwarfObj, s390x) require random access. Pass
`/*supportsCrel=*/false` to `relsOrRelas` to allocate a buffer and
convert CREL to RELA (`relas` instead of `crels` will be used). Since
allocating a buffer increases, the conversion is only performed when
absolutely necessary.
---
Non-alloc SHT_CREL sections may be created in -r and --emit-relocs
links. SHT_CREL and SHT_RELA components need reencoding since
r_offset/r_symidx/r_type/r_addend may change. (r_type may change because
relocations referencing a symbol in a discarded section are converted to
`R_*_NONE`).
* SHT_CREL components: decode with `RelsOrRelas` and re-encode (`OutputSection::finalizeNonAllocCrel`)
* SHT_RELA components: convert to CREL (`relToCrel`). An output section can only have one relocation section.
* SHT_REL components: print an error for now.
SHT_REL to SHT_CREL conversion for -r/--emit-relocs is complex and
unsupported yet.
Link: https://discourse.llvm.org/t/rfc-crel-a-compact-relocation-format-for-elf/77600
Pull Request: https://github.com/llvm/llvm-project/pull/98115
SmallPtrSet.h and TimeProfiler.h are unused. CommandLine.h is only
needed for the UseNewDbgInfoFormat declare, which can be moved to the
places that need it.
So long as ld -r links using bitcode always result in an ELF object, and
not a merged bitcode object, the output form a relocatable link using
FatLTO objects should not have a .llvm.lto section. Prior to this, using
the object code sections would cause the bitcode section in the output
of a relocatable link to be corrupted, by concatenating all the
.llvm.lto
sections together.
This patch discards SHT_LLVM_LTO sections when not using
--fat-lto-objects, so that the relocatable ELF output won't contain
inalid bitcode.
GNU ld's relocatable linking behaviors:
* Sections with the `SHF_GROUP` flag are handled like sections matched
by the `--unique=pattern` option. They are processed like orphan
sections and ignored by input section descriptions.
* Section groups' (usually named `.group`) content is updated as the
section indexes are updated. Section groups can be discarded with
`/DISCARD/ : { *(.group) }`.
`-r --force-group-allocation` discards section groups and allows
sections with the `SHF_GROUP` flag to be matched like normal sections.
If two section group members are placed into the same output section,
their relocation sections (if present) are combined as well.
This behavior can be useful when -r output is used as a pseudo shared
object (e.g., FreeBSD's amd64 kernel modules, CHERIoT compartments).
This patch implements --force-group-allocation:
* Input SHT_GROUP sections are discarded.
* Input sections do not get the SHF_GROUP flag, so `addInputSec`
will combine relocation sections if their relocated section group
members are combined.
The default behavior is:
* Input SHT_GROUP sections are retained.
* Input SHF_GROUP sections can be matched (unlike GNU ld)
* Input SHF_GROUP sections keep the SHF_GROUP flag, so `addInputSec`
will create different OutputDesc copies.
GNU ld provides the `FORCE_GROUP_ALLOCATION` command, which is not
implemented.
Pull Request: https://github.com/llvm/llvm-project/pull/94704
This reverts commit 7832769d32.
This was reverted prior due to a test failure on the windows builder. I
think this was because we didn't specify the triple and assumed windows.
The other tests use the full triple specifying linux, so we follow suite
here.
---
We are using PLTs for cortex-m33 which only supports thumb. More
specifically, this is for a very restricted use case. There's no MMU so
there's no sharing of virtual addresses between two processes, but this
is fine. The MCU is used for running [chre
nanoapps](https://android.googlesource.com/platform/system/chre/+/HEAD/doc/nanoapp_overview.md)
for android. Each nanoapp is a shared library (but effectively acts as
an executable containing a test suite) that is loaded and run on the MCU
one binary at a time and there's only one process running at a time, so
we ensure that the same text segment cannot be shared by two different
running executables. GNU LD supports thumb PLTs but we want to migrate
to a clang toolchain and use LLD, so thumb PLTs are needed.
We are using PLTs for cortex-m33 which only supports thumb. More
specifically, this is for a very restricted use case. There's no MMU so
there's no sharing of virtual addresses between two processes, but this
is fine. The MCU is used for running [chre
nanoapps](https://android.googlesource.com/platform/system/chre/+/HEAD/doc/nanoapp_overview.md)
for android. Each nanoapp is a shared library (but effectively acts as
an executable containing a test suite) that is loaded and run on the MCU
one binary at a time and there's only one process running at a time, so
we ensure that the same text segment cannot be shared by two different
running executables. GNU LD supports thumb PLTs but we want to migrate
to a clang toolchain and use LLD, so thumb PLTs are needed.
For a DSO with all DT_NEEDED entries accounted for, if it contains an
undefined non-weak symbol that shares a name with a non-exported
definition (hidden visibility or localized by a version script), and
there is no DSO definition, we should report an error.
#70769 implemented the error when we see `ref.so def-hidden.so`. This patch
implementes the error when we see `def-hidden.so ref.so`, matching GNU
ld.
Close#86777
`TargetEndianness` is long and unwieldy. "Target" in the name is confusing. Rename it to "Endianness".
I cannot find noticeable out-of-tree users of `TargetEndianness`, but
keep `TargetEndianness` to make this patch safer. `TargetEndianness`
will be removed by a subsequent change.