The linker was crashing due to stack overflow when parsing ':ALIGN' in
an output section description. This commit fixes the linker script
parser so that the crash does not happen.
The root cause of the stack overflow is how we parse expressions
(readExpr) in linker script and the behavior of ScriptLexer::expect(...)
utility. ScriptLexer::expect does not do anything if errors have already
been encountered during linker script parsing. In particular, it never
increments the current token position in the script file, even if the
current token is the same as the expected token. This causes an infinite
call cycle on parsing an expression such as '(4096)' when an error has
already been encountered.
readExpr() calls readPrimary()
readPrimary() calls readParenExpr()
readParenExpr():
expect("("); // no-op, current token still points to '('
Expression *E = readExpr(); // The cycle continues...
Closes#146722
Signed-off-by: Parth Arora <partaror@qti.qualcomm.com>
This change removes unnecessary tune args to the AArch64 backend. The
AArch64 backend automatically handles `tune-cpu` and adds the necessar
y features based on the models from TableGen.
It follows this fix: https://github.com/llvm/llvm-project/pull/146260
where updating a subtarget feature didn't fail the frontend test because
both the toolchain and the test suffered from a coordinated error.
This uses LIT substitutions in a response file that could contain spaces
in paths. This caused a failure on a build bot where the path to the
system Python executable was "C:\Program Files\Python310\python.exe", as
reported in #142757.
Add appropriate quoting to fix the issue.
PR #141106 changed the debuginfo metdata to allow dynamic bit offsets
and sizes. This caused a crash in lld when using LTO.
The problem is that lazyLoadOneMetadata assumes that the metadata in
question can be cast to MDNode; but in the typical case where the offset
is a constant, this is not true.
This patch changes this spot to allow non-MDNodes through.
The included test case comes from the report in #141106.
This patch introduces support for Integrated Distributed ThinLTO (DTLTO)
in ELF LLD.
DTLTO enables the distribution of ThinLTO backend compilations via
external distribution systems, such as Incredibuild, during the
traditional link step: https://llvm.org/docs/DTLTO.html.
It is expected that users will invoke DTLTO through the compiler driver
(e.g., Clang) rather than calling LLD directly. A Clang-side interface
for DTLTO will be added in a follow-up patch.
Note: Bitcode members of archives (thin or non-thin) are not currently
supported. This will be addressed in a future change. As a consequence
of this lack of support, this patch is not sufficient to allow for
self-hosting an LLVM build with DTLTO. Theoretically,
--start-lib/--end-lib could be used instead of archives in a self-host
build. However, it's unclear how --start-lib/--end-lib can be easily
used with the LLVM build system.
Testing:
- ELF LLD `lit` test coverage has been added, using a mock distributor
to avoid requiring Clang.
- Cross-project `lit` tests cover integration with Clang.
For the design discussion of the DTLTO feature, see: #126654.
This is a workaround for
https://github.com/llvm/llvm-project/issues/82050 by skipping the `DllMain` symbol if seen in aimport library. If this situation occurs, after this commit a warning will also be displayed. The warning can be silenced with `/ignore:exporteddllmain`
Support TLSDESC to initial-exec or local-exec optimizations. Introduce a
new hook RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing
R_RELAX_TLS_GD_TO_IE_ABS to support TLSDESC => IE, while use existing
R_RELAX_TLS_GD_TO_LE to support TLSDESC => LE.
In normal or medium code model, there are two forms of code sequences:
* pcalau12i $a0, %desc_pc_hi20(sym_desc)
* addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
* ld.d $ra, $a0, %desc_ld(sym_desc)
* jirl $ra, $ra, %desc_call(sym_desc)
------
* pcaddi $a0, %desc_pcrel_20(sym_desc)
* ld.d $ra, $a0, %desc_ld(sym_desc)
* jirl $ra, $ra, %desc_call(sym_desc)
Convert to IE:
* pcalau12i $a0, %ie_pc_hi20(sym_ie)
* ld.[wd] $a0, $a0, %ie_pc_lo12(sym_ie)
Convert to LE:
* lu12i.w $a0, %le_hi20(sym_le) # le_hi20 != 0, otherwise NOP
* ori $a0 src, %le_lo12(sym_le) # le_hi20 != 0, src = $a0, otherwise src = $zero
Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to
convert the preceding instructions to NOPs, due to both forms of code
sequence (corresponding to relocation combinations:
R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
R_LARCH_TLS_DESC_PCREL20_S2) have same process.
TODO: When relaxation enables, redundant NOPs can be removed. It will be
implemented in a future patch.
Note: All forms of TLSDESC code sequences should not appear interleaved
in the normal, medium or extreme code model, which compilers do not
generate and lld is unsupported. This is thanks to the guard in
PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation,
but post-ra we don't gain anything by scheduling across calls
since we don't need to worry about register pressure.
```
For non-SHF_ALLOC sections, sh_addr is set to 0.
Skip sections without SHF_ALLOC flag, so `minVA` will not be set to 0
with non-SHF_ALLOC sections, and the size of non-SHF_ALLOC sections will
not contribute to `maxVA`.
Fixed assertion failure when reading .eh_frame sections, and added
.eh_frame sections to tests.
This reverts commit 1e95349dbe.
Original commit message follows:
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.
Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.
The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:
CFI enabled: +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]
The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.
This optimization is implemented for AArch64 and X86_64 only.
lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:
```
N Min Max Median Avg Stddev
x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888
+ 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971
Difference at 95.0% confidence
0.0243538 +/- 0.00233202
1.87831% +/- 0.179859%
(Student's t, pooled s = 0.0190369)
```
[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057
Reviewers: zmodem, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/145579
If any of the Windows SDK (and MSVC)-related argument is passed in the
command line, they should take priority over the environment variables
like `INCLUDE` or `LIB` set by vcvarsall from the Visual Studio
Developer Environment on Windows.
These changes ensure that all of the arguments related to VC Tools and
the Windows SDK cause the driver to ignore the environment.
A good proxy to estimate the number of page faults during startup is the
total size of startup functions. Assuming profiles are up-to-date, we
can measure this total size pretty easily. Note that if profile data is
old, this number could be wrong.
This caused assertion failures in applyBranchToBranchOpt():
llvm/include/llvm/Support/Casting.h:578:
decltype(auto) llvm::cast(From*)
[with To = lld::elf::InputSection; From = lld::elf::InputSectionBase]:
Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
See comment on the PR (https://github.com/llvm/llvm-project/pull/138366)
This reverts commit 491b82a5ec.
This also reverts the follow-up "[lld] Use llvm::partition_point (NFC) (#145209)"
This reverts commit 2ac293f5ac.
String table size is too big for large binary when symbol table is
enabled. Some strings in strtab is same so it can be reused.
This patch revives 9ffeaaa authored by mstorsjo with the prioritized
string table builder to fix debug section name issue (see 4d2eda2
for more details).
---------
Co-authored-by: Wen Haohai <whh108@live.com>
Co-authored-by: James Henderson <James.Henderson@sony.com>
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.
Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.
The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:
CFI enabled: +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]
The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.
This optimization is implemented for AArch64 and X86_64 only.
lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:
```
N Min Max Median Avg Stddev
x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888
+ 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971
Difference at 95.0% confidence
0.0243538 +/- 0.00233202
1.87831% +/- 0.179859%
(Student's t, pooled s = 0.0190369)
```
[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057
Pull Request: https://github.com/llvm/llvm-project/pull/138366
Include the offset of a thunk in the ThunkSection when adding symbols.
At Thunk creation time the offset is set to 0 as we don't know where in
the ThunkSection the Thunk will end up. The symbol values are updated by
the setOffset() call in assignOffsets().
When we transform a thunk from a short to a long, we sometimes add a
mapping symbol. At this point the offset of the thunk is non zero and we
need to account for that when defining the symbol, as the setOffset()
call subtracts the offset before adding the new one back in.
To test; added a second thunk that is converted to a long thunk to
aarch64-thunk-bit-multipass. This second thunk is given a non zero
offset from the start of the Thunk Section so we can observe the mapping
symbol being put in the wrong place without accounting for the offset.
fixes: https://github.com/llvm/llvm-project/issues/142326
When changing codegen (e.g. in #130809), offsets in binaries produced by
LTO tests might change. We do not need to match concrete offset values,
it's enough to ensure that hex values in particular places are
identical.
---------
Co-authored-by: Anatoly Trosinenko <atrosinenko@accesssoftek.com>
+ If `-z zicfilp=implicit` or option not specified, the output would
have the ZICFILP feature enabled/disabled based on input objects
+ If `-z zicfilp=<never|unlabeled|func-sig>`, the output would have
ZICFILP feature forced <off|on to the "unlabeled" scheme|on to the
"func-sig" scheme>
+ If `-z zicfiss=implicit` or option not specified, the output would
have the ZICFISS feature enabled/disabled based on input objects
+ If `-z zicfiss=<never|always>`, the output would have the ZICFISS
feature forced <off|on>
GCC on Cygwin environment invokes linker with passing
`--dll-search-prefix=cyg`.
Implementing this option makes lld-mingw invokable by `gcc -fuse-ld=lld`.
---------
Co-authored-by: jeremyd2019 <github@jdrake.com>
Previously, the AArch64 PAuth ABI core values were stored as an
ArrayRef<uint8_t>, introducing unnecessary indirection.
This patch replaces the ArrayRef with two explicit uint64_t fields:
aarch64PauthAbiPlatform and aarch64PauthAbiVersion. This simplifies the
representation and improves readability.
No functional change intended, aside from improved error messages.
The behavior of an undefined weak reference is implementation defined.
For static -no-pie linking, dynamic relocations are generally avoided (except
IRELATIVE). -shared linking generally emits dynamic relocations.
Dynamic -no-pie linking and -pie allow flexibility. Changes adjust the
behavior for better consistency and simpler internal representation,
e.g. https://reviews.llvm.org/D63003https://reviews.llvm.org/D105164
(generalized to undefined non-weak in
2fcaa00d1e).
GNU ld introduced -z [no]dynamic-undefined-weak option to fine-tune the
behavior. (The option is not very effective with -no-pie, e.g. on
x86-64, `ld.bfd a.o s.so -z dynamic-undefined-weak` generates
R_X86_64_NONE relocations instead of GLOB_DAT/JUMP_SLOT)
This patch implements -z [no]dynamic-undefined-weak option.
The effects are summarized as follows:
* Static -no-pie: no-op
* Dynamic -no-pie: nodynamic-undefined-weak suppresses GLOB_DAT/JUMP_SLOT
* Static -pie: dynamic-undefined-weak generates ABS/GLOB_DAT/JUMP_SLOT.
https://discourse.llvm.org/t/lld-weak-undefined-symbols-in-vdso-only/86749
* Dynamic -pie: nodynamic-undefined-weak suppresses ABS/GLOB_DAT/JUMP_SLOT
The -pie behavior likely stays stable while -no-pie (`!ctx.arg.isPic` in
`isStaticLinkTimeConstant`) behavior will likely change in the future.
The current default value of ctx.arg.zDynamicUndefined is selected to
prevent behavior changes.
Pull Request: https://github.com/llvm/llvm-project/pull/143831
So that when mixing small and large text, large text stays out of the
way of the rest of the binary.
Place large RX sections at the beginning rather than at the end so that
with `--no-rosegment`, the large text and rodata share a single PT_LOAD
segment. Place large RWX sections at the end to keep writable and
readonly sections separate.
Clang started emitting the large section flag for `.ltext` sections in
#73037.
This testcase was removed in 4cafd28b7d,
as a082f665f8 had made it no longer
trigger the error that it was supposed to do. (Because the latter of
those two commits makes the symbol "__rt_sdiv" be included among the
potential libcalls listed by lto::LTO::getRuntimeLibcallSymbols().)
Readd the test as a positive test, making sure that such libcalls can
get linked.
We do have preexisting test coverage for LTO libcalls overall in
libcall-archive.ll, but readd this test to cover specifically the ARM
division helper functions as well.
* Merge the special case into isStaticLinkTimeConstant
* Generalize isUndefWeak to isUndefined. undefined non-weak is an error
case. We choose to be general, which also brings us in line with GNU ld.
Increase specificity by using the correct unit sizes. KBytes is an
abbreviation for kB, 1000 bytes, and the hardware industry as well as
several operating systems have now switched to using 1000 byte kBs.
If this change is acceptable, sometimes GitHub mangles merges to use the
original email of the account. $dayjob asks contributions have my work
email. Thanks!
The new test (derived from riscv32 openssl/test/cmp_msg_test.c) revealed
oscillation in two R_RISCV_CALL_PLT jumps:
- First jump (~2^11 bytes away): alternated between 4 and 8 bytes.
- Second jump (~2^20 bytes away): alternated between 2 and 8 bytes.
The issue is not related to alignment. In 2019, GNU ld addressed a
similar problem by reducing the relaxation allowance for cross-section
relaxation (https://sourceware.org/bugzilla/show_bug.cgi?id=25181).
This approach would result in a suboptimal layout for the tight range
tested by riscv-relax-call.s.
This patch stabilizes the process by preventing `remove` increment after
a few passes, similar to integrated assembler's fragment relaxation.
(For the Android bit reproduce, `pass < 2` leads to non-optimal layout
while `pass < 3` and `pass < 4` output is identical.)
Fix https://github.com/llvm/llvm-project/issues/113838
Possibly fix https://github.com/llvm/llvm-project/issues/123248 (inputs
are bitcode, subject to ever-changing code generation, not reproducible)
Pull Request: https://github.com/llvm/llvm-project/pull/142899
This is causing use-after-frees due to references getting invalidating
after the loadedDylibs map grows, see comments on the PR.
This reverts commit 475a8a47ea.
This test is failing on a couple of bots and on premerge after
a082f665f8.
That patch configures the relevant libcalls for ARM in RuntimeLibCalls.
This causes __rt_sdiv to get pulled into the LTO preopt IR. This should
happen for other builtins as well, which means that the original issue
that the patch introducing this patch intended to diagnose should no
longer exist.
The compiler generated calls to builtins mentioned in
7f9a0048fa should always have definitions,
assuming they are available in the link and will not only get pulled in
late if lazily loading symbols from archives. We otherwise get the
standard diagnostic if they are not.
Linking for Cygwin target always needs -lcygwin (and, -lmsys-2.0
instead for MSYS2 target) but should not auto-export from
them, same as -lmingw32 for MinGW target.
This flag has been superseded by `.option exact`, as the test updates
show.
Given the flag was always hidden, it makes sense to me to remove it, and
move tests that required it to use `.option exact`.