Commit Graph

278 Commits

Author SHA1 Message Date
Jacek Caban
d6a2a2612f [LLD][COFF] Add support for DLL imports on ARM64EC (#141587)
Define additional `__imp_aux_` and mangled lazy symbols. Also allow
overriding EC aliases with lazy symbols, as we do for other lazy symbol
types.
2025-05-29 11:37:18 +02:00
Jacek Caban
6602bfa721 [LLD][COFF] Avoid forcing lazy symbols in loadMinGWSymbols during symbol table enumeration (#141593)
Forcing lazy symbols at this point may introduce new entries into the
symbol table. Avoid mutating `symTab` while iterating over it.
2025-05-29 11:35:16 +02:00
Jacek Caban
5b0572875c [LLD][COFF] Add support for including native ARM64 objects in ARM64EC images (#137653)
MSVC linker accepts native ARM64 object files as input with
`-machine:arm64ec`, similar to `-machine:arm64x`. Its usefulness is very
limited; for example, both exports and imports are not reflected in the
PE structures and can't work. However, their symbol tables are otherwise
functional.

Since we already have handling of multiple symbol tables implemented for
ARM64X, the required changes are mostly about adjusting relevant checks
to account for them on the ARM64EC target.

Delay-load helper handling is a bit of a shortcut. The patch never pulls
it for native object files and just ensures that the code is fine with
that. In general, I think it would be nice to adjust the driver to pull
it only when it's actually referenced, which would allow applying the
same logic to the native symbol table on ARM64EC without worrying about
pulling too much.
2025-05-15 11:38:24 +02:00
Alexandre Ganea
c75eac7c03 [LLD][COFF] Don't dllimport from static libraries (#134443)
This reverts commit 6a1bdd9 and re-instate behavior that matches what
MSVC link.exe does, that is, error out when trying to dllimport a symbol
from a static library.

A hint is now displayed in stdout, mentioning that we should rather dllimport the symbol
from a import library.

Fixes https://github.com/llvm/llvm-project/issues/131807
2025-04-07 11:34:24 -04:00
Jacek Caban
7c26407a20 [LLD][COFF] Clarify EC vs. native symbols in diagnostics on ARM64X (#130857)
On ARM64X, symbol names alone are ambiguous as they may refer to either
a native or an EC symbol. Append '(EC symbol)' or '(native symbol)' in
diagnostic messages to distinguish them.
2025-03-15 21:15:08 +01:00
Jacek Caban
f2473bc31e [LLD][COFF] Support -aligncomm directives on ARM64X (#129513) 2025-03-03 22:48:20 +01:00
Jacek Caban
8616c87335 [LLD][COFF] Support alternate names in both symbol tables on ARM64X (#127619)
The `.drectve` directive applies only to the namespace in which it is
defined, while the command-line argument applies only to the EC
namespace.
2025-02-21 12:52:28 +01:00
Nico Weber
d9b8120259 [lld/COFF] Fix -start-lib / -end-lib more after reviews.llvm.org/D116434 (#124294)
This is a follow-up to #120452 in a way.

Since lld/COFF does not yet insert all defined in an obj file before all
undefineds (ELF and MachO do this, see #67445 and things linked from
there), it's possible that:

1. We add an obj file a.obj
2. a.obj contains an undefined that's in b.obj, causing b.obj to be
added
3. b.obj contains an undefined that's in a part of a.obj that's not yet
in the symbol table, causing a recursive load of a.obj, which adds the
symbols in there twice, leading to duplicate symbol errors.

For normal archives, `ArchiveFile::addMember()` has a `seen` check to
prevent this. For start-lib lazy objects, we can just check if the
archive is still lazy at the recursive call.

This bug is similar to issue #59162.

(Eventually, we'll probably want to do what the MachO and ELF ports do.)

Includes a test that caused duplicate symbol diagnostics before this
code change.
2025-01-24 13:14:21 -05:00
Jacek Caban
a2c683b665 [LLD][COFF] Use EC symbol table for exports defined in module definition files (#123849) 2025-01-22 23:30:23 +01:00
Jacek Caban
455b3d6df2 [LLD][COFF] Separate EC and native exports for ARM64X (#123652)
Store exports in SymbolTable instead of Configuration.
2025-01-21 10:41:15 +01:00
Jacek Caban
b068f2fd0f [LLD][COFF] Process bitcode files separately for each symbol table on ARM64X (#123194) 2025-01-17 11:36:12 +01:00
Jacek Caban
1bd5f34d76 [LLD][COFF] Move getChunk to LinkerDriver (NFC) (#123103)
The `getChunk` function returns all chunks, not just those specific to a
symbol table. Move it out of the `SymbolTable` class to clarify its
scope.
2025-01-16 12:55:12 +01:00
Jacek Caban
f22af59336 [LLD][COFF] Move symbol mangling and lookup helpers to SymbolTable class (NFC) (#122836)
This refactor prepares for further ARM64X hybrid support, where these
helpers will need to work with either the native or EC symbol table
based on context.
2025-01-15 15:21:06 +01:00
Jacek Caban
251ef3f503 [LLD][COFF] Use appropriate symbol table for -include argument on ARM64X (#122554)
Move `LinkerDriver::addUndefined` to` SymbolTable` to allow its use with
both symbol tables on ARM64X and rename it to `addGCRoot` to clarify its
distinct role compared to the existing `SymbolTable::addUndefined`.

Command-line `-include` arguments now apply to the EC symbol table, with
`mainSymtab` introduced in `linkerMain`. There will be more similar
cases. For `.drectve` sections, the corresponding symbol table is used
based on the context.
2025-01-13 23:16:57 +01:00
Jacek Caban
1fa0302ba2 [LLD][COFF] Emit warnings for missing load config on EC targets (#121339)
ARM64EC and ARM64X images require a load configuration to be valid.
2025-01-02 12:06:58 +01:00
Jacek Caban
8435225374 [LLD][COFF] Move addFile implementation to LinkerDriver (NFC) (#121342)
The addFile implementation does not rely on the SymbolTable object. With
#119294, the symbol table for input files is determined during the
construction of the objects representing them. To clarify that
relationship, this change moves the implementation from the SymbolTable
class to the LinkerDriver class.
2025-01-01 19:42:49 +01:00
Jacek Caban
ff29f38c02 [LLD][COFF] Store and validate load config in SymbolTable (#120324)
Improve diagnostics for invalid load configurations.
2024-12-29 11:43:45 +01:00
Nico Weber
f8bcd93224 [lld/COFF] Fix -start-lib / -end-lib after reviews.llvm.org/D116434 (#120452)
That change forgot to set `lazy` to false before calling `addFile()` in
`forceLazy()` which caused `addFile()` to parse the file we want to
force a load for to be added as a lazy object again instead of adding
the file to `ctx.objFileInstances`.

This is caught by a pretty simple test (included).
2024-12-19 11:30:54 -05:00
Nico Weber
cde996c31d [lld/COFF] Remove needless indirection
`symtab.ctx.symtab` is just `symtab`. Looks like #119296 added
this using a global find-and-replace.

This was the only instance of `symtab.ctx.symtab` in lld/.

No behavior change.
2024-12-17 16:27:16 -05:00
Jacek Caban
d3c4857179 [LLD][COFF] Store machine type in SymbolTable (NFC) (#119298)
This change prepares for hybrid ARM64X support, which requires two
`SymbolTable` instances: one for native symbols and one for EC symbols.
In such cases, `config.machine` will remain ARM64X, while the
`SymbolTable` instances will store ARM64 and ARM64EC machine types.
2024-12-15 18:43:09 +01:00
Jacek Caban
0a9810d325 [LLD][COFF] Factor out LinkerDriver::setMachine (NFC) (#119297) 2024-12-15 18:41:26 +01:00
Jacek Caban
6b493baec1 [LLD][COFF] Store reference to SymbolTable instead of COFFLinkerContext in InputFile (NFC) (#119296)
This change prepares for the introduction of separate hybrid namespaces.
Hybrid images will require two `SymbolTable` instances, making it
necessary to associate `InputFile` objects with the relevant one.
2024-12-15 12:45:34 +01:00
Fangrui Song
c7caab2238 [lld-link] Simplify some << toString 2024-12-05 20:56:19 -08:00
Fangrui Song
983f88c1ec [lld-link] Use COFFSyncStream
Add a operator<< overload for Symbol *.
2024-12-05 20:41:37 -08:00
Fangrui Song
8d225f10ef [lld-link] Replace error(...) with Err 2024-12-05 19:44:26 -08:00
Fangrui Song
4639a9a063 [lld-link] Replace log(...) with Log 2024-12-04 09:04:40 -08:00
Fangrui Song
1534f45694 [lld-link] Replace warn(...) with Warn(ctx) 2024-12-03 22:19:30 -08:00
Jacek Caban
f942949a7c [LLD][COFF] Require explicit specification of ARM64EC target (#116281)
Inferring the ARM64EC target can lead to errors. The `-machine:arm64ec`
option may include x86_64 input files, and any valid ARM64EC input is
also valid for `-machine:arm64x`. MSVC requires an explicit `-machine`
argument with informative diagnostics; this patch adopts the same
behavior.
2024-11-24 14:33:14 +01:00
Jacek Caban
cdda76a8cf [LLD][COFF] Fix handling of invalid ARM64EC function names (#116252)
Since these symbols cannot be mangled or demangled, there is no symbol
to check for conflicts in `checkLazyECPair`, nor is there an alias to
create in `addUndefined`. Attempting to create an import library with
such symbols results in an error; the patch includes a test to ensure
the error is handled correctly.

This is a follow-up to #115567.
2024-11-15 16:42:36 +01:00
Jacek Caban
56077e5ac0 [LLD][COFF] Add support for locally imported EC symbols (#114985)
Allow imported symbols to be recognized in both mangled and demangled
forms. Support __imp_aux_ symbols in addition to __imp_ symbols.
2024-11-06 12:09:22 +01:00
Jacek Caban
98bc5295ec [LLD][COFF] Check both mangled and demangled symbols before adding a lazy archive symbol to the symbol table on ARM64EC (#113284)
On ARM64EC, a function symbol may appear in both mangled and demangled
forms:
- ARM64EC archives contain only the mangled name, while the demangled
symbol is defined by the object file as an alias.
- x86_64 archives contain only the demangled name (the mangled name is
usually defined by an object referencing the symbol as an alias to a
guess exit thunk).
- ARM64EC import files contain both the mangled and demangled names for
thunks.

If more than one archive defines the same function, this could lead to
different libraries being used for the same function depending on how
they are referenced. Avoid this by checking if the paired symbol is
already defined before adding a symbol to the table.
2024-10-23 13:10:07 +02:00
Jacek Caban
9b88792291 [LLD][COFF] Allow overriding EC alias symbols with lazy archive symbols (#113283)
On ARM64EC, external function calls emit a pair of weak-dependency
aliases: `func` to `#func` and `#func` to the `func` guess exit thunk
(instead of a single undefined `func` symbol, which would be emitted on
other targets). Allow such aliases to be overridden by lazy archive
symbols, just as we would for undefined symbols.
2024-10-23 12:43:38 +02:00
Jacek Caban
f1ba8943c8 [LLD][COFF] Support anti-dependency symbols (#112542)
Co-authored-by: Billy Laws <blaws05@gmail.com>

Anti-dependency symbols are allowed to be duplicated, with the first
definition taking precedence. If a regular weak alias is present, it is
preferred over an anti-dependency definition. Chaining anti-dependencies
is not allowed.
2024-10-21 11:44:31 +02:00
Kazu Hirata
3c2e1d3a00 [lld] Avoid repeated hash lookups (NFC) (#112299) 2024-10-15 07:35:42 -07:00
Mike Hommey
6a1bdd9a2e [LLD][COFF] Do as many passes of resolveRemainingUndefines as necessary for undefined lazy symbols (#109082) 2024-10-03 22:53:26 +03:00
Jacek Caban
86d2abefcb [LLD][COFF] Store __imp_ symbols as Defined in InputFile (#109115) 2024-09-18 19:49:06 +02:00
Mike Hommey
5e23b66699 [LLD][COFF] Handle imported weak aliases consistently (#109105)
symTab being a DenseMap, the order in which a symbol and its
corresponding import symbol are processed is not guaranteed, and when
the latter comes first, it is left undefined.
2024-09-18 14:42:42 +03:00
Jacek Caban
6ca5c397a9 [LLD][COFF] Redirect __imp_ Symbols to __imp_aux_ on ARM64EC for x64 object files (#108608)
On ARM64EC, __imp_ symbols reference the auxiliary IAT, while __imp_aux_
symbols reference the regular IAT. However, x86_64 code expects both to
reference the regular IAT. This change adjusts the symbols accordingly,
matching the behavior observed in the MSVC linker.
2024-09-17 00:01:17 +02:00
JOE1994
4b27b5800f [lld] Nits on uses of raw_string_ostream (NFC)
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Strip calls to raw_string_ostream::str(), to avoid excess layer of indirection.
2024-09-15 04:23:11 -04:00
Jacek Caban
82a36468c7 [LLD][COFF] Add support for ARM64EC auxiliary IAT (#108304)
In addition to the regular IAT, ARM64EC also includes an auxiliary IAT.
At runtime, the regular IAT is populated with the addresses of imported
functions, which may be x86_64 functions or the export thunks of ARM64EC
functions. The auxiliary IAT contains versions of functions that are
guaranteed to be directly callable by ARM64 code.

The linker fills the auxiliary IAT with the addresses of `__impchk_`
thunks. These thunks perform a call on the IAT address using
`__icall_helper_arm64ec` with the target address from the IAT. If the
imported function is an ARM64EC function, the OS may replace the address
in the auxiliary IAT with the address of the ARM64EC version of the
function (not its export thunk), avoiding the runtime call checker for
better performance.
2024-09-12 22:20:50 +02:00
Jacek Caban
99a2354993 [LLD][COFF] Add support for ARM64EC import call thunks. (#107931)
These thunks can be accessed using `__impchk_*` symbols, though they
are typically not called directly. Instead, they are used to populate the
auxiliary IAT. When the imported function is x86_64 (or an ARM64EC
function with a patched export thunk), the thunk is used to call it.
Otherwise, the OS may replace the thunk at runtime with a direct
pointer to the ARM64EC function to avoid the overhead.
2024-09-11 14:46:40 +02:00
Jacek Caban
7e0008d5ad [LLD][COFF][NFC] Create import thunks in ImportFile::parse. (#107929) 2024-09-11 12:22:36 +02:00
Jacek Caban
519b36925c [LLD][COFF][NFC] Store impSym as DefinedImportData in ImportFile. (#107162) 2024-09-04 11:49:50 +02:00
Jacek Caban
ec4d5a6658 [LLD][COFF] Preserve original symbol name when resolving weak aliases. (#105897)
Instead of replacing it with target's name.
2024-08-26 19:20:18 +02:00
Jacek Caban
a2d8743cc8 [LLD][COFF] Generate X64 thunks for ARM64EC entry points and patchable functions. (#105499)
This implements Fast-Forward Sequences documented in ARM64EC
ABI https://learn.microsoft.com/en-us/windows/arm/arm64ec-abi.

There are two conditions when linker should generate such thunks:

- For each exported ARM64EC functions.
It applies only to ARM64EC functions (we may also have pure x64
functions, for which no thunk is needed). MSVC linker creates
`EXP+<mangled export name>` symbol in those cases that points to the
thunk and uses that symbol for the export. It's observable from the
module: it's possible to reference such symbols as I did in the test.
Note that it uses export name, not name of the symbol that's exported
(as in `foo` in `/EXPORT:foo=bar`). This implies that if the same
function is exported multiple times, it will have multiple thunks. I
followed this MSVC behavior.

- For hybrid_patchable functions.
The linker tries to generate a thunk for each undefined `EXP+*` symbol
(and such symbols are created by the compiler as a target of weak alias
from the demangled name). MSVC linker tries to find corresponding
`*$hp_target` symbol and if fails to do so, it outputs a cryptic error
like `LINK : fatal error LNK1000: Internal error during
IMAGE::BuildImage`. I just skip generating the thunk in such case (which
causes undefined reference error). MSVC linker additionally checks that
the symbol complex type is a function (see also #102898). We generally
don't do such checks in LLD, so I made it less strict. It should be
fine: if it's some data symbol, it will not have `$hp_target` symbol, so
we will skip it anyway.
2024-08-22 22:03:05 +02:00
Nikita Popov
49ae2dcf36 [PassManager] Remove some unnecessary includes (NFC) (#96175)
SmallPtrSet.h and TimeProfiler.h are unused. CommandLine.h is only
needed for the UseNewDbgInfoFormat declare, which can be moved to the
places that need it.
2024-06-20 17:41:35 +02:00
Jacek Caban
fed8e38c19 [LLD][COFF] Add support for ARM64EC entry thunks. (#88132)
For x86_64 callable functions, ARM64EC requires an entry thunk generated
by the compiler. The linker interprets .hybmp sections to associate
function chunks with their entry points and writes an offset to thunks
preceding function section contents.

Additionally, ICF needs to be aware of entry thunks to not consider
chunks to be equal when they have different entry thunks, and GC needs
to mark entry thunks together with function chunks.

I used a new SectionChunkEC class instead of storing entry thunks in
SectionChunk, following the guideline to keep SectionChunk as compact as
possible. This way, there is no memory usage increase on non-EC targets.
2024-06-18 11:14:01 +02:00
Jacek Caban
8f9903db8a [LLD][COFF][NFC] Use getMachineArchType helper. (#87495)
It's similar to #87370, but for lld-link.
2024-04-04 14:41:50 +02:00
Martin Storsjö
89efffd463 [LTO] [LLD] Don't alias the __imp_func and func symbol resolutions (#71376)
Commit b963c0b658 fixed LTO compilation of
cases where one translation unit is calling a function with the
dllimport attribute, and another translation unit provides this function
locally within the same linked module (i.e. not actually dllimported);
see https://github.com/llvm/llvm-project/issues/37453 or
https://bugs.llvm.org/show_bug.cgi?id=38105 for full context.

This was fixed by aliasing their GlobalResolution structs, for the
`__imp_` prefixed and non prefixed symbols.

I believe this fix to be wrong.

This patch reverts that fix, and fixes the same issue differently,
within LLD instead.

The fix assumed that one can treat the `__imp_` prefixed and unprefixed
symbols as equal, referencing SVN r240620
(d766653534). However that referenced
commit had mistaken how this logic works, which was corrected later in
SVN r240622 (88e0f9206b); those symbols
aren't direct aliases for each other - but if there's a need for the
`__imp_` prefixed one and the other one exists, the `__imp_` prefixed
one is created, as a pointer to the other one.

However this fix only works if both translation units are compiled as
LTO; if the caller is compiled as a regular object file and the callee
is compiled as LTO, the fix fails, as the LTO compilation doesn't know
that the unprefixed symbol is needed.

The only level that knows of the potential relationship between the
`__imp_` prefixed and unprefixed symbol, across regular and bitcode
object files, is LLD itself.

Therefore, revert the original fix from
b963c0b658, and fix the issue differently
- when concluding that we can fulfill an undefined symbol starting with
`__imp_`, mark the corresponding non prefixed symbol as used in a
regular object for the LTO compilation, to make sure that this non
prefixed symbol exists after the LTO compilation, to let LLD do the
fixup of the local import.

Extend the testcase to test a regular object file calling an LTO object
file, which previously failed.

This change also fixes another issue; an object file can provide both
unprefixed and prefixed versions of the same symbol, like this:

    void importedFunc(void) { 
    }
    void (*__imp_importedFunc)(void) = importedFunc;

That allows the function to be called both with and without dllimport
markings. (The concept of automatically resolving a reference to
`__imp_func` to a locally defined `func` only is done in MSVC style
linkers, but not in GNU ld, therefore MinGW mode code often uses this
construct.)

Previously, the aliasing of global resolutions at the LTO level would
trigger a failed assert with "Multiple prevailing defs are not allowed"
for this case, as both `importedFunc` and `__imp_importedFunc` could be
prevailing. Add a case to the existing LLD test case lto-imp-prefix.ll
to test this as well.

This change (together with previous change in
3ab6209a3f) completes LLD to work with
mingw-w64-crt files (the base glue code for a mingw-w64 toolchain) built
with LTO.
2023-11-21 15:06:00 +02:00
Martin Storsjö
7f9a0048fa [LLD] [COFF] Error out if new LTO objects are pulled in after the main LTO compilation (#71337)
Normally, this shouldn't happen. It can happen in exceptional
circumstances, if the compiled output of a bitcode object file
references symbols that weren't listed as undefined in the bitcode
object file itself.

This can at least happen in the following cases:
- A custom SEH personality is set via asm()
- Compiler generated calls to builtin helper functions, such as
__chkstk, or __rt_sdiv on arm

Both of these produce undefined references to symbols after compiling to
a regular object file, that aren't visible on the level of the IR object
file.

This is only an issue if the referenced symbols are provided as LTO
objects themselves; loading regular object files after the LTO
compilation works fine.

Custom SEH personalities are rare, but one CRT startup file in mingw-w64
does this. The referenced pesonality function is usually provided via an
import library, but for WinStore targets, a local dummy reimplementation
in C is used, which can be an LTO object.

Generated calls to builtins is very common, but the builtins aren't
usually provided as LTO objects (compiler-rt's builtins explicitly pass
-fno-lto when building), and many of the builtins are provided as raw .S
assembly files, which don't get built as LTO objects anyway, even if
built with -flto.

If hitting this unusual, but possible, situation, error out cleanly with
a clear message rather than crashing.
2023-11-07 11:49:40 +02:00