[lldb] fix parallel module loading deadlock for Linux DYLD (#166480)

Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered
(his previous
[attempt](https://github.com/llvm/llvm-project/pull/160225)).

This change can be summarized as the following:
* Plumb through a boolean flag to force no preload in
`GetOrCreateModules` all the way through to `LoadModuleAtAddress`.
* Parallelize `Module::PreloadSymbols` separately from
`Target::GetOrCreateModule` and its caller `LoadModuleAtAddress` (this
is what avoids the deadlock).

These changes roughly maintain the performance characteristics of the
previous implementation of parallel module loading. Testing on targets
with between 5000 and 14000 modules, I saw similar numbers as before,
often more than 10% faster in the new implementation across multiple
trials for these massive targets. I think it's because we have less lock
contention with this approach.

# The deadlock

See [bt.txt](https://github.com/user-attachments/files/22524471/bt.txt)
for a sample backtrace of LLDB when the deadlock occurs.

As @GeorgeHuyubo explains in his
[PR](https://github.com/llvm/llvm-project/pull/160225), the deadlock
occurs from an ABBA deadlock that happens when a thread context-switches
out of `Module::PreloadSymbols`, goes into `Target::GetOrCreateModule`
for another module, possibly entering this block:
```
      if (!module_sp) {
        // The platform is responsible for finding and caching an appropriate
        // module in the shared module cache.
        if (m_platform_sp) {
          error = m_platform_sp->GetSharedModule(
              module_spec, m_process_sp.get(), module_sp, &search_paths,
              &old_modules, &did_create_module);
        } else {
          error = Status::FromErrorString("no platform is currently set");
        }
      }
```
`Module::PreloadSymbols` holds a module-level mutex, and then
`GetSharedModule` *attempts* to hold the mutex of the global shared
`ModuleList`. So, this thread holds the module mutex, and waits on the
global shared `ModuleList` mutex.

A competing thread may execute `Target::GetOrCreateModule`, enter the
same block as above, grabbing the global shared `ModuleList` mutex.
Then, in `ModuleList::GetSharedModule`, we eventually call
`ModuleList::FindModules` which eventually waits for the `Module` mutex
held by the first thread (via `Module::GetUUID`). Thus, we deadlock.

## Reproducing the deadlock

It might be worth noting that I've never been able to observe this
deadlock issue during live debugging (e.g. launching or attaching to
processes), however we were able to consistently reproduce this issue
with coredumps when using the following settings:
```
(lldb) settings set target.parallel-module-load true
(lldb) settings set target.preload-symbols true
(lldb) settings set symbols.load-on-demand false
(lldb) target create --core /some/core/file/here
# deadlock happens
```

## How this change avoids this deadlock

This change avoids concurrent executions of `Module::PreloadSymbols`
with `Target::GetOrCreateModule` by waiting until after the
`Target::GetOrCreateModule` executions to run `Module::PreloadSymbols`
in parallel. This avoids the ordering of holding a Module lock *then*
the ModuleList lock, as `Target::GetOrCreateModule` executions maintain
the ordering of the shared ModuleList lock first (from what I've read
and tested).

## Why not read-write lock?

Some feedback in https://github.com/llvm/llvm-project/pull/160225 was to
modify mutexes used in these components with read-write locks. This
might be a good idea overall, but I don't think it would *easily*
resolve this specific deadlock. `Module::PreloadSymbols` would probably
need a write lock to Module, so even if we had a read lock in
`Module::GetUUID` we would still contend. Maybe the `ModuleList` lock
could be a read lock that converts to a write lock if it chooses to
update the module, but it seems likely that some thread would try to
update the shared module list and then the write lock would contend
again.

Perhaps with deeper architectural changes, we could fix this issue?

# Other attempts

One downside of this approach (and the former approach of parallel
module loading) is that each DYLD would need to implement this pattern
themselves. With @clayborg's help, I looked at a few other approaches:
* In `Target::GetOrCreateModule`, backgrounding the
`Module::PreloadSymbols` call by adding it directly to the thread pool
via `Debugger::GetThreadPool().async()`. This required adding a lock to
`Module::SetLoadAddress` (probably should be one there already) since
`ObjectFileELF::SetLoadAddress` is not thread-safe (updates sections).
Unfortunately, during execution, this causes the preload symbols to run
synchronously with `Target::GetOrCreateModule`, preventing us from truly
parallelizing the execution.
* In `Module::PreloadSymbols`, backgrounding the `symtab` and `sym_file`
`PreloadSymbols` calls individually, but similar issues as the above.
* Passing a callback function like
https://github.com/swiftlang/llvm-project/pull/10746 instead of the
boolean I use in this change. It's functionally the same change IMO,
with some design tradeoffs:
* Pro: the caller doesn't need to explicitly call
`Module::PreloadSymbols` itself, and can instead call whatever function
is passed into the callback.
* Con: the caller needs to delay the execution of the callback such that
it occurs after the `GetOrCreateModule` logic, otherwise we run into the
same issue. I thought this would be trickier for the caller, requiring
some kinda condition variable or otherwise storing the calls to execute
afterwards.

# Test Plan:
```
ninja check-lldb
```

---------

Co-authored-by: Tom Yang <toyang@fb.com>
This commit is contained in:
Tom Yang
2025-11-14 15:58:43 -08:00
committed by GitHub
parent e02fdf0fce
commit 66d5f6a605
7 changed files with 50 additions and 9 deletions

View File

@@ -511,6 +511,12 @@ public:
/// Atomically swaps the contents of this module list with \a other.
void Swap(ModuleList &other);
/// For each module in this ModuleList, preload its symbols.
///
/// \param[in] parallelize
/// If true, all modules will be preloaded in parallel.
void PreloadSymbols(bool parallelize) const;
protected:
// Class typedefs.
typedef std::vector<lldb::ModuleSP>

View File

@@ -352,6 +352,7 @@ public:
protected:
// Utility methods for derived classes
/// Find a module in the target that matches the given file.
lldb::ModuleSP FindModuleViaTarget(const FileSpec &file);
/// Checks to see if the target module has changed, updates the target

View File

@@ -629,13 +629,20 @@ public:
/// or identify a matching Module already present in the Target,
/// and return a shared pointer to it.
///
/// Note that this function previously also preloaded the module's symbols
/// depending on a setting. This function no longer does any module
/// preloading because that can potentially cause deadlocks when called in
/// parallel with this function.
///
/// \param[in] module_spec
/// The criteria that must be matched for the binary being loaded.
/// e.g. UUID, architecture, file path.
///
/// \param[in] notify
/// If notify is true, and the Module is new to this Target,
/// Target::ModulesDidLoad will be called.
/// Target::ModulesDidLoad will be called. See note in
/// Target::ModulesDidLoad about thread-safety with
/// Target::GetOrCreateModule.
/// If notify is false, it is assumed that the caller is adding
/// multiple Modules and will call ModulesDidLoad with the
/// full list at the end.
@@ -931,6 +938,13 @@ public:
// the address of its previous instruction and return that address.
lldb::addr_t GetBreakableLoadAddress(lldb::addr_t addr);
/// This call may preload module symbols, and may do so in parallel depending
/// on the following target settings:
/// - TargetProperties::GetPreloadSymbols()
/// - TargetProperties::GetParallelModuleLoad()
///
/// Warning: if preloading is active and this is called in parallel with
/// Target::GetOrCreateModule, this may result in a ABBA deadlock situation.
void ModulesDidLoad(ModuleList &module_list);
void ModulesDidUnload(ModuleList &module_list, bool delete_locations);

View File

@@ -165,7 +165,8 @@ ModuleSP DynamicLoader::FindModuleViaTarget(const FileSpec &file) {
if (ModuleSP module_sp = target.GetImages().FindFirstModule(module_spec))
return module_sp;
if (ModuleSP module_sp = target.GetOrCreateModule(module_spec, false))
if (ModuleSP module_sp =
target.GetOrCreateModule(module_spec, /*notify=*/false))
return module_sp;
return nullptr;

View File

@@ -7,6 +7,7 @@
//===----------------------------------------------------------------------===//
#include "lldb/Core/ModuleList.h"
#include "lldb/Core/Debugger.h"
#include "lldb/Core/Module.h"
#include "lldb/Core/ModuleSpec.h"
#include "lldb/Core/PluginManager.h"
@@ -28,6 +29,7 @@
#include "lldb/Utility/Log.h"
#include "lldb/Utility/UUID.h"
#include "lldb/lldb-defines.h"
#include "llvm/Support/ThreadPool.h"
#if defined(_WIN32)
#include "lldb/Host/windows/PosixApi.h"
@@ -1381,3 +1383,21 @@ void ModuleList::Swap(ModuleList &other) {
m_modules_mutex, other.m_modules_mutex);
m_modules.swap(other.m_modules);
}
void ModuleList::PreloadSymbols(bool parallelize) const {
std::lock_guard<std::recursive_mutex> guard(m_modules_mutex);
if (!parallelize) {
for (const ModuleSP &module_sp : m_modules)
module_sp->PreloadSymbols();
return;
}
llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
for (const ModuleSP &module_sp : m_modules)
task_group.async([module_sp] {
if (module_sp)
module_sp->PreloadSymbols();
});
task_group.wait();
}

View File

@@ -469,7 +469,8 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
}
ModuleSP module_sp = LoadModuleAtAddress(
so_entry.file_spec, so_entry.link_addr, so_entry.base_addr, true);
so_entry.file_spec, so_entry.link_addr, so_entry.base_addr,
/*base_addr_is_offset=*/true);
if (!module_sp.get())
return;
@@ -726,9 +727,8 @@ void DynamicLoaderPOSIXDYLD::LoadAllCurrentModules() {
task_group.async(load_module_fn, *I);
task_group.wait();
} else {
for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I) {
for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I)
load_module_fn(*I);
}
}
m_process->GetTarget().ModulesDidLoad(module_list);

View File

@@ -1855,6 +1855,9 @@ void Target::NotifyModulesRemoved(lldb_private::ModuleList &module_list) {
}
void Target::ModulesDidLoad(ModuleList &module_list) {
if (GetPreloadSymbols())
module_list.PreloadSymbols(GetParallelModuleLoad());
const size_t num_images = module_list.GetSize();
if (m_valid && num_images) {
for (size_t idx = 0; idx < num_images; ++idx) {
@@ -2509,10 +2512,6 @@ ModuleSP Target::GetOrCreateModule(const ModuleSpec &orig_module_spec,
if (symbol_file_spec)
module_sp->SetSymbolFileFileSpec(symbol_file_spec);
// Preload symbols outside of any lock, so hopefully we can do this for
// each library in parallel.
if (GetPreloadSymbols())
module_sp->PreloadSymbols();
llvm::SmallVector<ModuleSP, 1> replaced_modules;
for (ModuleSP &old_module_sp : old_modules) {
if (m_images.GetIndexForModule(old_module_sp.get()) !=