This patch improves the way lldb checks if the terminal it's opened in
(if any) supports Unicode or not.
On POSIX systems, we check if `LANG` contains `UTF-8`.
On Windows, we always return `true` since we use the `WriteToConsoleW`
api.
This is a relanding of https://github.com/llvm/llvm-project/pull/168603.
The tests failed because the bots support Unicode but the tests expect
ASCII. To avoid different outputs depending on the environment the tests
are running in, this patch always force ASCII in the tests.
When you run an expression and the result has a dynamic type that is
different from the expression's static result type, we print the result
variable using the dynamic type, but at present when you use the result
variable in an expression later on, we only give you access to the
static type. For instance:
```
(lldb) expr MakeADerivedReportABase()
(Derived *) $0 = 0x00000001007e93e0
(lldb) expr $0->method_from_derived()
^
error: no member named 'method_from_derived' in 'Base'
(lldb)
```
The static return type of that function is `Base *`, but we printed that
the result was a `Derived *` and then only used the `Base *` part of it
in subsequent expressions. That's not very helpful, and forces you to
guess and then cast the result types to their dynamic type in order to
be able to access the full type you were returned, which is
inconvenient.
This patch makes lldb retain the dynamic type of the result variable
(and ditto for persistent result variables).
It also adds more testing of expression result variables with various
types of dynamic values, to ensure we can access both the ivars and
methods of the type we print the result as.
The ObjectFile plugin interface accepts an optional DataBufferSP
argument. If the caller has the contents of the binary, it can provide
this in that DataBufferSP. The ObjectFile subclasses in their
CreateInstance methods will fill in the DataBufferSP with the actual
binary contents if it is not set.
ObjectFile base class creates an ivar DataExtractor from the
DataBufferSP passed in.
My next patch will be a caller that creates a VirtualDataExtractor with
the binary data, and needs to pass that in to the ObjectFile plugin,
instead of the bag-of-bytes DataBufferSP. It builds on the previous
patch changing ObjectFile's ivar from DataExtractor to DataExtractorSP
so I could pass in a subclass in the shared ptr. And it will be using
the VirtualDataExtractor that Jonas added in
https://github.com/llvm/llvm-project/pull/168802
No behavior is changed by the patch; we're simply moving the creation of
the DataExtractor to the caller, instead of a DataBuffer that is
immediately used to set up the ObjectFile DataExtractor. The patch is a
bit complicated because all of the ObjectFile subclasses have to
initialize their DataExtractor to pass in to the base class.
I ran the testsuite on macOS and on AArch64 Ubutnu. (btw David, I ran it
under qemu on my M4 mac with SME-no-SVE again, Ubuntu 25.10, checked
lshw(1) cpu capabilities, and qemu doesn't seem to be virtualizing the
SME, that explains why the testsuite passes)
rdar://148939795
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Correct the use_editline check in IOHandlerEditline to prevent a crash
when we have an output and/or error file, but no stream. This fixes a
regression introduced by 58279d1 that results in a crash when calling
el_init with a NULL stream.
The original code was checking the stream: GetOutputFILE and
GetErrorFILE.
```
use_editline = GetInputFILE() && GetOutputFILE() && GetErrorFILE() &&
m_input_sp && m_input_sp->GetIsRealTerminal();
```
The new code is checking the file: `m_output_sp` and `m_error_sp`.
```
use_editline = m_input_sp && m_output_sp && m_error_sp &&
m_input_sp->GetIsRealTerminal();
```
The correct check is:
```
use_editline =
m_input_sp && m_input_sp->GetIsRealTerminal() &&
m_output_sp && m_output_sp->GetUnlockedFile().GetStream() &&
m_error_sp && m_error_sp->GetUnlockedFile().GetStream();
```
We don't need to update the check for the input, because we're handling
the missing stream there correctly in the call to the constructor:
```
m_editline_up = std::make_unique<Editline>(
editline_name, m_input_sp ? m_input_sp->GetStream() : nullptr,
m_output_sp, m_error_sp, m_color);
```
We can't do the same for the output and error because we need to pass
the file, not the stream (to do proper locking).
As I was debugging this I added some more assertions. They're generally
useful so I'm keeping them.
Fixes#170891
This reverts commit d20d84fec5.
Fixes#169788, but in a different way.
In which I changed an SBError use so that when the function succeeded,
IsValid on the SBError would be true.
This seemed to make sense but SBError acts differently to other SB
classes in this respect. For something like SBMemoryRegionInfo, if
IsValid() is false, you can't do anything with it.
However for SBError, IsValid() true only means there's some underlying
error object in there. If the SBError represents a success, there's no
need to put anything in there.
You can see this intent from a lot of its methods, many have handling
for IsValid() false.
This is not a bug but a misunderstanding of what IsValid means. Yes it
does function the way I expected for most classes, but it does not for
SBError and though that's not intuitive, it is consistent with how we
describe IsValid in the documentation.
So instead of changing that method's use of SBError I'm documenting this
initially counterintuitive behaviour in the SBError header and on the
website (https://lldb.llvm.org/resources/sbapi.html).
Remove `CommandReturnObject::AppendRawError` and replace its two uses
with `AppendError`, which correctly prefixes the message with `error:`.
The comment for the method is outdated and the prefixing is clearly
desired in both situations.
Add a verbose option to the version command and include the "build
configuration" in the command output. This allows users to quickly
identify if their version of LLDB was built with support for
xml/curl/python/lua etc. This data is already available through the SB
API using SBDebugger::GetBuildConfiguration, but this makes it more
discoverable.
```
(lldb) version -v
lldb version 22.0.0git (git@github.com:llvm/llvm-project.git revision 21a2aac5e5456f9181384406f3b3fcad621a7076)
clang revision 21a2aac5e5456f9181384406f3b3fcad621a7076
llvm revision 21a2aac5e5456f9181384406f3b3fcad621a7076
editline_wchar: yes
lzma: yes
curses: yes
editline: yes
fbsdvmcore: yes
xml: yes
lua: yes
python: yes
targets: [AArch64, AMDGPU, ARM, AVR, BPF, Hexagon, Lanai, LoongArch, Mips, MSP430, NVPTX, PowerPC, RISCV, Sparc, SPIRV, SystemZ, VE, WebAssembly, X86, XCore]
curl: yes
```
Resolves#170727
Inside the LLVM codebase, const vector& should just be ArrayRef, as this
more general API works both with vectors, SmallVectors and
SmallVectorImpl, as well as with single elements.
This commit replaces two uses introduced in
https://github.com/llvm/llvm-project/pull/168797 .
Scripted frames that materialize Python functions or other non-native
code are PC-less by design, meaning they don't have valid program
counter values. Previously, these frames would display invalid addresses
(`0xffffffffffffffff`) in backtrace output.
This patch updates `FormatEntity` to detect and suppress invalid address
display for PC-less frames, adds fallback to frame methods when symbol
context is unavailable, and modifies `StackFrame::GetSymbolContext` to
skip PC-based symbol resolution for invalid addresses.
The changes enable PC-less frames to display cleanly with proper
function names, file paths, and line numbers, and allow for source
display of foreign sources (like Python). Includes comprehensive test
coverage demonstrating frames pointing to Python source files.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
This method puts strings into the ConstString pool and vends them as
llvm::StringRefs. Most of the uses only require a `std::string` or a
`const char *`. This can be achieved without wasting memory.
Someone came up with a clever idea for a new breakpoint type, but we
couldn't figure out how to add it in an ergonomic way because
`breakpoint set` has used up all the short-option characters. And even
if they did find a way to add it, the help for `break set` is so
confusing - because of the way it is implemented - that very few people
would detect the addition.
The basic problem is that `break set` distinguishes amongst the
fundamental breakpoint types it offers by which "required option" you
provide. If you pass a `-a` you are setting an address breakpoint, if
`-n`, `-F`, etc. a symbol name based breakpoint. And so forth. That is
however pretty hard to discern from the option grouping printing from
`help break set`. `break set` also suffers from the problem that it uses
common options in different ways depending on which "required" option is
present, which makes documenting the various behaviors difficult. And as
we run out of single letters it makes extending it difficult to
impossible.
This PR fixes that situation by adding a new command for adding
breakpoints - `break add`. The new command specifies the "breakpoint
types" as subcommands of `break add` rather than distinguishing them by
their being one of the "required" options the way `break set` does. That
both makes it much clearer what the breakpoint types actually are, and
means that the option set can be dedicated to that particular breakpoint
type, and so the help for each is less cluttered, and can be documented
properly for each usage.
Instead of trying to parse the meaning of:
```
(lldb) help break set
Sets a breakpoint or set of breakpoints in the executable.
Syntax: breakpoint set <cmd-options>
Command Options Usage:
breakpoint set [-DHd] -l <linenum> [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-u <column>] [-f <filename>] [-m <boolean>] [-s <shlib-name>] [-K <boolean>]
breakpoint set [-DHd] -a <address-expression> [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-s <shlib-name>]
breakpoint set [-DHd] -n <function-name> [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
breakpoint set [-DHd] -F <fullname> [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
breakpoint set [-DHd] -S <selector> [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
breakpoint set [-DHd] -M <method> [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
breakpoint set [-DHd] -r <regular-expression> [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
breakpoint set [-DHd] -b <function-name> [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
breakpoint set [-ADHd] -p <regular-expression> [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-f <filename>] [-m <boolean>] [-s <shlib-name>] [-X <function-name>]
breakpoint set [-DHd] -E <source-language> [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-h <boolean>] [-w <boolean>]
breakpoint set [-DHd] -P <python-class> [-k <none>] [-v <none>] [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-f <filename>] [-s <shlib-name>]
breakpoint set [-DHd] -y <linespec> [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-m <boolean>] [-s <shlib-name>] [-K <boolean>]
```
We instead offer:
```
(lldb) help break add
Commands to add breakpoints of various types
Syntax: breakpoint add
Access the breakpoint search kernels built into lldb. Along with specifying the
search kernel, each breakpoint add operation can specify a common set of
"reaction" options for each breakpoint. The reaction options can also be
modified after breakpoint creation using the "breakpoint modify" command.
The following subcommands are supported:
address -- Add breakpoints by raw address
exception -- Add breakpoints on language exceptions. If no language is specified, break on exceptions for all supported languages
file -- Add breakpoints on lines in specified source files
name -- Add breakpoints matching function or symbol names
pattern -- Add breakpoints matching patterns in the source text Expects 'raw' input (see 'help raw-input'.)
scripted -- Add breakpoints using a scripted breakpoint resolver.
For more help on any particular subcommand, type 'help <command> <subcommand>'.
```
The individual subcommand helps are also easier to read. They are still
a little too verbose because they all repeat the options for the
`reactions`. A general fix for our help system would be when a command
incorporates an OptionGroup whole into the command options, help would
show the option group name - which you could separately look up. But
even without that:
```
(lldb) help br a a
Add breakpoints by raw address
Syntax: breakpoint add address <cmd-options> <address> [<address> [...]]
Command Options Usage:
breakpoint add address [-DHde] [-G <boolean>] [-C <command>] [-c <expr>] [-Y <source-language>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-s <shlib-name>] <address> [<address> [...]]
-C <command> ( --command <command> )
A command to run when the breakpoint is hit, can be provided more than once, the commands will be run in left-to-right order.
-D ( --dummy-breakpoints )
Act on Dummy breakpoints - i.e. breakpoints set before a file is provided, which prime new targets.
-G <boolean> ( --auto-continue <boolean> )
The breakpoint will auto-continue after running its commands.
-H ( --hardware )
Require the breakpoint to use hardware breakpoints.
-N <breakpoint-name> ( --breakpoint-name <breakpoint-name> )
Adds this name to the list of names for this breakpoint. Can be specified more than once.
-T <thread-name> ( --thread-name <thread-name> )
The breakpoint stops only for the thread whose thread name matches this argument.
-Y <source-language> ( --condition-language <source-language> )
Specifies the Language to use when executing the breakpoint's condition expression.
-c <expr> ( --condition <expr> )
The breakpoint stops only if this condition expression evaluates to true.
-d ( --disable )
Disable the breakpoint.
-e ( --enable )
Enable the breakpoint.
-i <count> ( --ignore-count <count> )
Set the number of times this breakpoint is skipped before stopping.
-o <boolean> ( --one-shot <boolean> )
The breakpoint is deleted the first time it stop causes a stop.
-q <queue-name> ( --queue-name <queue-name> )
The breakpoint stops only for threads in the queue whose name is given by this argument.
-s <shlib-name> ( --shlib <shlib-name> )
Set the breakpoint at an address relative to sections in this shared library.
-t <thread-id> ( --thread-id <thread-id> )
The breakpoint stops only for the thread whose TID matches this argument. The token 'current' resolves to the current thread's ID.
-x <thread-index> ( --thread-index <thread-index> )
The breakpoint stops only for the thread whose index matches this argument.
This command takes options and free-form arguments. If your arguments resemble option specifiers (i.e., they start with a - or --), you must use ' --
' between the end of the command options and the beginning of the arguments.
```
is pretty readable.
## Description
Contribution to this topic [Rich Disassembler for
LLDB](https://discourse.llvm.org/t/rich-disassembler-for-lldb/76952),
this part.
```
The rich disassembler output should be exposed as structured data and made available through LLDB’s scripting API so more tooling could be built on top of this
```
----
This pr introduces new method `AnnotateStructured` in
`VariableAnnotator` class, which returns the result as a vector of
`VariableAnnotation` structured data, compared to original `Annotate`.
Additionally structured data is enhanced with information inferred from
`DWARFExpressionEntry` and variable declaration data.
I have moved this part of functionality form a bigger pr
https://github.com/llvm/llvm-project/pull/165163 to make it easier to
review, deliver smaller chunk faster in an incremental way.
## Testing
Run test with
```sh
./build/bin/lldb-dotest -v -p TestVariableAnnotationsDisassembler.py lldb/test/API/functionalities/disassembler-variables
```
all tests (9 existing) are passing.
<details>
<summary>screenshot 2025-11-24</summary>
<img width="1344" height="875" alt="screenshot"
src="https://github.com/user-attachments/assets/863e0fca-1e3e-43dc-bfa3-4b78ce287ae6"
/>
</details>
<details>
<summary>screenshot 2025-11-26</summary>
<img width="1851" height="865" alt="image"
src="https://github.com/user-attachments/assets/d47dacee-a679-4a49-ab22-efb5a16fe29c"
/>
</details>
<details>
<summary>screenshot 2025-12-03</summary>
<img width="1592" height="922" alt="Screenshot From 2025-12-03 22-11-30"
src="https://github.com/user-attachments/assets/957ded3d-bea1-43d0-8241-d342dfc2c7b0"
/>
</details>
---------
Signed-off-by: Nikita B <n2h9z4@gmail.com>
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Scripted frames that materialize Python functions are PC-less by design,
meaning they don't have valid address ranges. Previously,
LineEntry::IsValid()
required both a valid address range and a line number, preventing
scripted
frames from creating valid line entries for these synthetic stack
frames.
Relaxing this requirement is necessary since
`SBSymbolContext::SetLineEntry`
will first check if the LineEntry is valid and discard it otherwise.
This change introduces an `synthetic` flag that gets set when LineEntry
objects are created or modified through the SBAPI (specifically via
SetLine).
When this flag is set, IsValid() no longer requires a valid address
range,
only a valid line number.
This is risk-free because the flag is only set for LineEntry objects
created
through the SBAPI, which are primarily used by scripted processes and
frames.
Regular debug information-derived line entries continue to require valid
address ranges.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
This patch improves the way lldb checks if the terminal it's opened in
(if any) supports Unicode or not.
On POSIX systems, we check if `LANG` contains `UTF-8`.
On Windows, we always return `true` since we use the `WriteToConsoleW`
api.
Some months ago, the LookupInfo constructor logic was refactored to not
depend on language specific logic, and use languages plugins instead. In
this refactor, when the language type is unknown, a single LookupInfo
object will handle multiple languages. This doesn't work well, as
multiple languages might want to configure the LookupInfo object in
different ways. For example, different languages might want to set the
m_lookup_name differently from each other, but the previous
implementation would pick the first name a language provided, and
effectively ignored every other language. Other fields of the LookupInfo
object are also configured in incompatible ways.
This approach doesn't seem to be a problem upstream, since only the
C++/Objective-C language plugins are available, but it broke downstream
on the Swift fork, as adding Swift to the list of default languages when
the language type is unknown breaks C++ tests.
This patch makes it so instead of building a single LookupInfo object
for multiple languages, one LookupInfo object is built per language
instead.
rdar://159531216
When the target is being created, the target list acquires the mutex for
the duration of the target creation process. However if a module
callback is enabled and is being called in parallel there exists an
opportunity to deadlock if the callback calls into targetlist. I've
created a minimum repro
[here](https://gist.github.com/Jlalond/2557e06fa09825f338eca08b1d21884f).
```
command script import dead-lock-example (from above gist)
...
target create a.out
[hangs]
```
This looks like a straight forward fix, where `CreateTargetInternal`
doesn't access any state directly, and instead calls methods which they
themselves are thread-safe. So I've moved the lock to when we update the
list with the created target. I'm not sure if this is a comprehensive
fix, but it does fix my above example and in my (albeit limited)
testing, doesn't cause any strange change in behavior.
Suppose two threads are performing the exact same step out plan. They
will both have an internal breakpoint set at their parent frame. Now
supposed both of those breakpoints are in the same address (i.e. the
same BreakpointSite).
At the end of `ThreadPlanStepOut::DoPlanExplainsStop`, we see this:
```
// If there was only one owner, then we're done. But if we also hit
// some user breakpoint on our way out, we should mark ourselves as
// done, but also not claim to explain the stop, since it is more
// important to report the user breakpoint than the step out
// completion.
if (site_sp->GetNumberOfConstituents() == 1)
return true;
```
In other words, the plan looks at the name number of constituents of the
site to decide whether it explains the stop, the logic being that a
_user_ might have put a breakpoint there. However, the implementation is
not correct; in particular, it will fail in the situation described
above. We should only care about non-internal breakpoints that would
stop for the current thread.
It is tricky to test this, as it depends on the timing of threads, but I
was able to consistently reproduce the issue with a swift program using
concurrency.
rdar://165481473
This will allow the instruction emulation unwinder to reason about
instructions that prevent the subsequent instruction from executing.
Part of a sequence of PRs:
[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit
#169630
[lldb][NFC] Rename forward_branch_offset to branch_offset in
UnwindAssemblyInstEmulation #169631
[lldb] Add DisassemblerLLVMC::IsBarrier API #169632
[lldb] Handle backwards branches in UnwindAssemblyInstEmulation #169633
commit-id:bb5df4aa
Remove errant `\a` command before `<directory>` in `SaveToDisk`
documentation. The `\a` Doxygen command expects a word argument, but
`<directory>` starts with `<` which Doxygen interprets as HTML. This
fixes:
```
llvm-project/lldb/include/lldb/API/SBTrace.h:60:
Warning 564: Error parsing Doxygen command a: No word followed the command. Command ignored.
```
This change makes StackFrame methods virtual to enable subclass
overrides and introduces BorrowedStackFrame, a wrapper that presents an
existing StackFrame with a different frame index.
This enables creating synthetic frame views or renumbering frames
without copying the underlying frame data, which is useful for frame
manipulation scenarios.
This also adds a new borrowed-info format entity to show what was the
original frame index of the borrowed frame.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
Vector registers have synthetic values for display purposes. This causes
SBValue::GetExpressionPath to dispatch
to ValueObjectSynthetic instead of ValueObjectRegister, producing
incorrect results.
Fixes#147144
This is an alternative to
https://github.com/llvm/llvm-project/pull/159500, breaking that PR down
into three separate PRs, to make it easier to review.
This first PR of the three adds the basic framework for doing type
casing to the DIL code, but it does not actually do any casting: In this
PR the DIL parser only recognizes builtin type names, and the DIL
interpreter does not do anything except return the original operand (no
casting). The second and third PRs will add most of the type parsing,
and do the actual type casting, respectively.
This change adds tracking of the StackFrameList that produced each frame
by storing a weak pointer (m_frame_list_wp) in both `StackFrame` and
`ExecutionContextRef`.
When resolving frames through `ExecutionContextRef::GetFrameSP`, the
code now first attempts to use the remembered frame list instead of
immediately calling `Thread::GetStackFrameList`. This breaks circular
dependencies that can occur during frame provider initialization, where
creating a frame provider might trigger `ExecutionContext` resolution,
which would then call back into `Thread::GetStackFrameList()`, creating
an infinite loop.
The `StackFrameList` now sets m_frame_list_wp on every frame it creates,
and a new virtual method `GetOriginatingStackFrameList` allows frames to
expose their originating list.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
This patch tries to upstream code landed downstream in
https://github.com/swiftlang/llvm-project/pull/11835.
This patch implements an instrumentation plugin for the
`-fbounds-safety` soft trap mode first implemented in
https://github.com/swiftlang/llvm-project/pull/11645 (rdar://158088757).
That functionality isn't supported in upstream Clang yet, however the
instrumented plugin can be compiled without issue so this patch tries to
upstream it. The included tests are all disabled when the clang used for
testing doesn't support `-fbounds-safety`. This means the tests will be
skipped. However, it's fairly easy to point LLDB at a clang that does
support `-fbounds-safety. I've done this and confirmed the tests pass.
To use a custom clang the following can be done:
* For API tests set the `LLDB_TEST_COMPILER` CMake cache variable to
point to appropriate compiler.
* For shell tests applying a patch like this can be used to set the
appropriate compiler:
```
--- a/lldb/test/Shell/helper/toolchain.py
+++ b/lldb/test/Shell/helper/toolchain.py
@@ -271,6 +271,7 @@ def use_support_substitutions(config):
if config.lldb_lit_tools_dir:
additional_tool_dirs.append(config.lldb_lit_tools_dir)
+ config.environment['CLANG'] = '/path/to/clang'
llvm_config.use_clang(
```
The current implementation of -fbounds-safety traps works by emitting
calls to runtime functions intended to log the occurrence of a soft
trap.
While the user could just set a breakpoint of these functions the
instrumentation plugin sets it automatically and provides several
additional features:
When debug info is available:
* It adjusts the stop reason to be the reason for trapping. This is
extracted from the artificial frame in the debug info (similar to
-fbounds-safety hard traps).
* It adjusts the selected frame to be the frame where the soft trap
occurred.
When debug info is not available:
* For the `call-with-str` soft trap mode the soft trap reason is
read from the first argument register.
* For the `call-minimal` soft trap mode the stop reason is adjusted
to note its a bounds check failure but does not give further
information because none is available.
* In this situation the selected frame is not adjusted because in
this mode the user will be looking at assembly and adjusting the
frame makes things confusing.
This patch includes shell and api tests. The shell tests seemed like the
best way to test behavior when debug info is missing because those tests
make it easy to disable building with debug info completely.
rdar://163230807
This adds a new virtual method `GetScriptedModulePath()` to
`ScriptedInterface` that allows retrieving the file path of the Python
module containing the scripted object implementation.
The Python implementation acquires the GIL and walks through the
object's `__class__.__module__` to find the module's `__file__`
attribute. This will be used by ScriptedFrame to populate the module and
compile unit for frames pointing to Python source files.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
ObjectFile has an m_data DataExtractor ivar which may be default
constructed initially, or initialized with a DataBuffer passed in to its
ctor. If the DataExtractor does not get a DataBuffer source passed in,
the subclass will initialize it with access to the object file's data.
When a DataBuffer is passed in to the base class ctor, the DataExtractor
only has its buffer initialized; ObjectFile doesn't yet know the address
size and endianness to fully initialize the DataExtractor.
This patch changes ObjectFile to instead have a DataExtractorSP ivar
which is always initialized with at least a default-constructed
DataExtractor object in the base class ctor. The next patch I will be
writing is to change the ObjectFile ctor to take an optional
DataExtractorSP, so the caller can pass a DataExtractor subclass -- the
VirtualizeDataExtractor being added via
https://github.com/llvm/llvm-project/pull/168802
instead of a DataBuffer which is trivially saved into the DataExtractor.
The change is otherwise mechanical; all `m_data.` changed to
`m_data_sp->` and all the places where `m_data` was passed in for a
by-ref call were changed to `*m_data_sp.get()`. The shared pointer is
always initialized to contain an object.
I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for
getting the Linux testsuite to run on SME-only systems David). All of
the ObjectFile subclasses I modifed compile cleanly, but I haven't
tested them beyond any unit tests they may have (prob breakpad).
rdar://148939795
Introduce VirtualDataExtractor, a DataExtractor subclass that enables
reading data at virtual addresses by translating them to physical buffer
offsets using a lookup table. The lookup table maps virtual address
ranges to physical offsets and enforces boundaries to prevent reads from
crossing entry limits.
The new class inherits from DataExtractor, overriding GetData and
PeekData to provide transparent virtual address translation for most of
the DataExtractor methods. The exception are the unchecked methods, that
bypass those methods and are overloaded as well.
This pr fixes#167388 .
## Description
This pr adds new method `GetArchName` to `SBTarget` so that no need to
parse triple to get arch name in client code.
## Testing
### All from `TestTargetAPI.py`
run test with
```
./build/bin/lldb-dotest -v -p TestTargetAPI.py
```
<details>
<summary>existing tests (without newly added)</summary>
<img width="1425" height="804" alt="image"
src="https://github.com/user-attachments/assets/617e4c69-5c6b-44c4-9aeb-b751a47e253c"
/>
</details>
<details>
<summary>existing tests (with newly added)</summary>
<img width="1422" height="778" alt="image"
src="https://github.com/user-attachments/assets/746990a1-df88-4348-a090-224963d3c640"
/>
</details>
### Only `test_get_arch_name`
run test with
```
./build/bin/lldb-dotest -v -p TestTargetAPI.py -f test_get_arch_name_dwarf -f test_get_arch_name_dwo -f test_get_arch_name_dsym lldb/test/API/python_api/target
```
<details>
<summary>only newly added</summary>
<img width="1422" height="778" alt="image"
src="https://github.com/user-attachments/assets/fcaafa5d-2622-4171-acee-e104ecee0652"
/>
</details>
---------
Signed-off-by: Nikita B <n2h9z4@gmail.com>
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
## Summary:
This change introduces a `DAPSessionManager` to enable multiple DAP
sessions to share debugger instances when needed, for things like child
process debugging and some scripting hooks that create dynamically new
targets.
Changes include:
- Add `DAPSessionManager` singleton to track and coordinate all active DAP
sessions
- Support attaching to an existing target via its globally unique target
ID (targetId parameter)
- Share debugger instances across sessions when new targets are created
dynamically
- Refactor event thread management to allow sharing event threads
between sessions and move event thread and event thread handlers to `EventHelpers`
- Add `eBroadcastBitNewTargetCreated` event to notify when new targets are
created
- Extract session names from target creation events
- Defer debugger initialization from 'initialize' request to
'launch'/'attach' requests. The only time the debugger is used currently
in between its creation in `InitializeRequestHandler` and the `Launch`
or `Attach` requests is during the `TelemetryDispatcher` destruction
call at the end of the `DAP::HandleObject` call, so this is safe.
This enables scenarios when new targets are created dynamically so that
the debug adapter can automatically start a new debug session for the
spawned target while sharing the debugger instance.
## Tests:
The refactoring maintains backward compatibility. All existing DAP test
cases pass.
Also added a few basic unit tests for DAPSessionManager
```
>> ninja DAPTests
>> ./tools/lldb/unittests/DAP/DAPTests
>>./bin/llvm-lit -v ../llvm-project/lldb/test/API/tools/lldb-dap/
```
show information about the signal when the user presses `process handle
<unix-signal>` i.e
```sh
(lldb) process handle SIGWINCH
NAME PASS STOP NOTIFY DESCRIPTION
=========== ===== ===== ====== ===================
SIGWINCH true false false window size changes
```
Wanted to use the existing `GetSignalDescription` but it is expected
behaviour to return the signal name if no signal code is passed. It is
used in stop info.
65c895dfe0/lldb/source/Target/StopInfo.cpp (L1192-L1195)
This patch fixes and eliminates the possibility of SupportFileSP ever
being nullptr. The support file was originally treated like a value
type, but became a polymorphic type and therefore has to be stored and
passed around as a pointer.
To avoid having all the callers check the validity of the pointer, I
introduced the invariant that SupportFileSP is never null and always
default constructed. However, without enforcement at the type level,
that's fragile and indeed, we already identified two crashes where
someone accidentally broke that invariant.
This PR introduces a NonNullSharedPtr to prevent that. NonNullSharedPtr
is a smart pointer wrapper around std::shared_ptr that guarantees the
pointer is never null. If default-constructed, it creates a
default-constructed instance of the contained type. Note that I'm using
private inheritance because you shouldn't inherit from standard library
classes due to the lack of virtual destructor. So while the new
abstraction looks like a `std::shared_ptr`, it is in fact **not** a
shared pointer. Given that our destructor is trivial, we could use
public inheritance, but currently there's no need for it.
rdar://164989579
NFC patch which moves `DiagnosticsRendering` from `Utility` to `Host`.
This refactoring is needed for
https://github.com/llvm/llvm-project/pull/168603. It adds a method to
check whether the current terminal supports Unicode or not. This will be
OS dependent and a better fit for `Host`. Since `Utility` cannot depend
on `Host`, `DiagnosticsRendering` must live in `Host` instead.
This patch adds `LLDBLog::InstrumentationRuntime` as a log channel to
provide an appropriate channel for instrumentation runtime plugins as
previously one did not exist.
A small use of the channel is added to illustrate its use. The logging
added is not intended to be comprehensive.
This is primarily motivated by an `-fbounds-safety` instrumentation
plugin (https://github.com/swiftlang/llvm-project/pull/11835).
rdar://164920875
In this PR we are proposing to change LLDB codebase so that LLDB is able
to print values of integer registers that have more than 64-bits (even
if the number of bits is not equal to 128).
---------
Co-authored-by: Matej Košík <matej.kosik@codasip.com>
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
If we open a `NativeFile` with a `FILE*`, the OpenOptions default to
`eOpenOptionReadOnly`. This is an issue in python scripts if you try to
write to one of the files like `print("Hi",
file=lldb.debugger.GetOutputFileHandle())`.
To address this, we need to specify the access mode whenever we create a
`NativeFile` from a `FILE*`. I also added an assert on the `NativeFile`
that validates the file is opened with the correct access mode and
updated `NativeFile::Read` and `NativeFile::Write` to check the access
mode.
Before these changes:
```
$ lldb -b -O 'script lldb.debugger.GetOutputFileHandle().write("abc")'
(lldb) script lldb.debugger.GetOutputFileHandle().write("abc")
Traceback (most recent call last):
File "<input>", line 1, in <module>
io.UnsupportedOperation: not writable
```
After:
```
$ lldb -b -O 'script lldb.debugger.GetOutputFileHandle().write("abc")'
(lldb) script lldb.debugger.GetOutputFileHandle().write("abc")
abc3
```
Fixes#122387
Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered
(his previous
[attempt](https://github.com/llvm/llvm-project/pull/160225)).
This change can be summarized as the following:
* Plumb through a boolean flag to force no preload in
`GetOrCreateModules` all the way through to `LoadModuleAtAddress`.
* Parallelize `Module::PreloadSymbols` separately from
`Target::GetOrCreateModule` and its caller `LoadModuleAtAddress` (this
is what avoids the deadlock).
These changes roughly maintain the performance characteristics of the
previous implementation of parallel module loading. Testing on targets
with between 5000 and 14000 modules, I saw similar numbers as before,
often more than 10% faster in the new implementation across multiple
trials for these massive targets. I think it's because we have less lock
contention with this approach.
# The deadlock
See [bt.txt](https://github.com/user-attachments/files/22524471/bt.txt)
for a sample backtrace of LLDB when the deadlock occurs.
As @GeorgeHuyubo explains in his
[PR](https://github.com/llvm/llvm-project/pull/160225), the deadlock
occurs from an ABBA deadlock that happens when a thread context-switches
out of `Module::PreloadSymbols`, goes into `Target::GetOrCreateModule`
for another module, possibly entering this block:
```
if (!module_sp) {
// The platform is responsible for finding and caching an appropriate
// module in the shared module cache.
if (m_platform_sp) {
error = m_platform_sp->GetSharedModule(
module_spec, m_process_sp.get(), module_sp, &search_paths,
&old_modules, &did_create_module);
} else {
error = Status::FromErrorString("no platform is currently set");
}
}
```
`Module::PreloadSymbols` holds a module-level mutex, and then
`GetSharedModule` *attempts* to hold the mutex of the global shared
`ModuleList`. So, this thread holds the module mutex, and waits on the
global shared `ModuleList` mutex.
A competing thread may execute `Target::GetOrCreateModule`, enter the
same block as above, grabbing the global shared `ModuleList` mutex.
Then, in `ModuleList::GetSharedModule`, we eventually call
`ModuleList::FindModules` which eventually waits for the `Module` mutex
held by the first thread (via `Module::GetUUID`). Thus, we deadlock.
## Reproducing the deadlock
It might be worth noting that I've never been able to observe this
deadlock issue during live debugging (e.g. launching or attaching to
processes), however we were able to consistently reproduce this issue
with coredumps when using the following settings:
```
(lldb) settings set target.parallel-module-load true
(lldb) settings set target.preload-symbols true
(lldb) settings set symbols.load-on-demand false
(lldb) target create --core /some/core/file/here
# deadlock happens
```
## How this change avoids this deadlock
This change avoids concurrent executions of `Module::PreloadSymbols`
with `Target::GetOrCreateModule` by waiting until after the
`Target::GetOrCreateModule` executions to run `Module::PreloadSymbols`
in parallel. This avoids the ordering of holding a Module lock *then*
the ModuleList lock, as `Target::GetOrCreateModule` executions maintain
the ordering of the shared ModuleList lock first (from what I've read
and tested).
## Why not read-write lock?
Some feedback in https://github.com/llvm/llvm-project/pull/160225 was to
modify mutexes used in these components with read-write locks. This
might be a good idea overall, but I don't think it would *easily*
resolve this specific deadlock. `Module::PreloadSymbols` would probably
need a write lock to Module, so even if we had a read lock in
`Module::GetUUID` we would still contend. Maybe the `ModuleList` lock
could be a read lock that converts to a write lock if it chooses to
update the module, but it seems likely that some thread would try to
update the shared module list and then the write lock would contend
again.
Perhaps with deeper architectural changes, we could fix this issue?
# Other attempts
One downside of this approach (and the former approach of parallel
module loading) is that each DYLD would need to implement this pattern
themselves. With @clayborg's help, I looked at a few other approaches:
* In `Target::GetOrCreateModule`, backgrounding the
`Module::PreloadSymbols` call by adding it directly to the thread pool
via `Debugger::GetThreadPool().async()`. This required adding a lock to
`Module::SetLoadAddress` (probably should be one there already) since
`ObjectFileELF::SetLoadAddress` is not thread-safe (updates sections).
Unfortunately, during execution, this causes the preload symbols to run
synchronously with `Target::GetOrCreateModule`, preventing us from truly
parallelizing the execution.
* In `Module::PreloadSymbols`, backgrounding the `symtab` and `sym_file`
`PreloadSymbols` calls individually, but similar issues as the above.
* Passing a callback function like
https://github.com/swiftlang/llvm-project/pull/10746 instead of the
boolean I use in this change. It's functionally the same change IMO,
with some design tradeoffs:
* Pro: the caller doesn't need to explicitly call
`Module::PreloadSymbols` itself, and can instead call whatever function
is passed into the callback.
* Con: the caller needs to delay the execution of the callback such that
it occurs after the `GetOrCreateModule` logic, otherwise we run into the
same issue. I thought this would be trickier for the caller, requiring
some kinda condition variable or otherwise storing the calls to execute
afterwards.
# Test Plan:
```
ninja check-lldb
```
---------
Co-authored-by: Tom Yang <toyang@fb.com>