This reverts commit d9e659c538.
I could not reproduce the Mac OS ASAN failure locally but I narrowed it
down to the test `test_many_fields_same_enum`. This test shares an enum
between x0, which is 64 bit, and cpsr, which is 32 bit.
My theory is that when it does `register read x0`, an enum type is
created where the undlerying enumerators are 64 bit, matching the
register size.
Then it does `register read cpsr` which used the cached enum type, but
this register is 32 bit. This caused lldb to try to read an 8 byte value
out of a 4 byte allocation:
READ of size 8 at 0x60200014b874 thread T0
<...>
=>0x60200014b800: fa fa fd fa fa fa fd fa fa fa fd fa fa fa[04]fa
To fix this I've added the register's size in bytes to the constructed
enum type's name. This means that x0 uses:
__lldb_register_fields_enum_some_enum_8
And cpsr uses:
__lldb_register_fields_enum_some_enum_4
If any other registers use this enum and are read, they will use the
cached type as long as their size matches, otherwise we make a new type.
This teaches lldb to parse the enum XML elements sent by lldb-server,
and make use of the information in `register read` and `register info`.
The format is described in
https://sourceware.org/gdb/current/onlinedocs/gdb.html/Enum-Target-Types.html.
The target XML parser will drop any invalid enum or evalue. If we find
multiple evalue for the same value, we will use the last one we find.
The order of evalues from the XML is preserved as there may be good
reason they are not in numerical order.
This change by itself has no measurable effect on the LLDB
testsuite. I'm making it in preparation for threading through more
errors in the Swift language plugin.
Create additional helper functions for the ValueObject class, for:
- returning the value as an APSInt or APFloat
- additional type casting options
- additional ways to create ValueObjects from various types of data
- dereferencing a ValueObject
These helper functions are needed for implementing the Data Inspection
Language, described in
https://discourse.llvm.org/t/rfc-data-inspection-language/69893
The the function is doing two fairly different things, depending on how
it is called. While this allows for some code reuse, it also makes it
hard to override it correctly. Possibly for this reason
ValueObjectSynthetic overerides GetChildAtIndex instead, which forces it
to reimplement some of its functionality, most notably caching of
generated children.
Splitting this up makes it easier to move the caching to a common place
(and hopefully makes the code easier to follow in general).
UpdateFormatsIfNeeded has hardcoded the call to GetFormat with no
dynamic values. GetFormat will try to find the synthetic children of the
ValueObject, and passing the wrong one can fail, which can be bad for
performance but should not be user visible. Fix the performace bug by
passing the dynamic value type of the ValueObject.
rdar://122506593
Change the signature of `DWARFExpression::Evaluate` and
`DWARFExpressionList::Evaluate` to return an `llvm::Expected` instead of a
boolean. This eliminates the `Status` output parameter and generally improves
error handling.
Reduce false positive identification of C names as Dlang mangled names. This happens
when a C function uses the prefix `_D`.
The [Dlang ABI](https://dlang.org/spec/abi.html#name_mangling) shows that mangled names
have a length immediately following the `_D` prefix. This change checks for a digit
after the `_D` prefix, when identifying the mangling scheme of a symbol. This doesn't
prevent false positives entirely, but does make it less likely.
This reverts commit fe82a3da36.
This broke LLDB on MacOS due to a missing symbol during linking.
The fix has been applied in c6c08eee37.
Original commit message:
The terminfo dependency introduces a significant nonhermeticity into the
build. It doesn't respect `--no-undefined-version` meaning that it's not
a dependency that can be built with Clang 17+. This forces maintainers
of source-based distributions to implement patches or ignore linker
errors.
Remove it to reduce the closure size and improve portability of
LLVM-based tools. Users can still use command line arguments to toggle
color support expliticly.
Fixes#75490Closes#53294#23355
This adds new SB API calls and classes to allow a user of the SB API to obtain an address range from SBFunction and SBBlock. This is a second attempt to land the reverted PR #92014.
The terminfo dependency introduces a significant nonhermeticity into the
build. It doesn't respect `--no-undefined-version` meaning that it's not
a dependency that can be built with Clang 17+. This forces maintainers
of source-based distributions to implement patches or ignore linker
errors.
Remove it to reduce the closure size and improve portability of
LLVM-based tools. Users can still use command line arguments to toggle
color support expliticly.
Fixes#75490Closes#53294#23355
# Motivation
Individual callers of `SBDebugger::SetDestroyCallback()` might think
that they have registered their callback and expect it to be called when
the debugger is destroyed. In reality, only the last caller survives,
and all previous callers are forgotten, which might be a surprise to
them. Worse, if this is called in a race condition, which callback
survives is less predictable, which may case confusing behavior
elsewhere.
# This PR
Allows multiple destroy callbacks to be registered and all called when
the debugger is destroyed.
**EDIT**: Adds two new APIs: `AddDestroyCallback()` and
`ClearDestroyCallback()`. `SetDestroyCallback()` will first clear then
add the given callback. Tests are added for the new APIs.
## Tests
```
bin/llvm-lit -sv ../external/llvm-project/lldb/test/API/python_api/debugger/TestDebuggerAPI.py
```
## (out-dated, see comments below) Semantic change to
`SetDestroyCallback()`
~~Currently, the method overwrites the old callback with the new one.
With this PR, it will NOT overwrite. Instead, it will hold on to both.
Both callbacks get called during destroy.~~
~~**Risk**: Although the documentation of `SetDestroyCallback()` (see
[C++](https://lldb.llvm.org/cpp_reference/classlldb_1_1SBDebugger.html#afa1649d9453a376b5c95888b5a0cb4ec)
and
[python](https://lldb.llvm.org/python_api/lldb.SBDebugger.html#lldb.SBDebugger.SetDestroyCallback))
doesn't really specify the behavior, there is a risk: if existing call
sites rely on the "overwrite" behavior, they will be surprised because
now the old callback will get called. But as the above said, the current
behavior of "overwrite" itself might be unintended, so I don't
anticipate users to rely on this behavior. In short, this risk might be
less of a problem if we correct it sooner rather than later (which is
what this PR is trying to do).~~
## (out-dated, see comments below) Implementation
~~The implementation holds a `std::vector<std::pair<callback, baton>>`.
When `SetDestroyCallback()` is called, callbacks and batons are appended
to the `std::vector`. When destroy event happen, the `(callback, baton)`
pairs are invoked FIFO. Finally, the `std::vector` is cleared.~~
# (out-dated, see comments below) Alternatives considered
~~Instead of changing `SetDestroyCallback()`, a new method
`AddDestroyCallback()` can be added, which use the same
`std::vector<std::pair<>>` implementation. Together with
`ClearDestroyCallback()` (see below), they will replace and deprecate
`SetDestroyCallback()`. Meanwhile, in order to be backward compatible,
`SetDestroyCallback()` need to be updated to clear the `std::vector` and
then add the new callback. Pros: The end state is semantically more
correct. Cons: More steps to take; potentially maintaining an
"incorrect" behavior (of "overwrite").~~
~~A new method `ClearDestroyCallback()` can be added. Might be
unnecessary at this point, because workflows which need to set then
clear callbacks may exist but shouldn't be too common at least for now.
Such method can be added later when needed.~~
~~The `std::vector` may bring slight performance drawback if its
implementation doesn't handle small size efficiently. However, even if
that's the case, this path should be very cold (only used during init
and destroy). Such performance drawback should be negligible.~~
~~A different implementation was also considered. Instead of using
`std::vector`, the current `m_destroy_callback` field can be kept
unchanged. When `SetDestroyCallback()` is called, a lambda function can
be stored into `m_destroy_callback`. This lambda function will first
call the old callback, then the new one. This way, `std::vector` is
avoided. However, this implementation is more complex, thus less
readable, with not much perf to gain.~~
---------
Co-authored-by: Roy Shi <royshi@meta.com>
Removes the debugger broadcast bits from `Debugger.h` and instead uses
the enum from `lldb-enumerations.h` and adds the
`eBroadcastSymbolChange` bit to the enum in `lldb-enumerations.h`. This fixes a bug wherein the incorrect broadcast bit could be referenced due both of these enums previously existing and being out-of-sync with each other.
Adds a `show_function_display_name` parameter to
`SymbolContext::DumpStopContext`. This
parameter defaults to false, but `BreakpointLocation::GetDescription`
sets it to true.
This is NFC in mainline lldb, and will be used to modify how Swift
breakpoint locations are printed.
Give language plugins the opportunity to provide a language specific
display name.
This will be used in a follow up commit. The purpose of this change is
to ultimately display breakpoint locations with a more human friendly
demangling of Swift symbols.
Adds support for applying LLVM formatting to variables.
The reason for this is to support cases such as the following.
Let's say you have two separate bytes that you want to print as a
combined hex value. Consider the following summary string:
```
${var.byte1%x}${var.byte2%x}
```
The output of this will be: `0x120x34`. That is, a `0x` prefix is
unconditionally applied to each byte. This is unlike printf formatting
where you must include the `0x` yourself.
Currently, there's no way to do this with summary strings, instead
you'll need a summary provider in python or c++.
This change introduces formatting support using LLVM's formatter system.
This allows users to achieve the desired custom formatting using:
```
${var.byte1:x-}${var.byte2:x-}
```
Here, each variable is suffixed with `:x-`. This is passed to the LLVM
formatter as `{0:x-}`. For integer values, `x` declares the output as
hex, and `-` declares that no `0x` prefix is to be used. Further, one
could write:
```
${var.byte1:x-2}${var.byte2:x-2}
```
Where the added `2` results in these bytes being written with a minimum
of 2 digits.
An alternative considered was to add a new format specifier that would
print hex values without the `0x` prefix. The reason that approach was
not taken is because in addition to forcing a `0x` prefix, hex values
are also forced to use leading zeros. This approach lets the user have
full control over formatting.
These are hardcoded strings that are already present in the data section
of the binary, no need to immediately place them in the ConstString
StringPools. Lots of code still calls `GetBroadcasterClass` and places
the return value into a ConstString. Changing that would be a good
follow-up.
Additionally, calls to these functions are still wrapped in ConstStrings
at the SBAPI layer. This is because we must guarantee the lifetime of
all strings handed out publicly.
This member variable is completely unused. I also don't think it makes a
ton of sense since (1) The "base address" can be obtained from the first
Instruction in its InstructionList, and (2) InstructionLists may not be
a series of contiguous instructions (even though they are most of the
time).
In
commit 2f63718f85
Author: Jason Molenda <jmolenda@apple.com>
Date: Tue Mar 26 09:07:15 2024 -0700
[lldb] Don't clear a Module's UnwindTable when adding a SymbolFile
(#86603)
I stopped clearing a Module's UnwindTable when we add a SymbolFile to
avoid the memory management problems with adding a symbol file
asynchronously while the UnwindTable is being accessed on another
thread. This broke the target-symbols-add-unwind.test shell test on
Linux which removes the DWARF debub_frame section from a binary, loads
it, then loads the unstripped binary with the DWARF debug_frame section
and checks that the UnwindPlans for a function include debug_frame.
I originally decided that I was willing to sacrifice the possiblity of
additional unwind sources from a symbol file because we rely on assembly
emulation so heavily, they're rarely critical. But there are targets
where we we don't have emluation and rely on things like DWARF
debug_frame a lot more, so this probably wasn't a good choice.
This patch adds a new UnwindTable::Update method which looks for any new
sources of unwind information and adds it to the UnwindTable, and calls
that after a new SymbolFile has been added to a Module.
This implements coalescing of progress events using a timeout, as
discussed in the RFC on Discourse [1]. This PR consists of two commits
which, depending on the feedback, I may split up into two PRs. For now,
I think it's easier to review this as a whole.
1. The first commit introduces a new generic `Alarm` class. The class
lets you to schedule a function (callback) to be executed after a given
timeout expires. You can cancel and reset a callback before its
corresponding timeout expires. It achieves this with the help of a
worker thread that sleeps until the next timeout expires. The only
guarantee it provides is that your function is called no sooner than the
requested timeout. Because the callback is called directly from the
worker thread, a long running callback could potentially block the
worker thread. I intentionally kept the implementation as simple as
possible while addressing the needs for the `ProgressManager` use case.
If we want to rely on this somewhere else, we can reassess whether we
need to address those limitations.
2. The second commit uses the Alarm class to coalesce progress events.
To recap the Discourse discussion, when multiple progress events with
the same title execute in close succession, they get broadcast as one to
`eBroadcastBitProgressCategory`. The `ProgressManager` keeps track of
the in-flight progress events and when the refcount hits zero, the Alarm
class is used to schedule broadcasting the event. If a new progress
event comes in before the alarm fires, the alarm is reset (and the
process repeats when the new progress event ends). If no new event comes
in before the timeout expires, the progress event is broadcast.
[1]
https://discourse.llvm.org/t/rfc-improve-lldb-progress-reporting/75717/
Fixing a crash in lldb when `symbols.auto-download` setting is enabled.
When doing a backtrace, this feature has lldb search for a SymbolFile
for stack frames when we are backtracing, and add them either
synchoronously or asynchronously, depending on the specific setting
used.
Module::SetSymbolFileFileSpec clears the Module's UnwindTable, once we
find a new SymbolFile. We may be adding a source of unwind information
that we did not have when lldb was working only with the executable
binary.
What happens in practice is that we're using a reference to the Module's
UnwindTable, and then the other thread getting the SymbolFile clears it
and now the first thread is referring to freed memory and we can crash.
When built with address sanitizer, it crashes much more reliably.
Given that unwind information used for exception handling -- eh_frame,
compact unwind -- is present in executable binaries, the only thing
we're likely to *add* would be DWARF's `debug_frame` if that was also
available. The actual value of re-creating the UnwindTable when we have
added a SymbolFile is not large.
I also tried fixing this by changing the Module to have a shared_ptr to
the UnwindTable, so we could have two different UnwindTable's in use
simultaneously for a brief period. This would be fine TODAY, but it
introduces a very subtle bug that someone will have a heck of a time
figuring out in the future.
In the end, I believe the safest approach is to sacrifice the possible
marginal gain of reconstructing the UnwindTable once a SymbolFile has
been added, to sidestep this whole problem area.
Also, in `Module::GetUnwindTable()`, call `DownloadSymbolFileAsync`
before we create the UnwindTable for the first time, in case the symbol
file is fetched synchronously, we will have it for that possible
marginal gain.
This implements coalescing of progress events using a timeout, as
discussed in the RFC on Discourse [1]. This PR consists of two commits
which, depending on the feedback, I may split up into two PRs. For now,
I think it's easier to review this as a whole.
1. The first commit introduces a new generic `Alarm` class. The class
lets you to schedule a function (callback) to be executed after a given
timeout expires. You can cancel and reset a callback before its
corresponding timeout expires. It achieves this with the help of a
worker thread that sleeps until the next timeout expires. The only
guarantee it provides is that your function is called no sooner than the
requested timeout. Because the callback is called directly from the
worker thread, a long running callback could potentially block the
worker thread. I intentionally kept the implementation as simple as
possible while addressing the needs for the `ProgressManager` use case.
If we want to rely on this somewhere else, we can reassess whether we
need to address those limitations.
2. The second commit uses the Alarm class to coalesce progress events.
To recap the Discourse discussion, when multiple progress events with
the same title execute in close succession, they get broadcast as one to
`eBroadcastBitProgressCategory`. The `ProgressManager` keeps track of
the in-flight progress events and when the refcount hits zero, the Alarm
class is used to schedule broadcasting the event. If a new progress
event comes in before the alarm fires, the alarm is reset (and the
process repeats when the new progress event ends). If no new event comes
in before the timeout expires, the progress event is broadcast.
[1]
https://discourse.llvm.org/t/rfc-improve-lldb-progress-reporting/75717/
This is another step towards supporting DWARF5 checksums and inline
source code in LLDB. This is a reland of #85468 but without the
functional change of storing the support file from the line table (yet).
Some languages may create artificial functions that have no real user
code, even though there is line table information for them. One such
case is with coroutine code that receives the CoroSplitter
transformation in LLVM IR. That code transformation creates many
different Functions, cloning one Instruction into many Instructions in
many different Functions and copying the associated debug locations.
It would be difficult to make that pass delete debug locations of cloned
instructions in a language agnostic way (is it even possible?), but LLDB
can ignore certain locations by querying its Language APIs and having it
decide based on, for example, mangling information.
MSVC fails when there is ambiguity (multiple options) around implicit
type conversion operators.
Make ConstString's conversion operator to string_view explicit to avoid
ambiguity with one to StringRef and remove an unused local variable that
MSVC also fails on.
The ValueObjectConstResult classes that back expression result variables
play a complicated game with where the data for their values is stored.
They try to make it appear as though they are still tied to the memory
in the target into which their value was written when the expression is
run, but they also keep a copy in the Host which they use after the
value is made (expression results are "history values" so that's how we
make sure they have "the value at the time of the expression".)
However, that means that if you ask them to cast themselves to a value
bigger than their original size, they don't have a way to get more
memory for that purpose. The same thing is true of ValueObjects backed
by DataExtractors, the data extractors don't know how to get more data
than they were made with in general.
The only place where we actually ask ValueObjects to sample outside
their captured bounds is when you do ValueObject::Cast from one
structure type to a bigger structure type. In
https://reviews.llvm.org/D153657 I handled this by just disallowing
casts from one structure value to a larger one. My reasoning at the time
was that the use case for this was to support discriminator based C
inheritance schemes, and you can't directly cast values in C, only
pointers, so this was not a natural way to handle those types. It seemed
logical that since you would have had to start with pointers in the
implementation, that's how you would write your lldb introspection code
as well.
Famous last words...
Turns out there are some heavy users of the SB API's who were relying on
this working, and this is a behavior change, so this patch makes this
work in the cases where it used to work before, while still disallowing
the cases we don't know how to support.
Note that if you had done this Cast operation before with either
expression results or value objects from data extractors, lldb would not
have returned the correct results, so the cases this patch outlaws are
ones that actually produce invalid results. So nobody should be using
Cast in these cases, or if they were, this patch will point out the bug
they hadn't yet noticed.
Walter Erquinigo added optional instruction annotations for x86
instructions in 2022 for the `thread trace dump instruction` command,
and code to DisassemblerLLVMC to add annotations for instructions that
change flow control, v. https://reviews.llvm.org/D128477
This was added as an option to `disassemble`, and the trace dump command
enables it by default, but several other instruction dumpers were
changed to display them by default as well. These are only implemented
for Intel instructions, so our disassembly on other targets ends up
looking like
```
(lldb) x/5i 0x1000086e4
0x1000086e4: 0xa9be6ffc unknown stp x28, x27, [sp, #-0x20]!
0x1000086e8: 0xa9017bfd unknown stp x29, x30, [sp, #0x10]
0x1000086ec: 0x910043fd unknown add x29, sp, #0x10
0x1000086f0: 0xd11843ff unknown sub sp, sp, #0x610
0x1000086f4: 0x910c63e8 unknown add x8, sp, #0x318
```
instead of `disassemble`'s output style of
```
lldb`main:
lldb[0x1000086e4] <+0>: stp x28, x27, [sp, #-0x20]!
lldb[0x1000086e8] <+4>: stp x29, x30, [sp, #0x10]
lldb[0x1000086ec] <+8>: add x29, sp, #0x10
lldb[0x1000086f0] <+12>: sub sp, sp, #0x610
lldb[0x1000086f4] <+16>: add x8, sp, #0x318
```
Adding symbolic annotations for assembly instructions is something I'm
interested in too, because we may have users investigating a crash or
apparent-incorrect behavior who must debug optimized assembly and they
may not be familiar with the ISA they're using, so short of flipping
through a many-thousand-page PDF to understand each instruction, they're
lost. They don't write assembly or work at that level, but to understand
a bug, they have to understand what the instructions are actually doing.
But the annotations that exist today don't move us forward much on that
front - I'd argue that the flow control instructions on Intel are not
hard to understand from their names, but that might just be my personal
bias. Much trickier instructions exist in any event.
Displaying this information by default for all targets when we only have
one class of instructions on one target is not a good default.
Also, in 2011 when Greg implemented the `memory read -f i` (aka `x/i`)
command
```
commit 5009f9d501
Author: Greg Clayton <gclayton@apple.com>
Date: Thu Oct 27 17:55:14 2011 +0000
[...]
eFormatInstruction will print out disassembly with bytes and it will use the
current target's architecture. The format character for this is "i" (which
used to be being used for the integer format, but the integer format also has
"d", so we gave the "i" format to disassembly), the long format is
"instruction".
```
he had DumpDataExtractor's DumpInstructions print the bytes of the
instruction -- that's the first field we see above for the `x/5i` after
the address -- and this is only useful for people who are debugging the
disassembler itself, I would argue. I don't want this displayed by
default either.
tl;dr this patch removes both fields from `memory read -f -i` and I
think this is the right call today. While I'm really interested in
instruction annotation, I don't think `x/i` is the right place to have
it enabled by default unless it's really compelling on at least some of
our major targets.
Change GetNumChildren()/CalculateNumChildren() methods return
llvm::Expected
This is an NFC change that does not yet add any error handling or change
any code to return any errors.
This is the second big change in the patch series started with
https://github.com/llvm/llvm-project/pull/83501
A follow-up PR will wire up error handling.
Change GetNumChildren()/CalculateNumChildren() methods return
llvm::Expected
This is an NFC change that does not yet add any error handling or change
any code to return any errors.
This is the second big change in the patch series started with
https://github.com/llvm/llvm-project/pull/83501
A follow-up PR will wire up error handling.