`FuncOpVectorUnroll` contains logic that replaces function arguments by
placeholders values. These replacements also involve changing all
instructions in the function that use the arguments to use these
placeholders. These placeholder values will later be changed back to use
the function arguments (either new or original if already legal).
The current implementation however only replaces back (the second
replacement, i.e. replacing the placeholder values to new/legal
arguments) the first block of instructions and not all of the blocks.
This may leave some instructions to use these placeholder values (which
for already legal arguments are just zeroattr values that will get
DCE'd) instead of the arguments, which is incorrect.
Closes#132158.
Followup to #125526. This expands the logic of the
unique-object-duplication warning so that it also works for windows
code.
For the most part, the logic is unchanged, merely substituting "has no
import/export annotation" in place of "has hidden visibility". However,
there are some small inconsistencies between the two; namely, visibility
is propagated through nested classes, while import/export annotations
aren't.
This PR:
1. Updates the logic for the warning to account for the differences
between posix and windows
2. Changes the warning message and documentation appropriately
3. Updates the tests to cover windows, and adds new test cases for the
places where behavior differs.
This PR was tested by building chromium (cross compiling linux->windows)
with the changes in place. After accounting for the differences in
semantics, no new warnings were discovered.
These were failing on our Windows on Arm bot, or more precisely,
not even completing.
This is because Microsoft's C runtime does extra parameter validation.
So when we called _read with an invalid fd, it called an invalid
parameter handler instead of returning an error.
https://learn.microsoft.com/en-us/%20cpp/c-runtime-library/reference/read?view=msvc-170https://learn.microsoft.com/en-us/%20cpp/c-runtime-library/parameter-validation?view=msvc-170
(lldb) run
Process 8440 launched: 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\unittests\Host\HostTests.exe' (aarch64)
Process 8440 stopped
* thread #1, stop reason = Exception 0xc0000409 encountered at address 0x7ffb7453564c
frame #0: 0x00007ffb7453564c ucrtbase.dll`_get_thread_local_invalid_parameter_handler + 652
ucrtbase.dll`_get_thread_local_invalid_parameter_handler:
-> 0x7ffb7453564c <+652>: brk #0xf003
ucrtbase.dll`_invalid_parameter_noinfo:
0x7ffb74535650 <+0>: b 0x7ffb745354d8 ; _get_thread_local_invalid_parameter_handler + 280
0x7ffb74535654 <+4>: nop
0x7ffb74535658 <+8>: nop
You can override this handler but I'm assuming that this reading
after close isn't a crucial feature, so disabling the tests seems
like the way to go.
If it is crucial, we can check the fd before we use it.
Tests added by https://github.com/llvm/llvm-project/pull/143946.
Previously, device info was returned as a queue with each element having
a "Level" field indicating its nesting level. This replaces this queue
with a more traditional tree-like structure.
This should not result in a change to the output of
`llvm-offload-device-info`.
Adds bf16 support to SPIRV by using the `SPV_KHR_bfloat16` extension.
Only a few operations are supported, including loading from and storing
to memory, conversion to/from other types, cooperative matrix operations
(including coop matrix arithmetic ops) and dot product support.
This PR adds the type definition and implements the basic cast
operations. Arithmetic/coop matrix ops will be added in a separate PR.
Enable generation of PTRADD SelectionDAG nodes for pointer arithmetic for SI,
for now behind an internal CLI option. Also add basic patterns to match these
nodes. Optimizations will come in follow-up PRs. Basic tests for SDAG codegen
with PTRADD are in test/CodeGen/AMDGPU/ptradd-sdag.ll
Only affects 64-bit address spaces for now, since the immediate use case only
affects the flat address space.
For SWDEV-516125.
This fixes the error reported in
https://github.com/llvm/llvm-project/pull/144037.
When computing the aranges table of a CU, LLDB would currently visit all
`DW_TAG_subprogram` DIEs and check their
`DW_AT_low_pc`/`DW_AT_high_pc`/`DW_AT_ranges` attributes. If those don't
exist it would error out and spam the console. Some subprograms
(particularly forward declarations) don't have low/high pc attributes,
so it's not really an "error". See DWARFv5 spec section `3.3.3
Subroutine and Entry Point Locations`:
```
A subroutine entry may have either a DW_AT_low_pc and DW_AT_high_pc
pair of attributes or a DW_AT_ranges attribute whose values encode the
contiguous or non-contiguous address ranges, respectively, of the machine
instructions generated for the subroutine (see Section 2.17 on page 51).
...
A subroutine entry representing a subroutine declaration that is not also a
definition does not have code address or range attributes.
```
We should just ignore those DIEs.
Extending `fir.do_concurrent` to `fir.do_loop ... unordered` lowering by
adding support for lowring/inlining non-empty `init` and `dealloc`
regions.
Resolves https://github.com/llvm/llvm-project/issues/143897 (actually
handles the todo).
Verify whether the generated assembly for the following function
includes the mtvsrbmi instruction.
vector unsigned char v00FF()
{
vector unsigned char x = { 0xFF, 0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0 };
return x;
}
Fixes#142404
The parser can't tell the difference between array indexing and a
substring: that has to be done in semantics once we have types.
Substrings can only be in the form string([lower]:[higher]) not
string(index) or string(lower:higher:step). I added semantic checks to
catch this for the DEPEND clause.
This patch also adds lowering for correct substrings and for complex
part references.
In PowerPC, the AtomicCmpXchgInst is lowered to
ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS. However, this node does not handle
the weak attribute of AtomicCmpXchgInst. As a result, when compiling C++
atomic_compare_exchange_weak_explicit, the generated assembly includes a
"reservation lost" loop — i.e., it branches back and retries if the
stwcx. (store-conditional) fails. This differs from GCC’s codegen, which
does not include that loop for weak compare-exchange.
Since PowerPC uses LL/SC-style atomic instructions, the patch enables
AtomicExpandImpl::expandAtomicCmpXchg for PowerPC. With this, the weak
attribute is properly respected, and the "reservation lost" loop is
removed for weak operations.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
See #143580 for MR with the test commit.
Performs the following transformations:
(select c, c1, t) -> (add (czero_nez t - c1, c), c1)
(select c, t, c1) -> (add (czero_eqz t - c1, c), c1)
@mgudim
Add a new __ARM_FEATURE_CSSC macro that can be utilized during the
preprocessing stage.
__ARM_FEATURE_CSSC is defined to 1 if there is hardware support for
CSSC.
Implements the ACLE change:
https://github.com/ARM-software/acle/pull/394
This patch is part of a series that adds origin-tracking to the debugify
source location coverage checks, allowing us to report symbolized stack
traces of the point where missing source locations appear.
This patch adds the configuration options needed to enable this feature,
in the form of a new CMake option that enables a flag in
`llvm-config.h`; this is not an entirely new CMake flag, but a new
option, `COVERAGE_AND_ORIGIN`, for the existing flag
`LLVM_ENABLE_DEBUGLOC_COVERAGE_TRACKING`. This patch contains
documentation, but no actual implementation for the flag itself.
Just commute with (V)BLENDPD/S like all other BLEND instructions
This is now handled more generally by the X86FixupInstTuningPass (OptSize fold occurs even without a scheduler model).
First step towards #142972
We now bubble up the expression evaluation diagnostics to the user and
also distinguish between "expression failed to parse/run" versus other
ways in which expressions didn't complete (e.g., setup errors, etc.).
Before:
```
(lldb) memory find -e "" 0x16fdfedc0 0x16fdfede0
error: expression evaluation failed. pass a string instead
(lldb) memory find -e "invalid" 0x16fdfedc0 0x16fdfede0
error: expression evaluation failed. pass a string instead
```
After:
```
(lldb) memory find -e "" 0x16fdfedc0 0x16fdfede0
error: Expression evaluation failed:
error: No result returned from expression. Exit status: 1
(lldb) memory find -e "invalid" 0x16fdfedc0 0x16fdfede0
error: Expression evaluation failed:
error: <user expression 0>:1:1: use of undeclared identifier 'invalid'
1 | invalid
| ^~~~~~~
```
There are many places in VPlan and LoopVectorize where we use
getKnownMinValue to discover the number of elements in a vector. Where
we expect the vector to have a fixed length, I have used the stronger
getFixedValue call. I believe this is clearer and adds extra protection
in the form of an assert in getFixedValue that the vector is not
scalable.
While looking at VPFirstOrderRecurrencePHIRecipe::computeCost I also
took the liberty of simplifying the code.
In theory I believe this patch should be NFC, but I'm reluctant to add
that to the title in case we're just missing tests for some of the VPlan
changes. I built and ran the LLVM test suite when targeting neoverse-v1
and it seemed ok.
When looking for the common base pointer, support the case where the
type changes because the GEP goes from pointer to vector of pointers.
This was supported prior to #142958.
Reduces codesize - make use of free PS<->PD domain transfers (like we do in many other places) and replace a suitable BLENDPS mask with MOVSD if OptSize or the scheduler prefers it
Previously, the AArch64 PAuth ABI core values were stored as an
ArrayRef<uint8_t>, introducing unnecessary indirection.
This patch replaces the ArrayRef with two explicit uint64_t fields:
aarch64PauthAbiPlatform and aarch64PauthAbiVersion. This simplifies the
representation and improves readability.
No functional change intended, aside from improved error messages.
I did manage to turn a crash into a non-zero return code,
but on the very first build it managed to time out.
I thought I had the appetite to tweak timeouts but
on second thought, I don't want yet another test to look
out for.
The test is not wrong, but on heavily loaded machines
it's always going to be inherently unstable.
When fusing two `linalg.genericOp`, where the producer has index
semantics, invalid `affine.apply` ops can be generated where the number
of indices do not match the number of loops in the fused genericOp.
This patch fixes the issue by directly using the number of loops from
the generated fused op.
This patch updates cir.call operation and allows function calls with
aggregate arguments and return values.
It seems that C++ class support is still at a minimum now. I tried to
make a call to a C++ function with an argument of aggregate type but it
failed because the initialization of C++ class / struct is NYI. I also
tried to inline this part of support into this patch, but the mixed
patch quickly blows in size and becomes unsuitable for review. Thus,
tests for calling functions with aggregate arguments are added only for
C for now.
Fixes#101162
This test did this:
* SBDebugger::Initialize
* Spawn a bunch of threads that do:
* SBDebugger::Create
* some work
* SBDebugger::Destroy
* Wait on those threads to finish then call SBDebugger::Terminate and
exit, or -
* Reach a time limit before all the threads finish, call
SBDebugger::Terminate and exit.
The problem was that in the timeout case, calling SBDebugger::Terminate
destroys data being used by threads that are still running. I expect
this test was expecting said threads to be so broken they were probably
stuck, but when the machine is just heavily loaded, one of them might
read that data before the whole program exits.
This means what should have been a timeout becomes a crash. Sometimes.
Which explains why we saw both timeouts and various signals on the
AArch64 Linux bot. It depends on the timings.
So I'm changing it not to call SBDebugger::Terminate in the timeout
case. We will have to tweak the timeout value based on what happens on
the buildbot, but we will know it's machine load not an lldb bug.
Also use _exit instead of exit, to skip more cleanup that might cause a
crash.
inline asm that clobbers any of the z-registers when not in streaming
mode, should still observe that the lower 128 bits of those registers
are clobbered.
Convert vector 64-bit lshr to 32-bit if shift amt is known to be >= 32.
Also convert scalar 64-bit lshr to 32-bit if shift amt is variable but
known to be >=32.
---------
Signed-off-by: John Lu <John.Lu@amd.com>
Added explanation why a is constructible evaluated to false. Also fixed
problem with ```ExtractTypeTraitFromExpression```. In case
```std::is_xxx_v<>``` with variadic pack it tries to get template
argument, but fails in expression ```Arg.getAsType()``` due to
```Arg.getKind() == TemplateArgument::ArgKind::Pack```, but not
```TemplateArgument::ArgKind::Type```.