[lldb] Part 2 of 2 - Refactor `CommandObject::DoExecute(...)` to return
`void` instead of ~~`bool`~~
Justifications:
- The code doesn't ultimately apply the `true`/`false` return values.
- The methods already pass around a `CommandReturnObject`, typically
with a `result` parameter.
- Each command return object already contains:
- A more precise status
- The error code(s) that apply to that status
Part 1 refactors the `CommandObject::Execute(...)` method.
- See
[https://github.com/llvm/llvm-project/pull/69989](https://github.com/llvm/llvm-project/pull/69989)
rdar://117378957
Detect when we are matching a memprof profile with no column numbers,
and in that case treat all column numbers as 0 when matching. The
profiled binary might have been built with -gno-column-info, for
example.
This was a regression introduced by myself in:
6a62707c04
where I too hastily removed the basic handling of implicit captures
we have currently. This will be superseded by all implicit captures
being added to target operations map_info entries in a soon landing
series of patches, however, that is currently not the case so we must
continue to do some basic handling of these captures for the time
being. This patch re-adds that behaviour to avoid regressions.
Unfortunately this means some test changes as well as
getUsedValuesDefinedAbove grabs constants used outside
of the target region which aren't handled particularly
well currently.
Update the documentation surrounding reduction kinds. Highlight
different min/max reduction kinds for signed/unsigned integers and
floats. Update IR examples.
When stride is x0, a strided load should behave like a unit stride load,
which uses the VLDE sched class.
---------
Co-authored-by: Wang Pengcheng <wangpengcheng.pp@bytedance.com>
G_GLOBAL_VALUE should be lowered into an absolute address if
`-codemodel=small` is used or into a PC-relative if `-codemodel=medium`
is used.
PR #68380 tried to create special instructions to do this, but I don't
see why we need to do that.
This reverts commit 4aa12afb96.
This change introduced failures upon checking out the PR source code.
Pulling this out of tree while I investigate further.
This patch makes a couple changes to the PR code formatting check:
- Moves the `changed-files` action to before the checkout to make sure
that it pulls
information from the Github API rather than by running `git diff` to
alleviate some
performance problems.
- Checkout the head of the pull request head instead of the base of the
pull request
to ensure that we have the PR commits inside the checkout.
- Add an additional sparse checkout of the necessary LLVM tools to run
the action
to alleviate security problems introduced by checking out the head of
the pull
request. Only code from the base of the pull request runs.
- Adjust the commit references to be based on `HEAD` as Github doesn't
give
exact commit SHAs for the first commit in the PR.
Builds on #67982 which recently introduced the nneg flag on a zext
instruction. InstCombine is one of our largest canonicalizers of zext
from non-negative sext instructions, so set the flag there.
This patch backports a one-liner `std::to_underlying` that came with C++23. This is useful for refactoring unscoped enums into scoped enums, because the latter are not implicitly convertible to integer types.
I followed libc++ implementation, but I consider their testing too heavy for us, so I wrote a simpler set of tests.
The calls to std::construct_at might overwrite the previously set
__has_value_ flag in the case where the flag is overlapping with
the actual value or error being stored (since we use [[no_unique_address]]).
To fix this issue, this patch ensures that we initialize the
__has_value_ flag after we call std::construct_at.
Fixes#68552
* Enhanced the logic of ExpandMemCmp pass to merge contiguous
subsequences
in LoadSequence, based on sizes allowed in `AllowedTailExpansions`.
* This enhancement seeks to minimize the number of basic blocks and
produce
optimized code when using memcmp with non-register aligned sizes.
* Enable this feature for AArch64 with memcmp sizes modulo 8 equal to
3, 5, and 6.
Reapplication of #69942 after fixing a bug
I think it follows from the HSA spec that a write to the first byte is
deemed significant to the GPU in which case writing to the second short
and reading back from it later would be safe. However, the examples for
this all involve an atomic write to the first 32 bits and it seems a
credible risk that the occasional CI errors abound invalid packets have
as their root cause that the firmware notices the early write to
packet->setup and treats that as a sign that the packet is ready to go.
That was overly-paranoid, however in passing noticed the code in libc is
genuinely invalid. The memset writes a zero to the header byte, changing
it from type_invalid (1) to type_vendor (0), at which point the GPU is
free to read the 64 byte packet and interpret it as a vendor packet,
which is probably why libc CI periodically errors about invalid packets.
Also a drive by change to do the atomic store on a uint32_t
consistently. I'm not sure offhand what __atomic_store_n on a uint16_t*
and an int resolves to, seems better to be unambiguous there.
The location set on atomic operations in both OpenMP and OpenACC was
completly off. The real location needs to be created from the source
CharBlock of the parse tree node of the respective atomic statement.
This patch updates locations in lowering for atomic operations.
This is consistent with the IR verifier and SelectionDAG's getNode.
Update tests accordingly. I tried to keep some coverage of non-pow2 when
possible. X86 didn't like a G_UNMERGE_VALUES from s48 to 3 s16 that got
created when I tried s48.
TotalElementCount() was modified to return std::optional<uint64_t>,
where std::nullopt means overflow occurred. Besides the additional
check in RESHAPE folding, all callers of TotalElementCount() were
changed, to also check for overflows.
These tests were failing on the LLDB public matrix build-bots for older
clang versions:
```
clang-7: warning: argument unused during compilation: '-nostdlib++' [-Wunused-command-line-argument]
error: invalid value 'c++20' in '-std=c++20'
note: use 'c++98' or 'c++03' for 'ISO C++ 1998 with amendments' standard
note: use 'gnu++98' or 'gnu++03' for 'ISO C++ 1998 with amendments and GNU extensions' standard
note: use 'c++11' for 'ISO C++ 2011 with amendments' standard
note: use 'gnu++11' for 'ISO C++ 2011 with amendments and GNU extensions' standard
note: use 'c++14' for 'ISO C++ 2014 with amendments' standard
note: use 'gnu++14' for 'ISO C++ 2014 with amendments and GNU extensions' standard
note: use 'c++17' for 'ISO C++ 2017 with amendments' standard
note: use 'gnu++17' for 'ISO C++ 2017 with amendments and GNU extensions' standard
note: use 'c++2a' for 'Working draft for ISO C++ 2020' standard
note: use 'gnu++2a' for 'Working draft for ISO C++ 2020 with GNU extensions' standard
make: *** [main.o] Error 1
```
The test fails because we try to compile it with `-std=c++20` (which is
required for std::chrono::{days,weeks,months,years}) on clang versions
that don't support the `-std=c++20` flag.
We could change the test to conditionally compile the C++20 parts of the
test based on the `-std=` flag and have two versions of the python
tests, one for the C++11 chrono features and one for the C++20 features.
This patch instead just disables the test on older clang versions
(because it's simpler and we don't really lose important coverage).
…off support
Same as D135848. The newly added test fails with `fatal error: error in
backend: Objective-C support is unimplemented for object file format`.
The code was incorrectly going into the wrong direction by removing one
component instead of appendeing /Developer to it. Due to fallback
mechanisms in xcrun this never seemed to have caused any issues.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
For most register sets, if it was enabled this meant you could use it,
it was present in the process. There was no present but turned off
state. So "enabled" made sense.
Then ZA came along (and soon to be ZT0) where ZA can be present in the
hardware when you have SME, but ZA itself can be made inactive. This
means that "IsZAEnabled()" doesn't mean is it active, it means do you
have SME. Which is very confusing when we actually want to know if ZA is
active.
So instead say "IsZAPresent", to make these checks more specific. For
things that can't be made inactive, present will imply "active" as
they're never inactive.
This test has been failing since the SPIR-V backend started failing
explicitly on unsupported shader types. Switched this test to a compute
shader since it is currently the only type supported.
These new intrinsics, `amdgcn_wmma_tied_f16_16x16x16_f16` and
`amdgcn_wmma_tied_f16_16x16x16_f16`,
explicitly tie the destination accumulator matrix to the input
accumulator matrix.
The `wmma_f16` and `wmma_bf16` intrinsics only write to 16-bit of the
32-bit destination VGPRs.
Which half is determined via the `op_sel` argument. The other half of
the destination registers remains unchanged.
In some cases however, we expect the destination to copy the other
halves from the input accumulator.
For instance, when packing two separate accumulator matrices into one.
In that case, the two matrices
are tied into the same registers, but separate halves. Then it is
important to copy the other matrix values
to the new destination.
We currently have three postprocess peephole optimisations for vector
pseudos:
1) Masked pseudo with all ones mask -> unmasked pseudo
2) Merge vmerge pseudo into operand pseudo's mask
3) vmerge pseudo with all ones mask -> vmv.v.v pseudo
This patch aims to move these peepholes out of SelectionDAG and into a
separate RISCVFoldMasks MachineFunction pass.
There are a few motivations for doing this:
* The current SelectionDAG implementation operates on MachineSDNodes,
which are essentially MachineInstrs but require a bunch of logic to
reason about chain and glue operands. The RISCVII::has*Op helper
functions also don't exactly line up with the SDNode operands. Mutating
these pseudos and their operands in place becomes a good bit easier at
the MachineInstr level. For example, we would no longer need to check
for cycles in the DAG during performCombineVMergeAndVOps.
* Although it's further down the line, moving this code out of
SelectionDAG allows it to be reused by GlobalISel later on.
* In performCombineVMergeAndVOps, it may be possible to commute the
operands to enable folding in more cases (see
test/CodeGen/RISCV/rvv/vmadd-vp.ll). There is existing machinery to
commute operands in TII::commuteInstruction, but it's implemented on
MachineInstrs.
The pass runs straight after ISel, before any of the other machine SSA
optimization passes run. This is so that dead-mi-elimination can mop up
any vmsets that are no longer used (but if preferred we could try and
erase them from inside RISCVFoldMasks itself). This also means that
these peepholes are no longer run at codegen -O0, so this patch isn't
strictly NFC.
Only the performVMergeToVMv peephole is refactored in this patch, the
remaining two would be implemented later. And as noted by @preames, it
should be possible to move doPeepholeSExtW out of SelectionDAG as well.