This commit adds the rewrite
```
llvm.amdgcn.tensor.{load.to/store.from}.lds(
<4 x i32> %d0, <8 x i32> %d1, <4 x i32> zeroinitializer,
<4 x i32> zeroinitializer, i32 [cachepolicy])
=>
llvm.amdgcn.tensor.{load.to/store.from}.lds.d2(
<4 x i32> %$d0, <8 x i32> %d1, i32 [cachepolicy])
```
This is justifed because, when the short encoding that uses the NULL
SGPR for registers 2 and 3 is used, the hardware acts as if those
registers were 0, including in the gather mode.
It is always safe not to run this transformation.
(Note: tests were LLM'd and then tweaked.)
If the larger than legal scalar integer is freely foldable to a vector
type, or is likely to be custom lowered to an operation using vectors,
attempt to convert to a vector store.
This doesn't appear to be worth it for i128 types which can more easily
convert between types with extra insert_subvector steps etc.
fixes#170241
PR #169090 updated vector swizzle elements individually for HLSL This is
because rawbuffer writes via rawBufferStore can cause datarace in
concurrent writing the way vectors are written all at once. This means
we needed individual writes per element. So that means we need to be
able to GEP into an element of a vector.
The SPIRV backend did not support this pattern. SPIRV assumes Composite
types (ie Vectors, Structs, and Arrays) only for Ptr legalization in
it's store transformations via transformStore.
Fixing things at the point of ptr legalziation for the store would be
too late because we would have lost critical ptr type information still
available in LLVM IR. Instead what we needed to do is teach the
walkLogicalAccessChain used by buildLogicalAccessChainFromGEP to
converts a byte-offset GEP on an i8-pointer into a logical SPIR-V
composite access chain.
In short:
The fundamental issue is that walkLogicalAccessChain currently refuses
to handle vector-based logical GEPs, but Clang’s new HLSL path produces
exactly that pattern:
- you have a logical i8* GEP that represents indexing into the elements
of a <4 x i32>.
We need to teach walkLogicalAccessChain to treat a FixedVectorType
similarly to an array of its element type. So in walkLogicalAccessChain
replace the “give up on vector” part with logic that:
- Interprets the byte Offset as indexing into the vector elements.
- Uses the vector’s element type and number of elements.
fixes#171049fixes#171050
- Allow Bools for matrix type when in HLSL mode
- use ConvertTypeForMem to figure out the bool size
- Add Bool matrix types to hlsl_basic_types.h
---------
Co-authored-by: Helena Kotas <hekotas@microsoft.com>
For mulh pattern with operands that are both signed or unsigned,
combination is performed automatically. However for mulh with operands
which are signed and unsigned respectively we need to combine them
manually same approach as what we've done for PASUB*.
Note: This is first patch for mulh which only handle basic high part
multiplication, there will be followup patches to handle rest of mulh
related instructions.
This was using getSigned() with an unsigned (not sign extended)
argument. Using plain get() would be correct here. We can go
one step further and use getSignMask() to avoid the issue entirely.
This pr resolves [#170867](https://github.com/llvm/llvm-project/issues/170867)
Existing code recomputes the cost for creating a shuffle instruction even for the
repeating Intrinsic operand pairs. This will result in higher newCost.
Hence the runtime will decide not to fold.
The change proposed in this pr will address this issue. When calculating
the newCost we are skipping the cost calculation of an operand pair if
it was already considered. And when creating the transformed code, we
are reusing the already created shuffle instruction for repeated operand
pair.
This changes the extension parsing mechanism underpinning `--spirv-ext`
to be more explicit about what it is doing and not rely on a sort. More
specifically, we partition extensions into enabled (prefixed with `+`)
and others, and then individually handle the resulting ranges.
When llvm-symbolizer is not found on PATH TSan uses system's addr2line
instead. On Ubuntu 22.04 addr2line can't handle DWARF v5, which results
in failures in some libarcher tests.
This PR adds the directory of the just built LLVM binaries to PATH, to
make llvm-symbolizer available to TSan.
The changes were tested on an AArch64 machine, on which
task-taskgroup-unrelated.c was flaky. Moving the test code to a separate
function, executed 10 times, solved the issue.
Fixes#170138
This patch adds the basic infrastructure for lowering an OpenMP
directive, which should enable someone to take over the OpenMP lowering
in the future. It adds the lowering entry points to CIR in the same way
as OpenACC.
Note that this does nothing with any of the directives, which will
happen in a followup patch. No infrastructure for clauses is added
either, but that will come in a followup patch as well.
Since we updated our buildbot setup, these have been failing. Ignore
them until we have time to find the real problem, which is something to
do with failing to backtrace, or missing debug info when we do.
The use of nested m_Reassociatable matchers by #169644 can result in
high compile times as the inner m_Reassociatable call is being repeated
a lot while the outer call is trying to match. Place the inner
m_ReassociatableAnd at the beginning of the pattern so it is not
repeatedly matched in recursion.
This PR adds support for selecting specific archive members in
llvm-symbolizer using the `archive.a(member.o)` syntax, with
architecture-aware member selection.
**Key features:**
1. **Archive member selection syntax**: Specify archive members using
`archive.a(member.o)` format
2. **Architecture selection via `--default-arch` flag**: Select the
appropriate member when multiple members have the same name but
different architectures
3. **Architecture selection via `:arch` suffix**: Alternative syntax
`archive.a(member.o):arch` for specifying architecture
This functionality is primarily designed for AIX big archives, which can
contain multiple members with the same name but different architectures
(32-bit and 64-bit). However, the implementation works with all archive
formats (GNU, BSD, Darwin, big archive) and handles same-named members
created with llvm-ar q.
---------
Co-authored-by: Midhunesh <midhuensh.p@ibm.com>
For an OpenMP loop construct, count how many loops will effectively be
contained in its associated block. For constructs that are loop-nest
associated this number should be 1. Report cases where this number is
different.
Take into account that the block associated with a loop construct can
contain compiler directives.
Without this patch DW_AT_call_target is used for all indirect call address
location expressions. The DWARF spec says:
For indirect calls or jumps where the address is not computable without use
of registers or memory locations that might be clobbered by the call the
DW_AT_call_target_clobbered attribute is used instead of the
DW_AT_call_target attribute.
This patch implements that behaviour.
If I take the command from the page and add my triple like so:
$ cmake -G Ninja -S llvm -B build \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DLLVM_ENABLE_PROJECTS="clang" \ # Configure
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind;compiler-rt" \
-DLLVM_RUNTIME_TARGETS="aarch64-unknown-linux-gnu"
CMake Warning:
Ignoring extra path from command line:
" "
<...>
-- Build files have been written to:
/home/david.spickett/llvm-project/build -bash:
-DLLVM_ENABLE_RUNTIMES=libcxx;libcxxabi;libunwind;compiler-rt: command
not found
As the comment is after the backslash, it's considered part of the next
line. This comments out the ENABLE_RUNTIMES line and makes the
RUNTIME_TARGETS line look like another command.
To fix this, put the comment before the configure command.
I also moved the other inline comments (which are fine) closer to the
text since they don't have to line up with the configure one anymore.
This patch makes use of aggressive interleaving options for the A320
subtarget. This is done by adding a new local parameter to the
AArch64Subtarget class. With this enabled we see an aggregate uplift of
0.7% on internal benchmark suites with up to 51% uplift on individual
benchmark workloads.
This fixes an error on our Armv8 bot:
```
<...>/RemoteJITUtils.cpp:132:24: error: use of undeclared identifier 'DynamicThreadPoolTaskDispatcher'
132 | std::make_unique<DynamicThreadPoolTaskDispatcher>(std::nullopt),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
These examples require LLVM_ENABLE_THREADS to be ON, and cannot run
otherwise. As a comment says elsewhere:
```
// Out of process mode using SimpleRemoteEPC depends on threads.
```
#164405 added specializations of `for_each` that didn't do the ranges
call shenanigans, but instead just did what the classic algorithms have
to do. This updates the calls to work for the ranges overloads as well.
This change adds support for mixed precision floating point
arithmetic for `f16` and `bf16` where the following patterns:
```
%fh = fpext half %h to float
%resfh = fp-operation(%fh, ...)
...
%fb = fpext bfloat %b to float
%resfb = fp-operation(%fb, ...)
where the fp-operation can be any of:
- fadd
- fsub
- llvm.fma.f32
- llvm.nvvm.add(/fma).*
```
are lowered to the corresponding mixed precision instructions which
combine the conversion and operation into one instruction from
`sm_100` onwards.
This also adds the following intrinsics to complete support for
all variants of the floating point `add/fma` operations in order
to support the corresponding mixed-precision instructions:
- `llvm.nvvm.add.(rn/rz/rm/rp){.ftz}.sat.f`
- `llvm.nvvm.fma.(rn/rz/rm/rp){.ftz}.sat.f`
We lower `fneg` followed by one of the above addition
intrinsics to the corresponding `sub` instruction.
Tests are added in `fp-arith-sat.ll` , `fp-fold-sub.ll`, and
`bultins-nvptx.c`
for the newly added intrinsics and builtins, and in
`mixed-precision-fp.ll`
for the mixed precision instructions.
PTX spec reference for mixed precision instructions:
https://docs.nvidia.com/cuda/parallel-thread-execution/#mixed-precision-floating-point-instructions
In an effort to get rid of VPUnrollPartAccessor and directly unroll
recipes, start by directly unrolling VectorPointerRecipe, allowing for
VPlan-based simplifications and simplification of the corresponding
execute.
The `__parent_pointer` type alias was marked to be removed in
d163ab3323.
At that time, <map> still had uses of `__parent_pointer` as a local
variable type in operator[] and at()
Those uses were removed in 4a2dd31f16,
which refactored `__find_equal` to return a pair instead of using an out
parameter
However, the typedef in <map> and the alias in __tree were left behind
This patch removes the unused typedef from <map> and the
`__parent_pointer` alias from __tree
Signed-off-by: Krechals <topala.andrei@gmail.com>
RegBankLegalize using trivial mapping helper, assigns same reg bank
to all operands, vgpr or sgpr.
Uncovers multiple codegen and regbank combiner regressions related to
looking through sgpr to vgpr copies.
Skip regbankselect-concat-vector.mir since agprs are not yet supported.
That PR added an include to `LLVMOps.td` without adding a target
providing that file. Curiously, this does not break the official builds
but it *does* break my bazel build.
Signed-off-by: Ingo Müller <ingomueller@google.com>
This is needed so the llvm-cgdata tool properly builds with
`LLVM_BUILD_LLVM_DYLIB` so LLVM can be built as a DLL on Windows.
This effort is tracked in #109483.
This solves a common issue where users have to manually add the
`.cache/clangd/index/` folder to their `.gitignore`. I got this idea
from [ruff](https://github.com/astral-sh/ruff), which creates
`.ruff_cache/.gitignore` and it would greatly improve the user
experience for everyone without requiring per-computer configurations
and without any significant cost.
In concept checking, we need to transform SubstNTTPExpr when evaluating
constraints.
The value category is initially computed during parameter mapping,
possibly with a dependent expression. However during instantiation, it
wasn't recomputed, and the stale category is propagated into parent
expressions. So we may end up with an 'out-of-thin-air' reference type,
which breaks the evaluation.
We now call BuildSubstNonTypeTemplateParmExpr in TreeTransform, in which
the value category is recomputed.
The issue was brought by both 078e99e and the concept normalization
patch, which are not released yet, so no release note.
Fixes https://github.com/llvm/llvm-project/issues/170856
This adds support for lowering smaller-than-legal masks such as:
```
<vscale x 8 x i1> @llvm.loop.dependence.war.mask.nxv8i1(ptr %a, ptr %b, i64 1)
```
To a whilewr + unpack. It also slightly simplifies the lowering.