This is intended as a fast pattern rewrite driver for the cases when a
simple walk gets the job done but we would still want to implement it in
terms of rewrite patterns (that can be used with the greedy pattern
rewrite driver downstream).
The new driver is inspired by the discussion in
https://github.com/llvm/llvm-project/pull/112454 and the LLVM Dev
presentation from @matthias-springer earlier this week.
This limitation comes with some limitations:
* It does not repeat until a fixpoint or revisit ops modified in place
or newly created ops. In general, it only walks forward (in the
post-order).
* `matchAndRewrite` can only erase the matched op or its descendants.
This is verified under expensive checks.
* It does not perform folding / DCE.
We could probably relax some of these in the future without sacrificing
too much performance.
If the runtime flat address resolves to a scratch address,
64-bit atomics do not work correctly. Insert a runtime address
space check (which is quite likely to be uniform) and select between
the non-atomic and real atomic cases.
Consider noalias.addrspace metadata and avoid this expansion when
possible (we also need to consider it to avoid infinitely expanding
after adding the predication code).
This commit reverts commit 7c55426 (relands commit 97fb21ac)
With the additional step of ignoring a warning that we don't care about. (-Wunguarded-availability-new)
> We know that aligned_alloc will never be called on systems that do not have aligned_alloc defined because the client OSX won't provide it. This function is actually guarded on OS's lower than 10.15 because aligned_alloc declaration won't exist on that version. There will be no way to call this function from code.
This patch adds basic support for `hypot`. Constant folding support will
be submitted in a subsequent patch.
Related issue: https://github.com/llvm/llvm-project/issues/113711
Note: It's my first time contributing to the LLVM with encouragement
from one of my friends, @fawdlstty. I learned a lot from
https://github.com/llvm/llvm-project/pull/99611, and thanks for that.
Note: I had created the same PR and merged
(https://github.com/llvm/llvm-project/pull/113724), but reverted caused
by the merging issue. (The CI issue happened in 3 A.M. at my timezone.
So, I need to fall asleep again after I replied about why issue
happened.) So, I rebased to the latest main branch and recreate the PR
and hope I won't have the third time to create the same PR.
I hope @arsenm can help me review the code again. I’m sorry for that.
Kenji Mouri
This change modifies `getBackwardSlice` to track values captures by the
regions of each operation that it traverses. Ignoring values captured
from a parent region may lead to an incomplete program slice. However,
there seems to be logic that depends on not traversing captured values,
so this change preserves the default behavior by hiding this logic
behind the `omitUsesFromAbove` flag.
Atomic loads are handled differently from the DAG, and have separate opcodes
and explicit control over the extensions, like ordinary loads. Add
new patterns for these.
There's room for cleanup and improvement. d16 cases aren't handled.
Fixes#111645
This PR is one of the many PRs in the SYCL upstreaming effort focusing
on device code linking during the SYCL offload compilation process. RFC:
https://discourse.llvm.org/t/rfc-offloading-design-for-sycl-offload-kind-and-spir-targets/74088
In this PR, we introduce a new tool that will be used to perform device
code linking for SYCL offload kind. It accepts SYCL device objects in
LLVM IR bitcode format and will generate a fully linked device object
that can then be wrapped and linked into the host object.
A primary use case for this tool is to perform device code linking for
objects with SYCL offload kind inside the clang-linker-wrapper. It can
also be invoked via clang driver as follows:
`clang --target=spirv64 --sycl-link input.bc`
Device code linking for SYCL offloading kind has a number of known
quirks that makes it difficult to use in a unified offloading setting.
Two of the primary issues are:
1. Several finalization steps are required to be run on the fully-linked
LLVM IR bitcode to gaurantee conformance to SYCL standards. This step is
unique to SYCL offloading compilation flow.
2. SPIR-V LLVM Translator tool is an extenal tool and hence SPIR-V IR
code generation cannot be done as part of LTO. This limitation will be
lifted once SPIR-V backend is available as a viable LLVM backend.
Hence, we introduce this new tool to provide a clean wrapper to perform
SYCL device linking.
Co-Author: Michael Toguchi
Thanks
---------
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
This extends the existing fold:
```
for(i = Start; i < End; ++i)
Rem = (i nuw+- IncrLoopInvariant) u% RemAmtLoopInvariant;
```
->
```
Rem = (Start nuw+- IncrLoopInvariant) % RemAmtLoopInvariant;
for(i = Start; i < End; ++i, ++rem)
Rem = rem == RemAmtLoopInvariant ? 0 : Rem;
```
To work with a non-zero `IncrLoopInvariant`.
This is a common usage in cases such as:
```
for(i = 0; i < N; ++i)
if ((i + 1) % X) == 0)
do_something_occasionally_but_not_first_iter();
```
Alive2 w/ i4/unrolled 6x (needs to be ran locally due to timeout):
https://alive2.llvm.org/ce/z/6tgyN3
Exhaust proof over all uint8_t combinations in C++:
https://godbolt.org/z/WYa561388
Many tests had this flag removed because of the G_BITCAST emission
issue. Now that the PR is merged, we can re-enable this additional
check.
2 tests (basic_int_types) just have the TODO removed because they are
not useful for SPIR-V as-is: SPIR-V requires reg2mem/mem2reg to run,
which removes all the body. Integers are used in other spirv tests, and
seems like testing for spirv32/64 and relying on others for the logical
target coverage should be fine.
Signed-off-by: Nathan Gauër <brioche@google.com>
This shares most of its code with the scalar sincos expansion. It allows
expanding vector FSINCOS nodes to a library call from the specified
`-vector-library`. The upside of this is it will mean the vectorizer
only needs to handle the sincos intrinsic, which has no memory effects,
and this can handle lowering the intrinsic to a call that takes output
pointers.
We can use `viota`+`vrgather` to synthesize `vdecompress` and lower
expanding load to `vcpop`+`load`+`vdecompress`.
And if `%mask` is all ones, we can lower expanding load to a normal
unmasked load.
Fixes#101914.
This patch improves error reporting in the MLIR to LLVM IR translation
pass for the 'omp' dialect by emitting descriptive errors when
encountering clauses not yet supported by that pass.
Additionally, not-yet-implemented errors previously missing for some
clauses are added, to avoid silently ignoring them.
Error messages related to inlining of `omp.private` and
`omp.declare_reduction` regions have been updated to use the same
format.
This patch unifies the handling of errors passed through the
OpenMPIRBuilder and removes some redundant error messages through the
introduction of a custom `ErrorInfo` subclass.
Additionally, the current list of operations and clauses unsupported by
the MLIR to LLVM IR translation pass is added to a new Lit test to check
they are being reported to the user.
Swift ClangImporter now supports concurrency annotations on imported
declarations and their parameters/results, to make it possible to use
imported APIs in Swift safely there has to be a way to annotate
individual parameters and result types with relevant attributes that
indicate that e.g. a block is called on a particular actor or it accepts
a `Sendable` parameter.
To faciliate that `SwiftAttr` is switched from `InheritableAttr` which
is a declaration attribute to `DeclOrTypeAttr`. To support this
attribute in type context we need access to its "Attribute" argument
which requires `AttributedType` to be extended to include `Attr *` when
available instead of just `attr::Kind` otherwise it won't be possible to
determine what attribute should be imported.
`DIGenericSubrange` is used when the dimensions of the arrays are
unknown at build time (e.g. assumed-rank arrays in Fortran). It has same
`lowerBound`, `upperBound`, `count` and `stride` fields as in
`DISubrange` and its translation looks quite similar as a result.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
SValBuilder::getKnownValue, getMinValue, getMaxValue use
SValBuilder::simplifySVal.
simplifySVal does repeated simplification until a fixed-point is
reached. A single step is done by SimpleSValBuilder::simplifySValOnce,
using a Simplifier visitor. That will basically decompose SymSymExprs,
and apply constant folding using the constraints we have in the State.
Once it decomposes a SymSymExpr, it simplifies both sides and then uses
the SValBuilder::evalBinOp to reconstruct the same - but now simpler -
SymSymExpr, while applying some caching to remain performant.
This decomposition, and then the subsequent re-composition poses new
challenges to the SValBuilder::evalBinOp, which is built to handle
expressions coming from real C/C++ code, thus applying some implicit
assumptions.
One previous assumption was that nobody would form an expression like
"((int*)0) - q" (where q is an int pointer), because it doesn't really
makes sense to write code like that.
However, during simplification, we may end up with a call to evalBinOp
similar to this.
To me, simplifying a SymbolRef should never result in Unknown or Undef,
unless it was Unknown or Undef initially or, during simplification we
realized that it's a division by zero once we did the constant folding,
etc.
In the following case the simplified SVal should not become UnknownVal:
```c++
void top(char *p, char *q) {
int diff = p - q; // diff: reg<p> - reg<q>
if (!p) // p: NULL
simplify(diff); // diff after simplification should be: 0(loc) - reg<q>
}
```
Returning Unknown from the simplifySVal can weaken analysis precision in
other places too, such as in SValBuilder::getKnownValue, getMinValue, or
getMaxValue because we call simplifySVal before doing anything else.
For nonloc::SymbolVals, this loss of precision is critical, because for
those the SymbolRef carries an accurate type of the encoded computation,
thus we should at least have a conservative upper or lower bound that we
could return from getMinValue or getMaxValue - yet we would just return
nullptr.
```c++
const llvm::APSInt *SimpleSValBuilder::getKnownValue(ProgramStateRef state,
SVal V) {
return getConstValue(state, simplifySVal(state, V));
}
const llvm::APSInt *SimpleSValBuilder::getMinValue(ProgramStateRef state,
SVal V) {
V = simplifySVal(state, V);
if (const llvm::APSInt *Res = getConcreteValue(V))
return Res;
if (SymbolRef Sym = V.getAsSymbol())
return state->getConstraintManager().getSymMinVal(state, Sym);
return nullptr;
}
```
For now, I don't plan to make the simplification bullet-proof, I'm just
explaining why I made this change and what you need to look out for in
the future if you see a similar issue.
CPP-5750
This teaches dagcombiner to fold:
`(asr (add nsw x, y), 1) -> (avgfloors x, y)`
`(lsr (add nuw x, y), 1) -> (avgflooru x, y)`
as well the combine them to a ceil variant:
`(avgfloors (add nsw x, y), 1) -> (avgceils x, y)`
`(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)`
iff valid for the target.
Removes some of the ARM MVE patterns that are now dead code.
It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the
immediate splatting as before.
The test is currently passing everywhere but this 32-bit arm ubuntu bot.
I don't have an easy way to debug this, so I'm skipping the test on that
platform till we get a chance to figure this out.