Dead code analysis crashed because a symbol that is called/used didn't appear in the symbol
table.
This patch fixes this by adding a nullptr check after symbol table lookup.
into anyext x
; CHECK-NEXT: [[MV1:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[TRUNC]](s32),
[[DEF]](s32)
Please continue padding merge values.
// %bits_8_15:_(s8) = G_IMPLICIT_DEF
// %0:_(s16) = G_MERGE_VALUES %bits_0_7:(s8), %bits_8_15:(s8)
%bits_8_15 is defined by undef. Its value is undefined and we can pick
an arbitrary value. For optimization, we pick anyext, which plays well
with the undefinedness.
// %0:_(s16) = G_ANYEXT %bits_0_7:(s8)
The upper bits of %0 are undefined and the lower bits come from
%bits_0_7.
Preserving the case order is not strictly necessary to preserve
semantics (for example, operations like SwitchInst::removeCase will
happily swap cases around). However, I'm planning to introduce an
optional verification step for SandboxIR that will use StructuralHash to
compare IR after a revert to the original IR to help catch tracker bugs,
and the order difference triggers a difference there.
`std::copy` doesn't use the `_AlgPolicy` for anything other than calling
itself with it, so we can just remove the argument. This also removes
the need in a few other algorithms which had an `_AlgPolicy` argument
only to call `copy`.
In `staticallyExtractSubvector`, When the extracting slice is the same
as source vector, do not need to emit `vector.extract_strided_slice`.
This fixes the lit test case `@vector_store_i4` in
`mlir\test\Dialect\Vector\vector-emulate-narrow-type.mlir`, where
converting from `vector<8xi4>` to `vector<4xi8>` does not need slice
extraction.
The issue was introduced in #113411 and #115070, CI failure link:
https://buildkite.com/llvm-project/github-pull-requests/builds/118845
This PR does not include a lit test case because it is a fix and the
above mentioned `@vector_store_i4` test actually tests the mechanism.
Signed-off-by: Alan Li <me@alanli.org>
If we linearize values (with an assertion tha they are disjoint) and
then delinearize that linear index with th exact same basis, we know
that these operations are exact inverses of each other and can be
replaced with the original inputs to the linearization.
Similarly, if we take a linear index, delinearize it with some bases,
and then re-linearize it with that same basis (noting that the outputs
of the delinearization are guaranteed to by `disjoint`, even if this is
not asserted on the linearize_index operation), the re-linearization is
the inverse of the delinearization, so those two operations can also be
canceled out.
This commit adds canonicalization patterns for these simple
cancelations.
* Enforce that Tosa_ElementwiseUnaryOp requires output tensors to match
the input tensor's type and shape.
* Update the following ops to conform to Tosa_ElementwiseUnaryOp: clamp,
erf, sigmoid, tanh, cos, sin, abs, bitwise_not, ceil, clz, exp, floor,
log, logical_not, negate, reciprocal, rsqrt.
* Add invalid tests for each operator to ensure compliance with TOSA
v1.0 Specification.
Signed-off-by: Peng Sun <peng.sun@arm.com>
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
- `Builtins.td` - Add f16 support for libm atan2 builtin
- `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin
- `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of
atan2 for clang builtin to lib call calling convention check, now that
atan2 maps to an intrinsic.
- add atan2 cases to llvm.experimental.constrained tests for more
backends: ARM, PowerPC, RISCV, SystemZ.
- LangRef.rst: add llvm.experimental.constrained.atan2, revise
llvm.atan2 description.
Last part of Implement the atan2 HLSL Function. Fixes#70096.
The -time option prints timing information for the subcommands
(compiler, linker) in a format similar to that used by gcc/gfortran.
This partially addresses requests from #89888
After #115084 the 80 bit long double tests error if sizeof(long double)
isn't 96 or 128 bits. This caused failures in long double is double
systems (since long double is 64 bits) so I've disabled the 80 bit long
double tests on systems that don't use them.
When the chain is not the entry node there is a risk the stores are
within a (CALLSEQ_START, CALLSEQ_END), which when the node is expanded
will lead to nested call sequences.
It should be possible to check for this and allow more cases, but for
now, let's limit this to cases where it's definitely safe.
Fixes#115323
Absolute thunks generated by LLD reference function addresses recorded
as data in code. Since they are generated by the linker, they don't have
relocations associated with them and thus the addresses are left
undetected. Use pattern matching to detect such thunks and handle them
in VeneerElimination pass.
According to the Doxygen documentation,
the `relates`, `related`, `relatesalso`, and `relatedalso` commands all
have a single argument. This patch changes their classification from
`VerbatimLineCommand` to `InlineCommand` so the argument is correctly
parsed.
Reorganize the code into phases:
* Analyze/normalize
* Create extracted function prototype
* Generate the new function's implementation
* Generate call to new function
* Connect call to original function's CFG
The motivation is #114669 to optionally clone the selected code region
into the new function instead of moving it. The current structure made
it difficult to add such functionality since there was no obvious place
to do so, not made easier by some functions doing more than their name
suggests. For instance, constructFunction modifies code outside the
constructed function, but also function properties such as
setPersonalityFn are derived somewhere else. Another example is
emitCallAndSwitchStatement, which despite its name also inserts stores
for output parameters.
Many operations also implicitly depend on the order they are applied
which this patch tries to reduce. For instance, ExtractedFuncRetVals
becomes the list exit blocks which also defines the return value when
leaving via that block. It is computed early such that the new
function's return instructions and the switch can be generated
independently. Also, ExtractedFuncRetVals is combining the lists
ExitBlocks and OldTargets which were not always kept consistent with
each other or NumExitBlocks. The method recomputeExitBlocks() will
update it when necessary.
The coding style partially contradict the current coding standard. For
instance some local variable start with lower case letters. I updated
some, but not all occurrences to make the diff match at least some lines
as unchanged.
The patch [D96854](https://reviews.llvm.org/D96854) introduced some
confusion of function argument indexes this is fixed here as well, hence
the patch is not NFC anymore. Tested in modified CodeExtractorTest.cpp.
Patch [D121061](https://reviews.llvm.org/D121061) introduced
AllocationBlock, but not all allocas were inserted there.
Efectively includes the following fixes:
1. ce73b1672a
2. 4aaa925786
3. Missing allocas, still unfixed
Originally submitted as https://reviews.llvm.org/D115218
Avoid generating spurious tensor.extract_slice, follow-on for #114315.
This is best to demonstrate with an example. Here's input for
`GeneralizeOuterUnitDimsPackOpPattern`:
```mlir
%pack = tensor.pack %input
padding_value(%pad : f32)
inner_dims_pos = [1, 0]
inner_tiles = [2, %tile_dim_1]
into %output : tensor<5x1xf32> -> tensor<1x1x2x?xf32>
```
Output _before_:
```mlir
%padded = tensor.pad %arg0 low[0, 0] high[%0, 1] {
^bb0(%arg4: index, %arg5: index):
tensor.yield %arg2 : f32
} : tensor<5x1xf32> to tensor<?x2xf32>
// NOTE: skipped in the output _after_
%extracted_slice = tensor.extract_slice
%padded[0, 0] [%arg3, 2] [1, 1] :
tensor<?x2xf32> to tensor<?x2xf32>
%empty = tensor.empty(%arg3) : tensor<2x?xf32>
%transposed = linalg.transpose
ins(%extracted_slice : tensor<?x2xf32>)
outs(%empty : tensor<2x?xf32>)
permutation = [1, 0]
%inserted_slice = tensor.insert_slice %transposed=
into %arg1[0, 0, 0, 0] [1, 1, 2, %arg3] [1, 1, 1, 1] :
tensor<2x?xf32> into tensor<1x1x2x?xf32>
```
Output _after_:
```mlir
%padded = tensor.pad %arg0 low[0, 0] high[%0, 1] {
^bb0(%arg4: index, %arg5: index):
tensor.yield %arg2 : f32
} : tensor<5x1xf32> to tensor<?x2xf32>
%empty = tensor.empty(%arg3) : tensor<2x?xf32>
%transposed = linalg.transpose
ins(%padded : tensor<?x2xf32>)
outs(%empty : tensor<2x?xf32>) permutation = [1, 0]
%inserted_slice = tensor.insert_slice %transposed
into %arg1[0, 0, 0, 0] [1, 1, 2, %arg3] [1, 1, 1, 1] :
tensor<2x?xf32> into tensor<1x1x2x?xf32>
```
This PR also adds a check to verify that only the last N trailing
dimensions are tiled (for some value of N). Based on the PR
discussion, this restriction seems reasonable - especially as there
are no in-tree tests requiring otherwise. For now, it also simplifies
the computation of permutations for linalg.transpose. This
restriction can be relaxed in the future if needed.
All patterns in populateVectorNarrowTypeEmulationPatterns currently
assume a 1-D vector load/store rather than an n-D vector load/store.
This assumption is evident in ConvertVectorTransferRead, for example,
here (extracted from `ConvertVectorTransferRead`):
```cpp
auto newRead = rewriter.create<vector::TransferReadOp>(
loc, VectorType::get(numElements, newElementType), adaptor.getSource(),
getValueOrCreateConstantIndexOp(rewriter, loc, linearizedIndices),
newPadding);
auto bitCast = rewriter.create<vector::BitCastOp>(
loc, VectorType::get(numElements * scale, oldElementType), newRead);
```
Both invocations of `VectorType::get()` here generate a 1-D vector.
Attempts to use these patterns with more generic cases, such as 2-D
vectors, fail. For example, trying to cast the following 2-D case to
`i32`:
```mlir
func.func @vector_maskedload_2d_i8_negative(
%idx1: index,
%idx2: index,
%num_elems: index,
%passthru: vector<2x4xi8>) -> vector<2x4xi8> {
%0 = memref.alloc() : memref<3x4xi8>
%mask = vector.create_mask %num_elems, %num_elems : vector<2x4xi1>
%1 = vector.maskedload %0[%idx1, %idx2], %mask, %passthru :
memref<3x4xi8>, vector<2x4xi1>, vector<2x4xi8> into vector<2x4xi8>
return %1 : vector<2x4xi8>
}
```
For example, casting to i32 produces:
```bash
error: 'vector.bitcast' op failed to verify that all of {source, result} have same rank
%1 = vector.maskedload %0[%idx1, %idx2], %mask, %passthru :
^
```
Instead of reworking these patterns (that's going to require much more
effort), I’ve marked them as 1-D only and extended
"TestEmulateNarrowTypePass" with an option to disable the Memref type
converter - that's to be able to add negative tests (otherwise, the type
converter throws an error we can't really test for). While not ideal,
this workaround should suit a test pass.
This commit adds an intrinsic that will read from an image buffer. We
chose to match the name of the DXIL intrinsic for simplicity in clang.
We cannot reuse the existing openCL readimage function because that is
not a reserved name in HLSL.
I considered trying to refactor generateReadImageInst, so that we could
share code between the two implementations. However, most of the code in
generateReadImageInst is concerned with trying to figure out which type
of image read is being done. Once we factor out the code that will be
common, then we end up with just a single call to the MIRBuilder being
common.
- Re-enabled ulkbits and lkbits for Risc-V
- Bumped `int_lk_t` to a `signed long long` and a `uint_ulk_t` to an
`unsigned long long` to guarantee they both fit in 8 bytes, which `long
_Accum` and `unsigned long _Accum` are defaulted to on 32bit
architectures.
This is probably inconvenient on systems that have a word size larger
than 64 bits?
#115778
- as per the definition of `select` in GlobalISel/InstructionSelector.h
the return value is a boolean denoting if the select was successful
- doing `Result |=` is incorrect as all inserted instructions should be
succesful, hence we change to using `Result &=`
- ensure that the return value of all BuildMI instructions are
propagated correctly
Heterogenous lookups allow us to call find with StringRef, avoiding a
temporary heap allocation of std::string.
This patch introduces alias:
using DiagsInGroup = std::map<std::string, GroupInfo, std::less<>>;
because the raw type is a bit mouthful.
The VP variants simply return the same costs as the non-VP variants.
This assumes that reductions are VL predicated, and that VL predication
has no additional cost.
Currently all implicit-def instructions are part of
bb prolog. We should only include the wwm-register's
implicit definitions into the BB prolog. The other
vector class registers' implicit defs when exist at
the bb top might cause interference when pushed the
LR_split copy insertion downwards. The SplitKit is
very strict on altering the insertion points and will
assert such instances.
The buildX naming convention originated when the CIRGen implementation
was planned to be substantially different from original CodeGen. CIRGen
is now a much closer adaption of CodeGen, and the emitX to buildX
renaming just makes things more confusing, since CodeGen also has some
helper functions whose names start with build or Build, so it's not
immediately clear which CodeGen function corresponds to a CIRGen buildX
function. Rename the buildX functions back to emitX to fix this.
This is reapplies #115615 without using tuples. The eager call of
`getRegion()` and `getOffset()` could cause crashes when the Store had
symbolic bindings.
Here I'm fixing the crash by lazily calling those getters.
Also, the tuple version poorly sorted the Clusters. The memory spaces
should have come before the regular clusters.
Now, that is also fixed here, demonstrated by the test.
Do not consider profile data when choosing a successor block to sink
into for optsize functions. This should result in more consistent
instruction sequences which will improve outlining and ICF. We've
observed a slight codesize improvement in a large binary. This is
similar reasoning to https://github.com/llvm/llvm-project/pull/114607.
Using profile data to select a block to sink into was original added in
d04f7596e7.
* Do not use profile data when flipping a branch condition when
optimizing for size. This should improving outlining and ICF due to more
uniform instruction sequences.
* Refactor `optimizeBranches()` to use early `continue`s
* Use the correct debug location for `insertBranch()`
This PR makes webkit.UncountedLambdaCapturesChecker ignore trivial
functions as well as the one being passed to an argument with
[[clang::noescape]] attribute. This dramatically reduces the false
positive rate for this checker.
To do this, this PR replaces VisitLambdaExpr in favor of checking
lambdas via VisitDeclRefExpr and VisitCallExpr. The idea is that if a
lambda is defined but never called or stored somewhere, then capturing
whatever variable in such a lambda is harmless.
VisitCallExpr explicitly looks for direct invocation of lambdas and
registers its DeclRefExpr to be ignored in VisitDeclRefExpr. If a lambda
is being passed to a function, it checks whether its argument is
annotated with [[clang::noescape]]. If it's not annotated such, it
checks captures for their safety.
Because WTF::switchOn could not be annotated with [[clang::noescape]] as
function type parameters are variadic template function so we hard-code
this function into the checker.
In order to check whether "this" pointer is ref-counted type or not, we
override TraverseDecl and record the most recent method's declaration.
In addition, this PR fixes a bug in isUnsafePtr that it was erroneously
checking whether std::nullopt was returned by isUncounted and
isUnchecked as opposed to the actual boolean value.
Finally, this PR also converts the accompanying test to use -verify and
adds a bunch of tests.
This change splits the llvm/test/CodeGen/RISCV/branch-relaxation.ll test
which contained comments saying that different test functions were valid
or not on rv32/rv64. Not only was this confusing, but the inline
assembly in the test was being passed values wider than xlen on rv32.
Combined constructs (OpenACC 3.3 section 2.11) are a short-cut for
writing a `loop` construct immediately inside of a `compute` construct.
However, this interaction requires we do additional work to ensure that
we get the semantics between the two correct, as well as diagnostics.
This patch adds the semantic analysis for the constructs (but no
clauses), as well as the AST nodes.
Similar to other targets (AMDGPU, Mips, PowerPC, RISCV, X86, ...)
`ninja check-clang-codegen-aarch64` can be used to test this subfolder.
Pull Request: https://github.com/llvm/llvm-project/pull/115818