This should prevent the flaky failures that have been plaguing the
buildbots since the test was introduced and allow for offline
investigation without disrupting CI.
Reviewers: topperc, mshockwave
Reviewed By: mshockwave
Pull Request: https://github.com/llvm/llvm-project/pull/170014
Remove errant `\a` command before `<directory>` in `SaveToDisk`
documentation. The `\a` Doxygen command expects a word argument, but
`<directory>` starts with `<` which Doxygen interprets as HTML. This
fixes:
```
llvm-project/lldb/include/lldb/API/SBTrace.h:60:
Warning 564: Error parsing Doxygen command a: No word followed the command. Command ignored.
```
The current description mistakenly specified that an address of a local
value in some address space is returned. When testing this with Wasm
runtimes that already implement this command, it can be observed that
the value itself is returned. The value itself may be an address for
languages that use shadow stack in Wasm linear memory, but the value of
an arbitrary local does not always contain that address.
fixes#168737fixes#168755
This change fixes adds support for Matrix truncations via the
ICK_HLSL_Matrix_Truncation enum. That ends up being most of the files
changed.
It also allows Matrix as an HLSL Elementwise cast as long as the cast
does not perform a shape transformation ie 3x2 to 2x3.
Tests for the new elementwise and truncation behavior were added. As
well as sema tests to make sure we error n the shape transformation
cast.
I am punting right now on the ConstExpr Matrix support. That will need
to be addressed later. Will file a seperate issue for that if reviewers
agree it can wait.
We were not marking the `.cfi.jumptable` functions as `naked` on windows. The referenced bug (https://llvm.org/bugs/show_bug.cgi?id=28641#c3) appears to be fixed:
```bash
build/bin/opt -S -passes=lowertypetests -mtriple=i686-pc-win32 llvm/test/Transforms/LowerTypeTests/function.ll | build/bin/llc -O0
```
```
L_.cfi.jumptable: # @.cfi.jumptable
# %bb.0: # %entry
#APP
jmp _f.cfi@PLT
int3
int3
int3
#NO_APP
#APP
jmp _g.cfi@PLT
int3
int3
int3
#NO_APP
# -- End function
.section .rdata,"dr"
.p2align 4, 0x0 # @0
```
Not seeing the spilled registers described in the bug anymore.
This PR is part of #167752. It upstreams the codegen and tests for the
shuffle builtins implemented in the incubator, including:
- `vinsert` + `insert`
- `pblend` + `blend`
- `vpermilp`
- `pshuf` + `shufp`
- `palignr`
It does NOT upstream the `perm`, `vperm2`, `vpshuf`, `shuf_i` / `shuf_f`
and `align` builtins, which are not yet implemented in the incubator.
This _is_ a large commit, but most of it is tests.
The `pshufd` / `vpermilp` builtins seem to have no test coverage in the
incubator, what should I do?
Reverts llvm/llvm-project#154069. I pointed out a number of issues
post-merge, most importantly examples of miscompiles:
https://github.com/llvm/llvm-project/pull/154069#issuecomment-3603854626.
While the motivation of the change is clear, I think the implementation
approach is flawed. It seems like the goal is to allow elements like
`load <2xi16>` and `load i32` to be vectorized together despite the
current algorithm not grouping them into the same equivalence classes. I
personally think that if we want to attempt this it should be a more
wholistic approach, maybe even redefining the concept of an equivalence
class. This current solution seems like it would be really hard to do
bug-free, and even if the bugs were not present, it is only able to
merge chains that happen to be adjacent to each other after
`splitChainByContiguity`, which seems like it is leaving things up to
chance whether this optimization kicks in. But we can discuss more in
the re-land. Maybe the broader approach I'm proposing is too difficult,
and a narrow optimization is worthwhile. Regardless, this should be
reverted, it needs more iteration before it is correct.
This PR is a follow up to #167975 and replaces calls to trivial copy
constructors with `cir::CopyOp`.
---------
Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
Co-authored-by: Henrich Lauko <henrich.lau@gmail.com>
Adding some new test cases (including FIXME:s) to highlight some bugs
related to lowering of llvm.objectsize.
One special case is when there are getelementptr instruction with index
types that are larger than the index type size for the pointer being
analysed. This will add a couple of tests to show what happens both when
using a smaller and larger index type, and when having out-of-bounds
indices (both too large and negative).
Use standard GlobalISel error reporting with reportGISelFailure
and pass returning false instead of llvm_unreachable.
Also enables -global-isel-abort=0 or 2 for -global-isel -new-reg-bank-select.
Note: new-reg-bank-select with abort 0 or 2 runs LCSSA,
while "intended use" without abort or with abort 1 does not run LCSSA.
* Add compatibility support for DP and REPORT macros
* Define a set of predefined Debug Type for libomptarget
* Start to update libomptarget files (OffloadRTL.cpp, device.cpp)
This change fixes couple of issues with static resources:
- Enables assignment to static resource or resource array variables (fixes#166458)
- Initializes static resources and resource arrays with default constructor that sets the handle to poison
Updates `InitializeRequestArguments` to correctly follow the spec, see
https://microsoft.github.io/debug-adapter-protocol/specification#Requests_Initialize.
This should correct which fields are tracked as optional and simplifies
some of the types to make sure they're meaningful (e.g. an
`optional<bool>` isn't anymore helpful than a `bool` since undefined and
false are basically equivalent and it requires us to handle interpreting undefined as the default value in all the places we use the `optional<bool>`).
This clause is pretty small/trivial and is a simple 'set a bool' value
on the IR node, so its implementation is quite simple. We create the
Operation with this as 'false', so the 'nohost' marks it as true always.
The VPlan-based cost model assigns the forced cost once for a whole
VPInterleaveRecipe. Update the legacy cost model to match this behavior.
This fixes a cost-model divergence, and assigns the cost in a way that
matches the generated code more accurately.
PR: https://github.com/llvm/llvm-project/pull/168270
Commit b262785 introduced a separate `AnalysisFpExc` target to try to
workaround the lack of a bazel equivalent of single source file
properties. However, this introduces backref errors when
`--warn-backrefs` is enabled.
This change alternatively just adds the `-ftrapping-math` copt to the
entire `Analysis` target.
Fix suggested by @rocallahan.
This patch extends the OpenACC PointerLikeType interface with two new
methods for generating load and store operations, enabling
dialect-agnostic memory access patterns.
New Interface Methods:
- genLoad(builder, loc, srcPtr, valueType): Generates a load operation
from a pointer-like value. Returns the loaded value.
- genStore(builder, loc, valueToStore, destPtr): Generates a store
operation to a pointer-like value.
Implementations provided for FIR pointer-like types, memref type (rank-0
only), and LLVM pointer types.
Extended TestPointerLikeTypeInterface.cpp with 'load' and 'store' test
modes.
The 'routine' construct just adds a acc.routine element to the global
module, which contains all of the information about the directive. it
contains a reference to the function, which also contains a reference to
the acc.routine, which this generates.
This handles both the implicit-func version (where the routine is
spelled without parens, and just applies to the next function) and
the explicit-func version (where the routine is spelled with the func
name in parens).
The AST stores the directive in an OpenACCRoutineDeclAttr in the
implicit case, so we can emit that when we hit the function declaration.
The explicit case is held in an OpenACCRoutineAnnotAttr on the function,
however, when we emit the function we haven't necessarily seen the
construct yet, so we can't depend on that attribute. Instead, we save up
the list in Sema so that we can emit them all at the end.
This results in the tests getting really hard to read (because ordering
is a little awkward based on spelling, with no way to fix it), so we
instead split the tests up based on topic.
One last thing: Flang spends some time determining if the clause lists
of two routines on the same function are identical, and omits the
duplicates. However, it seems to do a poor job on this when the ordering
isn't the same, or references are slightly different. This patch doesn't
bother trying that, and instead emits all, trusting the ACC dialect to
remove duplicates/handle duplicates gracefully.
Note; This doesn't cause emission of functions that would otherwise not
be emitted, but DOES emit routine references based on which function
they are attached to.
This commit modifies the dwarf expression evaluator in how we handle the
deref operation for register and implicit locations on the stack. For a
typical memory location a deref operation will read the value from
memory. For register and implicit locations the deref operation will
read the value from the register or its implicit location. In lldb we
eagerly read register and implicit values and push them on the stack so
the deref operation for these becomes a "no-op" that leaves the value on
the stack and updates the tracked location kind.
The motivation for this change is to handle `DW_OP_deref*` operations on
location descriptions as described by the heterogenious debugging
[extensions](https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/llvm/html/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html#a-2-5-4-4-4-register-location-description-operations).
Specifically, for register locations it states
> These operations obtain a register location. To fetch the contents of
> a register, it is necessary to use DW_OP_regval_type, use one of the
> DW_OP_breg* register-based addressing operations, or use DW_OP_deref*
on
> a register location description.
My understanding is that this is the intended behavior from dwarf5 as
well and is not a change in behavior.
The two paterns for handlig vector.maskedload on AMD GPUs had an overlap
- both the "scalar mask becomes an if statement" pattern and the "masked
loads become a normal load + a select on buffers" patterns could handle
a load with a broadcast mask on a fat buffer resource.
This commet add checks to resolve the overlap.
In MFMA rewrite pass, prevent AGPR_32 reg class assignment for scale
operands, not permitted by instruction format.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
This change makes StackFrame methods virtual to enable subclass
overrides and introduces BorrowedStackFrame, a wrapper that presents an
existing StackFrame with a different frame index.
This enables creating synthetic frame views or renumbering frames
without copying the underlying frame data, which is useful for frame
manipulation scenarios.
This also adds a new borrowed-info format entity to show what was the
original frame index of the borrowed frame.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
I had a case where the frontend was generating a zero elem array in
non-shader code so it was just crashing in a release build.
Add a real error and make it not crash.
---------
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
In Debug builds, the names of adjusted pointers have a pointer-specific
name prefix which doesn't exist in non-debug builds.
This causes differences in output when looking at the output of SROA
with a Debug or Release compiler.
For most of our ongoing testing, we use essentially Release+Asserts
build (basically release but without NDEBUG defined), however we ship a
Release compiler. Therefore we want to say with reasonable confidence
that building a large project with Release vs a Release+Asserts build
gives us the same output when the same compiler version is used.
This difference however, makes it difficult to prove that the output is
the same if the only difference is the name when using LTO builds and
looking at bitcode.
Hence this change is being proposed.