This commit fixes the problem of missing build dependencies between
libclc source files and their various includes (namely headers and .inc
files).
We would like to do this with compiler-generated dependency files
because then the dependencies are accurate and there are no false
positives, leading to unnecessary rebuilds. This is how regular C/C++
dependencies are usually tracked by CMake.
Note that this variable is an internal API so is not guaranteed to work,
but then again *all* of CMake's support for new languages (which we use
for CLC/LL languages) is an internal API. On balance this change is
probably worth it due to how minimally invasive it is. It should work
with all supported compilers and CMake generators.
The `ASTWriter` algorithm for computing affecting module maps uses
`SourceManager::translateFile()` to get a `FileID` from a `FileEntry`.
This is slow (O(n)) since the function performs a linear walk over
`SLocEntries` until it finds one with a matching `FileEntry`.
This patch removes this use of `SourceManager::translateFile()` by
tracking `FileID` instead of `FileEntry` in couple of places in
`ModuleMap`, giving `ASTWriter` the desired `FileID` directly. There are
no changes required for clients that still want a `FileEntry` from
`ModuleMap`: the existing APIs internally use `SourceManager` to perform
the reverse `FileID` to `FileEntry` conversion in O(1).
We could easily support the X86ISD 'float' variants of the logic ops as well, but we don't have good test coverage at the moment (they're mainly for SSE1 targets).
This is an NFC prep patch for an upcoming change which is going to
support or_is_add in a bunch more cases. Posting separately because
tblgen is not particularly my strong suit.
Previously, the clang compiler with the dxc driver would accept the
-enable-16bit-types flag without checking to see if the required
conditions are met for proper processing of the flag.
Specifically, -enable-16bit-types requires a shader model of at least
6.2 and an HLSL version of at least 2021.
This PR adds a validation check for these other options having the
required values, and emits an error if these constraints are not met.
Fixes#57876
---------
Co-authored-by: Damyan Pepper <damyanp@microsoft.com>
Co-authored-by: Chris B <cbieneman@microsoft.com>
Split them according to their implementation.
Ideally, header files should be used by only one target, but this is
hard because CMake is less strict with headers (no layering check). But
even with bazel, headers should only be exported once in the `hdrs`
attribute. Other targets may use them in the `srcs` attribute to avoid
circular dependencies.
We have more complete logic for handling `Add`, so try to use that
logic for `or disjoint` (which can definitionally be treated as
`add`).
Closes#86058
This adds test coverage for implementation limits that were defined in
WG14 N590. The original content of that paper is not available, so this
actually tests against the limits as of C23.
These non-finite math ops are supported by SPIR-V but not by WGSL.
Assume finite floating point values and expand these ops into `false`.
Previously, this worked by adding fast math flags during conversion from
arith to spirv, but this got removed in
https://github.com/llvm/llvm-project/pull/86578.
Also do some misc cleanups in the surrounding code.
The original implementation was eagerly reporting silenceable failures
from actions as definite failures. Since silenceable failures are
intended for cases when the IR has not been irreversibly modified, it's
okay to propagate them as silenceable failures of the parent op.
Fixes#86834.
…d viceversa
-- Adds folding of producer linalg transpose op with consumer unpack op,
also adds folding of producer unpack op and consumer transpose op.
-- Minor bug fixes w.r.t. to the test cases.
Update instruction locations in the __bug_table section after new code
is emitted. If an instruction with associated bug ID was deleted,
overwrite its location with zero.
On SystemZ, the outgoing argument area which is big enough for all calls
in the function is created once during the prolog, as opposed to
adjusting the stack around each call. The call-sequence instructions are
therefore not really useful any more than to compute the maximum call
frame size, which has so far been done by PEI, but can just as well be
done at an earlier point.
This patch removes the mapping of the CallFrameSetupOpcode and
CallFrameDestroyOpcode and instead computes the MaxCallFrameSize
directly after instruction selection and then removes the ADJCALLSTACK
pseudos. This removes the confusing pseudos and also avoids the problem
of having to keep the call frame size accurate when creating new MBBs.
This fixes#76618 which exposed the need to maintain the call frame size
when splitting blocks (which was not done).
Critically, we don't want to return an iterator to the end of the underlying
cpp::array "store." Add a test to catch this issue.
This will be used by __cxa_finalize to iterate backwards through a FixedVector.
Link: #85651
This patch fixes:
mlir/lib/Dialect/Tensor/Transforms/MergeConsecutiveInsertExtractSlicePatterns.cpp:158:17:
error: 'matchAndRewrite' overrides a member function but is not
marked 'override' [-Werror,-Wsuggest-override]
This is a new ld64 flag (along with `-warn_duplicate_libraries`), where
the warning is enabled by default, and it can be useful to ignore since
it can be hard to dedup library flags across large builds. This doesn't
ignore the enabling version since if someone manually passed that and
lld didn't respect it, we probably want the user to know that.
HLSL has wave operations and other kind of function which required the
control flow to either be converged, or respect certain constraints as
where and how to re-converge.
At the HLSL level, the convergence are mostly obvious: the control flow
is expected to re-converge at the end of a scope.
Once translated to IR, HLSL scopes disapear. This means we need a way to
communicate convergence restrictions down to the backend.
For this, the SPIR-V backend uses convergence intrinsics. So this commit
adds some code to generate convergence intrinsics when required.
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
`TargetEndianness` is long and unwieldy. "Target" in the name is confusing. Rename it to "Endianness".
I cannot find noticeable out-of-tree users of `TargetEndianness`, but
keep `TargetEndianness` to make this patch safer. `TargetEndianness`
will be removed by a subsequent change.
Fold the `tensor.insert_slice` of `tensor.extract_slice` into
`tensor_extract_slice` when the `insert_slice` simply expand some unit
dims dropped by the `extract_slice`.
The folding of sign/zero extensions into an atomic load by specifying an
extension type is not target specific, and therefore belongs in the
DAGCombiner rather than in the SystemZ backend.
- Handle atomic loads similarly to regular loads by adding
AtomicLoadExtActions with set/get methods.
- Move SystemZ extendAtomicLoad() to DagCombiner.cpp.
The libclc project is currently only properly supported as an external
project. However, when trying to get it to also build in-tree, the CMake
configuration messages it outputs stand out amongst the rest of the LLVM
projects and sub-projects.
This commit makes all messages clear that they belong to the libclc
project, as well as turning them into 'STATUS' messages where
appropriate.
The constructor `Derived(int)` in the newly added test
`ClassDerivedFromOptionalValueConstructor` is not a template, and this
used to
cause an assertion failure in `valueOrConversionHasValue()` because
`F.getTemplateSpecializationArgs()` returns null.
(This is modeled after the `MaybeAlign(Align Value)` constructor, which
similarly causes an assertion failure in the analysis when assigning an
`Align`
to a `MaybeAlign`.)
To fix this, we can simply look at the type of the destination type
which we're
constructing or assigning to (instead of the function template
argument), and
this not only fixes this specific case but actually simplifies the
implementation.
I've added some additional tests for the case of assigning to a nested
optional
because we didn't have coverage for these and I wanted to make sure I
didn't
break anything.
Adds support for scalable vectors to patterns defined in
VectorLineralize.cpp.
Linearization is disable in 2 notable cases:
* vectors with more than 1 scalable dimension (we cannot represent
vscale^2),
* vectors initialised with arith.constant that's not a vector splat
(such arith.constant Ops cannot be flattened).
This is a follow-up for #81790. This patch basically extends:
* test/Integration/Dialect/Linalg/CPU/mmt4d.mlir
with pack/unpack ops so that to overall computation is a matrix
multiplication (as opposed to linalg.mmt4d). For comparison (and to make
it easier to verify correctness), linalg.matmul is also included in the
test.
Adds improved bitwidth analysis for udiv/urem instructions. The
analysis is based on similar version in InstCombiner.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/85928
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.
This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.
In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
This reapplies d9a685a9dd, which was
reverted because it broke ubsan bots. There seems to be a bug in
coroutine code-gen, which is causing EmitTypeCheck to use the wrong
alignment. For now, pass alignment zero to EmitTypeCheck so that it can
compute the correct alignment based on the passed type (see function
EmitCXXMemberOrOperatorMemberCallExpr).
Similar to 3f46e5453d, this patch allows
the backend to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided, for local-exec TLS
variables that are annotated with the "aix-small-tls" attribute.
The expectation is for local-exec TLS variables to be set with this
attribute through PGO. Furthermore, the optimized access sequence is
only generated for local-exec TLS variables annotated with
"aix-small-tls", only if they are less than ~32KB in size.