This patch updates the mbarrier.arrive.expect_tx Op.
It also adds an Op for its arrive_drop version.
* No change in the existing inline-asm lowering.
This functionality continues to work as is.
* An optional return value is added for shared_cta space.
* The scope and semantics are added as attributes.
* Inline-PTX lowering is available when `predicate` is provided.
Otherwise, the Op lowers to intrinsics.
* lit tests are added to verify the lowering to intrinsics.
* Specific negative tests are added to check the invalid cases for
inline-ptx lowering.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Introduce VirtualDataExtractor, a DataExtractor subclass that enables
reading data at virtual addresses by translating them to physical buffer
offsets using a lookup table. The lookup table maps virtual address
ranges to physical offsets and enforces boundaries to prevent reads from
crossing entry limits.
The new class inherits from DataExtractor, overriding GetData and
PeekData to provide transparent virtual address translation for most of
the DataExtractor methods. The exception are the unchecked methods, that
bypass those methods and are overloaded as well.
cc_binary now needs to be loaded from the rules_cc repo
I don't think this file is actually used, but updating it to be more
syntactically correct anyway.
This removes the presence of global state from within the pass which is
blocking some efforts around test daemonization and is not good design
practice in general for LLVM. See
https://discourse.llvm.org/t/rfc-reducing-process-creation-overhead-in-llvm-regression-tests/88612/11
for more discussion.
This patch replaces the usage of global state with a DebugCounter, which
helps fix the global state problem and also increases the flexibility of
the option as now an explicit range can be passed.
Co-authored-by: Mingming Liu <mingmingl@google.com>
There's `arrayType` matcher for matching `ArrayType`, but no matcher for
`ArrayTypeLoc`. This change complements it.
Note that there's already `hasElementTypeLoc` matcher, which was
declared together with the `hasElementType` matcher.
The primary motivation for this is to set a fixed RNG seed for flaky
tests. This also has the bonus of adding debug logging for what seed
gets used which can make it much easier to reproduce issues that only
happen occasionally and are seed-dependent.
Reviewers: sjoerdmeijer, davemgreen, mshockwave
Reviewed By: davemgreen
Pull Request: https://github.com/llvm/llvm-project/pull/170013
When creating a stack frame in JSONUtils.cpp CreateStackFrame() the code
constructs a std::string from module.GetUUIDString(), which can return
nullptr in some cases (as documented in the implementation of
SBModule::GetUUIDString()). This causes a segmentation fault when passed
to the std::string constructor.
This fix adds a null check before constructing the UUID string, falling
back to an empty string if nullptr is returned. The existing empty check
ensures the moduleId field is omitted from the JSON when no UUID exists.
rdar://163811812
---------
Co-authored-by: Ebuka Ezike <yerimyah1@gmail.com>
When LOps.RootInsert comes after LI2, since we use LI2 as the new insert
point, we should make sure the memory region accessed by LOps isn't
modified. However, the original implementation passes the bit width
`LOps.LoadSize` as the number of bytes to be accessed, causing BasicAA
to return NoAlias:
a941e15074/llvm/lib/Analysis/BasicAliasAnalysis.cpp (L1658-L1667)
With `-aa-trace`, we get:
```
End ptr getelementptr inbounds nuw (i8, ptr @g, i64 4) @ LocationSize::precise(1), %gep1 = getelementptr i8, ptr %p, i64 4 @ LocationSize::precise(32) = NoAlias
```
This patch uses `getTypeStoreSize` to compute the correct access size
for LOps. Instead of modifying the MemoryLocation for End (i.e.,
`LOps.RootInsert`), it also uses the computed base and AATag for
correctness.
Closes https://github.com/llvm/llvm-project/issues/169921.
This patch adds a print and parse ability for the func to have
MLIR-standard 'attributes' printed along side the standard function.
This patch also seeds the initial "disallowed" list so that we don't
print things that we have custom printing for, AND will disallow them
from being parsed. I believe this list to be complete, and it passes all
tests.
This printing of attributes is necessary for testing some OpenACC things
that putting into the normal func-printing seems unnecessary.
In `gcdMIVtest`, there is logic that assumes the addition(s) of
`SCEVAddExpr` don't overflow without any checks. Adding overflow checks
would be fine, but this part appeart to be less useful. So this patch
removes it.
Fix one of the tests added in #169926.
This PR re-lands #165873.
This PR extends the gpu.subgroup_mma_* ops to support fp64 type.
The extension requires special handling during the lowering to nvvm due
to the return type for load ops for fragment a and b (they return a
scalar instead of a struct).
The original PR did not guard the new test based on the required
architecture (sm80) which lead to a failure on the cuda runners with T4
GPUs.
This PR implements catch all handling for widening the scalable
subvector operand (INSERT_SUBVECTOR) or result (EXTRACT_SUBVECTOR). It
does this via the stack using masked memory operations. With general
handling available we can add optimiations for specific cases.
Follow-up for #169047. The previous PR moved some functions from DA to
Delinearization, but the member function declarations were not updated
accordingly. This patch removes them.
From the review in
https://github.com/llvm/llvm-project/pull/169527#discussion_r2567122387,
there are some users where we want to extend or truncate a ConstantRange
only if it's not already the destination bitwidth. Previously this
asserted, so this PR relaxes it to just be a no-op, similar to
IRBuilder::createZExt and friends.
Python multiprocessing is limited to 60 workers at most:
6bc65c30ff/Lib/concurrent/futures/process.py (L669-L672)
The limit being per thread pool, we can work around it by using multiple
pools on windows when we want to actually use more workers.
From OpenMP 4.0:
> When an if clause is present on a cancel construct and the if
expression
> evaluates to false, the cancel construct does not activate
cancellation.
> The cancellation point associated with the cancel construct is always
> encountered regardless of the value of the if expression.
This wording is retained unmodified in OpenMP 6.0.
This re-opens the already approved PR #164587, which was closed by
accident. The only changes are a rebase.
During InsertNegateRAState pass we check the annotations on
instructions,
to decide where to generate the OpNegateRAState CFIs in the output
binary.
As only instructions in the input binary were annotated, we have to make
a judgement on instructions generated by other BOLT passes.
Incorrect placement may cause issues when an (async) unwind request
is received during the new "unknown" instructions.
This patch adds more logic to make a more informed decision on by taking
into account:
- unknown instructions in a BasicBlock with other instruction have the
same RAState. Previously, if the BasicBlock started with an unknown
instruction,
the RAState was copied from the preceding block. Now, the RAState is
copied from
the succeeding instructions in the same block.
- Some BasicBlocks may only contain instructions with unknown RAState,
As explained in issue #160989, these blocks already have incorrect
unwind info. Because of this, the last known RAState based on the layout order
is copied.
Updated bolt/docs/PacRetDesign.md to reflect changes.
When a `scf.if` directly precedes a `scf.condition` in the before region
of a `scf.while` and both share the same condition, move the if into the
after region of the loop. This helps simplify the control flow to enable
uplifting `scf.while` to `scf.for`.