This PR is one in a series of 3 that aim to add support for explicit
member mapping of allocatable components in derived types within
OpenMP+Fortran for Flang.
This PR provides all of the runtime tests that are currently
upstreamable, unfortunately some of the other tests would require
linking of the fortran runtime for offload which we currently do not do.
But regardless, this is plenty to ensure that the mapping is working in
most cases.
The corresponding enum members were only used by `EmitMOPS`, which
immediately translated them to machine opcodes. Just pass the machine
opcodes instead.
This is in preparation for adding a new optimization to the pass that
cares about the order of instructions. The existing optimization does
not care, so this just causes minor codegen differences.
Instead respect what the toolchain default is (or what the user sets via
CMAKE_CXX_FLAGS).
This fixes builds with libcxx, with mingw toolchains targeting
msvcrt.dll, after 5d8be4c036aa5ce4a94f1f37a9155d5c877e23db; after that
commit, the libcxx public headers reference symbols such as iswspace_l,
which are unavailable when targeting msvcrt.dll on older versions of
Windows (it's only available in msvcrt.dll since Windows Vista).
into zext x
LegalizerHelper has two padding strategies: undef or zero.
see LegalizerHelper:273
see LegalizerHelper:315
This PR is about zero sugar and Coke Zero.
; CHECK-NEXT: [[MV2:%[0-9]+]]:_(s64) = G_MERGE_VALUES %a(s32),
[[C]](s32)
Please continue padding merge values.
// %bits_8_15:(s8) = G_CONSTANT i8 0
// %0:(s16) = G_MERGE_VALUES %bits_0_7:(s8), %bits_8_15:(s8)
%bits_8_15 is defined by zero. For optimization, we pick zext.
// %0:_(s16) = G_ZEXT %bits_0_7:(s8)
The upper bits of %0 are zero and the lower bits come from %bits_0_7.
The runtime Assign function is not meant to initialize an array from a
scalar. For that we need to use DoAssignFromSource. Update the data
transfer from scalar to descriptor to use a new entry point that use
this function underneath.
In order to support f16 load/store we need to make load/stores with s16
register type legal. If regbank selection doesn't pick the FPR bank,
we'll be left with a GPR load or store which we don't have isel patterns
for from SelectionDAG.
In order to add the patterns we need to make i16 a legal type for the
GPR register class.
Tests are currently disabling the legality check because I haven't
update the legalizer yet.
Up until now we could only support packing of scalar elements. This
patch fixes this by implementing packing of vector elements, by
generating extractelement and insertelement instruction pairs.
This patch removes MemProf format Version 0 now that version 2 and 3
seem to be working well.
I'm not touching version 1 for now because some tests still rely on
version 1.
Note that Version 0 is identical to Version 1 except that the MemProf
section of the indexed format has a MemProf version field.
This patch further speeds up the extraction of caller-callee pairs
from the profile.
Recall that we reconstruct a call stack by traversing the radix tree
from one of its leaf nodes toward a root. The implication is that
when we decode many different call stacks, we end up visiting nodes
near the root(s) repeatedly. That in turn adds many duplicates to our
data structure:
DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> Calls;
only to be deduplicated later with sort+unique for each vector.
This patch makes the extraction process more efficient by keeping
track of indices of the radix tree array we've visited so far and
terminating traversal as soon as we encounter an element previously
visited.
Note that even with this improvement, we still add at least one
caller-callee pair to the data structure above for each call stack
because we do need to add a caller-callee pair for the leaf node with
the callee GUID being 0.
Without this patch, it takes 4 seconds to extract caller-callee pairs
from a large MemProf profile. This patch shortenes that down to
900ms.
This patch implements packing of scalar operands when the vectorizer
decides to stop vectorizing. Packing is implemented with a sequence of
InsertElement instructions.
Packing vectors requires different instructions so it's implemented in a
follow-up patch.
AMDGPU disabled the use of `byval` for struct argument passing in commit
d77c620. However, when emitting `__enqueue_kernel_basic`, Clang still
adds the
`byval` attribute by default. Emitting the `byval` attribute by default
in this
context doesn’t seem like a good idea, as argument-passing conventions
are
highly target-dependent, and assumptions here could lead to issues. This
PR
removes the addition of the `byval` attribute, aligning the behavior
with other
`__enqueue_kernel_*` functions.
We've seen bugs where we lost track of error states stored in the
functor because we passed the functor by value (that is,
std::function) as opposed to reference (llvm::function_ref).
This patch fixes a couple of places we pass functors by value.
While we are at it, this patch adds curly braces around a "for" loop
spanning multiple lines.
Since cd16b07 (IR: introduce CmpInst::isEquivalence), there is now an
isEquivalence routine in CmpInst that we can use to determine
equivalence in simplifySelectWithICmpEq. Implement this, extending the
code from integer-equalities to integer and floating-point equivalences.
I was wrong last patch. I viewed the `Visited` set purely as a possible
recursion deterrent where functions calling a callee multiple times are
handled elsewhere. This doesn't consider cases where a function is
called multiple times by different callers still part of the same call
graph. New test shows the aforementioned case.
Reapplies #111004, fixes#115562.
Move Landing Pad Creation to a new function that checks each thunk every
pass to see if it needs a landing pad. This permits a thunk to be
created without needing a landing pad, but later needing one due to
drifting out of direct branch range and requiring an indirect branch.
We record all the Thunks created so far in a new vector rather than
trying to iterate over the DenseMap as we need a deterministic order of
adding LandingPadThunks due to the short branch fall through. We cannot
use normalizeExistingThunk() either as that only iterates through live
thunks.
Fixes: https://crbug.com/377438309
Original PR: https://github.com/llvm/llvm-project/pull/108989
Sending without a new test case to fix existing test. A new regression
test will come in a separate PR as coming up with a small enough
reproducer for this case is non-trivial.
This commit adds support for handling mask constants generated by the
`arith.constant` op in the `VectorEmulateNarrowType` pattern.
Previously, this pattern would not match due to the lack of mask
constant handling in `getCompressedMaskOp`.
The changes include:
1. Updating `getCompressedMaskOp` to recognize and handle
`arith.constant` ops as mask value sources.
2. Handling cases where the mask is not aligned with the emulated load
width. The compressed mask is adjusted to account for the offset.
Limitations:
- The arith.constant op can only have 1-dimensional constant values.
Resolves: #115742
Signed-off-by: Alan Li <me@alanli.org>
This paper is about the quantum exponent of NAN, which only applies if
we support decimal floating-point types from the TS. That is why the
status changed from Unknown to No.
Fixes#115293
While the definition of ceildivsi is integer division, rounding up, most
implementations will use `-(-a / b)` for dividing `a ceildiv b` with `a`
negative and `b` positive.
Mathematically, and for most integers, these two definitions are
equivalent. However, with `a == INT_MIN`, the initial negation is a
noop, which means that, while divinding and rounding up would give a
negative result, `-((- INT_MIN) / b)` is `-(INT_MIN / b)`, which is
positive.
This commit adds a special case to ceilDivSI inference to handle this
case and bring it in line with the operational instead of the
mathematical semantics of ceiling division.
Nodes created with `getMemIntrinsicNode` have memory operands. In order
for operands to be propagated to machine instructions, the nodes should
have `SDNPMemOperand` property.
Similar to 3c8c385a.
* Have clang always append & pass System/Library/SubFrameworks when determining default sdk search paths.
* Teach clang-installapi to traverse there for framework input.
* Teach llvm-readtapi that the library files (TBD or binary) in there should be considered private.
resolves: rdar://137457006
Currently when we merge invokes as part of SimplifyCFG we apply a merge
of the invoke DILocations to the merged invoke. We also insert an
unconditional branch to the merged invoke at the positions previously
occupied by the original invokes; as this branch is part of the
substitution for the invoke it has replaced, we should propagate the
original invoke DebugLoc to it.