The `ARCHIVE` artifact kind is not valid for `install(FILES ...)`.
Additionally, install wasn't resolving the target's `TARGET_FILE`
properly and was trying to find it in the top-level build directory, rather than
in the libclc binary directory. This is because our `TARGET_FILE`
properties were being set to relative paths. The cmake behaviour they
are trying to mimic - `$<TARGET_FILE:$tgt>` - provides an absolute path.
As such this patch updates instances where we set the `TARGET_FILE`
property to return an absolute path.
Our support for derived types uses `getTypeSizeAndAlignment` to
calculate the offset of the members. The `fir.box` was not supported in
that function. It meant that any member which required descriptor was
not supported in the derived type.
We convert the type into an llvm type and then use the DataLayout to
calculate the size/offset of a member. There is no dependency on
`getTypeSizeAndAlignment` to get the size of the types.
There are 2 other changes in this PR:
1. The `recID` field is used to handle cases where we have a member
references its parent type.
2. A type cache is maintained to avoid duplication. It is also needed
for circular reference case.
Fixes#108001.
Change the names of the TableGen features to match the names used by
AMDGPUSubtarget. "Addressable" refers to the amount that can be accessed
by a single workgroup. Add some explanatory comments. NFC.
-- This commit extends consumer fusion to take place even if the
producer has multiple uses.
-- The multiple uses of the producer essentially means that besides the
consumer op in concern, the only other uses of the producer are
allowed in :-
1. scf.yield
2. tensor.parallel_insert_slice
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
We do not expect to see live carry out outputs on these adds,
so add a dead flag. Split the test for the degenerate case. This
makes it more apparent a regression in a future commit does not matter.
SimplifyCFG store speculation currently has some homegrown code to check
for a writable object, handling the alloca special case only.
Switch it to use the generic isWritableObject() API, which means that we
also support byval arguments, allocator return values, and writable
arguments.
I've adjusted isWritableObject() to also check for the noalias attribute
when handling writable. Otherwise, I don't think that we can generalize
from at-entry writability. This was not relevant for previous uses of
the function, because they'd already require noalias for other reasons
anyway.
…me lowering
SME instructions can only be used in streaming mode. PTRUE for
predicated counter and the ld/st pair can be used when:
sve2.1 is available or
sme2 available in function in streaming mode.
Previously the frame lowering only checking if sme2 available when
building the machine instruction.
This fix checks if sme2 is available and is subtarget in streaming mode
Pass EarliestEscapeInfo to BatchAA in MemCpyOpt. This allows memcpy
elimination in cases where one of the involved pointers is captured
after the relevant memcpy/call.
This patch separates the computation of the final reduction result and
the intermediate stores of reduction.
---------
Co-authored-by: Florian Hahn <flo@fhahn.com>
According to https://github.com/ChuanqiXu9/clangd-for-modules/issues/9,
I surprisingly found the support for C++20 modules doesn't support code
completion well.
After debugging, I found there are problems:
(1) We forgot to call `adjustHeaderSearchOptions` in code complete. This
may be an easy oversight.
(2) In `CodeCompleteOptions::getClangCompleteOpts`, we may set
`LoadExternal` as false when index is available. But we have support
modules with index. So it is conflicting. Given modules are opt in now,
I think it makes sense to to set LoadExternal as true when modules are
enabled.
This is a small fix and I wish it can land faster.
The DominatorTree version is marked for deprecation, so we use the
DomTreeUpdater version. We also update sinkRegion() to iterate over
basic blocks instead of DomTreeNodes. The loop body calls
SplitBlockPredecessors. The DTU version calls
DomTreeUpdater::apply_updates(), which may call DominatorTree::reset().
This invalidates the worklist of DomTreeNodes to iterate over.
Windows have different separators for paths than Unix based OS. One of
the tests in debug-compilation-unit.ll didn't have Win supported '\\'
variant which broken test suite on that OS.
Replace `element_type*` handles in HLSLExternalSemaSource with
`__hlsl_resource_t` builtin type.
The handle used to be defined as `element_type*` which was used by the
provisional subscript operator implementation. Now that the handle is
`__hlsl_resource_t` the subscript placeholder implementation was updated
to add `element_type* e;` field to the resource struct. and return a
reference to that. This field is just a temporary workaround until the
indexing is implemented properly in llvm/llvm-project#95956, at which
point the field will be removed. This seemed like a better solution than
disabling many of the existing tests that already use the `[]` operator.
One test has to be disabled nevertheless because an error based on
interactions of const and template instantiation (potential bug that can
be investigated once indexing is implemented the right way).
Fixes#84824
Currently callers of analyze can't get detailed information about a
missing header, e.g. resolve path. Only way to get at this is to use low
level walkUsed funciton, which is way more complicated than just calling
analyze.
This enables further analysis, e.g. when includes are spelled relative
to inner directories, caller can still know their path relative to
repository root.
I'm trying to speed up the reaching def analysis by changing the
underlying data structure. Turning MBBReachingDefsInfo into a proper
class decouples the data structure and its users. This patch does not
change the existing three-dimensional vector structure.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
This PR fixes a bug in `SparseTensorDimOpRewriter` when `tensor.dim` has
an unranked tensor type. To prevent crashes, we now use
`tryGetSparseTensorType` instead of `getSparseTensorType`. Fixes
#107807.
Predicated instructions cannot hoisted trivially, so don't treat them as
uniform value in the cost model.
This fixes a difference between legacy and VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/110295.
When using bsd sed that ships with macOS on the object files for
comparison, every command would error with
```
sed: RE error: illegal byte sequence
```
This was potentially fixed for an older version in
6c52b02e7d but even the commands in the
example there still have this error. You can repro this with any binary:
```
$ sed s/a/b/ /bin/ls >/dev/null
sed: RE error: illegal byte sequence
```
Where LC_CTYPE appears to no longer solve the issue:
```
$ LC_CTYPE=C sed s/a/b/ /bin/ls >/dev/null
sed: RE error: illegal byte sequence
```
But this change with LC_ALL does:
```
$ LC_ALL=C sed s/a/b/ /bin/ls >/dev/null; echo $?
0
```
It seems like the difference here is that if you have LC_ALL set to
something else, LC_CTYPE does not override it. More info:
https://stackoverflow.com/a/23584470/902968