Attempt to address the following example from causing an assert or ICE:
```
subroutine test(a)
implicit none
integer :: i
real(kind=real64), dimension(:) :: a
real(kind=real64), dimension(size(a, 1)) :: b
!$omp target map(tofrom: b)
do i = 1, 10
b(i) = i
end do
!$omp end target
end subroutine
```
Where we utilise a Fortran intrinsic (size) to calculate the size of
allocatable arrays and then map it to device.
Allow utility constructs (error and nothing) to appear in the
specification part as well as the execution part. The exception is
"ERROR AT(EXECUTION)" which should only be in the execution part.
In case of ambiguity (the boundary between the specification and the
execution part), utility constructs will be parsed as belonging to the
specification part. In such cases move them to the execution part in the
OpenMP canonicalization code.
This commit updates the internal `ConversionValueMapping` data structure
in the dialect conversion driver to support 1:N replacements. This is
the last major commit for adding 1:N support to the dialect conversion
driver.
Since #116470, the infrastructure already supports 1:N replacements. But
the `ConversionValueMapping` still stored 1:1 value mappings. To that
end, the driver inserted temporary argument materializations (converting
N SSA values into 1 value). This is no longer the case. Argument
materializations are now entirely gone. (They will be deleted from the
type converter after some time, when we delete the old 1:N dialect
conversion driver.)
Note for LLVM integration: Replace all occurrences of
`addArgumentMaterialization` (except for 1:N dialect conversion passes)
with `addSourceMaterialization`.
---------
Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
This already worked without /wholearchive; now it works with it too.
(Only for thin archives containing relative file names, matching the ELF
and Mach-O ports.)
This was found by modernize-use-nullptr ClangTidy check, which suggested
to pass nullptr instead of 0 to DIFileAttr.
However it looks like the intention was to pass the two 0 values for
line and scopeLine, and we should pass {} to DIFileAttr. Do that change.
This shifted `1UL` to make the mask. On 32 bit Linux UL is 32 bit,
so if Hi+1 was >= 32 then you'd get the wrong result here.
The other version of this uses 1ULL, but using the uint64_t typename
here saves someone going to check what ULL means on different platforms.
This fixes test failures seen on Linaro's 32 bit bots:
https://lab.llvm.org/buildbot/#/builders/39/builds/3700https://lab.llvm.org/buildbot/#/builders/122/builds/781
Though I cannot say exactly why this fixes them. Does not seem
like the new code was triggering this problem, but somehow it
must be.
This helps with +fp64 targets where the f64s are legal and not previously
lowered. It can treat fpextends as a shift + cvt and fptrunc can use a libcall.
In the case where a type-constraint on an NTTP contains a pack, we form
a PackExpansionType to model it. However, there are a few places
expecting it to be a non-pack expansion, and luckily only small changes
could make them work.
Fixes https://github.com/llvm/llvm-project/issues/88866
This patch,
- Added a translation support for aligned clause in SIMD directive by passing the alignment details to "llvm.assume" intrinsic.
- Updated the insertion point for llvm.assume intrinsic call in "OMPIRBuilder.cpp".
- Added a check in aligned clause MLIR lowering, to ensure that the alignment value must be a power of 2.
Some patterns (in particular horizontal style patterns) can end up with shuffles straddling both sides of a binop/cmp.
Where individually the folds aren't worth it, by merging the (oneuse) shuffles we can notably reduce the net instruction count and cost.
One of the final steps towards finally addressing #34072
When a single #embed directive is used to initialize a char array, the
case is optimized via swap of EmbedExpr to underlying StringLiteral,
resulting in better performance in AST consumers.
While browsing through the code, I realized that
7122b70cfc
which changed type of EmbedExpr made the "fast path" unreachable. This
patch fixes this unfortunate situation.
This test is failing on Windows (see e.g.
https://lab.llvm.org/buildbot/#/builders/146/builds/1983), probably due to
incomplete debugger support there (the test registers debug info in-process, so
non-Darwin builds shouldn't be expected to have the right symbols).
This PR is a follow-up to #116373 and #116439, where a Transform Dialect
(TD) operation was introduced to collect patterns for decomposing
tensor.pack. The second patch renamed the patterns and the TD Op.
Originally, adding patterns for `tensor.unpack` was marked as a TODO,
which this PR addresses.
No new tests are introduced in this PR. Instead, existing tests from:
* "decompose-tensor-unpack.mlir"
are reused. To achieve this:
* The test is updated to use the TD operation
`transform.apply_patterns.linalg.decompose_pack_unpack` instead of the
flag `--test-linalg-transform-patterns="test-decompose-tensor-unpack"`,
avoiding artificial tests created solely for the TD Op.
* The TD sequence is saved to a new file, "decompose_unpack.mlir", and
preloaded using the option.
This patch specializes `FoldTensorCastProducerOp` for `tensor::UnPackOp` by
introducing a dedicated pattern: `FoldTensorCastUnPackOp`. This mirrors a
similar update made for `tensor::PackOp` in #114559. Below is the updated
rationale tailored to `tensor::UnPackOp`.
ISSUE DESCRIPTION
Currently, `FoldTensorCastProducerOp` incorrectly folds the following:
```mlir
%cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32>
// Note: `%c8` and `?`.
%unpack = tensor.unpack %cast
inner_dims_pos = [0, 1]
inner_tiles = [%c8, 1]
into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32>
```
as:
```mlir
// Note: `%c8` and `8`.
%unpack = tensor.unpack %cast
inner_dims_pos = [0, 1]
inner_tiles = [%c8, 1]
into %res : tensor<1x1x8x1xi32> -> tensor<7x?xi32>
```
This triggers an Op verification failure because the folder does not
update the inner tile sizes in the unpack Op. This patch addresses the
issue by ensuring proper handling of inner tile sizes.
ADDITIONAL CHANGES
* invalid.mlir: Fixed a typo.
* TensorOps.cpp:
* Removed unnecessary `(void)tileSize`.
* Added comments following the discussion in PR #115772.
* Made minor updates to `FoldTensorCastPackOp` for consistency with the newly
introduced `FoldTensorCastUnPackOp`.
* Tensor/canonicalize.mlir: Ensured consistent usage of `test_attr` (e.g.,
replaced mixed use of `test_attr` and `some_attr`).
V_MAX3/MIN3_NUM_F16 are alias GFX12 instructions with V_MAX3/MIN3_F16 in
GFX11 and they should be updated together.
This fix a bug introduced in
https://github.com/llvm/llvm-project/pull/113603 such that only
V_MAX3/MIN3_F16 are replaced in true16 format. Also added GFX12 runlines
for CodeGen test
All the sources of `llvm-min-tblgen` are also used for `llvm-tblgen`,
with identical compilation flags. Reuse the object files of
`llvm-min-tblgen` for `llvm-tblgen` by applying the usual source
structure of an executable: One file per executable which named after
the executable name containing the (in this case trivial) main function,
which just calls the tblgen_main in TableGen.cpp. This should also clear
up any confusion (including mine) of where each executable's main
function is.
While this slightly reduces build time, the main motivation is ccache.
Using the hard_link
option, building the object files for `llvm-tblgen` will result in a
hard link to the same object file already used for `llvm-min-tblgen`. To
signal the build system that the file is new, ccache will update the
file's time stamp. Unfortunately, time stamps are shared between all
hard-linked files s.t. this will indirectly also update the time stamps
for the object files used for `llvm-tblgen`. At the next run, Ninja will
recognize this time stamp discrepancy to the expected stamp recorded in
`.ninja_log` and rebuild those object files for `llvm-min-tblgen`, which
again will also update the stamp for the `llvm-tblgen`... . This is
especially annoying for tablegen because it means Ninja will re-run all
tablegenning in every build.
I am using the hard_link option because it reduces the cost of having
multiple build-trees of the LLVM sources and reduces the wear to the SSD
they are stored on.
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
adjust the limit by removing reserved registers.
It seems that we shouldn't use
`TargetRegisterInfo::getRegPressureSetLimit`
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.
Separate from https://github.com/llvm/llvm-project/pull/118787
The gcov version is set to 11.1 (compatible with gcov 9) even if
`-Xclang -coverage-version=` specified version is less than 11.1.
Therefore, we can drop producer support for version < 11.1.
Similar to a9e75b1d4d: During MachOPlatform bootstrap we need to defer actions
until essential platform functionality has been loaded, but the platform itself
may be loaded under a concurrent dispatcher so we have to guard against the
deferred actions vector being accessed concurrently.
This fixes a probablistic failure in the ORC runtime regression tests on
Darwin/x86-64 that was spotted after edca1d9bad (which turned on concurrent
linking by default in llvm-jitlink).
This optimization was introduced by #70469.
Like AArch64, we allow tail expansions for 3 on RV32 and 3/5/6
on RV64.
This can simplify the comparison and reduce the number of blocks.
It wraps the body of namespace with additional newlines, turning this code:
```
namespace N {
int function();
}
```
into the following:
```
namespace N {
int function();
}
```
---------
Co-authored-by: Owen Pan <owenpiano@gmail.com>
The debug section map was using MachO section names (with the "__" prefix), but
DWARFContext expects section names with the object format prefix stripped off.
This was preventing DWARFContext from accessing the debug_str section,
resulting in bogus source name strings.
The commit 22b7b84860
made the symbols provided by shared libraries "defined",
and thus effectively made it impossible to generate non-pie
dynamically linked executables using
--unresolved-symbols=import-dynamic.
This commit, based on https://github.com/llvm/llvm-project/pull/109249,
fixes it by checking sym->isShared() explictly.
(as a bonus, you don't need to rely on
--unresolved-symbols=import-dynamic
anymore.)
Fixes https://github.com/llvm/llvm-project/issues/107387
Change the global variable reference to a member access of another
variable `ctx`. In the future, we may pass through `ctx` to functions to
eliminate global variables.
Pull Request: https://github.com/llvm/llvm-project/pull/119835
R_RISCV_{ADD*/SUB*} relocations are kept only when feature relax
enabled. So it is better to add relax to the test, so that relocs can be
reserved for processing by the jitlink. That's what this test really
wants to test.