- P1252 ("Ranges Design Cleanup") -- deprecate
`move_iterator::operator->` starting from C++20; add range comparisons
to the `<functional>` synopsis. This restores
`move_iterator::operator->` that was incorrectly deleted in D117656;
it's still defined in the latest draft, see
http://eel.is/c++draft/depr.move.iter.elem. Note that changes to
`*_result` types from 6.1 in the paper are no longer relevant now that
these types are aliases;
- P2106 ("Alternative wording for GB315 and GB316") -- add a few
`*_result` types to the synopsis in `<algorithm>` (some algorithms are
not implemented yet and thus some of the proposal still cannot be
marked as done);
Also mark already done issues as done (or as nothing to do):
- P2091 ("Fixing Issues With Range Access CPOs") was already implemented
(this patch adds tests for some ill-formed cases);
- LWG 3247 ("`ranges::iter_move` should perform ADL-only lookup of
`iter_move`") was already implemented;
- LWG 3300 ("Non-array ssize overload is underconstrained") doesn't
affect the implementation;
- LWG 3335 ("Resolve C++20 NB comments US 273 and GB 274") was already
implemented;
- LWG 3355 ("The memory algorithms should support move-only input
iterators introduced by P1207") was already implemented (except for
testing).
Differential Revision: https://reviews.llvm.org/D126053
The pass was previously limited to LUI+ADDI being used by a single
instruction.
This patch allows the pass to optimize multiple memory operations
that use the same offset. Each of them will receive a separate %lo
relocation. My main motivation is to handle a read-modify-write
where we have a load and store to the same address, but I didn't
restrict it to that case.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D128599
The unit tests introduced in patch D128335 are causing build failures,
and the fix is non-trivial. This patch disables these tests temporarily
until a proper fix can be implemented.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D128746
Currently, in the Presburger library, we use the words "variables" and
"identifiers" interchangeably. This patch changes this to only use "variables" to
refer to the variables of PresburgerSpace.
The reasoning behind this change is that the current usage of the word "identifier"
is misleading. variables do not "identify" anything. The information attached to them is the
actual "identifier" for the variable. The word "identifier", will later be used
to refer to the information attached to each variable in space.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D128585
This patch is to enable exception handling on the z/OS platform that is compatible with the existing z/OS runtime. No functionality of libcxxabi has been changed for other platforms. With this patch the hope is we can add z/OS as a platform to perform testing on any C++ ABI changes.
There is a primary difference for the z/OS implementation. On z/OS the thrown object is added to a linked list of caught and uncaught exceptions. The unwinder uses the top one as the current exception it is trying to find the landing pad for. We have to pop the top exception after we get it’s landing pad for our unwinder to correctly get any subsequent rethrows or nested exception calls.
Differential Revision: https://reviews.llvm.org/D99913
If BIND(C) appears on an internal procedure, it must have a null binding
label, i.e. BIND(C,NAME="").
Also address conflicts with D127725 which was merged during development.
Differential Revision: https://reviews.llvm.org/D128676
This is already partially the case, but we can rely more heavily on interface libraries and how they are imported/exported in other to simplify the implementation of the mlir python functions in Cmake.
This change also makes a couple of other changes:
1) Add a new CMake function which handles "pure" sources. This was done inline previously
2) Moves the headers associated with CAPI libraries to the libraries themselves. These were previously managed in a separate source target. They can now be added directly to the CAPI libraries using DECLARED_HEADERS.
3) Cleanup some dependencies that showed up as an issue during the refactor
This is a big CMake change that should produce no impact on the build of mlir and on the produced *build tree*. However, this change fixes an issue with the *install tree* of mlir which was previously unusable for projects like torch-mlir because both the "pure" and "extension" targets were pointing to either the build or source trees.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D128230
This patch fixes the problem the bots were having with the algorithm
test not including pthreads correctly. They will likely need a manual
forced clean build for this to take effect.
Differential Revision: https://reviews.llvm.org/D128742
This pattern can kick in when the source of the broadcast has a shape
that is a prefix/suffix of the result of the shape_cast.
Differential Revision: https://reviews.llvm.org/D128734
This patch is extracted from D86539.
Current implementation of lookForDIEsToKeep() function skips types
duplications basing on the getCanonicalDIEOffset() data:
```
if (AttrSpec.Form != dwarf::DW_FORM_ref_addr && (UseOdr || IsModuleRef) &&
Info.Ctxt &&
Info.Ctxt != ReferencedCU->getInfo(Info.ParentIdx).Ctxt &&
Info.Ctxt->getCanonicalDIEOffset() && isODRAttribute(AttrSpec.Attr)) <<<<<
continue;
```
But that field is set after all compile units inside object file are processed:
```
for (auto &CurrentUnit : OptContext.CompileUnits)
lookForDIEsToKeep(.., &CurrentUnit, ..); // check CanonicalDIEOffset
DIECloner.cloneAllCompileUnits(); // set CanonicalDIEOffset
```
Thus, if the object file contains several compilation units - types would
not be deduplicated. The above solution works well for the case when the object file
contains only one compilation unit. But if the object file contains several compilation
units then types would not be deduplicated between these compilation units.
This patch changes the algorithm so that types were deduplicated between
compilation units from the same object file.
It produces binary incompatible output for the cases when several compilation units
are located inside the same object file.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D125469
Enforce the assumption made on tensor buffers explicitly. When in-place,
reuse the buffer, but fill with all zeroes for the non-update case, since
the kernel assumes all elements are written to. When not in-place, zero
out the new buffer when materializing or when no-updates occur. Copy the
original tensor value when updates occur. This prepares migrating to the
new bufferization strategy, where these assumptions must be made explicit.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D128691
I believe we already checked that the destination of the first
CMOV is only used by the second CMOV so I don't think there is any
reason we need the PHI to write the register that was used by the
first CMOV. We can directly use the second CMOV destination and
avoid the copy.
This may be a left over from when the cascaded select handling
was part of the main algorithm before it was refactored in D35685.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D128124
Looks like with https://reviews.llvm.org/D127911, Windows emits more
globals with mangled names into the IR. Relax the tests in order to
allow these mangled names.
This patch fixes a couple of issues with the lowering of user defined assignment.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D128730
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
This is a resurrection of D106421 with the change that it keeps backward-compatibility. This means decoding the previous version of `LLVM_BB_ADDR_MAP` will work. This is required as the profile mapping tool is not released with LLVM (AutoFDO). As suggested by @jhenderson we rename the original section type value to `SHT_LLVM_BB_ADDR_MAP_V0` and assign a new value to the `SHT_LLVM_BB_ADDR_MAP` section type. The new encoding adds a version byte to each function entry to specify the encoding version for that function. This patch also adds a feature byte to be used with more flexibility in the future. An use-case example for the feature field is encoding multi-section functions more concisely using a different format.
Conceptually, the new encoding emits basic block offsets and sizes as label differences between each two consecutive basic block begin and end label. When decoding, offsets must be aggregated along with basic block sizes to calculate the final offsets of basic blocks relative to the function address.
This encoding uses smaller values compared to the existing one (offsets relative to function symbol).
Smaller values tend to occupy fewer bytes in ULEB128 encoding. As a result, we get about 17% total reduction in the size of the bb-address-map section (from about 11MB to 9MB for the clang PGO binary).
The extra two bytes (version and feature fields) incur a small 3% size overhead to the `LLVM_BB_ADDR_MAP` section size.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D121346
This mostly finishes the DRs for C89, though there are still a few
outliers which remain. It also corrects some of the statuses of DRs
where it's not clear if it was fully resolved by the committee or not.
As a drive-by, it also adds -fsyntax-only to the tests which are
verifying diagnostic results. This was previously missed by accident.
This is already possible for e.g. `cstring_literals`, but the entry for zerofill was unnamed.
rdar://90336380
Differential Revision: https://reviews.llvm.org/D128654
Treat captures as a uniform list, rather than default-captures being special
snowflakes that may only appear at the start.
This accepts a larger set of (incorrect) code, and simplifies error-handling
by making this fit into the usual homogeneous-list pattern.
Differential Revision: https://reviews.llvm.org/D128708
This isn't allowed by the standard grammar but is allowed in C, and clang/GCC
permit it as an extension.
It avoids the need to determine which type of list we have in error-recovery.
While here, also support array index designators `{ [4]=1 }` which are
also legal in C, and common extensions in C++.
Differential Revision: https://reviews.llvm.org/D128687
Fix bugs relating to support for characters of different kinds. Lowering
was creating bad FIR and MLIR that crashed in conversion to LLVM IR.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128723
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
This attribute is similar to DenseElementsAttr but does not support
splat. As such it has a much simpler API and does not need any smart
iterator: it exposes direct ArrayRef access.
A new syntax is introduced so that the generic printing/parsing looks
like:
[:i64 1, -2, 3]
This attribute beings like an ArrayAttr but has a `:` token after the
opening square brace to introduce the element type (supported are I8,
I16, I32, I64, F32, F64) and the comma separated list for the data.
This is particularly convenient for attributes intended to be small,
like those referring to shapes.
For example a `transpose` operation with a `dims` attribute could be
defined as such:
let arguments = (ins AnyTensor:$input, DenseI64ArrayAttr:$dims);
let assemblyFormat = "$input `dims` `=` $dims attr-dict : type($input)";
And printed this way (the element type is elided in this case):
transpose %input dims = [0, 2, 1] : tensor<2x3x4xf32>
The C++ API for dims would just directly return an ArrayRef<int64>
RFC: https://discourse.llvm.org/t/rfc-introduce-a-new-dense-array-attribute/63279
Recommit with a custom DenseArrayBaseAttrStorage class to ensure
over-alignment of the storage to the largest type.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D123774
For the rapid triage push, just add a TODO for the degenerate POINTER
assignment case. The LHD ought to be a variable of type !fir.box, but it
is currently returning a shadow variable for the raw data pointer. More
investigation is needed there.
Make sure that conversions are applied in FORALL degenerate contexts.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128724
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Add lowering tests left behind during the upstreaming.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128721
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
The gold linker veneers are written between functions without symbols,
so we to handle it specially in BOLT.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D128082
Migrate extractelement, insertelement and shufflevector to use the
FoldXYZ rather than CreateXYZ APIs.
This is probably NFC in practice, because the places using
InstSimplifyFolder probably aren't using vector operations.
It makes sense to handle byval promotion in the same way as non-byval
but also allowing `store` instructions. However, these should
use the same checks as the `load` instructions do, i.e. be part of the
`ArgsToPromote` collection. For these instructions, the check for
interfering modifications can be disabled, though. The promotion
algorithm itself has been modified a lot: all the accesses (i.e. loads
and stores) are rewritten to the emitted `alloca` instructions. To
optimize these new `alloca`s out, the `PromoteMemToReg` function from
`Transforms/Utils/PromoteMemoryToRegister.cpp` file is invoked after
promotion.
In order to let the `PromoteMemToReg` promote as many `alloca`s as it
is possible, there should be no `GEP`s from the `alloca`s. To
eliminate the `GEP`s, its own `alloca` is generated for every argument
part because a single `alloca` for the whole argument (that
significantly simplifies the code of the pass though) unfortunately
cannot be used.
The idea comes from the following discussion:
https://reviews.llvm.org/D124514#3479676
Differential Revision: https://reviews.llvm.org/D125485
This attribute is similar to DenseElementsAttr but does not support
splat. As such it has a much simpler API and does not need any smart
iterator: it exposes direct ArrayRef access.
A new syntax is introduced so that the generic printing/parsing looks
like:
[:i64 1, -2, 3]
This attribute beings like an ArrayAttr but has a `:` token after the
opening square brace to introduce the element type (supported are I8,
I16, I32, I64, F32, F64) and the comma separated list for the data.
This is particularly convenient for attributes intended to be small,
like those referring to shapes.
For example a `transpose` operation with a `dims` attribute could be
defined as such:
let arguments = (ins AnyTensor:$input, DenseI64ArrayAttr:$dims);
let assemblyFormat = "$input `dims` `=` $dims attr-dict : type($input)";
And printed this way (the element type is elided in this case):
transpose %input dims = [0, 2, 1] : tensor<2x3x4xf32>
The C++ API for dims would just directly return an ArrayRef<int64>
RFC: https://discourse.llvm.org/t/rfc-introduce-a-new-dense-array-attribute/63279
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D123774
opportunities
There are straight forward splat load opportunities blocked by
getNormalLoadInput(), since those cases involve consecutive bitcasts.
Improve by looking through bitcasts.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D128703