This patch adds entry point in the runtime to be able to allocate
descriptors in managed memory. These entry points currently only call
`CUFAllocManaged` and `CUFFreeManaged` but could be more complicated in
the future.
`cuf.alloc` and `cuf.free` related to local descriptors are converted
into runtime calls.
@walter-erquinigo found the the [PR with testing and a fix for
DebugInfoD](https://github.com/llvm/llvm-project/pull/98344) caused an
issue when working with stripped binaries.
The issue is that when you're working with split-dwarf, there are *3*
possible files: The stripped binary the user is debugging, the
"only-keep-debug" *or* unstripped binary, plus the `.dwp` file. The
debuginfod plugin should provide the unstripped/OKD binary. However, if
the debuginfod plugin fails, the default symbol locator plugin will just
return the stripped binary, which doesn't help. So, to address that, the
SymbolVendorELF code checks to see if the SymbolLocator's
ExecutableObjectFile request returned the same file, and bails if that's
the case. You can see the specific diff as the second commit in the PR.
I'm investigating adding a test: I can't quite get a simple repro, and
I'm unwilling to make any additional changes to Makefile.rules to this
diff, for Pavlovian reasons.
Currently, the only way to see the passes that were registered is by
calling “mlir-opt --help”. However, for compilers with 500+ passes, the
help message becomes too long and sometimes hard to understand. In this
PR I add a new "--list-passes" option to mlir-opt, which can be used for
printing only the registered passes, a feature that would be extremely
useful.
This adds addressof at the required places in [input.output]. Some of
the new tests failed since string used operator& internally. These have
been fixed too.
Note the new fstream tests perform output to a basic_string instead of a
double. Using a double requires num_get specialization
num_get<CharT, istreambuf_iterator<CharT,
char_traits_operator_hijacker<CharT>>
This facet is not present in the locale database so the conversion would
fail due to a missing locale facet. Using basic_string avoids using the
locale.
As a drive-by fixes several bugs in the ofstream.cons tests. These
tested ifstream instead of ofstream with an open mode.
Implements:
- LWG3130 [input.output] needs many addressof
Closes#100246.
replaceIncomingBlockWith and removeIncomingValueIf are both
straightforward and done.
I'll defer copyIncomingBlocks until a couple of other changes that also
handle blocks go in.
Changes the loop range to match similar tests and avoids zero
iterations. The original motivation to reduce the number of iterations
was to allow the test to be executed during constant evaluation.
Fixes: https://github.com/llvm/llvm-project/issues/100502
expandVectorPredication may change code, even if the intrinsic itself
remains in the code. Report changes whenever such an intrinsic is
encountered, because code could have been changed.
Another follow-up fix for #101652 to fix expensive-checks-only failure.
- added all variations of ffma and fdiv
- will add all new headers into yaml for next patch
- only fsub is left then all basic operations for float is complete
---------
Co-authored-by: OverMighty <its.overmighty@gmail.com>
These would always be uniform anyway, but it shouldn't hurt to
mark them as always uniform. This will help use TTI::isAlwaysUniform
in place of proper uniformity analysis in trivial situations.
…heckCountedByAttrOnField'
llvm::SmallVectorImpl<TypeCoupledDeclRefInfo> &Decls is a vector of
declarations referred to by the argument of 'counted_by' attributes and
fields. 'BuildCountAttributedArrayOrPointerType' had been made
self-contained to produce the 'Decls' within itself to allow
'TreeTransform' to invoke the function without having to call
'Sema::CheckCountedByAttrOnField' again. Thus, 'Decls' produced by
`Sema::CheckCountedByAttrOnField` is never used.
A class member named by an expression in a member function that may instantiate to a static _or_ non-static member is represented by a `UnresolvedLookupExpr` in order to defer the implicit transformation to a class member access expression until instantiation. Since `ASTContext::getDecltypeType` only creates a `DecltypeType` that has a `DependentDecltypeType` as its canonical type when the operand is instantiation dependent, and since we do not transform types unless they are instantiation dependent, we need to mark the `UnresolvedLookupExpr` as instantiation dependent in order to correctly build a `DecltypeType` using the expression as its operand with a `DependentDecltypeType` canonical type. Fixes#99873.
8b4306ce05 introduced validity checks for every module flag access,
because the auto-upgrader uses named metadata before verifying the
module.
This causes overhead for all other accesses, and the check is, in fact,
only need at that single place. Change the upgrader to be careful when
accessing module flags before the module is verified and remove the
checks on all other occasions.
There are two tangential optimizations included: first, when querying a
specific flag, don't enumerate all other flags into a vector as well.
Second, don't use a Twine for getNamedMetadata(), which has
materialization overhead -- all call sites use simple strings that can
be implicitly converted to a StringRef.
This PR changes several places in the CMake scripts to make libc build
on Windows. It adds the `errno` entrypoint to the Windows target.
A mistake in the overlay build doc is also fixed.
Tests still cannot be built on Windows because of the lack of osutils.
The selection of the most constrained candidate for member function
explicit specializations introduced in #88963 does not check whether the
selected candidate is more constrained than all other candidates, which
can result in ambiguities being undiagnosed. This patch addresses the
issue.
In 2acf77f987 code was added to use the
'full' name including syntax and scope.
Instead of building up a large string for each name, add syntax and
scope checks to the value expression in tablegen.
There is already code to generate expressions for target specific
attributes. This change refactors and adds to that code to include
syntax and scope checks.
The tablegen avoids generating the complicated expression unless there
are two attributes using the same name, otherwise the case values will
be as simple as before.
Removes the currently unused attributeHasStrictIdentifierArgAtIndex
function and the related tablegen.
Treat 8th bit of version value for llvm_linux platform as signed GOT
flag.
- clang: define `PointerAuthELFGOT` LangOption and set 8th bit of
`aarch64-elf-pauthabi-version` LLVM module flag correspondingly;
- llvm-readobj: print `PointerAuthELFGOT` or `!PointerAuthELFGOT` in
version description of llvm_linux platform depending on whether the flag
is set.
Polyhedron/capacita,protein and CPU2000/facerec,wupwise showed up to
60% regression on x86 after #101421. The memcpy loops of the toAt and
fromAt arrays that are run to create the initial work item end up
being encoded as 'rep mov', and they add noticeable overhead
comparing to the total amount of work. 'rep mov' is not the best
choise for small size memcpy (e.g. when the array rank is 1 or 2,
it would be quite slow). Moreover, the rest of the stack related
setup is also noticeable for the simple cases.
I added a shortcut for the simple copy case, and also got rid
of the initial toAt/fromAt copies by allowing the CopyDescriptor
to use the external subscript storages.
This patch enables more entrypoints for riscv. The changes to the test cases are introduced to support rv32 which has long double but doesn't have int128
Currently SLP vectorizer compares cmp instructions by the type id of the
compared operands, which may failed in case of different integer types,
for example, which have same type id, but different sizes. Patch adds
comparison by type sizes to fix this.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/102132
By the OpenMP standard, `num_teams` clause can only accept one
expression (for now). In this patch, we extend it to allow to accept
multiple expressions when it is used with `target teams ompx_bare`
construct. This will allow to launch a multi-dim grid, same as CUDA/HIP.
This patch implements simple landing pad labels ([pr]). When Zicfilp
enabled, this patch inserts `lpad 0` at the beginning of basic blocks
which are possible to be landed by indirect jumps.
This patch also supports option riscv-landing-pad-label to make users
cpable to set nonzero fixed labels. Using nonzero fixed label force
setting t2 before indirect jumps. It's less portable but more strict
than original implementation.
[pr]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/417
If any pointer operand of the non-cosencutive loads is an instructions
with the user, which is not part of the current graph, and, thus,
requires emission of the extractelement instruction, better to try to
detect if the load sequence can be repsented as strided load and
extractelement instructions for pointers are not required.
Reviewers: preames, RKSimon, topperc
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/101668
These changes aim to support cross-compilation build on Windows host for
Linux target for API tests execution. They're not final: changes will
follow for refactoring and adjustments to make all tests pass.
Chocolatey make is recommended to be used since it is maintained better
than GnuWin32 mentioned here
https://lldb.llvm.org/resources/build.html#windows (latest GnuWin32
release is dated by 2010) and helps to avoid problems with building
tests (for example, GnuWin32 make doesn't support long paths and there
are some other failures with building for Linux with it).
Co-authored-by: Pavel Labath <pavel@labath.sk>