This reverts commit 9b9d08111b.
(Accepted by Jon https://reviews.llvm.org/D118493#4178250)
libc++, libc++abi, libunwind, and compiler-rt don't add the extra DT_RUNPATH,
it's strange for OpenMP to diverge.
Some build systems want to handle DT_RUNPATH themselves (e.g.
CMAKE_INSTALL_RPATH). Some distributions (e.g. Fedora) have policies against
DT_RUNPATH and the default DT_RUNPATH for OpenMP is causing trouble.
For users who don't want to specify rpath by themselves,
https://clang.llvm.org/docs/UsersManual.html#configuration-files
can be used to specify the default rpath, e.g.
specify -frtlib-add-rpath or -Wl,-rpath in bin/clang.cfg
The support for enabling and disabling certain architectures for the
OpenMP device RTL is different between AMD and Nvidia. This patch
updates the logic to make it common. This supports the `auto` format
more generally via the `nvptx-arch` and `amdgpu-arch` options. (These
are not availible at CMake time without a runtimes build, or another
install somewhere. But that only prevents users from using auto).
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D145513
Some build bots have not been updated to the new minimal CMake version.
Reverting for now and ping the buildbot owners.
This reverts commit 44c6b905f8.
This partly undoes D137724.
This change has been discussed on discourse
https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193
Note this does not remove work-arounds for older CMake versions, that
will be done in followup patches.
Reviewed By: mehdi_amini, MaskRay, ChuanqiXu, to268, thieta, tschuett, phosek, #libunwind, #libc_vendors, #libc, #libc_abi, sivachandra, philnik, zibi
Differential Revision: https://reviews.llvm.org/D144509
Only generate the second def file when necessary (native Windows import
library builds).
Properly clean up .def file artifacts.
Reduce the re-generated import library build artifacts to the minimum.
Refactor the import library related portions of the script for clarity.
Tested with MSVC and MinWG/gcc12.0
Differential Revision:https://reviews.llvm.org/D144419
Summary:
Tihs patch is mostly NFC to fix some warning currently present in OpenMP
offloading plugins. Specifically this mostly removes the use of Twine
variables in favor of LLVM's small string. Twine variables are prone to
use-after-free and this is a cleaner way to concatenate a string.
The next-gen plugins did not properly set the values from
`OMP_NUM_TEAMS` and `OMP_TEAMS_THREAD_LIMIT`. This is because these
maximum values are set by each plugin to its hardware maximum. This
happens *after* the previous initialization. Move it to the correct
place and then add a test.
Fixes https://github.com/llvm/llvm-project/issues/61082
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D145105
Makes the info that is printed for kernel launches configurable for
different plugins. Adds all machinery to print the detailed launch
info that the current AMD plugin provides and includes e.g. register
spill counts.
The files msgpack.cpp, msgpack.def, and msgpack.h are copied from the old plugin
and are untouched. The contents of UtilitiesHSA.cpp and .h are copied together from
various files from the old plugin. The code was originally written by
Jon Chesterfield. I updated the function and type names visible to the outside, i.e.
in headers, to respect the LLVM conventions.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D144521
The old code didn't actually align the values, and it added padding even
when none was necessary. This approach will pad entries if necessary
and, similar to the struct case, use the host pointer as guidance.
NOTE: This does still not align them as the host has, but it's unclear
if the user really should use the alignment bits anyway. For now
this is a reasonable compromise, only if we have host alignment
information (explicitly not implicitly via the host pointer), we
could do it completely right without wasting lots of resources for
>99% of the cases.
Fixes: https://github.com/llvm/llvm-project/issues/61034
ATOMIC_VAR_INIT has a trivial definition
`#define ATOMIC_VAR_INIT(value) (value)`,
is deprecated in C17/C++20, and will be removed in newer standards in
newer GCC/Clang (e.g. https://reviews.llvm.org/D144196).
Summary:
Ever since the change to the new plugins the information messages are
common between the major plugins. This allows us to test the info.c file
generically.
Summary:
Internally we need to know the feature that was used to build the CUDA.
This used to be added when the deviceRTL was build via the OpenMP
interface, but ever since it was moved to call the packager explicitly
it was not being added. This causes failured if the user attempts to use
the library without LTO enabled.
This fix runtime problem due to generate this[:1] map info for non member
variable.
To fix this check VD, if VD is not null, it is not member from current
or base classes.
Differential Revision: https://reviews.llvm.org/D144616
This interface function does not actually need the device image type.
It's unused in the function, so it should be able to be safely removed.
The motivation for this is to facilitate downsteam porting of the
amd-stg-open RPC module into the nextgen plugin so we can delete the old
plugin entirely. For that to work we need to be able to call this
function at kernel-launch time, which doesn't have the image. Also it's
cleaner.
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D144436
This patch should enable the "Host" allocation using fine-grained
memory. As far as I understand, this is HSA managed memory that is
availible to the host, but can be accessed by the device as well.
The original patch that introduced these extensions just stipulated that
it's "non-migratable" memory, which is most likely true because it's
managed by the host but accessible by the device. This should work
sufficiently well for what we expect the "host" allocation to do.
Depends on D143771
Reviewed By: kevinsala
Differential Revision: https://reviews.llvm.org/D143775
Currently, the AMDGPU plugin did not support the `TARGET_ALLOC_SHARED`
allocation kind. We used the fine-grained memory allocator for the
"host" alloc when this is most likely not what is intended. Fine-grained
memory can be accessed by all agents, so it should be considered shared.
This patch removes the use of fine-grained memory for the host
allocator. A later patch will add support for this via the
`hsa_amd_memory_lock` method.
Reviewed By: kevinsala
Differential Revision: https://reviews.llvm.org/D143771
~TaskAsyncInfoWrapperTy() calls isDone. With synchronize inside isDone, we need to handle the error return from synchronize in the destructor.
The consumers of TaskAsyncInfoWrapperTy, targetDataMapper and targetKernel, both call AsyncInfo.synchronize() before exiting.
For this reason in ~TaskAsyncInfoWrapperTy(), calling synchronize() via isDone() is redundant.
This patch removes synchronize() call inside isDone() and makes it a lightweight check.
__tgt_target_nowait_query needs to call synchronize() before checking isDone().
Differential Revision: https://reviews.llvm.org/D144315
Summary:
Currently when we synchronize the asynchronous queue for the plugins, we
ignore the return value. This is problematic because we will continue on
like nothing happened if the kernel fails.
Fixes https://github.com/llvm/llvm-project/issues/60814
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D144191
06d9bf5e64 (https://reviews.llvm.org/D143431)
did a large restructuring of how the import library is created;
previously, a second step to tweak the import library was only
done for MSVC style targets, but after this commit, that logic
was applied for mingw targets too.
Since LIBOMP_GENERATED_IMP_LIB_FILENAME and LIBOMP_IMP_LIB_FILE
are equal on mingw targets (both are "libomp.dll.a", while they
are "libomp.dll.lib" and "libomp.lib" for MSVC targets), this caused
a conflict, with errors like this:
ninja: error: build.ninja:875: multiple rules generate runtime/src/libomp.dll.a [-w dupbuild=err]
Skip the logic with a second step to recreate the import library
for mingw targets. The MSVC specific logic for this relies on
running the static archiver with CMAKE_LINK_DEF_FILE_FLAG, which
with MS lib.exe (and llvm-lib) ignore the input object files and
just generates an import library - but mingw style tools don't
support this mode of operation. (By attemptinig the same, mingw tools
would generate a static library with the def file as one member.)
With mingw tools, the same can be achieved by invoking the dlltool
executable instead.
Instead of adding alternative logic for invoking dlltool, just skip
the second import library step, since neither GNU nor LLVM mingw
tools actually generate import libraries that link by ordinal - so
there's no need for a second import library.
Differential Revision: https://reviews.llvm.org/D143992
Need to assign the calculated lower bound back to temp variable,
otherwise incorrect value (upper bound instead of lower bound) might be
used.
Differential Revision: https://reviews.llvm.org/D144015
Current runtime implementation only checks for target allocator when libmemkind is
not available. This patch adds checks for target allocator regardless of the
presence of libmemkind library.
Differential Revision: https://reviews.llvm.org/D142582
than ordinal
This check-in changes the OpenMP build script to generate the Windows
import library that imports by name rather than ordinal to reduce
ordinals order dependency and promote runtime flavors compatibility
going forward. The existing ordinals ordering is preserved to maintain
backward compatibility.
Differential Revision: https://reviews.llvm.org/D143431
The GPU plugins have a dependency on the device libraries. Sometimes we
cannot build the device libraries because the user does not have a valid
`clang` to use or it was explicitly disabled. Currently this leads to a
transitive failure because we cannot meet this dependency. This patch
simply removes that dependency.
Fixes https://github.com/llvm/llvm-project/issues/60457
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D143196
With the NPM, we're now defaulting to preserving LCSSA, so a couple
of tests have changed slightly.
Differential Revision: https://reviews.llvm.org/D140982
The th_task_state was initialized from the master thread's value, or
from its memo stack, but this causes problems because neither of those
may have the right value at the right time. However, other threads in
the team are guaranteed to have the right values, so we change the
initialize the new threads' th_task_state from the th_task_state of
the last of the older threads in the hot team.
Differential Revision: https://reviews.llvm.org/D142247Fix#56307.
The memory sanitizer intercepts the memcpy() call but not the direct
assignment of last byte to 0. This leads the sanitizer to believe the
last byte of a string based on the kmp_str_buf_t type is uninitialized.
Hence, the eventual strlen() inside __kmp_env_dump() leads to an
use-of-uninitialized-value warning.
Using strncat() instead gives the sanitizer the information it needs.
Differential Revision: https://reviews.llvm.org/D143401Fixes#60501
The memory sanitizer intercepts the memcpy() call but not the direct
assignment of last byte to 0. This leads the sanitizer to believe the
last byte of a string based on the kmp_str_buf_t type is uninitialized.
Hence, the eventual strlen() inside __kmp_env_dump() leads to an
use-of-uninitialized-value warning.
Using strncat() instead gives the sanitizer the information it needs.
Differential Revision: https://reviews.llvm.org/D143401Fixes#60501
The NextGen plugins use the information regarding new mapping/unmappings to
lock/unlock the corresponding host buffer and speed up the host-device memory
transfers involving those buffers. The locking/unlocking is disabled by default
and can be enabled by the LIBOMPTARGET_LOCK_MAPPED_HOST_BUFFERS envar. The
envar accepts boolean values (on/off) and a special option:
- off: Do not lock mapped host buffers (default).
- on: Lock mapped host buffers automatically, but do not report lock
failures if the plugin fails to lock them.
- mandatory: Lock mapped host buffers automatically and treat locking failures
in the plugins as fatal errors. This option may be useful for
debugging purposes.
Differential Revision: https://reviews.llvm.org/D142514
Summary:
The previous patch also needed to apply this to the other AMDGPU plugin,
this will be removed soon but it should be correct while it's here at
least.
Previously, on non-Linux, amdgpu would get enabled whatever the CPU architecture.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D143017
While we potentially need to align partially mapped structs more than
the first member, we do not need to align past the struct itself. This
prevents us from moving the base pointer past the struct beginning too.
See https://reviews.llvm.org/D142508 for a discussion.
Reviewed By: pavelkopyl, grokos, jhuber6
Differential Revision: https://reviews.llvm.org/D142586
`check_loc` is not used if ITT is disabled or debug is off, causing a
compiler warning.
Reviewed By: jlpeyton
Differential Revision: https://reviews.llvm.org/D143004
Summary:
We added a new agent information enum in a previous commit. This was not
added to the dynamic HSA implementation so it failed to compile without
a local HSA install to use.
The next-gen plugin properly prints errors. This patch improves the
error messages by including the Node-ID of the GPU that failed as well
as a textual representation of the enumeration values.
Reviewed By: kevinsala
Differential Revision: https://reviews.llvm.org/D143192