This patch adds the `rpc_host_call` function as a GPU extension. This is
exported from the `libc` project to use the RPC interface to call a
function pointer via RPC any copying the arguments by-value. The
interface can only support a single void pointer argument much like
pthreads. The function call here is the bare-bones version of what's
required for OpenMP reverse offloading. Full support will require
interfacing with the mapping table, nowait support, etc.
I decided to test this interface in `libomptarget` as that will be the
primary consumer and it would be more difficult to make a test in `libc`
due to the testing infrastructure not really having a concept of the
"host" as it runs directly on the GPU as if it were a CPU target.
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D155003
If we store a constant in an ICV it is easier for the optimizer to
propagate it. Since we often use the full block for the thread limit and
the parallel team size, we can instead replace that dynamic value with a
constant that otherwise cannot occur, here 0.
We ended up with `llvm.assume(icmp ne ptr as(4) null, as(4) @str)`
because the string in address space 4 was not known to be non-null.
There is no need to create these assumes.
This is the only place that defines this prefix in a header file and
was thus overriding and redefining other users of it. If we must use it
in a header file, at least repsect its old values.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D155316
This points users to the `libc` documentation and explains the basics of
how it's used inside the runtime.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D155318
The 'RPCHandleTy' was intended to capture the intention that a specific
device owns its slot in the RPC server. However, this required creating
a temporary store to hold these pointers. This was causing really weird
spurious failure due to undefined behaviour in the order of library
teardown. For example, the x64 plugin would be torn down, set this to
some invalid memory, and then the CUDA plugin would crash. Rather than
spend the time to fully diagnose this problem I found it pertinent to
simply remove the failure mode.
This patch removes this indirection so now the usage of the RPC server
must always be done with the intended device. This just requires some
extra handling for the AMDGPU indirection where we need to store a
reference to the device.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154971
Add CHECK_OPENMP_ENV environment variable which will be passed to environment
variables for test (make check-* target). This provides a handy way to
exercise various openmp code with different settings during development.
For example, to change default barrier pattern:
```
$ env CHECK_OPENMP_ENV="KMP_FORKJOIN_BARRIER_PATTERN=hier,hier \
KMP_PLAIN_BARRIER_PATTERN=hier,hier \
KMP_REDUCTION_BARRIER_PATTERN=hier,hier" \
ninja check-openmp
```
Even with this, each test can set appropriate environment variables if needed
as before.
Also, this commit adds missing documention about how to run tests in README.
Patch provided by t-msn
Differential Revision: https://reviews.llvm.org/D122645
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
This is fixing all files missed in b0abd4893f and
39d8e6e22c.
Differential Revision: https://reviews.llvm.org/D154763
OpenMP 5.1 added OMP_TOOL_VERBOSE_INIT. This env variable is
extremely helpful to understand the issue when loading a tool fails
unexpectedly (e.g., errors from dlopen, when the libc available at
runtime is older than libc used at compile time of the tool -> missed
to load the right gcc module).
This patch replicates the verbose init code from libomp watching
out for a different env variable. Similar to
CLIENT_TOOL_LIBRARIES_VAR, a tool can define the name of
the env var by defining CLIENT_TOOL_VERBOSE_INIT_VAR
before including ompt-multiplex.h.
Alternatively, a tool can define OMPT_MULTIPLEX_TOOL_NAME
to specify the tool name which will be the prefix for both
_TOOL_LIBRARIES and _VERBOSE_INIT var.
Finally, if none of the two macros is defined, the header will
print a compiler warning and look at OMP_TOOL_VERBOSE_INIT.
Patch prepared by Semih Burak
Differential Revision: https://reviews.llvm.org/D112809
This patch adds the intial support for running an RPC server in
libomptarget to handle host services. We interface with the library
provided by the `libc` project to stand up a basic server. We introduce
a new type that is controlled by the plugin and has each device
intialize its interface. We then run a basic server to check the RPC
buffer.
This patch does not fully implement the interface. In the future each
plugin will want to define special handlers via the interface to support
things like malloc or H2D copies coming from RPC. We will also want to
allow the plugin to specify t he number of ports. This is currently
capped in the implementation but will be adjusted soon.
Right now running the server is handled by whatever thread ends up doing
the waiting. This is probably not a completely sound solution but I am
not overly familiar with the behaviour of OpenMP tasks and what would be
required here. This works okay with synchrnous regions, and somewhat
fine with `nowait` regions, but I've observed some weird behavior when
one of those regions calls `exit`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154312
The OpenMP specification mentions that omp_test_lock and
omp_test_nest_lock dispatch OMPT callbacks with ompt_mutex_test_lock
and ompt_mutex_test_nest_lock for their kind respectively. Previously,
the values ompt_mutex_lock and ompt_mutex_nest_lock were used. This
could cause issues in application relying on the kind to correctly
determine lock states. This commit changes the kind to the expected
ones.
Also update callback.h and OMPT tests to reflect this change.
Patch prepared by Thyre
Differential Review: https://reviews.llvm.org/D153028
Differential Review: https://reviews.llvm.org/D153031
Differential Review: https://reviews.llvm.org/D153032
OpenMP 5.1 replaced callback ompt_callback_master_t by
ompt_callback_masked_t. In order to stick to the standard,
the implementation is updated accordingly.
Patch prepared by Semih Burak
Differential Revision: https://reviews.llvm.org/D112798
In the functions ompt_multiplex_get_own_ompt_data
and ompt_multiplex_get_client_ompt_data in addition to
data being NULL, also the void pointer field "ptr" of
"data" could be NULL, leading to a subsequent
segfault.
This patch add the corresponding checks.
Patch prepared by Semih Burak
Differential Revision: https://reviews.llvm.org/D112806
The semantic of depend(out:omp_all_memory) is quite similar to taskwait in
that it separates all tasks (with dependency) created before an
all_memory-task from all tasks (with dependency) created after an
all_memory-task.
Only a single of such tasks can execute at a time. Similar to taskwait, we
have a CV (AllMemory[1]) in the generating task to express the dependency
sink semantic of an all_memory-task. In addition, AllMemory[0] describes the
dependency source semantic of an all_memory-task. All tasks with dependency
create an HB-arc towards the sink and terminate an HB-arc from the source.
Since we expect that not many applications will use such dependency, the
support for handling the synchronization semantic is off by default and
can be turned on using ARCHER_OPTION="all_memory=1". The most costly part
is the precautionary posting of an HB-arc towards the sink, which represents
a potentially contentious write from all concurrently executing sibling tasks.
A warning is printed at runtime, when the option is off while such dependency
is observed. In most cases the lazy activation will still lead to false alerts.
Differential Revision: https://reviews.llvm.org/D111895
omp_all_memory currently has no representation in OMPT.
Adding new dependency flags as suggested by omp-lang issue #3007.
Differential Revision: https://reviews.llvm.org/D111788
The next-gen plugins didn't correctly configure tests and were never
actually being run. Since deleting the old plugin we stopped getting
`libomptarget` tests. This patch fixes the issue and allows the targets
to be built
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154619
It's time to remove the old plugins as the next-gen has already been set
to default in LLVM 16.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D142820
These plugins are unmaintained and are not in a workable state. The VE
plugin has not been touched for years and has never had any running
tests. The remote plugin is in an unfinished state and is not production
ready upstream. These will need to be ported to the new nextgen
interface in the future if they are needed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154548
AMDGPU provides a fixed frequency clock since some generations back.
However, the frequency is variable by card and must be looked up at
runtime. This patch adds a new device environment line for the clock
frequency so that we can use it in the same way as NVPTX. This is the
correct implementation and the version in ASO should be replaced.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D154456
Summary:
This code used `LIBOMPTARGET_DEBUG` which is not the macro name, but the
environment variable. This caused this portion to always be disabled. In
the long run we should aim for this to always be availible as it's
useful for other diagnostic message.
In the case of partially mapped structs, libomptarget sometimes adds
padding to device allocations to ensure they are aligned properly.
However, without this patch, it considers that padding to be mapped to
the host, which can cause presence checks (e.g.,
`omp_target_is_present` or a `present` modifier) to misbehave for
unmapped parts of the struct. This patch keeps the padding but treats
it as unmapped. See the new test case for examples.
Reviewed By: grokos, jdoerfert
Differential Revision: https://reviews.llvm.org/D149685
With https://reviews.llvm.org/D137524, memory scope and ordering
attributes are being used to generate the required instructions for
atomic inc/dec on AMDGPU. This patch adds the memory scope attribute to
the atomic::inc API and uses the device scope in reduction. Without
the device scope in atomic_inc, the default system scope leads to
unnecessary L2 write-backs/invalidates.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D154172
A previous patch by @arsenm adjusted these to find the `amdgpu-arch`
tool correctly if we do a `LLVM_ENABLE_PROJECTS` build. This patch
applies the same to `nvptx-arch` tool to keep it consistent.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D154107
Flang currently supports offloading for AMD GPUs. This patch establishes a test structure for Fortran offloading tests in libomptarget.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D148778
SubtargetFeature.h is currently part of MC while it doesn't depend on
anything in MC. Since some LLVM components might have the need to work
with target features without necessarily needing MC, it might be
worthwhile to move SubtargetFeature.h to a different location. This will
reduce the dependencies of said components.
Note that I choose TargetParser as the destination because that's where
Triple lives and SubtargetFeatures feels related to that.
This issues came up during a JITLink review (D149522). JITLink would
like to avoid a dependency on MC while still needing to store target
features.
Reviewed By: MaskRay, arsenm
Differential Revision: https://reviews.llvm.org/D150549