Currently the GPU build requires the `LLVM_LIBC_FULL_BUILD` option to be
set. This patch changes the logic so that it is always enabled when
targeting the GPU. Also, this patch allows `LIBC_GPU_BUILD` and
`LIBC_GPU_ARCHITECTURES` to both enable a GPU build. Now, enabling the
GPU support should only require the following CMake:
```
-DLLVM_ENABLE_RUNTIMES=libc -DLIBC_GPU_ARCHITECTURES=gfx1030
```
Reviewed By: jdoerfert, sivachandra
Differential Revision: https://reviews.llvm.org/D146979
This patch adds the necessary build infrastructure to build and run the
integration tests on NVIDIA GPUs. The NVIDIA `nvlink` linker utility is
what is ultimately used to combine these files into a single executable
image. Unfortunately, their tool does not support static libraries. So
we need to link with every object directly instead. This could be solved
by impelementing a "wrapper" utility around `nvlink` like we used to use
for OpenMP. But for now this should be sufficient.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146861
A cast is necessary to avoid implicit narrowing warnings
when those are enabled.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D146886
Move the real LLVM_LIBC_FUNCTION macro definitions to
LLVM_LIBC_FUNCTION_IMPL and make LLVM_LIBC_FUNCTION a wrapper to
expand macros in its arguments. This makes it possible to
compile libc implementation and test files with -Dfunc=othername.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146863
These files are not used because the generic sqrt and sqrtf
functions already go through internal layers that reach the
machine-specific internal implemenations.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146865
This patch adds the necessary code to impelement the existing RPC client
/ server interface when targeting NVPTX GPUs. This follows closely to
the implementation in the AMDGPU version. This does not yet enable unit
testing as the `nvlink` linker does not support static libraries. So
that will need to be worked around.
I am ignoring the RPC duplication between the AMDGPU and NVPTX loaders. This
will be changed completely later so there's no point unifying the code at this
stage. The implementation was tested manually with the following file and
compilation flags.
```
namespace __llvm_libc {
void write_to_stderr(const char *msg);
void quick_exit(int);
} // namespace __llvm_libc
using namespace __llvm_libc;
int main(int argc, char **argv, char **envp) {
for (int i = 0; i < argc; ++i) {
write_to_stderr(argv[i]);
write_to_stderr("\n");
}
quick_exit(255);
}
```
```
$ clang++ crt1.o rpc_client.o quick_exit.o io.o main.cpp --target=nvptx64-nvidia-cuda -march=sm_70 -o image
$ ./nvptx_loader image 1 2 3
image
1
2
3
$ echo $?
255
```
Depends on D146681
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D146846
This patch adds a loader utility targeting the CUDA driver API to launch
NVPTX images called `nvptx_loader`. This takes a GPU image on the
command line and launches the `_start` kernel with the appropriate
arguments. The `_start` kernel is provided by the already implemented
`nvptx/start.cpp`. So, an application with a `main` function can be
compiled and run as follows.
```
clang++ --target=nvptx64-nvidia-cuda main.cpp crt1.o -march=sm_70 -o image
./nvptx_loader image args to kernel
```
This implementation is not tested and does not yet support RPC. This
requires further development to work around NVIDIA specific limitations
in atomics and linking.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D146681
The option -fno-omit-frame-pointer was accidentally added to the x86_64
longjmp target. This change not only removes it, but makes it
-fomit-frame-pointer.
Summary:
A recent patch allowed us to emit a callable kernel from freestanding
NVPTX code. This allows us to move away from using the CUDA language.
This has several advantages in that it works around an entire assortment
of errors I was seeing while implementing RPC for Nvidia.
This patch implements setjmp and longjmp in riscv using inline asm. The
following changes were required:
* Omit frame pointer: otherwise gcc won't allow us to use s0
* Use __attribute__((naked)): otherwise both gcc and clang will generate
function prologue and epilogue in both functions. This doesn't happen
in x86_64, so we guard it to only riscv
Furthermore, using __attribute__((naked)) causes two problems: we
can't use `return 0` (both gcc and clang) and the function arguments in
the function body (clang only), so we had to use a0 and a1 directly.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D145584
Summary:
These stored previously used `RELEASE`. This was done originally to
ensure that the stores to the shared memory buffer were flushed prior to
signaling that the other side can begin accessing it. However, this
should be accomplished by the memory fence above the store. This change
is required because NVPTX does not support non-relaxed atomics used on
unified shared memory.
The printf and fprintf implementations use our internal implementation
to improve performance when it's available, but this patch enables using
the public FILE API for overlay mode.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D146001
Memory fences are not handled by the NVPTX backend. We need to replace
them with a memory barrier intrinsic function. This doesn't include the
ordering, but should perform the necessary functionality, albeit slower.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D146725
Summary:
The startup code needs to include the environment pointer so we add this
to the arguments. Also we need to ensure that the `crt1.o` file is made
with `-fgpu-rdc` set so we can actually use it without undefined
reference errors.
Summary:
This startup code is only intended to be used internally, we shouldn't
export it under a conflicting name. In the future we may package this in
an exportable format.
Somehow having MBState and StructTmType in the definition for wchar was
causing test failures. This should fix those.
Differential Revision: https://reviews.llvm.org/D146476
This patch adds the wchar header, as well as the functions to convert to
and from wide chars. The header also sets up the definitions for wint
and wchar.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D145995
The bazel build is currently overlay mode only, so the FILE functions
are still out of reach for it, but sprintf only uses strings. This adds
targets for sprintf, snprintf, and all the interal printf pieces, as
well as tests.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D146100
This patch performs the same operation to copy over the `argv` array to
the `envp` array. This allows the GPU tests to use environment
variables.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146322
Some test code was doing loose conversions caught by compiler
warnings in the Fuchsia build. This included duplicated code
in a few tests that was reconsolidated with the existing header
file copy of the same functions.
The MemoryMatcher abstraction presumes gtest-style matcher support,
which is not available in Fuchsia's zxtest library. It's avoided
in favor of simpler memory-comparing assertions.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D146343
Summary:
Fixes the lack of a dependency after changing the order of some
includes. Also we weren't running any tests as the GPU was always
disabling them. Fix the logic.
This patch enables integration tests running on the GPU. This uses the
RPC interface implemented in D145913 to compile the necessary
dependencies for the integration test object. We can then use this to
compile the objects for the GPU directly and execute them using the AMD
HSA loader combined with its RPC server. For example, the compiler is
performing the following actions to execute the integration tests.
```
$ clang++ --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nostdlib -flto -ffreestanding \
crt1.o io.o quick_exit.o test.o rpc_client.o args_test.o -o image
$ ./amdhsa_loader image 1 2 5
args_test.cpp:24: Expected 'my_streq(argv[3], "3")' to be true, but is false
```
This currently only works with a single threaded client implementation
running on AMDGPU. Further work will implement multiple clients for AMD
and the ability to run on NVPTX as well.
Depends on D145913
Reviewed By: sivachandra, JonChesterfield
Differential Revision: https://reviews.llvm.org/D146256
This patch adds initial support for an RPC client / server architecture.
The GPU is unable to perform several system utilities on its own, so in
order to implement features like printing or memory allocation we need
to be able to communicate with the executing process. This is done via a
buffer of "sharable" memory. That is, a buffer with a unified pointer
that both the client and server can use to communicate.
The implementation here is based off of Jon Chesterfields minimal RPC
example in his work. We use an `inbox` and `outbox` to communicate
between if there is an RPC request and to signify when work is done.
We use a fixed-size buffer for the communication channel. This is fixed
size so that we can ensure that there is enough space for all
compute-units on the GPU to issue work to any of the ports. Right now
the implementation is single threaded so there is only a single buffer
that is not shared.
This implementation still has several features missing to be complete.
Such as multi-threaded support and asynchrnonous calls.
Depends on D145912
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D145913
This generalizes handling of the integration tests. We now implicitly
depend on the `libc.startup.${LIBC_TARGET_OS}.crt1` file rather than
passing it in manually. This simplifies the interface.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146237
Summary:
The changes in D146184 made the integration tests use the inhereted
dependencies from the startup code like a normal target. For the AArch64
target this resulted in the threads depenency not being pulled in
because it was not present in the original code.
All integration tests rely on the startup code to be run. Currently we
manually include a few of these dependencies that are relevant for the
Linux target. This patch changes this to make the integration test's
dependencies include all the dependencies of the startup code. This
simplifies the code and makes it easier to support different targets.
The changes here cause the integration test to be dependent on more
targets than previously necessary, but it should be fine.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146184
This patch enables the remaining calls from unistd.
The test cases had to be updated to:
1. Use SYS_symlinkat if SYS_symlink is not available
2. Use SYS_readlinkat if SYS_readlink is not available
3. Use SYS_unlinkat if SYS_unlink is not available
4. Use SYS_openat if SYS_open is not available
We also abort compilation if neither of the syscalls mentioned above are
available.
Differential Revision: https://reviews.llvm.org/D146161
In this patch we add support for the spawn lib in riscv.
Only small changes were required, the biggest one was to use of dup3
instead of dup2, if the latter is not available. This follows our
implementation of dup2.
Differential Revision: https://reviews.llvm.org/D146145
This patch removes some duplicated libs added to entrypoints.txt, adds
new libs supported to entrypoints.txt and updates header.txt
Differential Revision: https://reviews.llvm.org/D146065
Fixes#59277 - The main part of that bug has already been addressed. This commit
just adds documentation.
Reviewed By: jeffbailey
Differential Revision: https://reviews.llvm.org/D146115
The integration tests require the C memory functions as the compiler may
emit calls to them directly. The tests normally use the `__internal__`
variant that is built for testing, but these memory functions were
linked directly to preserve the entrypoint. Instead, we forward delcare
the internal versions and map the entrypoints to them manually inside
the integration test. This allows us to use the internal versions of
these files like the rest of the test objects.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146177