intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-19 09:31:59 +08:00

Author	SHA1	Message	Date
Guillaume Chatelet	177583c914	[libc][NFC] Use SIZE_MAX instead of size_t(-1)	2023-06-29 12:21:43 +00:00
Tue Ly	de19101e33	[libc][NFC] Set rounding mode for sincosf exhaustive test.	2023-06-28 20:30:54 -04:00
Tue Ly	f320fefc4a	[libc][math] Implement erff function correctly rounded to all rounding modes. Implement correctly rounded `erff` functions. For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`. For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval: ``` erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14). ``` For `x < 0`, we can use the same formula as above, since the odd part is factored out. Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X: Reciprocal throughput (clock cycles / op) ``` $ ./perf.sh erff --path2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with -march=native (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call; -- CORE-MATH reciprocal throughput -- with -march=x86-64-v2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call; -- LIBC reciprocal throughput -- with -mavx2 -mfma (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call; -- LIBC reciprocal throughput -- with -msse4.2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call; ``` and latency (clock cycles / op): ``` $ ./perf.sh erff --path2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with -march=native (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call; -- CORE-MATH latency -- with -march=x86-64-v2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call; -- LIBC latency -- with -mavx2 -mfma (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call; -- LIBC latency -- with -msse4.2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call; ``` Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153683	2023-06-28 13:58:37 -04:00
Guillaume Chatelet	b3b54131d0	[libc][NFC] Separate avx/no-avx x86 memcpy implementations Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D153958	2023-06-28 13:56:56 +00:00
Tue Ly	e9074d019e	[libc] Fix missing dependency and linking option for sqrtf exhaustive test.	2023-06-28 08:13:53 -04:00
Tue Ly	9532074a9d	[libc][math] Clean up exhaustive tests implementations. Clean up exhaustive tests. Let check functions return number of failures instead of passed/failed. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D153682	2023-06-28 07:58:46 -04:00
Joseph Huber	31c154881c	[libc] Allow the RPC client to be initialized via a H2D memcpy The RPC client must be initialized to set a pointer to the underlying buffer. This is currently done with the `reset` method which may not be ideal for the use-case. We want runtimes to be able to initialize this without needing to call a kernel. Recent changes allowed the `Client` type to be trivially copyable. That means we can create a client on the server side and then copy it over. To that end we take the existing externally visible symbol and initialize it to the client's pointer. Therefore we can look up the symbol and copy it over once loaded. No test currently, I tested with a demo OpenMP application but couldn't think of how to put that in-tree. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D153633	2023-06-26 10:41:32 -05:00
Tue Ly	d9428945df	[libc][Obvious] Fix docs warning.	2023-06-26 14:32:28 +00:00
Joseph Huber	6fb57f7349	[libc] Add basic utility support for timing functions on the GPU This patch adds the utilities for the clocks on the GPU. This is done prior to exporting it via some other interface and is mainly just done so they are availible if we wish to do internal testing. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D153388	2023-06-23 15:51:28 -05:00
Joseph Huber	a24a1e042f	[libc][Obvious] Use the current binary dir instead of the base one Summary: We include things off of `libc/include` so we need to use the current binary dir when setting up the directory.	2023-06-23 14:33:45 -05:00
Joseph Huber	3368a92b0f	[libc] Fix installing GPU headers The patch in D152592 changed the logic for this. We could never check if we were on the GPU as this was before the variable was defined so I moved it later. Secondly, we cannot use the `LLVM_BINARY_DIR` here, and I do not know if that works in general. The problem is that it will isntall the headers under a normal path outside of the `LLVM_ENABLE_RUNTIMES` build. I don't know if that's correct for the other targets, but for the GPU I need to set it back to the CMAKE_BINARY_DIR so it works. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D153637	2023-06-23 13:47:52 -05:00
Jon Chesterfield	d4d8cd8446	[libc] Factor specifics of packet type out of process NFC. Simplifies process slightly, gives more options for testing it. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D153604	2023-06-23 03:45:23 +01:00
Jon Chesterfield	7e799342e1	[libc] Simplify access permissions, change to composition over inheritance Private member variable minimises scope of access to Process Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D153603	2023-06-23 02:54:22 +01:00
Jon Chesterfield	85c66f5d18	[libc] Instantiate and sanity check rpc class CMake plumbing cargo culted from other tests. Minor changes to Process to allow statically allocating a buffer. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D153594	2023-06-23 02:11:18 +01:00
Jon Chesterfield	65a4ce09f8	[libc] Can build amdgpu libc even if rocm is missing Clang defaults to failing to build if it can't find rocm device libs Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D153581	2023-06-22 21:18:44 +01:00
Jon Chesterfield	578d229a1a	[libc] Move fences into outbox/wait-for-ownership test Also moves the wait-until-inbox-changes test into a shared method. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D153573	2023-06-22 18:14:41 +01:00
Jun Zhang	ce378fcb9e	[libc][NFC] Simplify return value logic in set_thread_ptr() Signed-off-by: Jun Zhang <jun@junz.org> Differential Revision: https://reviews.llvm.org/D153572	2023-06-23 00:47:48 +08:00
Jon Chesterfield	ba01a2c608	[libc] Add memory fences to device-local locking calls This makes the interface less error prone. The acquire was previously forgotten. Release is currently missing if recv() is the last operation made before close. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D153571	2023-06-22 17:46:09 +01:00
Petr Hosek	f3b64887de	[libc] Place headers in the right include directory When LLVM_ENABLE_PER_TARGET_RUNTIME_DIR is enabled, place headers in `include/<target>` directory, otherwise use `include/`. Differential Revision: https://reviews.llvm.org/D152592	2023-06-22 06:22:32 +00:00
Joseph Huber	e0b487bfc0	[libc] Rename and install the RPC server interface This patch prepares the RPC interface to be installed. We place this in the existing `llvm-gpu-none` directory as it will also give us access to the generated `libc` headers for the opcodes. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D153040	2023-06-21 11:26:24 -05:00
Joseph Huber	4272d09196	[libc][NFC] Cleanup the RPC server implementation prior to installing This does some simple cleanup prior to landing the patch to install these. Differential Revision: https://reviews.llvm.org/D153439	2023-06-21 11:14:20 -05:00
Joseph Huber	1f99526d9d	[libc][NFC] Move `__has_builtin` to `LIBC_HAS_BUILTIN` Summary: These should use the common `LIBC_HAS_BUILTIN` even if we will only compile this with `clang`.	2023-06-21 09:50:40 -05:00
Guillaume Chatelet	bd1cba9f4f	Revert D148717 "[libc] Improve memcmp latency and codegen" Once integrated in our codebase the patch triggered a bunch of failing tests. We do not yet understand where the bug is but we revert it to move forward with integration. This reverts commit `5e32765c15`.	2023-06-21 12:37:14 +00:00
Petr Hosek	9fa7998555	[libc] Support for riscv32 This change adds basic support for baremetal riscv32 configuration. Differential Revision: https://reviews.llvm.org/D152563	2023-06-21 07:11:22 +00:00
Siva Chandra Reddy	75d70b7306	[libc] Make close function of the internal File class cleanup the file object. Before this change, a separate static method named cleanup was used to cleanup the file. Instead, now the close method cleans up the full file object using the platform's close function. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D153377	2023-06-21 05:05:04 +00:00
Joseph Huber	ee6ace27e0	[libc] Remove disabled pass after performance improvement This pass used to cause huge compile time regressions, That has been address and can now be re-added. Differential Revision: https://reviews.llvm.org/D153374	2023-06-20 15:48:02 -05:00
Joseph Huber	964a535bfa	[libc] Remove flexible array and replace with a template Currently the implementation of the RPC interface requires a flexible struct. This caused problems when compilling the RPC server with GCC as would be required if trying to export the RPC server interface. This required that we either move to the `x[1]` workaround or make it a template parameter. While just using `x[1]` would be much less noisy, this is technically undefined behavior. For this reason I elected to use templates. The downside to using templates is that the server code must now be able to handle multiple different types at runtime. I was unable to find a good solution that didn't rely on type erasure so I simply branch off of the given value. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D153304	2023-06-20 15:22:37 -05:00
Mikhail R. Gadelha	a2df87c2b0	[libc] Fix libmath test compilation when using UInt<T> This patch: (1) adds the add_with_carry_const and sub_with_borrow_const constexpr calls to add and sub, respectively. Both add and sub are constexpr calls and were call the non-constexpr version of add/sub_with_borrow. (2) adds explicit UIntType construct calls in some fp tests. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D150223	2023-06-20 15:41:18 -03:00
Tue Ly	46aa659a32	[libc][math] Improve exp2f performance. Re-organize special cases and add a special case when `\|x\| < 2^-5`. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153134	2023-06-20 09:34:20 -04:00
Tue Ly	0ae409c0d7	[libc][math] Slightly improve sinhf and coshf performance. Re-order exceptional branches and slightly adjust the evaluation. Depends on https://reviews.llvm.org/D153026 . Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153062	2023-06-20 09:27:28 -04:00
Tue Ly	5dbd5118ec	[libc][math] Improve tanhf performance. Re-order exceptional branches and slightly adjust the evaluation. Performance tested with the CORE-MATH project on AMD EPYC 7B12 (clocks/op) Reciprocal throughputs: ``` --- BEFORE --- $ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 7.794 + 0.102 clc/call; Median-Min = 0.066 clc/call; Max = 8.267 clc/call; [####################] 100 %. (with -msse4.2) Ntrial = 20 ; Min = 10.783 + 0.172 clc/call; Median-Min = 0.144 clc/call; Max = 11.446 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 18.926 + 0.381 clc/call; Median-Min = 0.342 clc/call; Max = 19.623 clc/call; --- AFTER --- $ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 6.598 + 0.085 clc/call; Median-Min = 0.052 clc/call; Max = 6.868 clc/call; [####################] 100 % (with -msse4.2) Ntrial = 20 ; Min = 9.245 + 0.304 clc/call; Median-Min = 0.248 clc/call; Max = 10.675 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 11.724 + 0.440 clc/call; Median-Min = 0.444 clc/call; Max = 12.262 clc/call; ``` Latency: ``` --- BEFORE --- $ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 38.821 + 0.157 clc/call; Median-Min = 0.122 clc/call; Max = 39.539 clc/call; [####################] 100 %. (with -msse4.2) Ntrial = 20 ; Min = 44.767 + 0.766 clc/call; Median-Min = 0.681 clc/call; Max = 45.951 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 55.055 + 1.512 clc/call; Median-Min = 1.571 clc/call; Max = 57.039 clc/call; --- AFTER --- $ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 36.147 + 0.194 clc/call; Median-Min = 0.181 clc/call; Max = 36.536 clc/call; [####################] 100 % (with -msse4.2) Ntrial = 20 ; Min = 40.904 + 0.728 clc/call; Median-Min = 0.557 clc/call; Max = 42.231 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 55.776 + 0.557 clc/call; Median-Min = 0.542 clc/call; Max = 56.551 clc/call; ``` Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153026	2023-06-20 09:25:07 -04:00
Siva Chandra Reddy	21e1651c0c	[libc] Remove the requirement of a platform-flush operation in File abstraction. The libc flush operation is not supposed to trigger a platform level flush operation. See "Notes" on this Linux man page: https://man7.org/linux/man-pages/man3/fflush.3.html Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153182	2023-06-19 18:38:29 +00:00
Joseph Huber	5a8fc41937	[libc] Disable atomic optimizations for `libc` AMDGPU builds Recently the AMDGPU backend automatically enables a pass to optimize atomics. This results in the LTO build taking about 10x longer in all cases. For now we disable this by default as was the case before the patch in D152649. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D153232	2023-06-19 03:25:51 -05:00
Alfred Persson Forsberg	c32ba7d5e0	[libc] [NFC] malloc.h: fix include guard typo Differential Revision: https://reviews.llvm.org/D153231	2023-06-18 23:08:25 +01:00
Joseph Huber	70b1c3999c	[libc][Docs] Add some motivation for the GPU libc This provides some basic motivation behind the GPU libc. Suggests are welcome. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D152028	2023-06-16 15:19:45 -05:00
Joseph Huber	d663da07e3	[libc][Obvious] Fix problem with the variable used for the jobs Summary: There was an issue with the variable we were using to conditonally set the job number for the GPU.	2023-06-16 14:11:53 -05:00
Joseph Huber	27f326334f	[libc] Add an option to use a job pool for GPU tests Currently the GPU has restrictions on how many tests can be run in parallel due to resource constraints. However, building these tests can take a long time so we want to be able to build them in parallel. This patch introduces the option `LIBC_GPU_TEST_JOBS` which is set to the number of threads to run in parallel. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D153157	2023-06-16 14:06:16 -05:00
Joseph Huber	485e2de6d5	[libc][nfc] Silence two warnings in tests These currently give warnings for unused variables or a default case where everything is covered. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D153137	2023-06-16 12:52:06 -05:00
Joseph Huber	ed34cb2cd7	[libc] Add a test for `fputs` to check using `stdout` and `stderr` This patch adds a test directly for the `fputs` function similar to the existing `puts` test. This lets us know that the default file pointers are function and the `fputs` interface works. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D152288	2023-06-16 11:01:55 -05:00
Alex Brachet	61c9052cec	[libc] Add LIBC_INLINE_VAR for inline variables These are the only variables I could find that use LIBC_INLINE. Note, these are namespace scoped constexpr so local linkage is implied. inline is useful here to silence clang's unused-const-variable variable. For Fuchsia, the distinction between LIBC_INLINE and LIBC_INLINE_VAR is helpful because we define LIBC_INLINE as `[[gnu::always_inline]] inline` when building with gcc. This isn't meaningful on variables. Alternatively, we could make these variables simply constexpr and also add `[[maybe_unused]]` Reviewed By: sivachandra, mcgrathr Differential Revision: https://reviews.llvm.org/D152951	2023-06-16 15:46:32 +00:00
Joseph Huber	490958b9ea	[libc][obvious] Actually return the value from `malloc` for NVPTX Switching to this interface we neglected to actually write the output from the malloc call to the RPC buffer. Fix this so the tests pass again. Differential Revision: https://reviews.llvm.org/D153069	2023-06-15 15:13:11 -05:00
Joseph Huber	7e8b0c27f2	[libc] Disable the strtod and strtold tests on NVPTX These tests have a single line that fails with a value off-by-one, see https://lab.llvm.org/buildbot/#/builders/46/builds/50055/steps/12/logs/stdio . Disable these for now so we can figure out what the error is later. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D153056	2023-06-15 13:29:42 -05:00
Joseph Huber	dcdfc963d7	[libc] Export GPU extensions to `libc` for external use The GPU port of the LLVM C library needs to export a few extensions to the interface such that users can interface with it. This patch adds the necessary logic to define a GPU extension. Currently, this only exports a `rpc_reset_client` function. This allows us to use the server in D147054 to set up the RPC interface outside of `libc`. Depends on https://reviews.llvm.org/D147054 Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D152283	2023-06-15 11:02:24 -05:00
Joseph Huber	719d77ed28	[libc] Begin implementing a library for the RPC server This patch begins providing a generic static library that wraps around the raw `rpc.h` interface. As discussed in the corresponding RFC, https://discourse.llvm.org/t/rfc-libc-exporting-the-rpc-interface-for-the-gpu-libc/71030, we want to begin exporting RPC services to external users. In order to do this we decided to not expose the `rpc.h` header by wrapping around its functionality. This is done with a C-interface as we make heavy use of callbacks and allows us to provide a predictable interface. Reviewed By: JonChesterfield, sivachandra Differential Revision: https://reviews.llvm.org/D147054	2023-06-15 11:02:23 -05:00
Joseph Huber	fd14f7adbe	[libc] Enable conversion functions on the GPU These functions were previously removed due to problems running the tests with `errno` in them. This was resolved previously by making the internal implementation of these functions use a global `errno` so that tests can still use `errno` functionality as long as they are run with a single thread. This allows us to re-enable these tests as a previous patch has also resolved the issue where the `stdlib` tests could not be hermetic due to the dependence on system rounding functions. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D153016	2023-06-15 09:38:12 -05:00
Joseph Huber	a09bec6459	[libc] Move the definitions of the standard IO streams to the platform This patch moves the definitions of the standard IO streams to the platform file definition. This is necessary because previously we had a level of indirection where the stream's `FILE ` was initialized based on the pointer to the internal `__llvm_libc` version. This cannot be resolved ahead of time by the linker because the address will not be known until runtime. This caused the previous implementation to emit a global constructor to initialize the pointer to the actual `FILE `. By moving these definitions so that we can bind their address to the original file type we can avoid this global constructor. This file keeps the entrypoints, but makes them empty files only containing an external reference. This is so they still appear as entrypoints and get emitted as declarations in the generated headers. Reviewed By: lntue, sivachandra Differential Revision: https://reviews.llvm.org/D152983	2023-06-15 07:06:43 -05:00
Joseph Huber	505829eacf	[libc][obvious] Fix the FMA implementation on the GPU Summary: This doesn't include the type_traits to perform the indirection, nor does it return the value.	2023-06-14 13:33:25 -05:00
Joseph Huber	f205fbbb01	[libc] Add support for FMA in the GPU utilities This adds the generic FMA utilities for the GPU. We implement these through the builtins which map to the FMA instructions in the ISA. These may not have strict compliance with other assumptions in the the `libc` such as rounding modes. I've included the relevant information on how the GPU vendors map the behaviour. This should help make it easier to implement some future generic versions. Depends on D152486 Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D152923	2023-06-14 12:59:18 -05:00
Joseph Huber	8060d96aed	[libc] Begin implementing a 'libmgpu.a' for math on the GPU This patch adds an outline to begin adding a `libmgpu.a` file for provindg math on the GPU. Currently, this is most likely going to be wrapping around existing vendor libraries and placing them in a more usable format. Long term, we would like to provide our own implementations of math functions that can be used instead. This patch works by simply forwarding the calls to the standard C math library calls like `sin` to the appropriate vendor call like `__nv_sin`. Currently, we will use the vendor libraries directly and link them in via `-mlink-builtin-bitcode`. This is necessary because of bizarre interactions with the generic bitcode, `-mlink-builtin-bitcode` internalizes and only links in the used symbols, furthermore is propagates the target's default attributes and its the only "truly" correct way to pull in these vendor bitcode libraries without error. If the vendor libraries are not availible at build time, we will still create the `libmgpu.a`, but we will expect that the vendor library definitions will be provided by the user's compilation as is made possible by https://reviews.llvm.org/D152442. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D152486	2023-06-14 12:59:15 -05:00
Tue Ly	53d4057622	[libc] Fix merging issue with test/src/math/exhaustive/expm1f_test	2023-06-14 11:00:13 -04:00

1 2 3 4 5 ...

1963 Commits