This partially reverts #124200. Rather than using a CMake option to
control whether to enable `-fno-math-errno`, use LIBC_MATH_NO_ERRNO
configuration option. While there might be other cases when we want to
set `-fno-math-errno`, having LIBC_MATH_NO_ERRNO imply it should be
always safe and represents a reasonable starting point.
Reverts llvm/llvm-project#107540
This PR demonstrated improvements on micro-benchmarks but the gains did
not seem to materialize in production. We are reverting this change for
now to get more data. This PR might be reintegrated later once we're
more confident in its effects.
We can use UMAXV.4S to reduce the comparison result in a single
instruction. This improves performance by roughly 4% on Apple M1:
Summary
bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 ran
1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.02 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.05 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
(1 = original, 2 = a variant of this patch that uses UMAXV.16B, 3 = this patch)
Reviewers: michaelrj-google, gchatelet, overmighty, SchrodingerZhu
Pull Request: https://github.com/llvm/llvm-project/pull/99260
Summary:
Without slab reclaiming this interface is much simpler and it can speed
up cases with a lot of churn. Basically, wastes memory for performance.
The VSCode instructions were stale from the transition to the runtimes
directory. This updates will all the options give on the Full Host Build
page.
Tested:
Built libc target.
Closes#159614
**Changes:**
- Initial implementation of rsqrt for single precision float
**Some small unrelated style changes to this PR (that I missed in my
rsqrtf16 PR):**
- Added extra - to the top comments to make it look nicer in
libc/shared/math/rsqrtf16.h
- Put rsqrtf16 inside of libc/src/__support/math/CMakeLists.txt in
sorted order
- Rearanged libc_math_function rsqrtf16 in Bazel to match alphabetical
order
Summary:
Previous change made us no longer link `exit` by default which implied
the RPC server. This is a required symbol for the loader. This will be
fixed later when I port the loader to LLVMOffload. For now just work
around it to fix the bots.
This PR includes only one of the fxdivi functions (rdivi). It uses a
polynomial function for initial approximation followed by 4
newton-raphson iterations to calculate the reciprocal and finally
multiplies the numerator with it to get the result.
---------
Signed-off-by: Shreeyash Pandey <shreeyash335@gmail.com>
Summary:
We use the infrastructure to stand up a pretend hosted environment on
the GPU. Part of that is calling exit codes and handling the callback.
Exiting from inside a GPU region is problematic as it actually relies on
a lot of GPU magic behind the scenes. This is at least *correct* now as
we use `quick_exit` on the CPU when the GPU calls `exit`. However,
calling `quick_exit` will interfere with instrumentation or benchmarking
that expects a nice teardown order. For normal execution we should do
the friendly option and let the loader utility clean everything up
manually.
Closes https://github.com/llvm/llvm-project/issues/153666
This patch introduces a new centralized AUXV (auxiliary vector) handling
mechanism for LLVM libc on Linux, replacing the previous scattered
implementation across multiple files.
## Key Changes:
### New Files:
- **libc/src/__support/OSUtil/linux/auxv.h**: New header library
providing
a clean interface for AUXV access with:
- `auxv::Entry` struct for AUXV entries (type and value)
- `auxv::Vector` class with iterator support for traversing AUXV
- `auxv::get()` function for retrieving specific AUXV values
- Thread-safe initialization with fallback mechanisms (prctl and
/proc/self/auxv)
### Modified Files:
1. **libc/src/__support/OSUtil/linux/CMakeLists.txt**:
- Added `auxv` header library declaration with proper dependencies:
- libc.hdr.fcntl_macros
- libc.src.__support.OSUtil.osutil
- libc.src.__support.common
- libc.src.__support.CPP.optional
- libc.src.__support.threads.callonce
2. **libc/config/linux/app.h**:
- Removed `AuxEntry` struct (moved to auxv.h as `auxv::Entry`)
- Removed `auxv_ptr` from `AppProperties` struct
- Simplified application properties structure
3. **libc/src/sys/auxv/linux/getauxval.cpp**:
- Completely refactored to use new auxv.h interface
- Removed ~200 lines of complex initialization code
- Simplified to just call `auxv::get()` function
- Removed dependencies to external symbols (mman, prctl, fcntl, read,
close, open)
4. **libc/src/sys/auxv/linux/CMakeLists.txt**:
- Updated dependencies to use new auxv header library
- Removed dependencies to external symbols (prctl, mman, fcntl, unistd,
etc.)
5. **libc/startup/linux/do_start.cpp**:
- Updated to use new `auxv::Vector` interface
- Changed from pointer-based to iterator-based AUXV traversal
- Updated field names (`aux_entry->id` → `aux_entry.type`,
`aux_entry->value` → `aux_entry.val`)
- Added call to `auxv::Vector::initialize_unsafe()` for early AUXV setup
6. **libc/startup/linux/CMakeLists.txt**:
- Added dependency on `libc.src.__support.OSUtil.linux.auxv`
Summary:
This RPC call does the final exiting. The callbacks were handled on the
GPU side and this is only 'valid' in the pretend mode where we treat the
GPU like a CPU program. Doing this keeps us from crashing and burning
if people continue using the program while this is running as `exit`
would tear down the offloading library in memory and lead to segfaults.
This just drops everything where it is and lets the process manager
clean it up for us.
#160404
- Implement POSIX function "faccessat"
- Remove redundant param in facessat syscall in access implementation,
faccessat syscall does not take a flags arg
Fast strlen implementations (naive wide-reads, SIMD-based, and
x86_64/aarch64-optimized versions) all may perform
technically-out-of-bound reads, which leads to reports under ASan,
HWASan (on ARM machines), and also TSan (which also has the capability
to detect heap out-of-bound reads). So, we need to explicitly disable
instrumentation in all three cases.
Tragically, Clang didn't support `[[gnu::no_sanitize]]` syntax until
recently, and since we're supporting both GCC and Clang, we have to
revert to `__attribute__` syntax.
Summary:
This unifies the interface to just be a bunch of `load` and `store`
functions that optionally accept a mask / indices for gathers and
scatters with masks.
I had to rename this from `load` and `store` because it conflicts with
the other version in `op_generic`. I might just work around that with a
trait instead.
Summary:
The libcxx and compiler-rt already install their headers according
to the triple if this option is enabled. We should do this by default so
these don't get mixed up when people potentially combine multiple
toolchains.