This patch fixes a bug in strftime's return value when the formatted
output exactly fills the buffer, not including the null terminator. The
previous check failed to account for the null terminator in this case,
incorrectly returning the written count instead of 0.
This options sets a compile option when building sources inside the
string directory, and this option affects string_utils.h. But
string_utils.h is #included from more places than just the string
directory (such as from __support/CPP/string.h), leading to both
narrow-reads in those cases, but more seriously, ODR violations when the
two different string_length implementations are included int he same
program.
Having this option at the top level avoids this problem.
* Add FILE type declaration, as it should be presented in `<wchar.h>`,
as well as in `<stdio.h>`
* Fix argument type in `wcsrtombs` / `wcsnrtombs` function - it should
be restrict pointer to `mbstate_t`. Add restrict qualifier to internal
implementation as well.
This brings us closer to being able to build libcxx with wide-character
support against llvm-libc headers.
Create a POSIX `<nl_types.h>` header with `catopen`, `catclose`, and
`catgets` function declarations.
Provide the stub/placeholder implementations which always return error.
This is consistent with the way
locales are currently (un-)implemented in llvm-libc.
Notably, providing `<nl_types.h>` fixes the last remaining issue with
building libc++ against llvm-libc
(on certain configuration of x86_64 Linux) after disabling threads and
wide-characters in libc++.
Based on the double precision's sin/cos fast path algorithm:
Step 1: Perform range reduction `y = x mod pi/8` with target errors <
2^-54.
This is because the worst case mod pi/8 for single precision is ~2^-31,
so to have up to 1 ULP errors from
the range reduction, the targeted errors should `be 2^(-31 - 23) =
2^-54`.
Step 2: Polynomial approximation
We use degree-5 and degree-4 polynomials to approximate sin and cos of
the reduced angle respectively.
Step 3: Combine the results using trig identities
```math
\begin{align*}
\sin(x) &= \sin(y) \cdot \cos(k \cdot \frac{\pi}{8}) + \cos(y) \cdot \sin(k \cdot \frac{\pi}{8}) \\
\cos(x) &= \cos(y) \cdot \cos(k \cdot \frac{\pi}{8}) - \sin(y) \cdot \sin(k \cdot \frac{\pi}{8})
\end{align*}
```
Overall errors: <= 3 ULPs for default rounding modes (tested
exhaustively).
Current limitation: large range reduction requires FMA instructions for
binary32. This restriction will be removed in the followup PR.
---------
Co-authored-by: Petr Hosek <phosek@google.com>
This patch adds the implementation for `inet_aton` function. Since this
function is not explicitly included in POSIX, I have marked it with
`llvm_libc_ext`. It is widely available and commonly used, and can also
be used to implement `inet_addr`, which is included in POSIX.
Reverts llvm/llvm-project#107540
This PR demonstrated improvements on micro-benchmarks but the gains did
not seem to materialize in production. We are reverting this change for
now to get more data. This PR might be reintegrated later once we're
more confident in its effects.
We can use UMAXV.4S to reduce the comparison result in a single
instruction. This improves performance by roughly 4% on Apple M1:
Summary
bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 ran
1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.02 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.05 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
(1 = original, 2 = a variant of this patch that uses UMAXV.16B, 3 = this patch)
Reviewers: michaelrj-google, gchatelet, overmighty, SchrodingerZhu
Pull Request: https://github.com/llvm/llvm-project/pull/99260
Summary:
Without slab reclaiming this interface is much simpler and it can speed
up cases with a lot of churn. Basically, wastes memory for performance.
Closes#159614
**Changes:**
- Initial implementation of rsqrt for single precision float
**Some small unrelated style changes to this PR (that I missed in my
rsqrtf16 PR):**
- Added extra - to the top comments to make it look nicer in
libc/shared/math/rsqrtf16.h
- Put rsqrtf16 inside of libc/src/__support/math/CMakeLists.txt in
sorted order
- Rearanged libc_math_function rsqrtf16 in Bazel to match alphabetical
order
This PR includes only one of the fxdivi functions (rdivi). It uses a
polynomial function for initial approximation followed by 4
newton-raphson iterations to calculate the reciprocal and finally
multiplies the numerator with it to get the result.
---------
Signed-off-by: Shreeyash Pandey <shreeyash335@gmail.com>
Closes https://github.com/llvm/llvm-project/issues/153666
This patch introduces a new centralized AUXV (auxiliary vector) handling
mechanism for LLVM libc on Linux, replacing the previous scattered
implementation across multiple files.
## Key Changes:
### New Files:
- **libc/src/__support/OSUtil/linux/auxv.h**: New header library
providing
a clean interface for AUXV access with:
- `auxv::Entry` struct for AUXV entries (type and value)
- `auxv::Vector` class with iterator support for traversing AUXV
- `auxv::get()` function for retrieving specific AUXV values
- Thread-safe initialization with fallback mechanisms (prctl and
/proc/self/auxv)
### Modified Files:
1. **libc/src/__support/OSUtil/linux/CMakeLists.txt**:
- Added `auxv` header library declaration with proper dependencies:
- libc.hdr.fcntl_macros
- libc.src.__support.OSUtil.osutil
- libc.src.__support.common
- libc.src.__support.CPP.optional
- libc.src.__support.threads.callonce
2. **libc/config/linux/app.h**:
- Removed `AuxEntry` struct (moved to auxv.h as `auxv::Entry`)
- Removed `auxv_ptr` from `AppProperties` struct
- Simplified application properties structure
3. **libc/src/sys/auxv/linux/getauxval.cpp**:
- Completely refactored to use new auxv.h interface
- Removed ~200 lines of complex initialization code
- Simplified to just call `auxv::get()` function
- Removed dependencies to external symbols (mman, prctl, fcntl, read,
close, open)
4. **libc/src/sys/auxv/linux/CMakeLists.txt**:
- Updated dependencies to use new auxv header library
- Removed dependencies to external symbols (prctl, mman, fcntl, unistd,
etc.)
5. **libc/startup/linux/do_start.cpp**:
- Updated to use new `auxv::Vector` interface
- Changed from pointer-based to iterator-based AUXV traversal
- Updated field names (`aux_entry->id` → `aux_entry.type`,
`aux_entry->value` → `aux_entry.val`)
- Added call to `auxv::Vector::initialize_unsafe()` for early AUXV setup
6. **libc/startup/linux/CMakeLists.txt**:
- Added dependency on `libc.src.__support.OSUtil.linux.auxv`
Summary:
This RPC call does the final exiting. The callbacks were handled on the
GPU side and this is only 'valid' in the pretend mode where we treat the
GPU like a CPU program. Doing this keeps us from crashing and burning
if people continue using the program while this is running as `exit`
would tear down the offloading library in memory and lead to segfaults.
This just drops everything where it is and lets the process manager
clean it up for us.
#160404
- Implement POSIX function "faccessat"
- Remove redundant param in facessat syscall in access implementation,
faccessat syscall does not take a flags arg
Fast strlen implementations (naive wide-reads, SIMD-based, and
x86_64/aarch64-optimized versions) all may perform
technically-out-of-bound reads, which leads to reports under ASan,
HWASan (on ARM machines), and also TSan (which also has the capability
to detect heap out-of-bound reads). So, we need to explicitly disable
instrumentation in all three cases.
Tragically, Clang didn't support `[[gnu::no_sanitize]]` syntax until
recently, and since we're supporting both GCC and Clang, we have to
revert to `__attribute__` syntax.
Summary:
This unifies the interface to just be a bunch of `load` and `store`
functions that optionally accept a mask / indices for gathers and
scatters with masks.
I had to rename this from `load` and `store` because it conflicts with
the other version in `op_generic`. I might just work around that with a
trait instead.
Summary:
This was originally kept separate so it didn't pollute the name space,
but now I'm thinking it's just easier to bundle it in with the default
interface. This means that we'll have a bit of extra code for people
using the server.h file to handle libc opcodes, but it's minimal (3
functions) and it simplifies this.
I'm doing this because I'm hoping to move the GPU tester binary to
liboffload which handles `libc` opcodes internally except these. This is
the easier option compared to adding a hook to register custom handlers
there.
Summary:
I landed a change in clang that allows integral vectors to implicitly
convert to boolean ones. This means I can simplify the interface and
remove the need to cast to bool on every use. Also do some other
cleanups of the traits.