Summary:
Without slab reclaiming this interface is much simpler and it can speed
up cases with a lot of churn. Basically, wastes memory for performance.
The VSCode instructions were stale from the transition to the runtimes
directory. This updates will all the options give on the Full Host Build
page.
Tested:
Built libc target.
Closes#159614
**Changes:**
- Initial implementation of rsqrt for single precision float
**Some small unrelated style changes to this PR (that I missed in my
rsqrtf16 PR):**
- Added extra - to the top comments to make it look nicer in
libc/shared/math/rsqrtf16.h
- Put rsqrtf16 inside of libc/src/__support/math/CMakeLists.txt in
sorted order
- Rearanged libc_math_function rsqrtf16 in Bazel to match alphabetical
order
Summary:
Previous change made us no longer link `exit` by default which implied
the RPC server. This is a required symbol for the loader. This will be
fixed later when I port the loader to LLVMOffload. For now just work
around it to fix the bots.
This PR includes only one of the fxdivi functions (rdivi). It uses a
polynomial function for initial approximation followed by 4
newton-raphson iterations to calculate the reciprocal and finally
multiplies the numerator with it to get the result.
---------
Signed-off-by: Shreeyash Pandey <shreeyash335@gmail.com>
Summary:
We use the infrastructure to stand up a pretend hosted environment on
the GPU. Part of that is calling exit codes and handling the callback.
Exiting from inside a GPU region is problematic as it actually relies on
a lot of GPU magic behind the scenes. This is at least *correct* now as
we use `quick_exit` on the CPU when the GPU calls `exit`. However,
calling `quick_exit` will interfere with instrumentation or benchmarking
that expects a nice teardown order. For normal execution we should do
the friendly option and let the loader utility clean everything up
manually.
Closes https://github.com/llvm/llvm-project/issues/153666
This patch introduces a new centralized AUXV (auxiliary vector) handling
mechanism for LLVM libc on Linux, replacing the previous scattered
implementation across multiple files.
## Key Changes:
### New Files:
- **libc/src/__support/OSUtil/linux/auxv.h**: New header library
providing
a clean interface for AUXV access with:
- `auxv::Entry` struct for AUXV entries (type and value)
- `auxv::Vector` class with iterator support for traversing AUXV
- `auxv::get()` function for retrieving specific AUXV values
- Thread-safe initialization with fallback mechanisms (prctl and
/proc/self/auxv)
### Modified Files:
1. **libc/src/__support/OSUtil/linux/CMakeLists.txt**:
- Added `auxv` header library declaration with proper dependencies:
- libc.hdr.fcntl_macros
- libc.src.__support.OSUtil.osutil
- libc.src.__support.common
- libc.src.__support.CPP.optional
- libc.src.__support.threads.callonce
2. **libc/config/linux/app.h**:
- Removed `AuxEntry` struct (moved to auxv.h as `auxv::Entry`)
- Removed `auxv_ptr` from `AppProperties` struct
- Simplified application properties structure
3. **libc/src/sys/auxv/linux/getauxval.cpp**:
- Completely refactored to use new auxv.h interface
- Removed ~200 lines of complex initialization code
- Simplified to just call `auxv::get()` function
- Removed dependencies to external symbols (mman, prctl, fcntl, read,
close, open)
4. **libc/src/sys/auxv/linux/CMakeLists.txt**:
- Updated dependencies to use new auxv header library
- Removed dependencies to external symbols (prctl, mman, fcntl, unistd,
etc.)
5. **libc/startup/linux/do_start.cpp**:
- Updated to use new `auxv::Vector` interface
- Changed from pointer-based to iterator-based AUXV traversal
- Updated field names (`aux_entry->id` → `aux_entry.type`,
`aux_entry->value` → `aux_entry.val`)
- Added call to `auxv::Vector::initialize_unsafe()` for early AUXV setup
6. **libc/startup/linux/CMakeLists.txt**:
- Added dependency on `libc.src.__support.OSUtil.linux.auxv`
Summary:
This RPC call does the final exiting. The callbacks were handled on the
GPU side and this is only 'valid' in the pretend mode where we treat the
GPU like a CPU program. Doing this keeps us from crashing and burning
if people continue using the program while this is running as `exit`
would tear down the offloading library in memory and lead to segfaults.
This just drops everything where it is and lets the process manager
clean it up for us.
#160404
- Implement POSIX function "faccessat"
- Remove redundant param in facessat syscall in access implementation,
faccessat syscall does not take a flags arg
Fast strlen implementations (naive wide-reads, SIMD-based, and
x86_64/aarch64-optimized versions) all may perform
technically-out-of-bound reads, which leads to reports under ASan,
HWASan (on ARM machines), and also TSan (which also has the capability
to detect heap out-of-bound reads). So, we need to explicitly disable
instrumentation in all three cases.
Tragically, Clang didn't support `[[gnu::no_sanitize]]` syntax until
recently, and since we're supporting both GCC and Clang, we have to
revert to `__attribute__` syntax.
Summary:
This unifies the interface to just be a bunch of `load` and `store`
functions that optionally accept a mask / indices for gathers and
scatters with masks.
I had to rename this from `load` and `store` because it conflicts with
the other version in `op_generic`. I might just work around that with a
trait instead.
Summary:
The libcxx and compiler-rt already install their headers according
to the triple if this option is enabled. We should do this by default so
these don't get mixed up when people potentially combine multiple
toolchains.
Summary:
This was originally kept separate so it didn't pollute the name space,
but now I'm thinking it's just easier to bundle it in with the default
interface. This means that we'll have a bit of extra code for people
using the server.h file to handle libc opcodes, but it's minimal (3
functions) and it simplifies this.
I'm doing this because I'm hoping to move the GPU tester binary to
liboffload which handles `libc` opcodes internally except these. This is
the easier option compared to adding a hook to register custom handlers
there.
Summary:
The AMDGPU hack can be removed, and we no longer need to skip 90% of the
`HandleLLVMOptions` if we work around NVPTX earlier. Simplifies the
interface by removing duplicated logic and keeps the GPU targets from
being weirdly divergent on some flags.
Summary:
I landed a change in clang that allows integral vectors to implicitly
convert to boolean ones. This means I can simplify the interface and
remove the need to cast to bool on every use. Also do some other
cleanups of the traits.
In change #146863 we moved definitions of preinit/init/fini arrays to
header but unintentionally moved outside of the namespace. Since the
namespace also controls the visibility (through LIBC_NAMESPACE_DECL), as
a consequence these symbols no longer have the hidden visibility which
changes the codegen from:
```
4: 4c11 ldr r4, [pc, #0x44] @ 0x4c <__libc_init_array+0x4c>
6: 4812 ldr r0, [pc, #0x48] @ 0x50 <__libc_init_array+0x50>
8: 447c add r4, pc
a: 4478 add r0, pc
c: 1b00 subs r0, r0, r4
```
to:
```
4: 4813 ldr r0, [pc, #0x4c] @ 0x54 <__libc_init_array+0x54>
6: 4914 ldr r1, [pc, #0x50] @ 0x58 <__libc_init_array+0x58>
8: 4478 add r0, pc
a: 4479 add r1, pc
c: 6804 ldr r4, [r0]
e: 6808 ldr r0, [r1]
10: 1b00 subs r0, r0, r4
```
The `ldr` will trigger a fault in case where these symbols aren't
pointing to a valid memory location which is sometimes the case when the
array is empty.