In 42622c7959, depending on the entire
utility library resulted in the lldb binary having more symbols that it
needs, duplicating those in liblldb, leading to odr violations. Now
instead we just expose the Host library's headers, which happens to be
enough for this to compile and link, referencing the symbols from
liblldb
Make sure all BUILD.bazel files under llvm-libc are specifying
package-level default_visibility and licenses.
Add top-level license notice and comment to new BUILD.bazel for
complex.h functions.
Since 6493345c5a the utility library is
needed by the driver. Since liblldb's exports are limited with
-exports_symbols_list on macOS, some symbols like
`__ZN12SelectHelper10FDSetWriteEi` are not exported from liblldb and
therefore cause linker failures on macOS only. In the cmake the driver
now depends on the full utility library, even though that leads to some
duplication of symbols, so it should be safe for us to do in bazel as
well.
This is the recommended way in bazel to differentiate between files that
require arc and those that require it be disabled. This matters
depending on the toolchain since the order of these flags may not have
been correct and we were relying on overwriting the default.
Load/unload GPU modules in global ctors/dtors instead of each time when
launching a kernel.
Loading GPU modules is a heavy-weight operation and synchronizes the GPU
context. Now that the modules are loaded ahead of time, asynchronously
launched kernels can run concurrently, see
https://discourse.llvm.org/t/how-to-lower-the-combination-of-async-gpu-ops-in-gpu-dialect.
The implementations of `embedBinary()` and `launchKernel()` use slightly
different mechanics at the moment but I prefer to not change the latter
more than necessary as part of this PR. I will prepare a follow-up NFC
for `launchKernel()` to align them again.
Create "public_header_deps" that is a convenient way to express
dependencies of a generated headers as a single (and same) target. It's
also convenient to use it in unit tests - which is also demonstrated in
this PR by adding the BUILD.bazel placeholder for test/include unit
tests, and creating a libc_test target for one of these tests.
See issue #134780.
- Add command line option `num-to-skip-size` to parameterize the size of
`NumToSkip` bytes in the decoder table. Default value will be 2, and
targets that need larger size can use 3.
- Keep all existing targets, except AArch64, to use size 2, and change
AArch64 to use size 3 since it run into the "disassembler decoding table
too large" error with size 2.
- Additional fixes on top of earlier revert: mark `decodeNumToSkip` as
static (not necessary anymore as the generated code is now in anonymous
namespace, but doing it for consistency) and incorporate Bazel build
changes from https://github.com/llvm/llvm-project/pull/136212
- Following is a rough reduction in size for the decoder tables by
switching to size 2.
```
Target Old Size New Size % Reduction
================================================
AArch64 153254 153254 0.00
AMDGPU 471566 412805 12.46
ARC 5724 5061 11.58
ARM 84936 73831 13.07
AVR 1497 1306 12.76
BPF 2172 1927 11.28
CSKY 10064 8692 13.63
Hexagon 47967 41965 12.51
Lanai 1108 982 11.37
LoongArch 24446 21621 11.56
MSP430 4200 3716 11.52
Mips 36330 31415 13.53
PPC 31897 28098 11.91
RISCV 37979 32790 13.66
Sparc 8331 7252 12.95
SystemZ 36722 32248 12.18
VE 48296 42873 11.23
XCore 2590 2316 10.58
Xtensa 3827 3316 13.35
```
Extend Bazel rule implementation to enforce that all transitive
dependencies of libc_header_library targets (used to implement
hand-in-hand code sharing via headers) indeed only contain header files.
This fixes Bazel portion of PR #133126.
`delta_bytes % (32 ceilDiv elementBitwidth) != 0` condition is incorrect
in https://github.com/llvm/llvm-project/pull/135014
For example, last load is issued to load only one last element of fp16.
Then `delta bytes = 2`, `(32 ceildiv 16) = 2`. In this case it will be
judged as word aligned. It will send to fast path but get all zeros for
the fp16 because it cross the word boundary.
In reality the equation should be just `delta_bytes % 4` , since a word
is 4 bytes. This PR fix the bug by amending the mod target to 4.
The 1:N dialect conversion driver has been deprecated. Use the regular
dialect conversion driver instead. This commit deletes the 1:N dialect
conversion driver.
Note for LLVM integration: If you are already using the regular dialect conversion, but still have argument materializations in your code base, simply delete all `addArgumentMaterialization` calls.
For details, see
https://discourse.llvm.org/t/rfc-merging-1-1-and-1-n-dialect-conversions/82513.
libc_function_deps and deps are now identical, as we no longer need or
have special treatment for libc_function targets. Merge these attributes
passed to the libc_test macro, and fix all relevant libc_test macro
invocations. This change is a no-op.
This concludes cleanup started in
9b13d34530.
Motivation: amdgpu buffer load instruction will return all zeros when
loading sub-word values. For example, assuming the buffer size is
exactly one word and we attempt to invoke
`llvm.amdgcn.raw.ptr.buffer.load.v2i32` starting from byte 2 of the
word, we will not receive the actual value of the buffer but all zeros
for the first word. This is because the boundary has been crossed for
the first word.
This PR come up with a fix to this problem, such that, it creates a
bounds check against the buffer load instruction. It will compare the
offset + vector size to see if the upper bound of the address will
exceed the buffer size. If it does, masked transfer read will be
optimized to `vector.load` + `arith.select`, else, it will continue to
fall back to default lowering of the masked vector load.
This macro is a no-op after 90c001ac9e:
libc_function macro now produce a "regular" cc_library target, without
modifying its name, and this target is intended to only be used in
tests.
Thus, libc_internal_target macro is no longer needed, and we can safely
treat libc_function rules and libc_support_library rules identically for
test purposes.
`libc_function_deps` attribute of a `libc_test` macro can also be
cleaned up, but I plan to do this in a subsequent change.
Instead of creating hundreds of implicit "filegroup" targets to keep
track of sources and textual headers required to build each libc
function or helper library, use Bazel aspects (see
https://bazel.build/versions/8.0.0/extending/aspects), which enable
transparent collection of transitive sources / textual headers while
walking the dependency DAG, and minimizes the Starlark overhead.
Co-authored-by: Jordan Rupprecht <rupprecht@google.com>
Replaces separate x86vector named intrinsic operations with direct calls
to LLVM intrinsic functions.
This rework reduces the number of named ops leaving only high-level MLIR
equivalents of whole intrinsic classes e.g., variants of AVX512 dot on
BF16 inputs. Dialect conversion applies LLVM intrinsic name mangling
further simplifying lowering logic.
The separate conversion step translating x86vector intrinsics into LLVM
IR is also eliminated. Instead, this step is now performed by the
existing llvm dialect infrastructure.
RFC:
https://discourse.llvm.org/t/rfc-simplify-x86-intrinsic-generation/85581