ControllerAccess provides an abstract interface for bidirectional RPC
between the executor (running JIT'd code) and the controller (containing
the llvm::orc::ExecutionSession). ControllerAccess implementations are
expected to implement IPC / RPC using a concrete communication method
(shared memory, pipes, sockets, native system IPC, etc).
Calls from executor to controller are made via callController, with
"handler tags" (addresses in the executor) specifying the target handler
in the controller. A handler must be associated in the controller with
the given tag for the call to succeed. This ensures that only registered
entry points in the controller can be used, and avoids leaking
controller addresses into the executor.
Calls in both directions are to "wrapper functions" that take a buffer
of bytes as input and return a buffer of bytes as output. In the ORC
runtime these must be `orc_rt_WrapperFunction`s (see
Session::handleWrapperCall). The interpretation of the byte buffers is
up to the wrapper functions: the ORC runtime imposes no restrictions on
how the bytes are to be interpreted.
ControllerAccess objects may be detached from the Session prior to
Session shutdown, in which case no further calls may be made in either
direction, and any pending results (from calls made that haven't
returned yet) should return errors. If the ControllerAccess class is
still attached at Session shutdown time it will be detached as part of
the shutdown process. The ControllerAccess::disconnect method must
support concurrent entry on multiple threads, and all callers must block
until they can guarantee that no further calls will be received or
accepted.
In the dataflow framework, the builtin transfer function currently only
handles the GLValue result case of ConditionalOperator when the
true and false expression StorageLocations are exactly the same.
Ideally / we have wanted to introduce alias sets to handle when the Locs
are different. However, that is a larger change to the framework
(and we may need to introduce weak updates).
For now, do something simpler to at least handle when the GLValue is
immediately cast to an RValue, by making up a distinct StorageLocation
that holds the join of the true and false expression values (when not a
record). This seems like the most common case, so seems worth covering.
The case when an LValue is needed and can be updated later (and
thus needs a link to the original storage locations) seems more rare,
and we currently do not handle such updates either, so this intermediate
step is no different (for that case).
This PR extends __scoped_atomic builtins with inc and dec functions.
They map to LLVM IR `atomicrmw uinc_wrap` and `atomicrmw udec_wrap`.
These enable implementation of OpenCL-style atomic_inc / atomic_dec with
wrap semantics on targets supporting scoped atomics (e.g. GPUs).
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Split from #158900 it adds a PerThreadContainer that can use STL-like
indexed containers based on a slightly refactored PerThreadTable.
---------
Co-authored-by: Joseph Huber <huberjn@outlook.com>
Enable it now that all of the tests pass under the internal shell. The
internal shell is slightly faster (10-15%) and also provides a better
debugging experience.
Reviewers: petrhosek, ilovepi, kadircet, HighCommander4
Reviewed By: ilovepi
Pull Request: https://github.com/llvm/llvm-project/pull/169540
This makes all of the clangd tests work with the internal shell.
Modifications needed for each test are as follows:
1. system-include-extractor.test was using variable expansion which is
not supported in the internal shell. This patch rewrites it to use
the readfile mechanism along with python. This isn't super pretty but
is readily understandable and there are only two tests across the
monorepo that use this construction, so making it prettier is hard to
justify.
2. include-cleaner-batch-fix.test - Was using $'' construction to create
new lines in a string. Simply replace it with multiple echo commands
to be canonical with the rest of the repository.
3. index-tools.test - Just add IndexBenchmark to the clangd test
depends, so the test now just works unconditionally. This should
significantly increase test coverage at little cost.
Reviewers: ilovepi, HighCommander4, petrhosek, kadircet
Reviewed By: ilovepi
Pull Request: https://github.com/llvm/llvm-project/pull/169539
This PR adds codegen for `cir.await` ready and suspend. One notable
difference from the classic codegen is that, in the suspend branch, it
emits an `AwaitSuspendWrapper`(`.__await_suspend_wrapper__init`)
function that is always inlined. This function wraps the suspend logic
inside an internal wrapper that gets inlined. Example here:
https://godbolt.org/z/rWYGcaaG4
Currently, -gsplit-dwarf and -mrelax are incompatible options in Clang.
The issue is that .dwo files should not contain any relocations, as they
are not processed by the linker. However, relaxable code emits
relocations in DWARF for debug ranges that reside in the .dwo file when
DWARF fission is enabled.
This patch makes DWARF fission compatible with RISC-V relaxations. It
uses the StartxEndx DWARF forms in .debug_rnglists.dwo, which allow
referencing addresses from .debug_addr instead of using absolute
addresses. This approach eliminates relocations from .dwo files.
Some globals (e.g., fir.global) have initialization regions that may
transitively reference other globals or type descriptors. Add
getInitRegion() to GlobalVariableOpInterface to retrieve these regions,
returning Region* (nullptr if the global uses attributes for
initialization, as with memref.global).
It is possible that a fork could occur while MutexTSDs is being held and
then cause a deadlock in a forked process when something attempts to
lock it again. Instead add it to the enable/disable list of mutexes.
Flags are now passed on construction/cloning. Remove unnecessary
transferFlags call, and make code independent of VPRecipeWithIRFlags, to
support additional recipes in the future.
When individual elements of a vector are updated via vector swizzle, it needs to be handled as separate store operations to the individual vector elements.
Clang treats vectors as one unit, so if a part of a vector needs to be updated, the whole vector is loaded, some elements modified, and then the whole vector is stored.
In HLSL vector elements are handled separately. We need to avoid this load/modify/store sequence to prevent overwriting other vector elements that might be getting updated in parallel.
Fixes#152815
This is noted by the specification, and should save a dynamic
instruction.
Code size should be no worse than before, as the pairs of moves can
usually be turned into two 16-bit moves, but `fmv.d` is always a 32-bit
instruction.
LLVM can look through a `FSGNJ_D_IN32X`, in
`RISCVInstrInfo::isCopyInstrImpl` which helps copy propagation.
Currently, the declarative pattern rewrite generator will always print
the [source]:[line](s) from which a pattern came. This is a useful
debugging hint, but it causes problem when absolute paths are used as
arguments to mlir-tblgen (which LLVM's build rules automatically do).
Specifially, it causes the source to be tied to the build location,
harning reproducability and our collective ability to get ccache hits
from, say, separate worktrees.
This commit resolves the issue by replacing absolute paths in thes
"Generated from:" comments with their filenames. (The alternative would
have been to implement an entire file-prefix-map the way the C compilers
do, but since this is an isolated incident, I chose to resolve it
locally.)
/usr/bin/ld: tools/mlir/lib/Dialect/GPU/Pipelines/CMakeFiles/obj.MLIRGP
UPipelines.dir/GPUToXeVMPipeline.cpp.o: in function `mlir::gpu::buildLo
werToXeVMPassPipeline(mlir::OpPassManager&, mlir::gpu::GPUToXeVMPipelin
eOptions const&)':
GPUToXeVMPipeline.cpp:(.text._ZN4mlir3gpu28buildLowerToXeVMPassPipeline
ERNS_13OpPassManagerERKNS0_24GPUToXeVMPipelineOptionsE+0x1293): undefin
ed reference to `mlir::createConvertVectorToLLVMPass()'
On targets that require even aligned 64-bit VGPRs, GWS operands
require even alignment of a 32-bit operand. Previously we had a hacky
post-processing which added an implicit operand to try to manage
the constraint. This would require special casing in other passes
to avoid breaking the operand constraint. This moves the handling
into the instruction definition, so other passes no longer need
to consider this edge case. MC still does need to special case this,
to print/parse as a 32-bit register. This also still ends up net
less work than introducing even aligned 32-bit register classes.
This also should be applied to the image special case.
I noticed that we had a hardcoded value of 4 for the pcrel section
relocations, which seems like an issue given that we recently added
support for 1-byte branch relocations in
https://github.com/llvm/llvm-project/pull/164439. The code included an
assert that the relevant relocation had the BYTE4 attribute, but that is
actually not enough to use a hardcoded value of 4: we need to assert
that the *other* `BYTE<n>` attributes are not set either.
However, since we did not support local branch relocations, that doesn't
seem to have mattered in practice. That said, local branch relocations
can be emitted by compilers, and ld64 does handle the 4-byte version of
them, so I've added support for it here.
ld64 actually seems to reject 1-byte section relocations, so the
questionable code is actually probably fine (minus the incorrect
assert). So we have two options: add an equivalent check in LLD, or just
support 1-byte local branch relocations. Supporting it actually requires
less code, so I've gone with that option here.
This patch adds parsing, semantic handling, and diagnostics for the
`OpenMP 6.1 fb_nullify` and` fb_preserve` fallback modifiers used with
the `need_device_ptr` map modifier.
…hese
We've finished all of the clauses/etc that we're going to use this
visitor for, so we can remove the SourceLocation we used just for that,
and replace all NYI with unreachables.
This hasn't been strictly necessary since c897c13dde.
Practically this makes little difference; we still enable IPRA
by default which implies this option. By removing this explicit
force, -enable-ipra=0 has the expected change in the pass pipeline
to remove the DummyCGSCC runs.