Commit Graph

124 Commits

Author SHA1 Message Date
Joseph Huber
eea62159e8 [Offload] Make the RPC thread sleep briefly when idle (#168596)
Summary:
We start this thread if the RPC client symbol is detected in the loaded
binary. We should make this sleep if there's no work to avoid the thread
running at high priority when the (scarecely used) RPC call is actually
required. So, right now after 25 microseconds we will assume the server
is inactive and begin sleeping. This resets once we do find work.

AMD supports a more intelligent way to do this. HSA signals can wake a
sleeping thread from the kernel, and signals can be sent from the GPU
side. This would be nice to have and I'm planning on working with it in
the future to make this infrastructure more usable with existing AMD
workloads.
2025-11-19 15:56:25 -06:00
Kevin Sala Penades
1a86f0aae7 [Offload] Add device info for shared memory (#167817) 2025-11-13 11:00:12 -08:00
Joseph Huber
aaddd8d38a [OpenMP] Fix tests relying on the heap size variable
Summary:
I made that an unimplemented error, but forgot that it was used for this
environment variable.
2025-11-06 13:00:26 -06:00
Joseph Huber
670c453aeb [Offload] Remove handling for device memory pool (#163629)
Summary:
This was a lot of code that was only used for upstream LLVM builds of
AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so
just use that. Simplifies code, can be added back if we start providing
alternate forms but I don't think there's a single use-case that would
justify it yet.
2025-11-06 10:15:18 -06:00
Robert Imschweiler
dc94f2cbad [Offload] Add device UID (#164391)
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of
a device on a given system. (Not necessarily a UUID.) Since it is not
guaranteed that the (U)UIDs defined by the device vendor libraries, such
as HSA, do not overlap with those of other vendors, the device UIDs in
offload are always combined with the offload plugin name. In case the
vendor library does not specify any device UID for a given device, we
fall back to the offload-internal device ID.
The device UID can be retrieved using the `llvm-offload-device-info`
tool.
2025-11-04 20:15:47 +01:00
Nicole Aschenbrenner
16641ad8a2 [OpenMP] Adds omp_target_is_accessible routine (#138294)
Adds omp_target_is_accessible routine.
Refactors common code from omp_target_is_present to work for both
routines.

---------

Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-10-22 17:35:16 +02:00
Alex Duran
45757b9284 [OFFLOAD] Remove unused init_device_info plugin interface (#162650)
This was used for the old interop code. It's dead code after #143491
2025-10-09 08:38:24 -05:00
Kevin Sala Penades
01d761a776 [Offload] Use Error for allocating/deallocating in plugins (#160811)
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-09-26 13:50:00 -05:00
Ross Brunton
e60a5733f0 [Offload] Print Image location rather than casting it (#160309)
This squishes a warning where the runtime tries to bind a StringRef to
a `%p`.
2025-09-24 10:57:55 +01:00
Alexey Sachkov
bb584644e9 [Offload][NFC] Avoid temporary string copies in InfoTreeNode (#159372) 2025-09-23 12:21:57 -05:00
Tobias Stadler
dfbd76bda0 [Remarks] Restructure bitstream remarks to be fully standalone (#156715)
Currently there are two serialization modes for bitstream Remarks:
standalone and separate. The separate mode splits remark metadata (e.g.
the string table) from actual remark data. The metadata is written into
the object file by the AsmPrinter, while the remark data is stored in a
separate remarks file. This means we can't use bitstream remarks with
tools like opt that don't generate an object file. Also, it is confusing
to post-process bitstream remarks files, because only the standalone
files can be read by llvm-remarkutil. We always need to use dsymutil
to convert the separate files to standalone files, which only works for
MachO. It is not possible for clang/opt to directly emit bitstream
remark files in standalone mode, because the string table can only be
serialized after all remarks were emitted.

Therefore, this change completely removes the separate serialization
mode. Instead, the remark string table is now always written to the end
of the remarks file. This requires us to tell the serializer when to
finalize remark serialization. This automatically happens when the
serializer goes out of scope. However, often the remark file goes out of
scope before the serializer is destroyed. To diagnose this, I have added
an assert to alert users that they need to explicitly call
finalizeLLVMOptimizationRemarks.

This change paves the way for further improvements to the remark
infrastructure, including more tooling (e.g. #159784), size optimizations
for bitstream remarks, and more.

Pull Request: https://github.com/llvm/llvm-project/pull/156715
2025-09-22 16:41:39 +01:00
Joseph Huber
23efc67e19 [Offload] Remove non-blocking allocation type (#159851)
Summary:
This was originally added in as a hack to work around CUDA's limitation
on allocation. The `libc` implementation now isn't even used for CUDA so
this code is never hit. Even if this case, this code never truly worked.

A true solution would be to use CUDA's virtual memory API instead to
allocate 2MiB slabs independenctly from the normal memory management
done in the stream.
2025-09-20 09:07:14 -05:00
Joseph Huber
51e3c3d51b [Offload] Implement 'olIsValidBinary' in offload and clean up (#159658)
Summary:
This exposes the 'isDeviceCompatible' routine for checking if a binary
*can* be loaded. This is useful if people don't want to consume errors
everywhere when figuring out which image to put to what device.

I don't know if this is a good name, I was thining like `olIsCompatible`
or whatever. Let me know what you think.

Long term I'd like to be able to do something similar to what OpenMP
does where we can conditionally only initialize devices if we need them.
That's going to be support needed if we want this to be more
generic.
2025-09-19 12:15:57 -05:00
Nick Sarnie
f74583fbe8 [offload] Fix build with debug libomptarget (#159144)
Currently get this error
```
offload/plugins-nextgen/common/src/PluginInterface.cpp:859:63: error: member reference type 'StringRef' is not a pointer; did you mean to use '.'?
```

We pass the full image binary now so we can't really print anything
useful here.

Seems introduced in https://github.com/llvm/llvm-project/pull/158748.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-09-16 18:40:02 +00:00
Joseph Huber
e7101dac9c [Offload] Copy loaded images into managed storage (#158748)
Summary:
Currently we have this `__tgt_device_image` indirection which just takes
a reference to some pointers. This was all find and good when the only
usage of this was from a section of GPU code that came from an ELF
constant section. However, we have expanded beyond that and now need to
worry about managing lifetimes. We have code that references the image
even after it was loaded internally. This patch changes the
implementation to instaed copy the memory buffer and manage it locally.

This PR reworks the JIT and other image handling to directly manage its
own memory. We now don't need to duplicate this behavior externally at
the Offload API level. Also we actually free these if the user unloads
them.

Upside, less likely to crash and burn. Downside, more latency when
loading an image.
2025-09-16 08:57:28 -05:00
Joseph Huber
5d550bf41c [OpenMP] Move `__omp_rtl_data_environment' handling to OpenMP (#157182)
Summary:
This operation is done every time we load a binary, this behavior should
be moved into OpenMP since it concerns an OpenMP specific data struct.
This is a little messy, because ideally we should only be using public
APIs, but more can be extracted later.
2025-09-08 09:58:38 -05:00
Ross Brunton
32beea0605 [OpenMP][Offload] Mark SPMD_NO_LOOP as a valid exec mode (#155990)
This was added in #154105 , but was not added to the plugin interface's
list of valid modes.
2025-09-01 11:27:24 +01:00
Ross Brunton
9e5d8bd3d1 [Offload] Improve olDestroyQueue logic (#153041)
Previously, `olDestroyQueue` would not actually destroy the queue,
instead leaving it for the device to clean up when it was destroyed.
Now, the queue is either released immediately if it is complete or put
into a list of "pending" queues if it is not. Whenever we create a new
queue, we check this list to see if any are now completed. If there are
any we release their resources and use them instead of pulling from
the pool.

This prevents long running programs that create and drop many queues
without syncing them from leaking memory all over the place.
2025-08-29 09:39:00 +01:00
Dominik Adamski
87db8e9130 [OpenMP][Offload] Add SPMD-No-Loop mode to OpenMP offload runtime (#154105)
Kernels which are marked as SPMD-No-Loop should be launched with
sufficient number of teams and threads to cover loop iteration space.

No-Loop mode is described in RFC:

https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517/
2025-08-28 09:19:14 +02:00
Callum Fare
0b18d2da70 [Offload] Implement olMemFill (#154102)
Implement olMemFill to support filling device memory with arbitrary
length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22 14:31:16 +01:00
Ross Brunton
4c0c295775 [Offload] OL_EVENT_INFO_IS_COMPLETE (#153194)
A simple info query for events that returns whether the event is
complete or not.
2025-08-22 13:40:31 +01:00
Ross Brunton
2c11a83691 [Offload] Add olCalculateOptimalOccupancy (#142950)
This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is
currently
only implemented on Cuda; AMDGPU and Host return unsupported.

---------

Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-19 15:16:47 +01:00
Abhinav Gaba
79cf877627 [Offload] Introduce dataFence plugin interface. (#153793)
The purpose of this fence is to ensure that any `dataSubmit`s inserted
into a queue before a `dataFence` finish before finish before any
`dataSubmit`s
inserted after it begin.

This is a no-op for most queues, since they are in-order, and by design
any operations inserted into them occur in order.

But the interface is supposed to be functional for out-of-order queues.

The addition of the interface means that any operations that rely on
such ordering (like ATTACH map-type support in #149036) can invoke it,
without worrying about whether the underlying queue is in-order or
out-of-order.

Once a plugin supports out-of-order queues, the plugin can implement
this function, without requiring any change at the libomptarget level.

---------

Co-authored-by: Alex Duran <alejandro.duran@intel.com>
2025-08-15 11:49:35 -07:00
Ross Brunton
30c7951136 [Offload] olLaunchHostFunction (#152482)
Add an `olLaunchHostFunction` method that allows enqueueing host work
to the stream.
2025-08-15 09:39:48 +01:00
Ross Brunton
910d7e90bf [Offload] Make olLaunchKernel test thread safe (#149497)
This sprinkles a few mutexes around the plugin interface so that the
olLaunchKernel CTS test now passes when ran on multiple threads.

Part of this also involved changing the interface for device synchronise
so that it can optionally not free the underlying queue (which
introduced a race condition in liboffload).
2025-08-08 10:57:04 +01:00
Ross Brunton
a44532544b [Offload] Don't create events for empty queues (#152304)
Add a device function to check if a device queue is empty. If liboffload
tries to create an event for an empty queue, we create an "empty" event
that is already complete.

This allows `olCreateEvent`, `olSyncEvent` and `olWaitEvent` to run
quickly for empty queues.
2025-08-07 10:16:33 +01:00
hidekisaito
83e5a99ff6 [AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc (#151882)
Enables AMD data center class GPUs to use memory manager memory pooling
up to 3GB allocation by default, up from the "1 << 13" threshold that
all plugin-nextgen devices use.
2025-08-06 14:41:20 -07:00
Alex Duran
f092b820d1 [OFFLOAD] Fix typo in assert (#152316)
Fixes an issue introduced by PR https://github.com/llvm/llvm-project/pull/143491.
2025-08-06 17:01:47 +02:00
Alex Duran
66d1c37eb6 [OFFLOAD][OPENMP] 6.0 compatible interop interface (#143491)
The following patch introduces a new interop interface implementation
with the following characteristics:

* It supports the new 6.0 prefer_type specification
* It supports both explicit objects (from interop constructs) and
implicit objects (from variant calls).
* Implements a per-thread reuse mechanism for implicit objects to reduce
overheads.
* It provides a plugin interface that allows selecting the supported
interop types, and managing all the backend related interop operations
(init, sync, ...).
* It enables cooperation with the OpenMP runtime to allow progress on
OpenMP synchronizations.
* It cleanups some vendor/fr_id mismatchs from the current query
routines.
* It supports extension to define interop callbacks for library cleanup.
2025-08-06 16:34:39 +02:00
Ross Brunton
ae44418f28 [Offload] Erase entries from JIT cache when program is destroyed (#148847)
When `unloadBinary` is called, any entries in the JITEngine's cache
for that binary will be cleared. This fixes a nasty issue with
liboffload program handles. If two handles happen to have had the same
address (after one was free'd, for example), the cache would be hit and
return the wrong program.
2025-07-25 16:11:30 +01:00
Joseph Huber
b53be5f4b2 [LLVM] Update CUDA ELF flags for their new ABI (#149534)
Summary:
We rely on these flags to do things in the runtime and print the
contents of binaries correctly. CUDA updated their ABI encoding recently
and we didn't handle that. it's a new ABI entirely so we just select on
it when it shows up.

Fixes: https://github.com/llvm/llvm-project/issues/148703
2025-07-21 14:38:03 -05:00
Ross Brunton
311847be4c [Offload] Allow "tagging" device info entries with offload keys (#147317)
When generating the device info tree, nodes can be marked with an
offload Device Info value. The nodes can also look up children based
on this value.
2025-07-18 14:27:34 +01:00
Ross Brunton
abb878438a [Offload] Allow querying the size of globals (#147698)
The `GlobalTy` helper has been extended to make both the Size and Ptr be
optional. Now `getGlobalMetadataFromDevice`/`Image` is able to write the
size of the global to the struct, instead of just verifying it.
2025-07-10 12:05:31 +01:00
Ross Brunton
8c06d0e547 [Offload] Generate OffloadInfo.inc (#147316)
This is a generated file which contains a macro for all Device Info
keys. This is visible to the plugin interface so that it can use the
definitions in a future patch.
2025-07-09 11:35:22 +01:00
Ross Brunton
8e104d69fc [Offload] Provide proper memory management for Images on host device (#146066)
The `unloadBinaryImpl` method on the host plugin is now implemented
properly (rather than just being a stub). When an image is unloaded,
it is deallocated and the library associated with it is closed.
2025-07-08 12:42:06 +01:00
Ross Brunton
4f02965ae2 [Offload] Store kernel name in GenericKernelTy (#142799)
GenericKernelTy has a pointer to the name that was used to create it.
However, the name passed in as an argument may not outlive the kernel.
Instead, GenericKernelTy now contains a std::string, and copies the
name into there.
2025-07-02 14:11:05 +01:00
Ross Brunton
0870c8838b [Offload] Add an unloadBinary interface to PluginInterface (#143873)
This allows removal of a specific Image from a Device, rather than
requiring all image data to outlive the device they were created for.

This is required for `ol_program_handle_t`s, which now specify the
lifetime of the buffer used to create the program.
2025-06-25 14:53:18 +01:00
Ross Brunton
4359e55838 [Offload] Properly report errors when jit compiling (#145498)
Previously, if a binary failed to load due to failures when jit
compiling, the function would return success with nullptr. Now it
returns a new plugin error, `COMPILE_FAILURE`.
2025-06-24 16:27:12 +01:00
Ross Brunton
f242360e15 [Offload] Add type information to device info nodes (#144535)
Rather than being "stringly typed", store values as a std::variant that
can hold various types. This means that liboffload doesn't have to do
any string parsing for integer/bool device info keys.
2025-06-20 09:05:05 -05:00
Ross Brunton
e6a3579653 [Offload] Replace device info queue with a tree (#144050)
Previously, device info was returned as a queue with each element having
a "Level" field indicating its nesting level. This replaces this queue
with a more traditional tree-like structure.

This should not result in a change to the output of
`llvm-offload-device-info`.
2025-06-13 09:22:47 -05:00
Ethan Luis McDonough
67ff66e677 [PGO][Offload] Fix offload coverage mapping (#143490)
This pull request fixes coverage mapping on GPU targets. 

- It adds an address space cast to the coverage mapping generation pass.
- It reads the profiled function names from the ELF directly. Reading it
from public globals was causing issues in cases where multiple
device-code object files are linked together.
2025-06-10 20:19:38 -05:00
Callum Fare
f44df93a9c [Offload] Explicitly create directories that contain tablegen output (#142817)
This isn't required when building with Ninja, but with the Makefile
generator these directories don't get implicitly created.
2025-06-04 13:46:19 -05:00
Callum Fare
b78bc35d16 [Offload] Don't check in generated files (#141982)
Previously we decided to check in files that we generate with tablegen.
The justification at the time was that it helped reviewers unfamiliar
with `offload-tblgen` see the actual changes to the headers in PRs.
After trying it for a while, it's ended up causing some headaches and is
also not how tablegen is used elsewhere in LLVM.

This changes our use of tablegen to be more conventional. Where
possible, files are still clang-formatted, but this is no longer a hard
requirement. Because `OffloadErrcodes.inc` is shared with libomptarget
it now gets generated in a more appropriate place.
2025-06-03 10:39:04 -05:00
Joseph Huber
eb9ed93fce [Offload] Optimistically accept SM architectures (#142399)
Summary:
We try to clamp these to ones known to work, but we should probably just
optimistically accept these. I'd prefer to update the flag check, but
since NVIDIA refuses to publish their ELF format it's too much effort to
reverse engineer.

Fixes: https://github.com/llvm/llvm-project/issues/138532
2025-06-02 14:32:05 -05:00
Ross Brunton
050892d2f8 [Offload] Use new error code handling mechanism and lower-case messages (#139275)
[Offload] Use new error code handling mechanism

This removes the old ErrorCode-less error method and requires
every user to provide a concrete error code. All calls have been
updated.

In addition, for consistency with error messages elsewhere in LLVM, all
messages have been made to start lower case.
2025-05-20 08:50:20 -05:00
Ross Brunton
1532ee6916 [Offload] Add Error Codes to PluginInterface (#138258)
A new ErrorCode enumeration is present in PluginInterface which can
be used when returning an llvm::Error from offload and PluginInterface
functions.

This enum must be kept up to sync with liboffload's ol_errc_t enum, so
both are automatically generated from liboffload's enum definition.

Some error codes have also been shuffled around to allow for future
work. Note that this patch only adds the machinery; actual error codes
will be added in a future patch.

~~Depends on #137339 , please ignore first commit of this MR.~~ This has
been merged.
2025-05-19 09:38:34 -05:00
Dhruva Chakrabarti
f965996cfb [Offload] Remove unused field IsBareKernel. (#139815) 2025-05-13 17:35:55 -07:00
Joseph Huber
92bba68634 [Offload] Fix handling of 'bare' mode when environment missing (#136794)
Summary:
We treated the missing kernel environment as a unique mode, but it was
kind of this random bool that was doing the same thing and it explicitly
expects the kernel environment to be zero. It broke after the previous
change since it used to default to SPMD and didn't handle zero in any of
the other cases despite being used. This fixes that and queries for it
without needing to consume an error.
2025-04-23 08:16:39 -05:00
Joseph Huber
25bf4e262c [Offload] Remove handling for COV4 binaries from offload/ (#131033)
Summary:
We moved from cov4 to cov5 a long time ago, and it guards simplifying
some front end code, so we should be able to move up with this.
2025-03-24 18:58:20 -05:00
Ethan Luis McDonough
c50d39f073 [PGO][Offload] Allow PGO flags to be used on GPU targets (#94268)
This pull request is the third part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on
https://github.com/llvm/llvm-project/pull/93365. This PR makes the
following changes:

- Allows PGO flags to be supplied to GPU targets
- Pulls version global from device
- Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to
allow the PGO version to be overridden
2025-03-19 19:01:38 -05:00