intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-13 19:08:21 +08:00

Author	SHA1	Message	Date
Joseph Huber	eea62159e8	[Offload] Make the RPC thread sleep briefly when idle (#168596 ) Summary: We start this thread if the RPC client symbol is detected in the loaded binary. We should make this sleep if there's no work to avoid the thread running at high priority when the (scarecely used) RPC call is actually required. So, right now after 25 microseconds we will assume the server is inactive and begin sleeping. This resets once we do find work. AMD supports a more intelligent way to do this. HSA signals can wake a sleeping thread from the kernel, and signals can be sent from the GPU side. This would be nice to have and I'm planning on working with it in the future to make this infrastructure more usable with existing AMD workloads.	2025-11-19 15:56:25 -06:00
Kevin Sala Penades	1a86f0aae7	[Offload] Add device info for shared memory (#167817 )	2025-11-13 11:00:12 -08:00
Joseph Huber	aaddd8d38a	[OpenMP] Fix tests relying on the heap size variable Summary: I made that an unimplemented error, but forgot that it was used for this environment variable.	2025-11-06 13:00:26 -06:00
Joseph Huber	670c453aeb	[Offload] Remove handling for device memory pool (#163629 ) Summary: This was a lot of code that was only used for upstream LLVM builds of AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so just use that. Simplifies code, can be added back if we start providing alternate forms but I don't think there's a single use-case that would justify it yet.	2025-11-06 10:15:18 -06:00
Robert Imschweiler	dc94f2cbad	[Offload] Add device UID (#164391 ) Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.	2025-11-04 20:15:47 +01:00
Nicole Aschenbrenner	16641ad8a2	[OpenMP] Adds omp_target_is_accessible routine (#138294 ) Adds omp_target_is_accessible routine. Refactors common code from omp_target_is_present to work for both routines. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-10-22 17:35:16 +02:00
Alex Duran	45757b9284	[OFFLOAD] Remove unused init_device_info plugin interface (#162650 ) This was used for the old interop code. It's dead code after #143491	2025-10-09 08:38:24 -05:00
Kevin Sala Penades	01d761a776	[Offload] Use Error for allocating/deallocating in plugins (#160811 ) Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-09-26 13:50:00 -05:00
Ross Brunton	e60a5733f0	[Offload] Print Image location rather than casting it (#160309 ) This squishes a warning where the runtime tries to bind a StringRef to a `%p`.	2025-09-24 10:57:55 +01:00
Alexey Sachkov	bb584644e9	[Offload][NFC] Avoid temporary string copies in InfoTreeNode (#159372 )	2025-09-23 12:21:57 -05:00
Tobias Stadler	dfbd76bda0	[Remarks] Restructure bitstream remarks to be fully standalone (#156715 ) Currently there are two serialization modes for bitstream Remarks: standalone and separate. The separate mode splits remark metadata (e.g. the string table) from actual remark data. The metadata is written into the object file by the AsmPrinter, while the remark data is stored in a separate remarks file. This means we can't use bitstream remarks with tools like opt that don't generate an object file. Also, it is confusing to post-process bitstream remarks files, because only the standalone files can be read by llvm-remarkutil. We always need to use dsymutil to convert the separate files to standalone files, which only works for MachO. It is not possible for clang/opt to directly emit bitstream remark files in standalone mode, because the string table can only be serialized after all remarks were emitted. Therefore, this change completely removes the separate serialization mode. Instead, the remark string table is now always written to the end of the remarks file. This requires us to tell the serializer when to finalize remark serialization. This automatically happens when the serializer goes out of scope. However, often the remark file goes out of scope before the serializer is destroyed. To diagnose this, I have added an assert to alert users that they need to explicitly call finalizeLLVMOptimizationRemarks. This change paves the way for further improvements to the remark infrastructure, including more tooling (e.g. #159784), size optimizations for bitstream remarks, and more. Pull Request: https://github.com/llvm/llvm-project/pull/156715	2025-09-22 16:41:39 +01:00
Joseph Huber	23efc67e19	[Offload] Remove non-blocking allocation type (#159851 ) Summary: This was originally added in as a hack to work around CUDA's limitation on allocation. The `libc` implementation now isn't even used for CUDA so this code is never hit. Even if this case, this code never truly worked. A true solution would be to use CUDA's virtual memory API instead to allocate 2MiB slabs independenctly from the normal memory management done in the stream.	2025-09-20 09:07:14 -05:00
Joseph Huber	51e3c3d51b	[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658 ) Summary: This exposes the 'isDeviceCompatible' routine for checking if a binary can be loaded. This is useful if people don't want to consume errors everywhere when figuring out which image to put to what device. I don't know if this is a good name, I was thining like `olIsCompatible` or whatever. Let me know what you think. Long term I'd like to be able to do something similar to what OpenMP does where we can conditionally only initialize devices if we need them. That's going to be support needed if we want this to be more generic.	2025-09-19 12:15:57 -05:00
Nick Sarnie	f74583fbe8	[offload] Fix build with debug libomptarget (#159144 ) Currently get this error ``` offload/plugins-nextgen/common/src/PluginInterface.cpp:859:63: error: member reference type 'StringRef' is not a pointer; did you mean to use '.'? ``` We pass the full image binary now so we can't really print anything useful here. Seems introduced in https://github.com/llvm/llvm-project/pull/158748. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-09-16 18:40:02 +00:00
Joseph Huber	e7101dac9c	[Offload] Copy loaded images into managed storage (#158748 ) Summary: Currently we have this `__tgt_device_image` indirection which just takes a reference to some pointers. This was all find and good when the only usage of this was from a section of GPU code that came from an ELF constant section. However, we have expanded beyond that and now need to worry about managing lifetimes. We have code that references the image even after it was loaded internally. This patch changes the implementation to instaed copy the memory buffer and manage it locally. This PR reworks the JIT and other image handling to directly manage its own memory. We now don't need to duplicate this behavior externally at the Offload API level. Also we actually free these if the user unloads them. Upside, less likely to crash and burn. Downside, more latency when loading an image.	2025-09-16 08:57:28 -05:00
Joseph Huber	5d550bf41c	[OpenMP] Move `__omp_rtl_data_environment' handling to OpenMP (#157182 ) Summary: This operation is done every time we load a binary, this behavior should be moved into OpenMP since it concerns an OpenMP specific data struct. This is a little messy, because ideally we should only be using public APIs, but more can be extracted later.	2025-09-08 09:58:38 -05:00
Ross Brunton	32beea0605	[OpenMP][Offload] Mark `SPMD_NO_LOOP` as a valid exec mode (#155990 ) This was added in #154105 , but was not added to the plugin interface's list of valid modes.	2025-09-01 11:27:24 +01:00
Ross Brunton	9e5d8bd3d1	[Offload] Improve `olDestroyQueue` logic (#153041 ) Previously, `olDestroyQueue` would not actually destroy the queue, instead leaving it for the device to clean up when it was destroyed. Now, the queue is either released immediately if it is complete or put into a list of "pending" queues if it is not. Whenever we create a new queue, we check this list to see if any are now completed. If there are any we release their resources and use them instead of pulling from the pool. This prevents long running programs that create and drop many queues without syncing them from leaking memory all over the place.	2025-08-29 09:39:00 +01:00
Dominik Adamski	87db8e9130	[OpenMP][Offload] Add SPMD-No-Loop mode to OpenMP offload runtime (#154105 ) Kernels which are marked as SPMD-No-Loop should be launched with sufficient number of teams and threads to cover loop iteration space. No-Loop mode is described in RFC: https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517/	2025-08-28 09:19:14 +02:00
Callum Fare	0b18d2da70	[Offload] Implement olMemFill (#154102 ) Implement olMemFill to support filling device memory with arbitrary length patterns. AMDGPU support will be added in a follow-up PR.	2025-08-22 14:31:16 +01:00
Ross Brunton	4c0c295775	[Offload] `OL_EVENT_INFO_IS_COMPLETE` (#153194 ) A simple info query for events that returns whether the event is complete or not.	2025-08-22 13:40:31 +01:00
Ross Brunton	2c11a83691	[Offload] Add olCalculateOptimalOccupancy (#142950 ) This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is currently only implemented on Cuda; AMDGPU and Host return unsupported. --------- Co-authored-by: Callum Fare <callum@codeplay.com>	2025-08-19 15:16:47 +01:00
Abhinav Gaba	79cf877627	[Offload] Introduce dataFence plugin interface. (#153793 ) The purpose of this fence is to ensure that any `dataSubmit`s inserted into a queue before a `dataFence` finish before finish before any `dataSubmit`s inserted after it begin. This is a no-op for most queues, since they are in-order, and by design any operations inserted into them occur in order. But the interface is supposed to be functional for out-of-order queues. The addition of the interface means that any operations that rely on such ordering (like ATTACH map-type support in #149036) can invoke it, without worrying about whether the underlying queue is in-order or out-of-order. Once a plugin supports out-of-order queues, the plugin can implement this function, without requiring any change at the libomptarget level. --------- Co-authored-by: Alex Duran <alejandro.duran@intel.com>	2025-08-15 11:49:35 -07:00
Ross Brunton	30c7951136	[Offload] `olLaunchHostFunction` (#152482 ) Add an `olLaunchHostFunction` method that allows enqueueing host work to the stream.	2025-08-15 09:39:48 +01:00
Ross Brunton	910d7e90bf	[Offload] Make olLaunchKernel test thread safe (#149497 ) This sprinkles a few mutexes around the plugin interface so that the olLaunchKernel CTS test now passes when ran on multiple threads. Part of this also involved changing the interface for device synchronise so that it can optionally not free the underlying queue (which introduced a race condition in liboffload).	2025-08-08 10:57:04 +01:00
Ross Brunton	a44532544b	[Offload] Don't create events for empty queues (#152304 ) Add a device function to check if a device queue is empty. If liboffload tries to create an event for an empty queue, we create an "empty" event that is already complete. This allows `olCreateEvent`, `olSyncEvent` and `olWaitEvent` to run quickly for empty queues.	2025-08-07 10:16:33 +01:00
hidekisaito	83e5a99ff6	[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc (#151882 ) Enables AMD data center class GPUs to use memory manager memory pooling up to 3GB allocation by default, up from the "1 << 13" threshold that all plugin-nextgen devices use.	2025-08-06 14:41:20 -07:00
Alex Duran	f092b820d1	[OFFLOAD] Fix typo in assert (#152316 ) Fixes an issue introduced by PR https://github.com/llvm/llvm-project/pull/143491.	2025-08-06 17:01:47 +02:00
Alex Duran	66d1c37eb6	[OFFLOAD][OPENMP] 6.0 compatible interop interface (#143491 ) The following patch introduces a new interop interface implementation with the following characteristics: * It supports the new 6.0 prefer_type specification * It supports both explicit objects (from interop constructs) and implicit objects (from variant calls). * Implements a per-thread reuse mechanism for implicit objects to reduce overheads. * It provides a plugin interface that allows selecting the supported interop types, and managing all the backend related interop operations (init, sync, ...). * It enables cooperation with the OpenMP runtime to allow progress on OpenMP synchronizations. * It cleanups some vendor/fr_id mismatchs from the current query routines. * It supports extension to define interop callbacks for library cleanup.	2025-08-06 16:34:39 +02:00
Ross Brunton	ae44418f28	[Offload] Erase entries from JIT cache when program is destroyed (#148847 ) When `unloadBinary` is called, any entries in the JITEngine's cache for that binary will be cleared. This fixes a nasty issue with liboffload program handles. If two handles happen to have had the same address (after one was free'd, for example), the cache would be hit and return the wrong program.	2025-07-25 16:11:30 +01:00
Joseph Huber	b53be5f4b2	[LLVM] Update CUDA ELF flags for their new ABI (#149534 ) Summary: We rely on these flags to do things in the runtime and print the contents of binaries correctly. CUDA updated their ABI encoding recently and we didn't handle that. it's a new ABI entirely so we just select on it when it shows up. Fixes: https://github.com/llvm/llvm-project/issues/148703	2025-07-21 14:38:03 -05:00
Ross Brunton	311847be4c	[Offload] Allow "tagging" device info entries with offload keys (#147317 ) When generating the device info tree, nodes can be marked with an offload Device Info value. The nodes can also look up children based on this value.	2025-07-18 14:27:34 +01:00
Ross Brunton	abb878438a	[Offload] Allow querying the size of globals (#147698 ) The `GlobalTy` helper has been extended to make both the Size and Ptr be optional. Now `getGlobalMetadataFromDevice`/`Image` is able to write the size of the global to the struct, instead of just verifying it.	2025-07-10 12:05:31 +01:00
Ross Brunton	8c06d0e547	[Offload] Generate OffloadInfo.inc (#147316 ) This is a generated file which contains a macro for all Device Info keys. This is visible to the plugin interface so that it can use the definitions in a future patch.	2025-07-09 11:35:22 +01:00
Ross Brunton	8e104d69fc	[Offload] Provide proper memory management for Images on host device (#146066 ) The `unloadBinaryImpl` method on the host plugin is now implemented properly (rather than just being a stub). When an image is unloaded, it is deallocated and the library associated with it is closed.	2025-07-08 12:42:06 +01:00
Ross Brunton	4f02965ae2	[Offload] Store kernel name in GenericKernelTy (#142799 ) GenericKernelTy has a pointer to the name that was used to create it. However, the name passed in as an argument may not outlive the kernel. Instead, GenericKernelTy now contains a std::string, and copies the name into there.	2025-07-02 14:11:05 +01:00
Ross Brunton	0870c8838b	[Offload] Add an `unloadBinary` interface to PluginInterface (#143873 ) This allows removal of a specific Image from a Device, rather than requiring all image data to outlive the device they were created for. This is required for `ol_program_handle_t`s, which now specify the lifetime of the buffer used to create the program.	2025-06-25 14:53:18 +01:00
Ross Brunton	4359e55838	[Offload] Properly report errors when jit compiling (#145498 ) Previously, if a binary failed to load due to failures when jit compiling, the function would return success with nullptr. Now it returns a new plugin error, `COMPILE_FAILURE`.	2025-06-24 16:27:12 +01:00
Ross Brunton	f242360e15	[Offload] Add type information to device info nodes (#144535 ) Rather than being "stringly typed", store values as a std::variant that can hold various types. This means that liboffload doesn't have to do any string parsing for integer/bool device info keys.	2025-06-20 09:05:05 -05:00
Ross Brunton	e6a3579653	[Offload] Replace device info queue with a tree (#144050 ) Previously, device info was returned as a queue with each element having a "Level" field indicating its nesting level. This replaces this queue with a more traditional tree-like structure. This should not result in a change to the output of `llvm-offload-device-info`.	2025-06-13 09:22:47 -05:00
Ethan Luis McDonough	67ff66e677	[PGO][Offload] Fix offload coverage mapping (#143490 ) This pull request fixes coverage mapping on GPU targets. - It adds an address space cast to the coverage mapping generation pass. - It reads the profiled function names from the ELF directly. Reading it from public globals was causing issues in cases where multiple device-code object files are linked together.	2025-06-10 20:19:38 -05:00
Callum Fare	f44df93a9c	[Offload] Explicitly create directories that contain tablegen output (#142817 ) This isn't required when building with Ninja, but with the Makefile generator these directories don't get implicitly created.	2025-06-04 13:46:19 -05:00
Callum Fare	b78bc35d16	[Offload] Don't check in generated files (#141982 ) Previously we decided to check in files that we generate with tablegen. The justification at the time was that it helped reviewers unfamiliar with `offload-tblgen` see the actual changes to the headers in PRs. After trying it for a while, it's ended up causing some headaches and is also not how tablegen is used elsewhere in LLVM. This changes our use of tablegen to be more conventional. Where possible, files are still clang-formatted, but this is no longer a hard requirement. Because `OffloadErrcodes.inc` is shared with libomptarget it now gets generated in a more appropriate place.	2025-06-03 10:39:04 -05:00
Joseph Huber	eb9ed93fce	[Offload] Optimistically accept SM architectures (#142399 ) Summary: We try to clamp these to ones known to work, but we should probably just optimistically accept these. I'd prefer to update the flag check, but since NVIDIA refuses to publish their ELF format it's too much effort to reverse engineer. Fixes: https://github.com/llvm/llvm-project/issues/138532	2025-06-02 14:32:05 -05:00
Ross Brunton	050892d2f8	[Offload] Use new error code handling mechanism and lower-case messages (#139275 ) [Offload] Use new error code handling mechanism This removes the old ErrorCode-less error method and requires every user to provide a concrete error code. All calls have been updated. In addition, for consistency with error messages elsewhere in LLVM, all messages have been made to start lower case.	2025-05-20 08:50:20 -05:00
Ross Brunton	1532ee6916	[Offload] Add Error Codes to PluginInterface (#138258 ) A new ErrorCode enumeration is present in PluginInterface which can be used when returning an llvm::Error from offload and PluginInterface functions. This enum must be kept up to sync with liboffload's ol_errc_t enum, so both are automatically generated from liboffload's enum definition. Some error codes have also been shuffled around to allow for future work. Note that this patch only adds the machinery; actual error codes will be added in a future patch. ~~Depends on #137339 , please ignore first commit of this MR.~~ This has been merged.	2025-05-19 09:38:34 -05:00
Dhruva Chakrabarti	f965996cfb	[Offload] Remove unused field IsBareKernel. (#139815 )	2025-05-13 17:35:55 -07:00
Joseph Huber	92bba68634	[Offload] Fix handling of 'bare' mode when environment missing (#136794 ) Summary: We treated the missing kernel environment as a unique mode, but it was kind of this random bool that was doing the same thing and it explicitly expects the kernel environment to be zero. It broke after the previous change since it used to default to SPMD and didn't handle zero in any of the other cases despite being used. This fixes that and queries for it without needing to consume an error.	2025-04-23 08:16:39 -05:00
Joseph Huber	25bf4e262c	[Offload] Remove handling for COV4 binaries from offload/ (#131033 ) Summary: We moved from cov4 to cov5 a long time ago, and it guards simplifying some front end code, so we should be able to move up with this.	2025-03-24 18:58:20 -05:00
Ethan Luis McDonough	c50d39f073	[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268 ) This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on https://github.com/llvm/llvm-project/pull/93365. This PR makes the following changes: - Allows PGO flags to be supplied to GPU targets - Pulls version global from device - Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to allow the PGO version to be overridden	2025-03-19 19:01:38 -05:00

1 2 3

124 Commits