intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-13 19:08:21 +08:00

Author	SHA1	Message	Date
Joseph Huber	227bc5786f	Revert "[Offload] Lazily initialize platforms in the Offloading API" (#163272 ) Summary: This causes issues with CUDA's teardown order when the init is separated from the total init scope.	2025-10-14 12:46:55 -05:00
Joseph Huber	4a35c4d38a	[Offload] Lazily initialize platforms in the Offloading API (#163272 ) Summary: The Offloading library wraps around the underlying plugins. The problem is that we currently initialize all plugins we find, even if they are not needed for the program. This is very expensive for trivial uses, as fully heterogenous usage is quite rare. In practice this means that you will always pay a 200 ms penalty for having CUDA installed. This patch changes the behavior to provide accessors into the plugins and devices that allows them to be initialized lazily. We use a once_flag, this should properly take a fast-path check while still blocking on concurrent use. Making full use of this will require a way to filter platforms more specifically. I'm thinking of what this would look like as an API. I'm thinking that we either have an extra iterate function that takes a callback on the platform, or we just provide a helper to find all the devices that can run a given image. Maybe both? Fixes: https://github.com/llvm/llvm-project/issues/159636	2025-10-14 09:35:53 -05:00
Jan Patrick Lehr	6eef045365	[Offload] Silence warning via maybe unused (NFC) (#163076 )	2025-10-12 17:28:46 +02:00
agozillon	9155b318f2	[Flang][OpenMP] Defer descriptor mapping for assumed dummy argument types (#154349 ) This PR adds deferral of descriptor maps until they are necessary for assumed dummy argument types. The intent is to avoid a problem where a user can inadvertently map a temporary local descriptor to device without their knowledge and proceed to never unmap it. This temporary local descriptor remains lodged in OpenMP device memory and the next time another variable or descriptor residing in the same stack address is mapped we incur a runtime OpenMP map error as we try to remap the same address. This fix was discussed with the OpenMP committee and applies to OpenMP 5.2 and below, future versions of OpenMP can avoid this issue via the attach semantics added to the specification.	2025-10-09 17:52:41 +02:00
Alex Duran	45757b9284	[OFFLOAD] Remove unused init_device_info plugin interface (#162650 ) This was used for the old interop code. It's dead code after #143491	2025-10-09 08:38:24 -05:00
Joseph Huber	095877c12e	[Offload] Fix isValidBinary segfault on host platform Summary: Need to verify this actually has a device. We really need to rework this to point to a real impolementation, or streamline it to handle this automatically.	2025-10-06 14:46:50 -05:00
Joseph Huber	8763812b4c	[Offload] Remove check on kernel argument sizes (#162121 ) Summary: This check is unnecessarily restrictive and currently incorrectly fires for any size less than eight bytes. Just remove it, we do sanity checks elsewhere and at some point need to trust the ABI.	2025-10-06 12:49:44 -05:00
Alex Duran	902fe02e87	[OFFLOAD] Restore interop functionality (#161429 ) This implements two pieces to restore the interop functionality (that I broke) when the 6.0 interfaces were added: * A set of wrappers that support the old interfaces on top of the new ones * The same level of interop support for the CUDA amd AMD plugins	2025-10-02 21:48:31 +02:00
Akash Banerjee	ed12dc5e30	[Flang][OpenMP] Implicitly map nested allocatable components in derived types (#160766 ) This PR adds support for nested derived types and their mappers to the MapInfoFinalization pass. - Generalize MapInfoFinalization to add child maps for arbitrarily nested allocatables when a derived object is mapped via declare mapper. - Traverse HLFIR designates rooted at the target block arg and build full coordinate_of chains; append members with correct membersIndex. This fixes #156461.	2025-10-02 16:15:16 +00:00
Joseph Huber	0fcce4fb7b	[OpenMP] Mark problematic tests as XFAIL / UNSUPPORTED (#161267 ) Summary: Several of these tests have been failing for literal years. Ideally we make efforts to fix this, but keeping these broken has had serious consequences on our testing infrastructure where failures are the norm so almost all test failures are disregarded. I made a tracking issue for the ones that have been disabled. https://github.com/llvm/llvm-project/issues/161265	2025-09-29 15:17:55 -05:00
Joseph Huber	786358a3d7	[Offload] Fix incorrect size used in llvm-offload-device-info tool Summary: This was not using the size previously queried and would fail when the implementation actually verified it.	2025-09-29 14:37:11 -05:00
Abhinav Gaba	7de73c4e9d	[OpenMP][Offload] Support `PRIVATE \| ATTACH` maps for corresponding-pointer-initialization. (#160760 ) `PRIVATE \| ATTACH` maps can be used to represent firstprivate pointers that should be initialized by doing doing the pointee's device address, if its lookup succeeds, or retain the original host pointee's address otherwise. With this, for a test like the following: ```f90 integer, pointer :: p(:) !$omp target map(p(1)) ... print, p(1) !$omp end target ``` The codegen can look like: ```llvm ; maps for p: ; &p(1), &p(1), sizeof(p(1)), TO\|FROM //(1) ; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), ATTACH //(2) ; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), PRIVATE\|ATTACH\|PARAM //(3) call... @__omp_outlined...(ptr %ref_ptr_of_p) ``` `(1)` maps the pointee `p(1)`. * `(2)` attaches it to the (previously) mapped `ref_ptr(p)`, if present. It can be controlled via OpenMP 6.1's `attach(auto/always/never)` map-type modifiers. * `(3)` privatizes and initializes the local `ref_ptr(p)`, which gets passed in as the kernel argument `%ref_ptr_of_p`. Can be skipped if p is not referenced directly within the region. While similar mapping can be used for C/C++, it's more important/useful for Fortran as we can avoid creating another argument for passing the descriptor, and use that to initialize the private copy in the body of the kernel.	2025-09-29 11:47:21 -07:00
Joseph Huber	44f392e999	[OpenMP] Fix 'libc' configuration when building OpenMP Summary: Forgot to port this option's old handling from offload. It's not way easier since they're built in the same CMake project. Also delete the leftover directory that's not used anymore, don't know how that was still there.	2025-09-29 11:59:17 -05:00
Dominik Adamski	e4d94f4f7f	[OpenMP][Flang] Fix no-loop test (#161162 ) Fortran no-loop test is supported only for GPU.	2025-09-29 16:01:52 +02:00
Piotr Balcer	23d08af3d4	[Offload][NFC] use unique ptrs for platforms (#160888 ) Currently, devices store a raw pointer to back to their owning Platform. Platforms are stored directly inside of a vector. Modifying this vector risks invalidating all the platform pointers stored in devices. This patch allocates platforms individually, and changes devices to store a reference to its platform instead of a pointer. This is safe, because platforms are guaranteed to outlive the devices they contain.	2025-09-29 07:10:26 -05:00
Kevin Sala Penades	01d761a776	[Offload] Use Error for allocating/deallocating in plugins (#160811 ) Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-09-26 13:50:00 -05:00
Dominik Adamski	83ef38a274	[Flang][OpenMP] Enable no-loop kernels (#155818 ) Enable the generation of no-loop kernels for Fortran OpenMP code. target teams distribute parallel do pragmas can be promoted to no-loop kernels if the user adds the -fopenmp-assume-teams-oversubscription and -fopenmp-assume-threads-oversubscription flags. If the OpenMP kernel contains reduction or num_teams clauses, it is not promoted to no-loop mode. The global OpenMP device RTL oversubscription flags no longer force no-loop code generation for Fortran.	2025-09-26 13:57:51 +02:00
Akash Banerjee	3e7e60ae5c	Revert "[Flang][OpenMP] Implicitly map nested allocatable components in derived types" (#160759 ) Reverts llvm/llvm-project#160116	2025-09-25 19:53:58 +01:00
Akash Banerjee	b4f1e0e5b1	[Flang][OpenMP] Implicitly map nested allocatable components in derived types (#160116 ) This PR adds support for nested derived types and their mappers to the MapInfoFinalization pass. - Generalize MapInfoFinalization to add child maps for arbitrarily nested allocatables when a derived object is mapped via declare mapper. - Traverse HLFIR designates rooted at the target block arg and build full coordinate_of chains; append members with correct membersIndex. This fixes #156461.	2025-09-24 14:30:27 +01:00
Ross Brunton	ea0e5185e2	[Offload] Add olGetMemInfo with platform-less API (#159581 )	2025-09-24 12:17:57 +01:00
Ross Brunton	e60a5733f0	[Offload] Print Image location rather than casting it (#160309 ) This squishes a warning where the runtime tries to bind a StringRef to a `%p`.	2025-09-24 10:57:55 +01:00
Alexey Sachkov	bb584644e9	[Offload][NFC] Avoid temporary string copies in InfoTreeNode (#159372 )	2025-09-23 12:21:57 -05:00
Joseph Huber	204580aa8e	[Offload] Don't add the unsupported host plugin to the list (#159642 ) Summary: The host plugin is basically OpenMP specific and doesn't work very well. Previously we were skipping over it in the list instead of just not adding it at all.	2025-09-23 08:31:35 -05:00
Ross Brunton	fcebe6bdbb	[Offload] Re-allocate overlapping memory (#159567 ) If olMemAlloc happens to allocate memory that was already allocated elsewhere (possibly by another device on another platform), it is now thrown away and a new allocation generated. A new `AllocBases` vector is now available, which is an ordered list of allocation start addresses.	2025-09-23 13:59:52 +01:00
Tobias Stadler	dfbd76bda0	[Remarks] Restructure bitstream remarks to be fully standalone (#156715 ) Currently there are two serialization modes for bitstream Remarks: standalone and separate. The separate mode splits remark metadata (e.g. the string table) from actual remark data. The metadata is written into the object file by the AsmPrinter, while the remark data is stored in a separate remarks file. This means we can't use bitstream remarks with tools like opt that don't generate an object file. Also, it is confusing to post-process bitstream remarks files, because only the standalone files can be read by llvm-remarkutil. We always need to use dsymutil to convert the separate files to standalone files, which only works for MachO. It is not possible for clang/opt to directly emit bitstream remark files in standalone mode, because the string table can only be serialized after all remarks were emitted. Therefore, this change completely removes the separate serialization mode. Instead, the remark string table is now always written to the end of the remarks file. This requires us to tell the serializer when to finalize remark serialization. This automatically happens when the serializer goes out of scope. However, often the remark file goes out of scope before the serializer is destroyed. To diagnose this, I have added an assert to alert users that they need to explicitly call finalizeLLVMOptimizationRemarks. This change paves the way for further improvements to the remark infrastructure, including more tooling (e.g. #159784), size optimizations for bitstream remarks, and more. Pull Request: https://github.com/llvm/llvm-project/pull/156715	2025-09-22 16:41:39 +01:00
Joseph Huber	23efc67e19	[Offload] Remove non-blocking allocation type (#159851 ) Summary: This was originally added in as a hack to work around CUDA's limitation on allocation. The `libc` implementation now isn't even used for CUDA so this code is never hit. Even if this case, this code never truly worked. A true solution would be to use CUDA's virtual memory API instead to allocate 2MiB slabs independenctly from the normal memory management done in the stream.	2025-09-20 09:07:14 -05:00
Joseph Huber	580860e8b7	[OpenMP][NFC] Clean up a bunch of warnings and clang-tidy messages (#159831 ) Summary: I made the GPU flags accept more of the default LLVM warnings, which triggered some new cases. Clean those up and fix some other ones while I'm at it.	2025-09-19 14:09:33 -05:00
Akash Banerjee	8afea0d0ea	[OpenMP][MLIR] Preserve to/from flags in mapper base entry for mappers (#159799 ) With declare mapper, the parent base entry was emitted as `TARGET_PARAM` only. The mapper received a map-type without `to/from`, causing components to degrade to `alloc`-only (no copies), breaking allocatable payload mapping. This PR preserves the map-type bits from the parent. This fixes #156466.	2025-09-19 19:34:09 +01:00
Joseph Huber	51e3c3d51b	[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658 ) Summary: This exposes the 'isDeviceCompatible' routine for checking if a binary can be loaded. This is useful if people don't want to consume errors everywhere when figuring out which image to put to what device. I don't know if this is a good name, I was thining like `olIsCompatible` or whatever. Let me know what you think. Long term I'd like to be able to do something similar to what OpenMP does where we can conditionally only initialize devices if we need them. That's going to be support needed if we want this to be more generic.	2025-09-19 12:15:57 -05:00
Ross Brunton	f334ac6665	[Offload] Include product name in llvm-offload-device-info (#159384 )	2025-09-18 12:22:13 +01:00
Joseph Huber	dffd7f3d9a	[LLVM] Fix offload and update CUDA ABI for all SM values (#159354 ) Summary: Turns out the new CUDA ABI now applies retroactively to all the other SMs if you upgrade to CUDA 13.0. This patch changes the scheme, keeping all the SM flags consistent but using an offset. Fixes: https://github.com/llvm/llvm-project/issues/159088	2025-09-17 14:39:39 -05:00
Kareem Ergawy	c286a427b9	[NFC][flang][do concurent] Add saxpy offload tests for OpenMP mapping (#155993 ) Adds end-to-end tests for `do concurrent` offloading to the device. PR stack: - https://github.com/llvm/llvm-project/pull/155754 - https://github.com/llvm/llvm-project/pull/155987 - https://github.com/llvm/llvm-project/pull/155992 - https://github.com/llvm/llvm-project/pull/155993 ◀️ - https://github.com/llvm/llvm-project/pull/157638 - https://github.com/llvm/llvm-project/pull/156610 - https://github.com/llvm/llvm-project/pull/156837	2025-09-17 07:04:13 +02:00
Nick Sarnie	f74583fbe8	[offload] Fix build with debug libomptarget (#159144 ) Currently get this error ``` offload/plugins-nextgen/common/src/PluginInterface.cpp:859:63: error: member reference type 'StringRef' is not a pointer; did you mean to use '.'? ``` We pass the full image binary now so we can't really print anything useful here. Seems introduced in https://github.com/llvm/llvm-project/pull/158748. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-09-16 18:40:02 +00:00
Joseph Huber	e7101dac9c	[Offload] Copy loaded images into managed storage (#158748 ) Summary: Currently we have this `__tgt_device_image` indirection which just takes a reference to some pointers. This was all find and good when the only usage of this was from a section of GPU code that came from an ELF constant section. However, we have expanded beyond that and now need to worry about managing lifetimes. We have code that references the image even after it was loaded internally. This patch changes the implementation to instaed copy the memory buffer and manage it locally. This PR reworks the JIT and other image handling to directly manage its own memory. We now don't need to duplicate this behavior externally at the Offload API level. Also we actually free these if the user unloads them. Upside, less likely to crash and burn. Downside, more latency when loading an image.	2025-09-16 08:57:28 -05:00
Jan Patrick Lehr	311d78f2a1	[OpenMP] Fix force-usm test after #157182 (#159095 ) The refactoring lead to an additional data transfer. This changes the assumed transfers in the check-strings to work with that changed behavior.	2025-09-16 15:42:02 +02:00
Ross Brunton	c5474cdc27	[Offload] Make `ASSERT_ERROR` output more readable (#157653 )	2025-09-16 12:04:53 +01:00
Abhinav Gaba	5af3fa81cc	[Offload][OpenMP] Support shadow-pointer tracking for Fortran descriptors. (#158370 ) This change adds support for saving full contents of attached Fortran descriptors, and not just their pointee address, in the shadow-pointer table. With this, we now support: * comparing full contents of descriptors to check whether a previous shadow-pointer entry is stale; * restoring the full contents of descriptors And with that, we can now use ATTACH map-types (added in #149036) for mapping Fortran pointer/allocatable arrays, and array-sections on them. e.g.: ```f90 integer, allocatable :: x(:) !$omp target enter data map(to: x(:)) ``` as: ``` void* addr_of_pointee = allocated(x) ? &x(1) : nullptr; int64_t sizeof_pointee = allocated(x) ? sizeof(x(:)) : 0 addr_of_pointee, addr_of_pointee, sizeof_pointee, TO addr_of_descriptor, addr_of_pointee, size_of_descriptor, ATTACH ```	2025-09-15 10:37:38 -07:00
Michał Górny	312b5615df	[offload] Fix finding libomptarget in runtimes build (#157856 ) Per the logic in top-level CMakeLists, `libomptarget` is placed into `LLVM_LIBRARY_OUTPUT_INTDIR` when this variable is set. Adjust the test logic to include this directory in `-L` and `-Wl,-rpath` arguments as well, in order to fix finding tests when building via the `runtimes` top-level directory. Signed-off-by: Michał Górny <mgorny@gentoo.org>	2025-09-10 16:31:22 +02:00
agozillon	8f16af3c20	[Flang][OpenMP] Fix mapping of character type with LEN > 1 specified (#154172 ) Currently, there's a number of issues with mapping characters with LEN's specified (strings effectively). They're represented as a char type in FIR with a len parameter, and then later on they're expanded into an array of characters when we're translating to the LLVM dialect. However, we don't generate a bounds for these at lowering. The fix in this PR for this is to generate a bounds from the LEN parameter and attatch it to the map on lowering from FIR to the LLVM dialect when we encounter this type.	2025-09-09 16:36:04 +02:00
Joseph Huber	4294907022	[Offload] Build libcxx on the GPU libc bot (#157673 )	2025-09-09 09:35:53 -05:00
Ross Brunton	7731ecf259	[Offload] Skip most liboffload tests if no devices (#157417 ) If there are no devices available for testing on liboffload, the test will no longer throw an error when it fails to instantiate. The tests will be silently skipped, but with a warning printed to stderr.	2025-09-09 10:11:05 +01:00
Joseph Huber	6d032c4df2	[OpenMP] Fix incorrect CUDA bc path after library change (#157547 )	2025-09-08 17:27:59 -05:00
Joseph Huber	5d550bf41c	[OpenMP] Move `__omp_rtl_data_environment' handling to OpenMP (#157182 ) Summary: This operation is done every time we load a binary, this behavior should be moved into OpenMP since it concerns an OpenMP specific data struct. This is a little messy, because ideally we should only be using public APIs, but more can be extracted later.	2025-09-08 09:58:38 -05:00
Joseph Huber	3f3f7d1fd9	[Offload] Build the OpenMP device library with the AMDGPU libc bot Summary: This is missing because I forgot to add it.	2025-09-08 08:36:18 -05:00
Michał Górny	6343c9bbdf	[offload] Permit redefining OPENMP_STANDALONE_BUILD (#157253 ) Permit redefining `OPENMP_STANDALONE_BUILD` to make it possible to build offload correctly via runtimes build (i.e. build where the top-level project is `runtimes`). This follows the same logic in `openmp` component. Signed-off-by: Michał Górny <mgorny@gentoo.org>	2025-09-08 15:16:02 +02:00
Joseph Huber	be6f110bc0	[OpenMP] Change build of OpenMP device runtime to be a separate runtime (#136729 ) Summary: Currently we build the OpenMP device runtime as part of the `offload/` project. This is problematic because it has several restrictions when compared to the normal offloading runtime. It can only be built with an up-to-date clang and we need to set the target appropriately. Currently we hack around this by creating the compiler invocation manually, but this patch moves it into a separate runtimes build. This follows the same build we use for libc, libc++, compiler-rt, and flang-rt. This also moves it from `offload/` into `openmp/` because it is still the `openmp/` runtime and I feel it is more appropriate. We do want a generic `offload/` library at some point, but it would be trivial to then add that as a separate library now that we have the infrastructure that makes adding these new libraries trivial. This most importantly will require that users update their build configs, mostly adding the following lines at a minimum. I was debating whether or not I should 'auto-upgrade' this, but I just went with a warning. ``` -DLLVM_RUNTIME_TARGETS='default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda' \ -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=openmp \ -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=openmp \ ``` This also changed where the `.bc` version of the library lives, but it's still created.	2025-09-08 07:51:52 -05:00
Julian Brown	c71da7d5e0	[OpenMP] Add tests for mapping of chained 'containing' structs (#156703 ) This PR adds several new tests for mapping of chained structures, i.e. those resembling: #pragma omp target map(tofrom: a->b->c) These are currently XFAILed, although the first two tests actually work with unified memory -- I'm not sure if it's possible to easily improve the condition on the XFAILs in question to make them more accurate. These cases are all fixed by the WIP PR https://github.com/llvm/llvm-project/pull/153683.	2025-09-08 10:30:04 +01:00
Michał Górny	7a88ddd3b1	Revert "[Offload] Run unit tests as a part of check-offload" (#157346 ) Reverts llvm/llvm-project#156675 due to regressions in standalone build and test errors without all plugins enabled (#157345).	2025-09-07 15:12:15 +00:00
Jan Patrick Lehr	05aff0eb65	[Offload] Run tests 16-way parallel on AMDGPU (#156627 ) Reduce the number of paralell tests run to align with the typical number of VMIDs provided by the kernel driver.	2025-09-05 22:23:20 +02:00
Robert Imschweiler	b2ff3e780a	[OpenMP][Offload] Restore __kmpc_* function signatures (#156104 ) Avoid altering existing function signatures of the kmpc interface to fix regressions in the runtime optimization (OpenMPOpt).	2025-09-04 10:56:42 -05:00

1 2 3 4 5 ...

480 Commits