intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-13 11:02:04 +08:00

Author	SHA1	Message	Date
Kevin Sala Penades	35315a84b4	[offload] Fix CUDA args size by subtracting tail padding (#172249 ) This commit makes the cuLaunchKernel call to pass the total arguments size without tail padding.	2025-12-14 21:57:25 -08:00
Alex Duran	02a908c4c9	[OpenMP][Offload] Continue to update libomptarget debug messages (#170425 ) * Add support to use lambdas to output debug messages (like LDBG_OS) * Update messages for interface.cpp and omptarget.cpp	2025-12-10 16:18:01 +01:00
Akash Banerjee	b360a782ca	Reland "[Flang][OpenMP] Add lowering support for is_device_ptr clause (#169331 )" (#170851 ) Add support for OpenMP is_device_ptr clause for target directives. [MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr #169367 This PR adds support for the OpenMP is_device_ptr clause in the MLIR to LLVM IR translation for target regions. The is_device_ptr clause allows device pointers (allocated via OpenMP runtime APIs) to be used directly in target regions without implicit mapping.	2025-12-05 17:38:41 +00:00
theRonShark	be79a0d90f	Revert "[Flang][OpenMP] Add lowering support for is_device_ptr clause" (#170778 ) Reverts llvm/llvm-project#169331	2025-12-04 19:38:16 -05:00
Akash Banerjee	a77c4948a5	[Flang][OpenMP] Add lowering support for is_device_ptr clause (#169331 ) Add support for OpenMP is_device_ptr clause for target directives. [MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr #169367 This PR adds support for the OpenMP is_device_ptr clause in the MLIR to LLVM IR translation for target regions. The is_device_ptr clause allows device pointers (allocated via OpenMP runtime APIs) to be used directly in target regions without implicit mapping.	2025-12-04 15:57:24 +00:00
Alex Duran	ec6091f4de	[OFFLOAD][LIBOMPTARGET] Start to update debug messages in libomptarget (#170265 ) * Add compatibility support for DP and REPORT macros * Define a set of predefined Debug Type for libomptarget * Start to update libomptarget files (OffloadRTL.cpp, device.cpp)	2025-12-02 23:45:23 +01:00
Robert Imschweiler	8808beeb1a	Reland: [OpenMP] Implement omp_get_uid_from_device() / omp_get_device_from_uid() (#168554 ) Reland https://github.com/llvm/llvm-project/pull/164392 with Fortran support moved to follow-up PR	2025-12-01 14:18:31 +01:00
Jason-VanBeusekom	84d511df8d	[OpenMP][clang] Register vtables on device for indirect calls runtime (#167011 ) This is a branch off of https://github.com/llvm/llvm-project/pull/159856, in which consists of the runtime portion of the changes required to support indirect function and virtual function calls on an `omp target device` when the virtual class / indirect function is mapped to the device from the host. Key Changes - Introduced a new flag OMP_DECLARE_TARGET_INDIRECT_VTABLE to mark VTable registrations - Modified setupIndirectCallTable to support both VTable entries and indirect function pointers Details: The setupIndirectCallTable implementation was modified to support this registration type by retrieving the first address of the VTable and inferring the remaining data needed to build the indirect call table. Since the Vtables / Classes registered as indirect can be larger than 8 bytes, and the vtables may not be at the first address we either need to pass the size to __llvm_omp_indirect_call_lookup and have a check at each step of the binary search, or add multiple entries to the indirect table for each address registered. The latter was chosen. Commit: a00def3f20e166d4fb9328e6f0bc0742cd0afa31 is not a part of this PR and is handled / reviewed in: https://github.com/llvm/llvm-project/pull/159856, This is PR (2/3) Register Vtable PR (1/3): https://github.com/llvm/llvm-project/pull/159856, Codegen / _llvm_omp_indirect_call_lookup PR (3/3): https://github.com/llvm/llvm-project/pull/159857	2025-11-26 17:33:26 +00:00
Alex Duran	3f22ed1152	[OFFLOAD] Add support for indexed per-thread containers (#164263 ) Split from #158900 it adds a PerThreadContainer that can use STL-like indexed containers based on a slightly refactored PerThreadTable. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-11-26 02:21:09 +01:00
Nick Sarnie	b3b83ac1e8	[offload][lit] Fix compilation of two offload tests (#169399 ) These are C tests, not C++, so no function parameters means unspecified number of parameters, not `void`. These compile fine on the current tested offload targets because an error is only [thrown](https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDecl.cpp#L10695) if the calling convention doesn't support variadic arguments, which they happen to. When compiling this test for other targets that do not support variadic arguments, we get an error, which does not seem intentional. Just add `void` to the parameter list. --------- Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>	2025-11-25 15:16:15 +00:00
Abhinav Gaba	2f8e712875	[NFC][OpenMP] Add use_device_ptr/addr tests for when the lookup fails. (#169428 ) As per OpenMP 5.1, the pointers are expected to retain their original values when a lookup fails and there is no device pointer to translate to.	2025-11-24 16:48:23 -08:00
Jan Leyonberg	3e86f05621	[OpenMP][flang] Lowering of OpenMP custom reductions to MLIR (#168417 ) This patch add support for lowering of custom reductions to MLIR. It also enhances the capability of the pass to automatically mark functions as "declare target" by traversing custom reduction initializers and combiners.	2025-11-24 16:00:46 -05:00
agozillon	173600880b	[Flang][OpenMP][MLIR] Initial declare target to for variables implementation (#119589 ) While the infrastructure for declare target to/enter and link for variables exists in the MLIR dialect and at the Flang level, the current lowering from MLIR -> LLVM IR isn't in place, it's only in place for variables that have the link clause applied. This PR aims to extend that lowering to an initial implementation that incorporates declare target to as well, which primarily requires changes in the OpenMPToLLVMIRTranslation phase. However, a minor addition to the OpenMP dialect was required to extend the declare target enumerator to include a default None field as well. This also requires a minor change to the Flang lowering's MapInfoFinlization.cpp pass to alter the map type for descriptors to deal with cases where a variable is marked declare to. Currently, when a descriptor variable is mapped declare target to the descriptor component can become attatched, and cannot be updated, this results in issues when an unusual allocation range is specified (effectively an off-by X error). The current solution is to map the descriptor always, as we always require an up-to-date version of this data. However, this also requires an interlinked PR that adds a more intricate type of mapping of structures/record types that clang currently implements, to circumvent the overwriting of the pointer in the descriptor. 3/3 required PRs to enable declare target to mapping, this PR should pass all tests and provide an all green CI. Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com	2025-11-24 21:22:49 +01:00
agozillon	20929abb85	[MLIR][OpenMP] Introduce overlapped record type map support (#119588 ) This PR introduces a new additional type of map lowering for record types that Clang currently supports, in which a user can map a top-level record type and then individual members with different mapping, effectively creating a sort of "overlapping" mapping that we attempt to cut around. This is currently most predominantly used in Fortran, when mapping descriptors and there data, we map the descriptor and its data with separate map modifiers and "cut around" the pointer data, so that wedo not overwrite it unless the runtime deems it a neccesary action based on its reference counting mechanism. However, it is a mechanism that will come in handy/trigger when a user explitily maps a record type (derived type or structure) and then explicitly maps a member with a different map type. These additions were predominantly in the OpenMPToLLVMIRTranslation.cpp file and phase, however, one Flang test that checks end-to-end IR compilation (as far as we care for now at least) was altered. 2/3 required PRs to enable declare target to mapping, should look at PR 3/3 to check for full green passes (this one will fail a number due to some dependencies). Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com	2025-11-24 21:20:29 +01:00
Alex Duran	66ddc9b3e7	[OFFLOAD] Add support for more fine grained debug messages control (#165416 ) This PR introduces new debug macros that allow a more fined control of which debug message to output and introduce C++ stream style for debug messages. Changing existing messages (except a few that I changed for testing) will come in subsequent PRs. I also think that we should make debug enabling OpenMP agnostic but, for now, I prioritized maintaing the current libomptarget behavior for now, and we might need more changes further down the line as we we decouple libomptarget.	2025-11-20 18:39:56 +01:00
Joseph Huber	eea62159e8	[Offload] Make the RPC thread sleep briefly when idle (#168596 ) Summary: We start this thread if the RPC client symbol is detected in the loaded binary. We should make this sleep if there's no work to avoid the thread running at high priority when the (scarecely used) RPC call is actually required. So, right now after 25 microseconds we will assume the server is inactive and begin sleeping. This resets once we do find work. AMD supports a more intelligent way to do this. HSA signals can wake a sleeping thread from the kernel, and signals can be sent from the GPU side. This would be nice to have and I'm planning on working with it in the future to make this infrastructure more usable with existing AMD workloads.	2025-11-19 15:56:25 -06:00
Michael Kruse	c32c1d0d21	[Runtimes] Default build must use its own output dirs (#168266 ) Post-commit fix of #164794 reported at https://github.com/llvm/llvm-project/pull/164794#issuecomment-3536253493 `LLVM_LIBRARY_OUTPUT_INTDIR` and `LLVM_RUNTIME_OUTPUT_INTDIR` is used by `AddLLVM.cmake` as output directories. Unless we are in a bootstrapping-build, It must not point to directories found by `find_package(LLVM)` which may be read-only directories. MLIR for instance sets thesese variables to its own build output directory, so should the runtimes.	2025-11-19 13:51:14 +01:00
Robert Imschweiler	9a0fd22da1	Revert "[OpenMP] Implement omp_get_uid_from_device() / omp_get_device_from_uid()" (#168547 ) Reverts llvm/llvm-project#164392 due to fortran issues	2025-11-18 15:10:42 +00:00
Robert Imschweiler	65c4a534bd	[OpenMP] Implement omp_get_uid_from_device() / omp_get_device_from_uid() (#164392 ) Use the implementation in libomptarget. If libomptarget is not available, always return the UID / device number of the host / the initial device.	2025-11-18 15:22:49 +01:00
Akash Banerjee	8aa7d823b0	[OpenMP][Flang] Emit default declare mappers implicitly for derived types (#140562 ) This patch adds support to emit default declare mappers for implicit mapping of derived types when not supplied by user. This especially helps tackle mapping of allocatables of derived types.	2025-11-14 15:59:48 +00:00
Kevin Sala Penades	1a86f0aae7	[Offload] Add device info for shared memory (#167817 )	2025-11-13 11:00:12 -08:00
Łukasz Plewa	1bd035d80f	[offload] defer "---> olInit" trace message (#167893 ) Tracing requires liboffload to be initialized, so calling isTracingEnabled() before olInit always returns false. This caused the first trace log to look like: ``` -> OL_SUCCESS ``` instead of: ``` ---> olInit() -> OL_SUCCESS ``` This patch moves the pre-call trace print for olInit so it is emitted only after initialization. It would be possible to add extra logic to detect whether liboffload is already initialized and only postpone the first pre-call print, but this would add unnecessary complexity, especially since this is tablegen code. The difference would matter only in the unlikely case of a crash during a second olInit call. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-11-13 15:56:38 +00:00
Ethan Luis McDonough	38cade7cc6	[PGO][Offload] Fix missing names bug in GPU PGO (#166444 ) After #163011 was merged, the tests in [`offload/test/offloading/gpupgo`](https://github.com/llvm/llvm-project/compare/main...EthanLuisMcDonough:llvm-project:gpupgo-names-fix-pr?expand=1#diff-f769f6cebd25fa527bd1c1150cc64eb585c41cb8a8b325c2bc80c690e47506a1) broke because the offload plugins were no longer able to find `__llvm_prf_nm`. This pull request explicitly makes `__llvm_prf_nm` visible to the host on GPU targets and reverses the changes made in `f7e9968a5b`.	2025-11-10 10:11:53 -06:00
Kevin Sala Penades	64ad5d976d	[Offload] Remove unused KernelArgsTy instantiation (#167197 )	2025-11-08 20:54:32 -08:00
Joseph Huber	aaddd8d38a	[OpenMP] Fix tests relying on the heap size variable Summary: I made that an unimplemented error, but forgot that it was used for this environment variable.	2025-11-06 13:00:26 -06:00
Joseph Huber	670c453aeb	[Offload] Remove handling for device memory pool (#163629 ) Summary: This was a lot of code that was only used for upstream LLVM builds of AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so just use that. Simplifies code, can be added back if we start providing alternate forms but I don't think there's a single use-case that would justify it yet.	2025-11-06 10:15:18 -06:00
Robert Imschweiler	dc94f2cbad	[Offload] Add device UID (#164391 ) Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.	2025-11-04 20:15:47 +01:00
agozillon	09318c6bff	[MLIR][OpenMP] Fix and simplify bounds offset calculation for 1-D GEP offsets (#165486 ) Currently this is being calculated incorrectly and will result in incorrect index offsets in more complicated array slices. This PR tries to address it by refactoring and changing the calculation to be more correct.	2025-10-31 00:54:31 +01:00
Alex Duran	426d1fe548	[OFFLOAD] Remove weak from __kmpc_* calls and gather them in one header (#164613 ) Follow-up from #162652 --------- Co-authored-by: Michael Klemm <michael.klemm@amd.com>	2025-10-24 15:42:20 +02:00
Nicole Aschenbrenner	16641ad8a2	[OpenMP] Adds omp_target_is_accessible routine (#138294 ) Adds omp_target_is_accessible routine. Refactors common code from omp_target_is_present to work for both routines. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-10-22 17:35:16 +02:00
Kaloyan Ignatov	1f7ddb61b3	[NFC][Offload][OMPT] Improve readability of liboffload OMPT tests (#163181 ) - ompt_target_data_op_t, ompt_scope_endpoint_t and ompt_target_t are now printed as strings instead of just numbers to ease debugging - some missing clang-format clauses have been added	2025-10-22 10:48:39 +02:00
Abhinav Gaba	829804724b	[NFC][OpenMP] Update a test that was failing on aarch64. (#164456 ) The failure was reported here: https://github.com/llvm/llvm-project/pull/164039#issuecomment-3425429556 The test was checking for the "bad" behavior so as to keep track of it, but there seem to be some issues with the pointer arithmetic specific to aarch64. The update for now is to not check for the "bad" behavior fully. We may need to debug further if similar issues are encountered eventually once the codegen has been fixed.	2025-10-21 21:15:52 -07:00
Ross Brunton	186182bb64	[Offload] Use `amd_signal_async_handler` for host function calls (#154131 )	2025-10-21 13:08:30 +01:00
Abhinav Gaba	f37b4459f0	[NFC][OpenMP] Add small class-member use_device_ptr/addr unit tests. (#164039 ) Two of the tests are currently asserting, and two are emitting unexpected results. The asserting tests will be fixed using the ATTACH-style codegen from #153683. The other two involve `use_device_addr` on byrefs, and need more follow-up codegen changes, that have been noted in a FIXME comment.	2025-10-20 13:14:33 -07:00
Alex Duran	9ba54ca3ee	[OFFLOAD] Interop fixes for Windows (#162652 ) On Windows, for a reason I don't fully understand boolean bits get extra padding (even when asking for packed structures) in the structures that messes the offsets between the compiler and the runtime. Also, "weak" works differently on Windows than Linux (i.e., the "local" routine has preference) which causes it to crash as we don't really have an alternate implementation of __kmpc_omp_wait_deps. Given this, it doesn't make sense to mark it as "weak" for Linux either.	2025-10-17 11:07:31 +02:00
Jan Patrick Lehr	f7e9968a5b	[Offload] XFAIL pgo tests until resolved (#163722 ) While people look into it, xfail the tests.	2025-10-16 11:43:55 +02:00
Joseph Huber	914fbe367e	[OpenMP] Disable a few more tests to get the bot green (#163614 )	2025-10-15 14:14:15 -05:00
Jan Patrick Lehr	4b84e0f3f0	[OpenMP] Add test to print interop identifiers (#161434 ) The test covers some of the identifier symbols in the interop runtime. This test, for now, is to guard against complete breakage, which was the result of the other `interop.c` test not being enabled on AMD and thus, not caught by our buildbots.	2025-10-15 20:38:33 +02:00
Joseph Huber	227bc5786f	Revert "[Offload] Lazily initialize platforms in the Offloading API" (#163272 ) Summary: This causes issues with CUDA's teardown order when the init is separated from the total init scope.	2025-10-14 12:46:55 -05:00
Joseph Huber	4a35c4d38a	[Offload] Lazily initialize platforms in the Offloading API (#163272 ) Summary: The Offloading library wraps around the underlying plugins. The problem is that we currently initialize all plugins we find, even if they are not needed for the program. This is very expensive for trivial uses, as fully heterogenous usage is quite rare. In practice this means that you will always pay a 200 ms penalty for having CUDA installed. This patch changes the behavior to provide accessors into the plugins and devices that allows them to be initialized lazily. We use a once_flag, this should properly take a fast-path check while still blocking on concurrent use. Making full use of this will require a way to filter platforms more specifically. I'm thinking of what this would look like as an API. I'm thinking that we either have an extra iterate function that takes a callback on the platform, or we just provide a helper to find all the devices that can run a given image. Maybe both? Fixes: https://github.com/llvm/llvm-project/issues/159636	2025-10-14 09:35:53 -05:00
Jan Patrick Lehr	6eef045365	[Offload] Silence warning via maybe unused (NFC) (#163076 )	2025-10-12 17:28:46 +02:00
agozillon	9155b318f2	[Flang][OpenMP] Defer descriptor mapping for assumed dummy argument types (#154349 ) This PR adds deferral of descriptor maps until they are necessary for assumed dummy argument types. The intent is to avoid a problem where a user can inadvertently map a temporary local descriptor to device without their knowledge and proceed to never unmap it. This temporary local descriptor remains lodged in OpenMP device memory and the next time another variable or descriptor residing in the same stack address is mapped we incur a runtime OpenMP map error as we try to remap the same address. This fix was discussed with the OpenMP committee and applies to OpenMP 5.2 and below, future versions of OpenMP can avoid this issue via the attach semantics added to the specification.	2025-10-09 17:52:41 +02:00
Alex Duran	45757b9284	[OFFLOAD] Remove unused init_device_info plugin interface (#162650 ) This was used for the old interop code. It's dead code after #143491	2025-10-09 08:38:24 -05:00
Joseph Huber	095877c12e	[Offload] Fix isValidBinary segfault on host platform Summary: Need to verify this actually has a device. We really need to rework this to point to a real impolementation, or streamline it to handle this automatically.	2025-10-06 14:46:50 -05:00
Joseph Huber	8763812b4c	[Offload] Remove check on kernel argument sizes (#162121 ) Summary: This check is unnecessarily restrictive and currently incorrectly fires for any size less than eight bytes. Just remove it, we do sanity checks elsewhere and at some point need to trust the ABI.	2025-10-06 12:49:44 -05:00
Alex Duran	902fe02e87	[OFFLOAD] Restore interop functionality (#161429 ) This implements two pieces to restore the interop functionality (that I broke) when the 6.0 interfaces were added: * A set of wrappers that support the old interfaces on top of the new ones * The same level of interop support for the CUDA amd AMD plugins	2025-10-02 21:48:31 +02:00
Akash Banerjee	ed12dc5e30	[Flang][OpenMP] Implicitly map nested allocatable components in derived types (#160766 ) This PR adds support for nested derived types and their mappers to the MapInfoFinalization pass. - Generalize MapInfoFinalization to add child maps for arbitrarily nested allocatables when a derived object is mapped via declare mapper. - Traverse HLFIR designates rooted at the target block arg and build full coordinate_of chains; append members with correct membersIndex. This fixes #156461.	2025-10-02 16:15:16 +00:00
Joseph Huber	0fcce4fb7b	[OpenMP] Mark problematic tests as XFAIL / UNSUPPORTED (#161267 ) Summary: Several of these tests have been failing for literal years. Ideally we make efforts to fix this, but keeping these broken has had serious consequences on our testing infrastructure where failures are the norm so almost all test failures are disregarded. I made a tracking issue for the ones that have been disabled. https://github.com/llvm/llvm-project/issues/161265	2025-09-29 15:17:55 -05:00
Joseph Huber	786358a3d7	[Offload] Fix incorrect size used in llvm-offload-device-info tool Summary: This was not using the size previously queried and would fail when the implementation actually verified it.	2025-09-29 14:37:11 -05:00
Abhinav Gaba	7de73c4e9d	[OpenMP][Offload] Support `PRIVATE \| ATTACH` maps for corresponding-pointer-initialization. (#160760 ) `PRIVATE \| ATTACH` maps can be used to represent firstprivate pointers that should be initialized by doing doing the pointee's device address, if its lookup succeeds, or retain the original host pointee's address otherwise. With this, for a test like the following: ```f90 integer, pointer :: p(:) !$omp target map(p(1)) ... print, p(1) !$omp end target ``` The codegen can look like: ```llvm ; maps for p: ; &p(1), &p(1), sizeof(p(1)), TO\|FROM //(1) ; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), ATTACH //(2) ; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), PRIVATE\|ATTACH\|PARAM //(3) call... @__omp_outlined...(ptr %ref_ptr_of_p) ``` `(1)` maps the pointee `p(1)`. * `(2)` attaches it to the (previously) mapped `ref_ptr(p)`, if present. It can be controlled via OpenMP 6.1's `attach(auto/always/never)` map-type modifiers. * `(3)` privatizes and initializes the local `ref_ptr(p)`, which gets passed in as the kernel argument `%ref_ptr_of_p`. Can be skipped if p is not referenced directly within the region. While similar mapping can be used for C/C++, it's more important/useful for Fortran as we can avoid creating another argument for passing the descriptor, and use that to initialize the private copy in the body of the kernel.	2025-09-29 11:47:21 -07:00

1 2 3 4 5 ...

518 Commits