intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-14 03:50:17 +08:00

Author	SHA1	Message	Date
Ye Luo	9f6784cc1f	[libomptarget] fix test offloading/disable_default_device.c Fixes the incorrect lit command line introduced in `536ba87726`	2025-07-09 09:52:00 -05:00
Abhinav Gaba	ae4a81e849	[NFC][OpenMP] Add tests for mapping pointers and their dereferences. (#146934 ) The output of the compile-and-run tests is incorrect. These will be used for reference in future commits that resolve the issues. Also updated the existing clang LIT test, target_map_both_pointer_pointee_codegen.cpp, with more constructs and fewer CHECKs (through more update_cc_test_checks filters).	2025-07-08 06:52:38 -04:00
Ye Luo	536ba87726	[libomptarget] Add a test for OMP_TARGET_OFFLOAD=disabled (#146385 ) closes https://github.com/llvm/llvm-project/issues/144786	2025-06-30 13:29:36 -05:00
Julian Brown	b62b58d1bb	[OpenMP] Fix crash with duplicate mapping on target directive (#146136 ) OpenMP allows duplicate mappings, i.e. in OpenMP 6.0, 7.9.6 "map Clause": Two list items of the map clauses on the same construct must not share original storage unless one of the following is true: they are the same list item [or other omitted reasons]" Duplicate mappings can arise as a result of user-defined mapper processing (which I think is a separate bug, and is not addressed here), but also in straightforward cases such as: #pragma omp target map(tofrom: s.mem[0:10]) map(tofrom: s.mem[0:10]) Both these cases cause crashes at runtime at present, due to an unfortunate interaction between reference counting behaviour and shadow pointer handling for blocks. This is what happens: 1. The member "s.mem" is copied to the target 2. A shadow pointer is created, modifying the pointer on the target 3. The member "s.mem" is copied to the target again 4. The previous shadow pointer metadata is still present, so the runtime doesn't modify the target pointer a second time. The fix is to disable step 3 if we've already done step 2 for a given block that has the "is new" flag set.	2025-06-29 22:41:24 +01:00
Ross Brunton	02d2a1646a	[Offload] Fix entry_points.td test (#145292 ) This was broken as part of #144494 , and just needs an update to the check lines.	2025-06-23 11:09:08 +01:00
Ethan Luis McDonough	daee5eee85	[Offload][PGO] Fix new GPU PGO tests (#143645 ) `pgo_atomic_teams.c` and `pgo_atomic_threads.c` currently are set to run on NVPTX despite the changes for that target not being upstreamed yet. This patch also replaces instances of `llvm-profdata` with `%profdata` in those tests.	2025-06-12 11:14:21 -05:00
Abhinav Gaba	02b6849cf1	[Clang][OpenMP] Fix mapping of arrays of structs with members with mappers (#142511 ) This builds upon #101101 from @jyu2-git, which used compiler-generated mappers when mapping an array-section of structs with members that have user-defined default mappers. Now we do the same when mapping arrays of structs.	2025-06-11 19:03:55 +00:00
Joseph Huber	051945304b	[Offload] Fix APU detection for MI300 testing (#143026 ) Summary: We have this check when the target is MI300 but it fails if this environment variable isn't set. Set a default value of '0' if not present so that will be converted to bool false.	2025-06-05 15:31:55 -05:00
Jan Patrick Lehr	e97f42e931	[OpenMP][Offload] Fix typo in error message (#142589 ) It appears that the spelling was incorrect in those test cases. At least on machines with ROCm version > 6.3. I had no chance to test with ROCm version version < 6.2 and would be interested in the result if someone has the chance.	2025-06-03 07:33:45 -05:00
Joseph Huber	b26baf1779	[Offload] Make AMDGPU plugin handle empty allocation properly (#142383 ) Summary: `malloc(0)` and `free(nullptr)` are both defined by the standard but we current trigger erros and assertions on them. Fix that so this works with empty arguments.	2025-06-02 08:12:20 -05:00
Ross Brunton	a1191b4875	[Offload] Fix broken tablegen test after #140879 (#141796 )	2025-05-28 11:30:15 -05:00
Johannes Doerfert	57a90edacd	[OpenMP][GPU][FIX] Enable generic barriers in single threaded contexts (#140786 ) The generic GPU barrier implementation checked if it was the main thread in generic mode to identify single threaded regions. This doesn't work since inside of a non-active (=sequential) parallel, that thread becomes the main thread of a team, and is not the main thread in generic mode. At least that is the implementation of the APIs today. To identify single threaded regions we now check the team size explicitly. This exposed three other issues; one is, for now, expected and not a bug, the second one is a bug and has a FIXME in the single_threaded_for_barrier_hang_1.c file, and the final one is also benign as described in the end. The non-bug issue comes up if we ever initialize a thread state. Afterwards we will never run any region in parallel. This is a little conservative, but I guess thread states are really bad for performance anyway. The bug comes up if we optimize single_threaded_for_barrier_hang_1 and execute it in Generic-SPMD mode. For some reason we loose all the updates to b. This looks very much like a compiler bug, but could also be another logic issue in the runtime. Needs to be investigated. Issue number 3 comes up if we have nested parallels inside of a target region. The clang SPMD-check logic gets confused, determines SPMD (which is fine) but picks an unreasonable thread count. This is all benign, I think, just weird: ``` #pragma omp target teams #pragma omp parallel num_threads(64) #pragma omp parallel num_threads(10) {} ``` Was launched with 10 threads, not 64.	2025-05-20 19:33:54 -07:00
Ethan Luis McDonough	1043810769	[PGO][Offload] Update PGO GPU tests (#132262 )	2025-05-14 17:17:52 -05:00
agozillon	f687ed9ff7	[Flang][OpenMP] Initial defaultmap implementation (#135226 ) This aims to implement most of the initial arguments for defaultmap aside from firstprivate and none, and some of the more recent OpenMP 6 additions which will come in subsequent updates (with the OpenMP 6 variants needing parsing/semantic support first).	2025-05-12 16:30:43 +02:00
agozillon	b291cfcad4	[Flang][OpenMP] Generate correct present checks for implicit maps of optional allocatables (#138210 ) Currently, we do not generate the appropriate checks to check if an optional allocatable argument is present before accessing relevant components of it, in particular when creating bounds, we must generate a presence check and we must make sure we do not generate/keep an load external to the presence check by utilising the raw address rather than the regular address of the info data structure. Similarly in cases for optional allocatables we must treat them like non-allocatable arguments and generate an intermediate allocation that we can have as a location in memory that we can access later in the lowering without causing segfaults when we perform "mapping" on it, even if the end result is an empty allocatable (basically, we shouldn't explode if someone tries to map a non-present optional, similar to C++ when mapping null data).	2025-05-09 13:57:45 +02:00
Callum Fare	6022a5214b	[Offload] Add check-offload-unit for liboffload unittests (#137312 ) Adds a `check-offload-unit` target for running the liboffload unit test suite. This unit test binary runs the tests for every available device. This can optionally filtered to devices from a single platform, but the check target runs on everything. The target is not part of `check-offload` and does not get propagated to the top level build. I'm not sure if either of these things are desirable, but I'm happy to look into it if we want. Also remove the `offload/unittests/Plugins` test as it's dead code and doesn't build.	2025-04-29 11:21:59 -05:00
Joseph Huber	92bba68634	[Offload] Fix handling of 'bare' mode when environment missing (#136794 ) Summary: We treated the missing kernel environment as a unique mode, but it was kind of this random bool that was doing the same thing and it explicitly expects the kernel environment to be zero. It broke after the previous change since it used to default to SPMD and didn't handle zero in any of the other cases despite being used. This fixes that and queries for it without needing to consume an error.	2025-04-23 08:16:39 -05:00
Callum Fare	800d949bb3	[Offload] Implement the remaining initial Offload API (#122106 ) Implement the complete initial version of the Offload API, to the extent that is usable for simple offloading programs. Tested with a basic SYCL program. As far as possible, these are simple wrappers over existing functionality in the plugins. * Allocating and freeing memory (host, device, shared). * Creating a program * Creating a queue (wrapper over asynchronous stream resource) * Enqueuing memcpy operations * Enqueuing kernel executions * Waiting on (optional) output events from the enqueue operations * Waiting on a queue to finish Objects created with the API have reference counting semantics to handle their lifetime. They are created with an initial reference count of 1, which can be incremented and decremented with retain and release functions. They are freed when their reference count reaches 0. Platform and device objects are not reference counted, as they are expected to persist as long as the library is in use, and it's not meaningful for users to create or destroy them. Tests have been added to `offload.unittests`, including device code for testing program and kernel related functionality. The API should still be considered unstable and it's very likely we will need to change the existing entry points.	2025-04-22 13:27:50 -05:00
Joseph Huber	5eabececb0	[Offload] Fix JIT test	2025-04-18 12:01:04 -05:00
Joseph Huber	6c5f50f186	[Offload] Fix typo on `-Xoffload-linker`	2025-04-18 10:47:45 -05:00
Joseph Huber	db0f754c5a	[OpenMP] Remove 'libomptarget.devicertl.a' fatbinary and use static library (#126143 ) Summary: Currently, we build a single `libomptarget.devicertl.a` which is a fatbinary. It is a host object file that contains the embedded archive files for both the NVIDIA and AMDGPU targets. This was done primarily as a convenience due to naming conflicts. Now that the clang driver for the GPU targets can appropriate link via the per-target runtime-dir, we can just make two separate static libraries and remove the indirection. This patch creates two new static libraries that get installed into ``` /lib/amdgcn-amd-amdhsa/libomp.a /lib/nvptx64-nvidia-cuda/libomp.a ``` for AMDGPU and NVPTX respectively. The link job created by the linker wrapper now simply needs to do `-lomp` and it will search those directories and link those static libraries. This requires far less special handling. This patch is a precursor to changing the build system entirely to be a runtimes based one. Soon this target will be a standard `add_library` and done through the GPU runtime targets. NOTE that this actually does remove an additional optimization step. Previously we merged all of the files into a single bitcode object and forcibly internalized some definitions. This, instead, just treats them like a normal static library. This may possibly affect performance for some files, but I think it's better overall to use static library semantics because it allows us to have an 'include-what-you-use' relationship with the library. Performance testing will be required. If we really need the merged blob then we can simply pack that into a new static library.	2025-04-18 07:43:31 -05:00
agozillon	b2c9a58b8f	[Flang][OpenMP][MLIR] Check for presence of Box type before emitting store in MapInfoFinalization pass (#135477 ) Currently we don't check for the presence of descriptor/BoxTypes before emitting stores which lower to memcpys, the issue with this is that users can have optional arguments, where they don't provide an input, making the argument effectively null. This can still be mapped and this causes issues at the moment as we'll emit a memcpy for function arguments to store to a local variable for certain edge cases, when we perform this memcpy on a null input, we cause a segfault at runtime. The fix to this is to simply create a branch around the store that checks if the data we're copying from is actually present. If it is, we proceed with the store, if it isn't we skip it.	2025-04-14 17:15:56 +02:00
Joseph Huber	2f41fa387d	[AMDGPU] Fix code object version not being set to 'none' (#135036 ) Summary: Previously, we removed the special handling for the code object version global. I erroneously thought that this meant we cold get rid of this weird `-Xclang` option. However, this also emits an LLVM IR module flag, which will then cause linking issues.	2025-04-10 11:31:21 -05:00
Zequan Wu	78b21ddba7	Revert "Reland "Symbolize line zero as if no source info is available (#124846 )" (#133798 )" This reverts commit `3483740289` because #128619 doesn't handle the case when we have an empty frame from `getInliningInfoForAddress` because line num is 0 which makes it non-differentiable from missing debug info. So, we end up using the base filename from symtab again. Reverting for now until that issus is solved.	2025-04-09 18:09:31 -07:00
Joel E. Denny	ad9f6d3cee	[PGO][Offload] Use %profdata in PGO tests (#135015 ) So that the wrong llvm-profdata is not picked up from PATH.	2025-04-09 10:40:46 -04:00
Jan Leyonberg	fbc8335311	[MLIR][OpenMP] Add codegen for teams reductions (#133310 ) This patch adds the lowering of teams reductions from the omp dialect to LLVM-IR. Some minor cleanup was done in clang to remove an unused parameter.	2025-04-07 12:47:16 -04:00
Zequan Wu	3483740289	Reland "Symbolize line zero as if no source info is available (#124846 )" (#133798 ) This land commits `23aca2f88d` and `1b15a89a23`. https://github.com/llvm/llvm-project/pull/128619 makes symbolizer to always use debug info when available so we can reland this chagnge.	2025-03-31 19:13:46 -04:00
Ethan Luis McDonough	0c81105373	[PGO][Offload] Disable PGO on NVPTX (#133522 )	2025-03-28 16:32:32 -05:00
macurtis-amd	21a8c63cdc	[offload] Remove bad assert in StaticLoopChunker::Distribute (#132705 ) When building with asserts enabled, this can actually cause strange miscompilations because an incorrect llvm.assume is generated at the point of the assertion.	2025-03-28 04:53:00 -05:00
Ethan Luis McDonough	c50d39f073	[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268 ) This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on https://github.com/llvm/llvm-project/pull/93365. This PR makes the following changes: - Allows PGO flags to be supplied to GPU targets - Pulls version global from device - Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to allow the PGO version to be overridden	2025-03-19 19:01:38 -05:00
Krzysztof Parzyszek	f4fc2d731c	[flang][OpenMP] Map ByRef if size/alignment exceed that of a pointer (#130832 ) Improve the check for whether a type can be passed by copy. Currently, passing by copy is done via the OMP_MAP_LITERAL mapping, which can only transfer as much data as can be contained in a pointer representation.	2025-03-12 19:41:11 -05:00
Krzysztof Parzyszek	d67947162f	[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568 ) The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically, `has_device_addr(x)` means that (in C/C++ terms) `&x` is a device address. When entering a target region, `x` does not need to be allocated on the device, or have its contents copied over (in the absence of additional mapping clauses). Passing its address verbatim to the region for use is sufficient, and is the intended goal of the clause. Some Fortran objects use descriptors in their in-memory representation. If `x` had a descriptor, both the descriptor and the contents of `x` would be located in the device memory. However, the descriptors are managed by the compiler, and can be regenerated at various points as needed. The address of the effective descriptor may change, hence it's not safe to pass the address of the descriptor to the target region. Instead, the descriptor itself is always copied, but for objects like `x`, no further mapping takes place (as this keeps the storage pointer in the descriptor unchanged). --------- Co-authored-by: Sergio Afonso <safonsof@amd.com>	2025-03-10 08:11:01 -05:00
agozillon	f1178815d2	[Flang][OpenMP][MLIR] Implement close, present and ompx_hold modifiers for Flang maps (#129586 ) This PR adds an initial implementation for the map modifiers close, present and ompx_hold, primarily just required adding the appropriate map type flags to the map type bits. In the case of ompx_hold it required adding the map type to the OpenMP dialect. Close has a bit of a problem when utilised with the ALWAYS map type on descriptors, so it is likely we'll have to make sure close and always are not applied to the descriptor simultaneously in the future when we apply always to the descriptors to facilitate movement of descriptor information to device for consistency, however, we may find an alternative to this with further investigation. For the moment, it is a TODO/Note to keep track of it.	2025-03-07 22:22:30 +01:00
Jan Patrick Lehr	1824bb47c2	[Offload][OpenMP] Fix check-prefix (#128599 )	2025-02-25 00:32:27 +01:00
Zequan Wu	1b15a89a23	Revert "[Offload] Fix assumptions on symbols after #124846 (#126238 )" The dependency commit was reverted at `23aca2f88d`. Reverting this as well.	2025-02-24 13:30:54 -08:00
Fabian Ritter	a2f9ae1421	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (#125826 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631 and SWDEV-512633	2025-02-19 09:56:04 +01:00
Akash Banerjee	785a5b4676	[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (#124746 ) This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare Mapper directive. Since both MLIR and Clang now support custom mappers, I've changed the respective function params to no longer be optional as well. Depends on #121005	2025-02-18 17:55:48 +00:00
Joseph Huber	1435c8ed95	Reapply "[LinkerWrapper] Clean up options after proper forwarding" (#126495 ) Summary: The test failed because it no longer passed Rpass by default without LTO. I think that's desirable as it matches the standard behavior. This reverts commit `6fd99de318`.	2025-02-14 09:56:46 -06:00
Ethan Luis McDonough	52ee06d273	[PGO][Offload] Fix pgo1.c (#126864 ) pgo1.c had outdated test checks	2025-02-12 00:54:31 -06:00
Ethan Luis McDonough	9e5c136d5a	[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (#93365 ) This pull request is the second part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #76587. This PR makes the following changes: - Introduces `__llvm_write_custom_profile` to PGO compiler-rt library. This is an external function that can be used to write profiles with custom data to target-specific files. - Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so that it can write the collected data to a profraw file. - Adds `PGODump` debug flag and only displays dump when the aforementioned flag is set	2025-02-11 23:30:54 -06:00
Jan Patrick Lehr	191d7d64e5	[Offload] Fix assumptions on symbols after #124846 (#126238 ) In #124846 the symbolizer was changed to ignore 0-column entries, which lead to a slightly different representation in the stack traces. This patch addresses these differences. Not sure if the difference in kernel_trap.c is also a result of this change or not. Can be tracked separate from this, after the bots are back to green.	2025-02-07 13:25:11 +01:00
David Blaikie	14d6e1ebf5	Update test for symbolizer fix	2025-02-06 19:18:20 +00:00
Michał Górny	689ef5fda0	[offload] [test] Use test compiler ID rather than host (#124408 ) Use the test compiler ID to verify whether tests can be run rather than the host compiler. This makes it possible to run tests (with Clang) while the library itself was built with GCC.	2025-02-02 15:55:39 +00:00
Christian Clauss	1f56bb3137	[Offload][NFC] Fix typos discovered by codespell (#125119 ) https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`	2025-01-31 09:35:29 -06:00
agozillon	2428b6ec40	[Flang][MLIR][OpenMP] Fix Target Data if (present(...)) causing LLVM-IR branching error (#123771 ) Currently if we generate code for the below target data map that uses an optional mapping: !$omp target data if(present(a)) map(alloc:a) do i = 1, 10 a(i) = i end do !$omp end target data We yield an LLVM-IR error as the branch for the else path is not generated. This occurs because we enter the NoDupPriv path of the call back function when generating the else branch, however, the emitBranch function needs to be set to a block for it to functionally generate and link in a follow up branch. The NoDupPriv path currently doesn't do this, while it's not supposed to generate anything (as far as I am aware) we still need to at least set the builders placement back so that it emits the appropriate follow up branch. This avoids the missing terminator LLVM-IR verification error by correctly generating the follow up branch.	2025-01-30 17:33:36 +01:00
agozillon	e0054e984c	[MLIR][OpenMP] Emit nullary check for mapped pointer members and appropriate size select based on results (#124604 ) This PR aims to fix a mapping error when trying to map nullary elements of a record type (primary example is allocatables/pointer types in Fortran at the moment). This should be legal to map, just not write to without pointing to anything within the target region. A common Fortran OpenMP idiom/example where this is useful can be found in the added Fortran offload example. The runtime error arises when we try to map the pointer member utilising a prescribed constant size that we receive from the lowered type, resulting in mapping of data that will be non-existent when there is no allocated data. The fix in this case is to emit a runtime check to see if the data has been allocated, if it hasn't been we select a size of 0, if it has we emit the usual type size.	2025-01-29 17:51:33 +01:00
Joseph Huber	13dcc95dcd	[Offload] Rework offloading entry type to be more generic (#124018 ) Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.	2025-01-28 07:26:13 -06:00
Joseph Huber	134401deea	[Offload] Move RPC server handling to a dedicated thread (#112988 ) Summary: Handling the RPC server requires running through list of jobs that the device has requested to be done. Currently this is handled by the thread that does the waiting for the kernel to finish. However, this is not sound on NVIDIA architectures and only works for async launches in the OpenMP model that uses helper threads. However, we also don't want to have this thread doing work unnnecessarily. For this reason we track the execution of kernels and cause the thread to sleep via a condition variable (usually backed by some kind of futex or other intelligent sleeping mechanism) so that the thread will be idle while no kernels are running.	2025-01-24 11:36:45 -06:00
Joseph Huber	723a3e746a	[OpenMP] Fix mispelled attribute and warning Summary: This is spelled `ompx_aligned_barrier` when used directly, but wasn't included in the list of known assumptions. Fix that so now th test works.	2025-01-20 08:40:19 -06:00
Jinsong Ji	8d1d67ec4d	[Offload][PGO] Fix dump of array in ProfData (#122039 ) Exposed by -Warray-bounds: In file included from ../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:252: ../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1: error: array index 4 is past the end of the array (that has type 'const std::remove_const<const uint16_t>::type[4]' (aka 'const unsigned short[4]')) [-Werror,-Warray-bounds] 109 \| INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \ \| ^ ~~~~~~~~~~~ ../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:250:15: note: expanded from macro 'INSTR_PROF_DATA' 250 \| outs() << ProfData.Name << " "; \ \| ^ ~~~~ ../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1: note: array 'NumValueSites' declared here 109 \| INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \ \| ^ ../../../../../../../llvm/offload/plugins-nextgen/common/include/GlobalHandler.h:62:3: note: expanded from macro 'INSTR_PROF_DATA' 62 \| std::remove_const<Type>::type Name; Avoid accessing out-of-bound data, but skip printing array data for now. As there is no simple way to do this without hardcoding the NumValueSites field. --------- Co-authored-by: Ethan Luis McDonough <ethanluismcdonough@gmail.com>	2025-01-14 15:46:27 -05:00

1 2 3

137 Commits