intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-26 12:26:52 +08:00

Author	SHA1	Message	Date
Akash Banerjee	a67b6e1635	Fix typo in test.	2024-09-06 14:23:06 +01:00
Johannes Doerfert	08533a3ee8	[Offload][NFC] Reorganize `utils::` and make Device/Host/Shared clearer (#100280 ) We had three `utils::` namespaces, all with different "meaning" (host, device, hsa_utils). We should, when we can, keep "include/Shared" accessible from host and device, thus RefCountTy has been moved to a separate header. `hsa_utils` was introduced to make `utils::` less overloaded. And common functionality was de-duplicated, e.g., `utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type punning now checks for the size of the result to make sure it matches the source type. No functional change was intended.	2024-09-05 13:36:26 -07:00
Akash Banerjee	142433684a	[OpenMP][Flang] Fix dynamic-extent array mapping (#107247 ) This patch fixes the mapping and lowering of arrays with dynamic extents and adds a new test for the same. The fix discards the incomplete the dynamic extent information and replacing it with just the base type. When lowering to llvm later, the bounds information is used instead.	2024-09-05 12:44:10 +01:00
Jan Patrick Lehr	1a0cf245ac	[Offload] Change x86_64-pc-linux to x86_64-unknown-linux (#107023 ) It appears that the RUNTIMES build prefers the x86-64-unknown-linux-gnu triple notation for the host. This fixes runtime / test breakages when compiler-rt is used as the CLANG_DEFAULT_RTLIB.	2024-09-03 14:25:33 +02:00
WÁNG Xuěruì	9adf81182e	[Offload] Fix stray libomptarget message helper calls (#106837 ) In #92581 the `LibomptargetUitls.cmake` helpers have been removed, but only uses of `libomptarget_say` were migrated. Migrate the remaining few warning and error messages so the `check-offload` target would not fail due to missing `libomptarget_warning_say`. While at it, update the `check-offload` unavailability message to say `check-offload` instead of `check-libomptarget`. Fixes #92581	2024-08-31 07:06:41 -05:00
WÁNG Xuěruì	75545b3449	[Offload] Fix disabling of cuda target on unsupported platforms (#106835 ) The target name and the message are wrong -- both should say "cuda" for the filtering to work. Fixes commit `300e5b9114` (#93186).	2024-08-31 07:06:17 -05:00
agozillon	f4cf93fb50	[Flang][OpenMP] Align map clause generation and fix issue with non-shared allocations for assumed shape/size descriptor types (#97855 ) This PR aims to unify the map argument generation behavior across both the implicit capture (captured in a target region) and the explicit capture (process map), currently the varPtr field of the MapInfo for the same variable will be different depending on how it's captured. This PR tries to align that across the generations of MapInfoOp in the OpenMP lowering. Currently, I have opted to utilise the rawInput (input memref to a HLFIR DeclareInfoOp) as opposed to the addr field which includes more information. The side affect of this is that we have to deal with BoxTypes less often, which will result in simpler maps in these cases. The negative side affect of this is that we don't have access to the bounds information through the resulting value, however, I believe the bounds information we require in our case is still appropriately stored in the map bounds, and this seems to be the case from testing so far. The other fix is for cases where we end up with a BoxType argument into a function (certain assumed shape and sizes cases do this) that has no fir.ref wrapping it. As we need the Box to be a reference type to actually utilise the operation to access the base address stored inside and create the correct mappings we currently generate an intermediate allocation in these cases, and then store into it, and utilise this as the map argument, as opposed to the original. However, as we were not sharing the same intermediate allocation across all of the maps for a variable, this resulted in errors in certain cases when detatching/attatching the data e.g. via enter and exit. This PR adjusts this for cases Currently we only maintain tracking of all intermediate allocations for the current function scope, as opposed to module. Primarily as the only case I am aware of that this is required is in cases where we pass certain types of arguments to functions (so I opted to minimize the overhead of the pass for now). It could likely be extended to module scope if required if we find other cases where it's applicable and causing issues.	2024-08-23 19:48:43 +02:00
Ethan Luis McDonough	fde2d23ee2	[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587 ) (#102691 ) This pull request is a revised version of #76587. This pull request fixes some build issues that were present in the previous version of this change. > This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: > > - Adds blank registration functions to device RTL > - Gives PGO globals protected visibility when targeting a supported GPU > - Handles any addrspace casts for PGO calls > - Implements PGO global extraction in GPU plugins (currently only dumps info) > > These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-08-22 01:10:54 -05:00
Johannes Doerfert	3b7611594f	[Offload] Improve error reporting on memory faults (#104254 ) Since we can already track allocations, we can diagnose memory faults to some degree. If the fault happens in a prior allocation (use after free) or "close but outside" one, we can provide that information to the user. Note that the fault address might be page aligned, and not all accesses trigger a fault, especially for allocations that are backed by a MemoryManager. Still, if people disable the MemoryManager or the allocation is big enough, we can sometimes provide valueable feedback.	2024-08-21 10:01:35 -07:00
Joseph Huber	e96146cd46	[OpenMP] Temporarily disable test to keep bots green Summary: This test mysteriously fails on the bots but not locally, disable until I can figure out why.	2024-08-20 15:16:05 -05:00
Joseph Huber	e0326b668e	[OpenMP] Map `omp_default_mem_alloc` to global memory (#104790 ) Summary: Currently, we assign this to private memory. This causes failures on some SOLLVE tests. The standard isn't clear on the semantics of this allocation type, but there seems to be a consensus that it's supposed to be shared memory.	2024-08-20 12:00:41 -05:00
Fabian Mora	cfc76b6498	[llvm][offload] Move AMDGPU offload utilities to LLVM (#102487 ) This patch moves utilities from `offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h` to `llvm/Frontend/Offloading/Utility.h` to be reused by other projects. Concretely the following changes were made: - Rename `KernelMetaDataTy` to `AMDGPUKernelMetaData`. - Remove unused fields `KernelObject`, `KernelSegmentSize`, `ExplicitArgumentCount` and `ImplicitArgumentCount` from `AMDGPUKernelMetaData`. - Return the produced error if `ELFObj.sections()` failed instead of using `cantFail`. - Added `AGPRCount` field to `AMDGPUKernelMetaData`. - Added a default invalid value to all the fields in `AMDGPUKernelMetaData`.	2024-08-20 09:03:06 -04:00
estewart08	ea8bb4d633	[offload] - Fix issue with standalone debug offload build (#104647 ) Error: CommandLine Error: Option 'attributor-manifest-internal' registered more than once During the standalone debug build of offload the above error is seen at app runtime when using a prebuilt llvm with LLVM_LINK_LLVM_DYLIB=ON. This is caused by linking both libLLVM.so and various archives that are found via llvm_map_components_to_libnames for jit support.	2024-08-19 17:59:21 -05:00
Joseph Huber	161e250add	[OpenMP] Fix buildbot failing on allocator test	2024-08-14 13:56:12 -05:00
Joseph Huber	74d23f15b6	[OpenMP] Implement 'omp_alloc' on the device (#102526 ) Summary: The 'omp_alloc' function should be callable from a target region. This patch implemets it by simply calling `malloc` for every non-default trait value allocator. All the special access modifiers are unimplemented and return null. The null allocator returns null as the spec states it should not be usable from the target.	2024-08-14 13:38:55 -05:00
Johannes Doerfert	ff12c0061b	[Offload] Ensure to load images when the device is used (#103002 ) When we use the device, e.g., with an API that interacts with it, we need to ensure the image is loaded and the constructors are executed. Two tests are included to verify we 1) load images and run constructors when needed, and 2) we do so lazily only if the device is actually used. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-08-13 14:41:26 -07:00
Joseph Huber	dcc27ea41e	[LinkerWrapper] Always pass `-flto` if the linker supports it (#102972 ) Summary; Now that we use the linker to do LTO / device linking, we need to inform the `clang` invocation to use `-flto` so it forwards arguments like `-On` correctly.	2024-08-13 11:23:55 -05:00
Johannes Doerfert	3f9c9acedd	[Offload] Add the right paths to the CUDA lit tests (#102997 )	2024-08-12 20:34:33 -07:00
Johannes Doerfert	80525dfcde	[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549 ) Through the new `-foffload-via-llvm` flag, CUDA kernels can now be lowered to the LLVM/Offload API. On the Clang side, this is simply done by using the OpenMP offload toolchain and emitting calls to `llvm` functions to orchestrate the kernel launch rather than `cuda` functions. These `llvm` functions are implemented on top of the existing LLVM/Offload API. As we are about to redefine the Offload API, this wil help us in the design process as a second offload language. We do not support any CUDA APIs yet, however, we could: https://www.osti.gov/servlets/purl/1892137 For proper host execution we need to resurrect/rebase https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf (which was designed for debugging). ``` ❯❯❯ cat test.cu extern "C" { void llvm_omp_target_alloc_shared(size_t Size, int DeviceNum); void llvm_omp_target_free_shared(void DevicePtr, int DeviceNum); } __global__ void square(int A) { A = 42; } int main(int argc, char argv) { int DevNo = 0; int Ptr = reinterpret_cast<int >(llvm_omp_target_alloc_shared(4, DevNo)); Ptr = 7; printf("Ptr %p, Ptr %i\n", Ptr, Ptr); square<<<1, 1>>>(Ptr); printf("Ptr %p, Ptr %i\n", Ptr, Ptr); llvm_omp_target_free_shared(Ptr, DevNo); } ❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native ❯❯❯ llvm-objdump --offloading test123 test123: file format elf64-x86-64 OFFLOADING IMAGE [0]: kind elf arch gfx90a triple amdgcn-amd-amdhsa producer openmp ❯❯❯ LIBOMPTARGET_INFO=16 ./test123 Ptr 0x155448ac8000, Ptr 7 Ptr 0x155448ac8000, Ptr 42 ```	2024-08-12 17:44:58 -07:00
jyu2-git	a43677c172	Test faild with amd. (#101781 ) Add unspport. This is relate #101101	2024-08-02 17:53:23 -07:00
jyu2-git	d8b61dd84b	[OpenMP] Generate implicit default mapper for mapping array section. (#101101 ) This is only for struct containing nested structs with user defined mappers. Add four functions: 1>buildImplicitMap: build map for default mapper 2>buildImplicitMapper: build default mapper. 3>hasUserDefinedMapper for given mapper name and mapper type, lookup user defined map, if found one return true. 4>isImplicitMapperNeeded check if Mapper is needed During create map, in checkMappableExpressionList, call isImplicitMapperNeeded when it return true, call buildImplicitMapper to generate implicit mapper and added to map clause. https://github.com/llvm/llvm-project/pull/101101	2024-08-02 17:22:40 -07:00
Johannes Doerfert	f3bfc56327	[Offload][OpenMP] Prettify error messages by "demangling" the kernel name (#101400 ) The kernel names for OpenMP are manually mangled and not ideal when we report something to the user. We demangle them now, providing the function and line number of the target region, together with the actual kernel name.	2024-08-01 15:24:15 -07:00
Johannes Doerfert	9a1013220b	[Offload] Allow to record kernel launch stack traces (#100472 ) Similar to (de)allocation traces, we can record kernel launch stack traces and display them in case of an error. However, the AMD GPU plugin signal handler, which is invoked on memroy faults, cannot pinpoint the offending kernel. Insteade print `<NUM>`, set via `OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>`, many traces. The recoding/record uses a ring buffer of fixed size (for now 8). For `trap` errors, we print the actual kernel name, and trace if recorded.	2024-07-31 11:49:50 -07:00
Johannes Doerfert	c95abe94ae	[Offload] Implement double free (and other allocation error) reporting (#100261 ) As a first step towards a GPU sanitizer we now can track allocations and deallocations in order to report double frees, and other problems during deallocation.	2024-07-30 10:10:57 -07:00
Joseph Huber	363c1e6e51	[OpenMP] Re-enable test after correctly forwarding `mllvm`	2024-07-29 10:06:54 -05:00
Joseph Huber	dbb8b7a0f4	Reapply "[OpenMP][libc] Remove special handling for OpenMP printf (#98940 )" This reverts commit `fea5914c92`.	2024-07-26 17:21:56 -05:00
Joseph Huber	fea5914c92	Revert "[OpenMP][libc] Remove special handling for OpenMP printf (#98940 )" This reverts commit `069e8bcd82`. Summary: Some tests failing, revert this for now.	2024-07-26 16:39:12 -05:00
Joseph Huber	069e8bcd82	[OpenMP][libc] Remove special handling for OpenMP printf (#98940 ) Summary: Currently there are several layers to handle `printf`. Since we now have varargs and an implementation of `printf` this can be heavily simplified. 1. The frontend renames `printf` into `omp_vprintf` and gives it an argument buffer. Removing 1. triggered some code in the AMDGPU backend menat for HIP / OpenCL, so I hadded an exception to it. 2. Forward this to CUDA vprintf or ignore it. We no longer need special handling for it since we have varargs. So now we just forward this to CUDA vprintf if we have libc, otherwise just leave `printf` as an external function and expect that `libc` will be linked in.	2024-07-26 16:03:36 -05:00
Joseph Huber	7ebd97b852	[OpenMP] Do not define '__assert_fail' if we have the GPU libc (#100409 ) Summary: The C library is intended to provide `__assert_fail`, so in the cases that we have both we should defer to that. This means that if you build the C library for GPUs you'll get the RPC based asser, and if not you'll get the trap based one.	2024-07-26 15:18:10 -05:00
Johannes Doerfert	7102592af7	[Offload] Repair and rename `llvm-omp-device-info` (to `-offload-`) (#100309 ) The `llvm-omp-device-info` tool is very handy, but broke due to the lazy evaluation of devices. This repairs the functionality and adds a test. The tool is also renamed into `llvm-offload-device-info` as `-omp-` is going away.	2024-07-24 09:35:09 -07:00
Joseph Huber	8d8fa01a66	Reapply "[libc] Remove 'packaged' GPU build support (#100208 )" This reverts commit `550b83d658`.	2024-07-24 10:24:53 -05:00
Johannes Doerfert	3c8efd7928	[OpenMP] Ensure the actual kernel is annotated with launch bounds (#99927 ) In debug mode there is a wrapper (the kernel) around the function in which we generate the kernel code. We worked around this before to get the correct kernel name, but now we really distinguish both to attach the launch bounds to the kernel, not the inner function.	2024-07-23 09:02:47 -07:00
Joseph Huber	4854e25359	[Offload] Re-enable tests that are now passing Summary: Some recent patches made these stop failing so the XFAIL now makes the bots go red. Fixes https://github.com/llvm/llvm-project/issues/98903	2024-07-23 10:56:55 -05:00
Shilei Tian	41f6599ae1	[NFC][Offload] Move variables to where they are used (#99956 )	2024-07-22 19:52:16 -04:00
Pranav Bhandarkar	d7e185cca9	[OMPIRBuilder] - Handle dependencies in `createTarget` (#93977 ) This patch handles dependencies specified by the `depend` clause on an OpenMP target construct. It does this much the same way clang does it by materializing an OpenMP `task` that is tagged with the dependencies. The following functions are relevant to this patch - 1) `createTarget` - This function itself is largely unchanged except that it now accepts a vector of `DependData` objects that it simply forwards to `emitTargetCall` 2) `emitTargetCall` - This function has changed now to check if an outer target-task needs to be materialized (i.e if `target` construct has `nowait` or has `depend` clause). If yes, it calls `emitTargetTask` to do all the heavy lifting for creating and dispatching the task. 3) `emitTargetTask` - Bulk of the change is here. See the large comment explaining what it does at the beginning of this function	2024-07-22 10:56:45 -05:00
Jan Patrick Lehr	caaf8099ef	[Offload][OMPT] Add callbacks for (dis)associate_ptr (#99046 ) This adds the OMPT callbacks for the API functions disassociate_ptr and associate_ptr.	2024-07-17 10:15:19 +02:00
Jan Patrick Lehr	4ed0f84d38	[Offload] XFAIL four tests while working on fix (#98899 ) omp_dynamic_shared_memory_mixed_amdgpu.c omp_dynamic_shared_memory_amdgpu.c amdgcn-amd-amdhsa::bug51982.c amdgcn-amd-amdhsa::bug51781.c	2024-07-15 15:45:59 +02:00
Jinsong Ji	6556ba66b2	[Offload][test]Fix typo of requires (#98327 ) Typos in `8823448807`.	2024-07-10 10:51:47 -04:00
jyu2-git	32f7672acc	[Clang][OpenMP] This is addition fix for #92210 . (#94802 ) Fix another runtime problem when explicit map both pointer and pointee in target data region. In #92210, problem is only addressed in target region, but missing for target data region. The change just passing AreBothBasePtrAndPteeMapped in generateInfoForComponentList when processing target data. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-07-03 20:56:53 -07:00
Joseph Huber	3c50cbfda4	[DeviceRTL] Make defined 'libc' functions weak in OpenMP (#97356 ) Summary: These functions provide special-case implementations internal to the OpenMP device runtime. This can potentially conflict with the symbols pulled in from the actual GPU `libc`. This patch makes these weak, so in the case that the GPU libc functions exist they will be overridden. This should not impact performance in the average case because the old `-mlink-builtin-bitcode` version does internalization, deleting weak, and the new LTO path will resolve to the strong reference and then internalize it.	2024-07-02 13:23:53 -05:00
dhruvachak	946f5d111d	[OpenMP] [OMPT] Callback registration should not depend on the device init callback. (#96371 ) Even if the device init callback is not registered, a tool should be allowed to register other callbacks.	2024-07-01 10:07:05 -07:00
Gheorghe-Teodor Bercea	1a478a69bc	[OpenMP][offload] Fix dynamic schedule tracking (#97065 ) This patch fixes the dynamic schedule tracking.	2024-07-01 10:23:11 -04:00
Ethan Luis McDonough	8823448807	[Offload] Refactor offload test requirements (#95196 ) Many tests in the `offload` project have requirements defined by which targets are not supported rather than which platforms are supported. This patch aims to streamline the requirement definitions by adding four new feature tags: `host`, `gpu`, `amdgpu`, and `nvidiagpu`.	2024-06-29 00:56:18 -05:00
Ethan Luis McDonough	2c8b912f63	Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587 )" This reverts commit `5fd2af38e4`. It caused build issues and broke the buildbot.	2024-06-28 12:30:45 -05:00
Ethan Luis McDonough	5fd2af38e4	[PGO][OpenMP] Instrumentation for GPU devices (#76587 ) This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: - Adds blank registration functions to device RTL - Gives PGO globals protected visibility when targeting a supported GPU - Handles any addrspace casts for PGO calls - Implements PGO global extraction in GPU plugins (currently only dumps info) These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-06-28 10:42:19 -05:00
agozillon	aec735cf47	[Flang][OpenMP][MLIR] Fix common block mapping for regular and declare target link (#91829 ) This PR attempts to fix common block mapping for regular mapping of these types as well as when they have been marked as "declare target link". This PR should allow correct mapping of both the members of a common block and the full common block via its block symbol. The main changes were some adjustments to the Fortran OpenMP lowering to HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and adjustments to the way the we handle target kernel map argument rebinding inside of the OMPIRBuilder. For the Fortran OpenMP lowering were two changes, one to prevent the implicit capture of common block members when the common block symbol itself has been marked and the other creates intermediate member access inside of the target region to be used in-place of those external to the target region, this prevents external usages breaking the IsolatedFromAbove pact. In the latter case, there was an adjustment to the size calculation for types to better handle cases where we pass an array as the type of a map (as opposed to the bounds and the type of the element), which occurs in the case of common blocks. There is also some adjustment to how handleDeclareTargetMapVar handles renaming of declare target symbols in the module to the reference pointer, now it will only apply to those within the kernel that is currently being generated and we also perform a modification to replace constants with instructions as necessary as we cannot replace these with our reference pointer (non-constant and constants do not mix nicely). In the case of the OpenMPIRBuilder some changes were made to defer global symbol rebinding to kernel arguments until all other arguments have been rebound. This makes sure we do not replace uses that may refer to the global (e.g. a GEP) but are themselves actually a separate argument that needs bound. Currently "declare target to" still needs some work, but this may be the case for all types in conjunction with "declare target to" at the moment.	2024-06-25 20:54:04 +02:00
Joseph Huber	3de162fee4	[Offload] Fix using old deprecated CUDA root variable (#96307 ) Summary: This variable isn't being set properly since we moved to the new way to find the CUDA directory. That means this variable was just unset the whole time. This patch adds it in by calculating it using the binary directory so it can be passed to `--cuda-path`.	2024-06-21 10:08:05 -05:00
Wu Yingcong	89841137fb	[offload][cmake] always define pythonize_bool macro (#96028 ) I use the following cmake config to build offload and openmp ``` cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_PROJECTS="clang;openmp" -DLLVM_ENABLE_RUNTIMES="offload" -DLLVM_LIT_ARGS="-vv -a" -DLLVM_ENABLE_ASSERTIONS=ON ../llvm ``` and got the following error: ``` CMake Error at /tmp/build-llvm/llvm/offload/CMakeLists.txt:321 (pythonize_bool): Unknown CMake command "pythonize_bool". ``` After some search I find out that the "correct" way to build this is putting openmp and offload to the ENABLE_RUNTIMES like ``` cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES="openmp;offload" -DLLVM_LIT_ARGS="-vv -a" -DLLVM_ENABLE_ASSERTIONS=ON ../llvm ``` . But since we don't forbid to config them using openmp as PROJECT and offload as RUNTIME, then we probably support it. The fix is to always define the pythonize_bool macro. For cmake, it is okay to redefine a macro, it does not cause a warning or else.	2024-06-20 07:00:19 -05:00
Joseph Huber	8043356380	[Offload] Change HSA header search order (#95769 ) Summary: The HSA headers existed previously in `include/hsa.h` and were moved to `include/hsa/hsa.h` in a later ROCm version. The include headers here were originally designed to favor a newer one. However, this unintentionally prevented the dyanmic HSA's `hsa.h` from being used if both were present. This patch changes the order so it will be found first. Related to https://github.com/llvm/llvm-project/pull/95484.	2024-06-17 14:52:50 -05:00
Tim Gymnich	597d2f7662	[OpenMP] Add Environment Variable to disable Reuse of Blocks for High Loop Trip Counts (#89239 ) Sometimes it might be beneficial to spawn more thread blocks instead of reusing existing for multiple loop iterations. Alternatives considered: Make `DefaultNumBlocks` settable via an environment variable. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-06-14 07:35:23 -07:00

1 2 3

112 Commits