Reapplication of #137828, changes:
* Workaround CMAKE_Fortran_PREPROCESS_SOURCE issue for CMake < 2.24: The
issue is that `try_compile` does not forward manually-defined compiler
flang variables to the test build environment; instead of just a
negative test result, it aborts the configuration step itself. To be
fair, manually defining these variables is deprecated since at least
CMake 3.6.
* Missing flang cmd line flags for CMake < 3.28 `-target=`, `-O2`, `-O3`
* It is now possible to set FLANG_RT_ENABLED_STATIC=OFF and
FLANG_RT_ENABLE_SHARED=OFF at the same and is the default for amdgpu and
nvptx targets. In this mode, only the .mod files are compiled --
necessary for module files in
lib/clang/22/finclude/flang/(nvptx64-nvidia-cuda|amdgpu-amd-amdhsa)/*.mod
to be available.
* For compiling omp_lib.mod for nvptx and amdgpu, the module build
functionality must be hoisted out if openmp's runtime/ directory which
is only included for host targets. This PR now requires #169909.
Move building the .mod files from openmp/flang to openmp/flang-rt using
a shared mechanism. Motivations to do so are:
1. Most modules are target-dependent and need to be re-compiled for each
target separately, which is something the LLVM_ENABLE_RUNTIMES system
already does. Prime example is `iso_c_binding.mod` which encodes the
target's ABI. Constants such as [`c_long_double` also have different
values](d748c81218/flang-rt/lib/runtime/iso_c_binding.f90 (L77-L81)).
Most other modules have `#ifdef`-enclosed code as well. For instance
this caused offload targets nvptx64-nvidia-cuda/amdgpu-amd-amdhsa to use
the modules files compiled for the host which may contrain uses of the
types REAL(10) or REAL(16) not available for nvptx/amdgpu.
#146876#128015#129742#158790
3. CMake has support for Fortran that we should use. Among other things,
it automatically determines module dependencies so there is no need to
hardcode them in the CMakeLists.txt.
4. It allows using Fortran itself to implement Flang-RT. Currently, only
`iso_fortran_env_impl.f90` emits object files that are needed by Fortran
applications (#89403). The workaround of #95388 could be reverted (PR
#169525).
If using Flang for cross-compilation or target-offloading, flang-rt must
now be compiled for each target not only for the library, but also to
get the target-specific module files. For instance in a bootstrapping
runtime build, this can be done by adding:
`-DLLVM_RUNTIME_TARGETS=default;nvptx64-nvidia-cuda;amdgpu-amd-amdhsa`.
Some new dependencies come into play:
* openmp depends on flang-rt for building `lib_omp.mod` and
`lib_omp_kinds.mod`. Currently, if flang-rt is not found then the
modules are not built.
* check-flang depends on flang-rt: If not found, the majority of tests
are disabled. If not building in a bootstrpping build, the location of
the module files can be pointed to using
`-DFLANG_INTRINSIC_MODULES_DIR=<path>`, e.g. in a flang-standalone
build. Alternatively, the test needing any of the intrinsic modules
could be marked with `REQUIRES: flangrt-modules`.
* check-flang depends on openmp: Not a change; tests requiring
`lib_omp.mod` and `lib_omp_kinds.mod` those are already marked with
`openmp_runtime`.
As intrinsic are now specific to the target, their location is moved
from `include/flang` to `<resource-dir>/finclude/flang/<triple>`. The
mechnism to compute the location have been moved from flang-rt
(previously used to compute the location of `libflang_rt.*.a`) to common
locations in `cmake/GetToolchainDirs.cmake` and
`runtimes/CMakeLists.txt` so they can be used by both, openmp and
flang-rt. Potentially the mechnism could also be shared by other
libraries such as compiler-rt.
`finclude` was chosen because `gfortran` uses it as well and avoids
misuse such as `#include <flang/iso_c_binding.mod>`. The search location
is now determined by `ToolChain` in the driver, instead of by the
frontend. Another subdirectory `flang` avoids accidental inclusion of
gfortran-modules which due to compression would result in
user-unfriendly errors. Now the driver adds `-fintrinsic-module-path`
for that location to the frontend call (Just like gfortran does).
`-fintrinsic-module-path` had to be fixed for this because ironically it
was only added to `searchDirectories`, but not
`intrinsicModuleDirectories_`. Since the driver determines the location,
tests invoking `flang -fc1` and `bbc` must also be passed the location
by llvm-lit. This works like llvm-lit does for finding the include dirs
for Clang using `-print-file-name=...`.
Extracted out of #169638. The motivation is that we want to build
Fortran module files for device triples (amdgpu-amd-amdhsa and
nvptx64-nvidia-cuda) as well, but the runtimes/ directory is only
included for host devices.
This PR itself should not change anything, including that omp_lib.mod is
only built on host devices triple. Some dependencies used for building
omp_lib.mod are hoisted out of runtimes/ as well. IMHO they all make
sense to hoist, e.g. LIBOMP_VERSION_MAJOR/LIBOMP_VERSION_MINOR should be
usable in the entire OpenMP subproject.
When a critical construct has finished, it will trigger a
critical-released event. If a tool is attached, and the `mutex_released`
callback was registered, the tool with receive an event containing the
`codeptr_ra`, the return address of the callback invocation.
All the way back in 82e94a5934, this
`codeptr_ra` was implemented by calling `__ompt_load_return_address`
with a fixed global thread id of `0`. However, this approach results in
a race-condition, and can yield incorrect results to the tool.
`__ompt_load_return_address(0)` points to the current return address of
the thread 0 in `__kmp_threads`. This thread may already execute some
other construct. A tool might therefore receive the return address of
e.g. some `libomp` internals, or other parts of the user code.
Additionally, a call to `__ompt_load_return_address` resets the
`th.ompt_thread_info.return_address` to `NULL`, therefore also affecting
the return address of thread 0. Another dispatched event, e.g.
parallel-begin might therefore not transfer any `codeptr_ra`.
To fix this, replace the fixed thread id by the `global_tid`, which is
stored just before dispatching the `mutex_released` callback.
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
Move building the .mod files from openmp/flang to openmp/flang-rt using
a shared mechanism. Motivations to do so are:
1. Most modules are target-dependent and need to be re-compiled for each
target separately, which is something the LLVM_ENABLE_RUNTIMES system
already does. Prime example is `iso_c_binding.mod` which encodes the
target's ABI. Most other modules have `#ifdef`-enclosed code as well.
2. CMake has support for Fortran that we should use. Among other things,
it automatically determines module dependencies so there is no need to
hardcode them in the CMakeLists.txt.
3. It allows using Fortran itself to implement Flang-RT. Currently, only
`iso_fortran_env_impl.f90` emits object files that are needed by Fortran
applications (#89403). The workaround of #95388 could be reverted.
Some new dependencies come into play:
* openmp depends on flang-rt for building `lib_omp.mod` and
`lib_omp_kinds.mod`. Currently, if flang-rt is not found then the
modules are not built.
* check-flang depends on flang-rt: If not found, the majority of tests
are disabled. If not building in a bootstrpping build, the location of
the module files can be pointed to using
`-DFLANG_INTRINSIC_MODULES_DIR=<path>`, e.g. in a flang-standalone
build. Alternatively, the test needing any of the intrinsic modules
could be marked with `REQUIRES: flangrt-modules`.
* check-flang depends on openmp: Not a change; tests requiring
`lib_omp.mod` and `lib_omp_kinds.mod` those are already marked with
`openmp_runtime`.
As intrinsic are now specific to the target, their location is moved
from `include/flang` to `<resource-dir>/finclude/flang/<triple>`. The
mechnism to compute the location have been moved from flang-rt
(previously used to compute the location of `libflang_rt.*.a`) to common
locations in `cmake/GetToolchainDirs.cmake` and
`runtimes/CMakeLists.txt` so they can be used by both, openmp and
flang-rt. Potentially the mechnism could also be shared by other
libraries such as compiler-rt.
`finclude` was chosen because `gfortran` uses it as well and avoids
misuse such as `#include <flang/iso_c_binding.mod>`. The search location
is now determined by `ToolChain` in the driver, instead of by the
frontend. Now the driver adds `-fintrinsic-module-path` for that
location to the frontend call (Just like gfortran does).
`-fintrinsic-module-path` had to be fixed for this because ironically it
was only added to `searchDirectories`, but not
`intrinsicModuleDirectories_`. Since the driver determines the location,
tests invoking `flang -fc1` and `bbc` must also be passed the location
by llvm-lit. This works like llvm-lit does for finding the include dirs
for Clang using `-print-file-name=...`.
Clang is adding support for the new `OpenMP transparent` clause on
`task` and `taskloop` directives.
The parsing and semantic handling for this clause is introduced in
https://github.com/llvm/llvm-project/pull/166810 .
To allow the compiler to communicate this clause to the `OpenMP`
runtime, a dedicated bit in `kmp_tasking_flags` is required.
This patch adds a new compiler-reserved bit `transparent` to the`
kmp_tasking_flags` structure.
On AIX, it generates libomp for both static and dynamic. There is no
need to create symbolic links to libomp.so.
---------
Co-authored-by: Xing Xue <xingxue@outlook.com>
When OMPT is enabled, the stack pointer was not saved to frame pointer
register immediately after storing the frame pointer to the stack.
Therefore the frame pointers did not constitute a proper chain.
Fixes [#163352](https://github.com/llvm/llvm-project/issues/163352)
Implementation files using the Intel syntax explicitly specify it.
Do the same for the few files using AT&T syntax.
This also enables building LLVM with `-mllvm -x86-asm-syntax=intel` in one's Clang config files
(i.e. a global preference for Intel syntax).
No functional change intended.
Add support for the standalone OpenMP tile construct:
```f90
!$omp tile sizes(...)
DO i = 1, 100
...
```
This is complementary to #143715 which added support for the tile
construct as part of another loop-associated construct such as
worksharing-loop, distribute, etc.
This change implements the fuse directive, `#pragma omp fuse`, as specified in the OpenMP 6.0, along with the `looprange` clause in clang.
This change also adds minimal stubs so flang keeps compiling (a full implementation in flang of this directive is still pending).
---------
Co-authored-by: Roger Ferrer Ibanez <roger.ferrer@bsc.es>
In addition to existing C/C++ tests, add Fortran-based tests. Fortran
tests will only run if a Fortran compiler is found. The first test is
for the unroll construct added in #144785.
Fix another test impacted by #157364.
On Windows, `GetComputerNameA()`, which is what this ends up calling,
takes an `LPDWORD`, but we were handing it an `int*`; fix this by
declaring it as a `DWORD` instead.
In an LLVM_ENABLE_PROJECTS=openmp build, the LLVM build tree in which
just-built Clang is available, but in contrast to an
LLVM_ENABLE_RUNTIMES=openmp build, is not the compiler that openmp is
built with (CMAKE_CXX_COMPILER). The latter compiler (which might also
be gcc) will not look into the resource directory of just-built Clang,
where the OpenMP headers are installed. There may not even be a
just-built Clang without LLVM_ENABLE_PROJECTS=clang.
We cannot add the OpenMP header output directory to the search path
which also include's Clang's internal headers that will conflict with
CMAKE_CXX_COMPILER's internal headers. The only choice left is to use
what the OpenMP standalone build does: Use CMAKE_CURRENT_BINARY_DIR
which is added unconditionally to the header search path to compile
openmp itself.
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary codegen changes.
Since GCC 15.1, libstdc++ enabled assertions/hardening by default in
non-optimized (-O0) builds [1]. That is, _GLIBCXX_ASSERTIONS is defined
in the libstdc++ headers itself so defining/undefining it on the
compiler command line no longer has an effect in non-optimized builds.
As the commit message[2] suggests, define _GLIBCXX_NO_ASSERTIONS
instead.
For libstdc++ headers before 15.1, -U_GLIBCXX_ASSERTIONS still has to be
on the command line as well.
Defining _GLIBCXX_NO_ASSERTIONS was previously proposed in #152223
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112808
[2] 361d230fd7
Flang currently doesn't build in debug mode on GCC 15 due to missing
dynamic libraries in some CMakeLists.txt files, and OpenMP doesn't link
in debug mode due to the atomic library pulling in libstdc++ despite an
incomplete attempt in the CMakeLists.txt to disable glibcxx assertions.
This PR fixes these issues and allows Flang and the OpenMP runtime to
build and link on GCC 15 in debug mode.
---------
Co-authored-by: ronlieb <ron.lieberman@amd.com>
A CMake change included in CMake 4.0 makes `AIX` into a variable
(similar to `APPLE`, etc.)
ff03db6657
However, `${CMAKE_SYSTEM_NAME}` unfortunately also expands exactly to
`AIX` and `if` auto-expands variable names in CMake. That means you get
a double expansion if you write:
`if (${CMAKE_SYSTEM_NAME} MATCHES "AIX")`
which becomes:
`if (AIX MATCHES "AIX")`
which is as if you wrote:
`if (ON MATCHES "AIX")`
You can prevent this by quoting the expansion of "${CMAKE_SYSTEM_NAME}",
due to policy
[CMP0054](https://cmake.org/cmake/help/latest/policy/CMP0054.html#policy:CMP0054)
which is on by default in 4.0+. Most of the LLVM CMake already does
this, but this PR fixes the remaining cases where we do not.
Using hex format allows to better interpret IDs:
the first digits represent the thread number, the last digits represent
the ID within a thread
The main change is in callback.h: PRIu64 -> PRIx64
The patch also guards RUN/CHECK lines in openmp/runtime/tests/ompt with clang-format on/off comments and clang-formats the directory.
---------
Co-authored-by: Kaloyan Ignatov <kaloyan.ignatov@rwth-aachen.de>
The following patch introduces a new interop interface implementation
with the following characteristics:
* It supports the new 6.0 prefer_type specification
* It supports both explicit objects (from interop constructs) and
implicit objects (from variant calls).
* Implements a per-thread reuse mechanism for implicit objects to reduce
overheads.
* It provides a plugin interface that allows selecting the supported
interop types, and managing all the backend related interop operations
(init, sync, ...).
* It enables cooperation with the OpenMP runtime to allow progress on
OpenMP synchronizations.
* It cleanups some vendor/fr_id mismatchs from the current query
routines.
* It supports extension to define interop callbacks for library cleanup.
This was updated in some earlier commits but was never updated on Darwin
because I was testing locally on Linux and it does not seem like there
are any buildbots testing this configuration. Update it since it should
be trivial and will definitely be broken otherwise.
I did not realize that these were originally in separate folders to
allow for the use of %T. Now that we have switched over to creating dirs
using %t, we can move these into a common folder and make things a
little bit more clean.
These were still passing because I did not clear all the test artifacts
in between so the old ones were still present after updating the test. I
forgot to update a lit substitution which failed on clean builds.
This patch removes all uses of %T from lit tests in OpenMP. %T has been
deprecated for years and is not reccomended given it does not create a
unique dir per test, allowing for race conditions. Remove uses of %T in
OpenMP so we can eventually remove support for it in llvm-lit.
The default build of openmp (`cmake -S <llvm-project>/runtimes
-DLLVM_ENABLE_RUNTIMES=openmp`) current fails with
```
CMake Error at /home/meinersbur/src/llvm/flangrt/_src/cmake/Modules/GetClangResourceDir.cmake:17 (string):
string sub-command REGEX, mode MATCH needs at least 5 arguments total to
command.
Call Stack (most recent call first):
/home/meinersbur/src/llvm/flangrt/_src/openmp/CMakeLists.txt:126 (get_clang_resource_dir)
```
The reason is that because it is not a bootstrapping-build, the clang
resource dir that it intends to write files such as `omp-tools.h` into,
is unavailable. Using the Clang resource dir for writing files is
conceptually broken, as that dir might be located in
`/usr/lib/clang/<version>/`. Writing to it is only intended in
bootstrapping builds where Clang is built alongside openmp.
This patch unifies the identification of being in a bootstrapping built.
The same `LLVM_TREE_AVAILABLE` definition is going to be used in
#137828. No reason for each runtime to define its own.
A lot of these only trip when using sanitizers with the library.
* Insert forgotten free()s
* Change (-1) << amount to 0xffffffffu as left shifting a negative is UB
* Fixup integer parser to return INT_MAX when parsing huge string of
digits. e.g., 452523423423423423 returns INT_MAX
* Fixup range parsing for affinity mask so integer overflow does not
occur
* Don't assert when branch bits are 0, instead warn user that is invalid
and use the default value.
* Fixup kmp_set_defaults() so the C version only uses null terminated
strings and the Fortran version uses the string + size version.
* Make sure the KMP_ALIGN_ALLOC is power of two, otherwise use
CACHE_LINE.
* Disallow ability to set KMP_TASKING=1 (task barrier) this doesn't work
and hasn't worked for a long time.
* Limit KMP_HOT_TEAMS_MAX_LEVEL to 1024, an array is allocated based on
this value.
* Remove integer values for OMP_PROC_BIND. The specification only allows
strings and CSV of strings.
* Fix setting KMP_AFFINITY=disabled + OMP_DISPLAY_AFFINITY=TRUE
Ticket lock has a yield operation (shown below) which degrades
performance on larger server machines due to an unconditional pause
operation.
```
#define KMP_YIELD(cond) \
{ \
KMP_CPU_PAUSE(); \
if ((cond) && (KMP_TRY_YIELD)) \
__kmp_yield(); \
}
```
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary host runtime changes.
Reland https://github.com/llvm/llvm-project/pull/146403. After manual
testing on a gfx90a machine, I could not reproduce the failing test,
which makes it even more likely that the test has just been flaky. (Or
at least that it's not an issue related to this patch.)
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary host runtime changes.