mirror of
https://github.com/intel/llvm.git
synced 2026-01-13 11:02:04 +08:00
[OpenMP][Docs] Added offloading command line reference to OpenMP FAQ
This command adds an OpenMP offloading specific command line reference. The OpenMP FAQ links to the .rst new file. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D156387
This commit is contained in:
committed by
antonrydahl
parent
239777c861
commit
5c0f98cd2a
187
openmp/docs/CommandLineArgumentReference.rst
Normal file
187
openmp/docs/CommandLineArgumentReference.rst
Normal file
@@ -0,0 +1,187 @@
|
||||
OpenMP Command-Line Argument Reference
|
||||
======================================
|
||||
Welcome to the OpenMP in LLVM command line argument reference. The content is
|
||||
not a complete list of arguments but includes the essential command-line
|
||||
arguments you may need when compiling and linking OpenMP.
|
||||
Section :ref:`general_command_line_arguments` lists OpenMP command line options
|
||||
for multicore programming while :ref:`offload_command_line_arguments` lists
|
||||
options relevant to OpenMP target offloading.
|
||||
|
||||
.. _general_command_line_arguments:
|
||||
|
||||
OpenMP Command-Line Arguments
|
||||
-----------------------------
|
||||
|
||||
``-fopenmp``
|
||||
^^^^^^^^^^^^
|
||||
Enable the OpenMP compilation toolchain. The compiler will parse OpenMP
|
||||
compiler directives and generate parallel code.
|
||||
|
||||
``-fopenmp-extensions``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Enable all ``Clang`` extensions for OpenMP directives and clauses. A list of
|
||||
current extensions and their implementation status can be found on the
|
||||
`support <https://clang.llvm.org/docs/OpenMPSupport.html#openmp-extensions>`_
|
||||
page.
|
||||
|
||||
``-fopenmp-simd``
|
||||
^^^^^^^^^^^^^^^^^
|
||||
This option enables OpenMP only for single instruction, multiple data
|
||||
(SIMD) constructs.
|
||||
|
||||
``-static-openmp``
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
Use the static OpenMP host runtime while linking.
|
||||
|
||||
``-fopenmp-version=<arg>``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Set the OpenMP version to a specific version ``<arg>`` of the OpenMP standard.
|
||||
For example, you may use ``-fopenmp-version=45`` to select version 4.5 of
|
||||
the OpenMP standard. The default value is ``-fopenmp-version=50`` for ``Clang``
|
||||
and ``-fopenmp-version=11`` for ``flang-new``.
|
||||
|
||||
.. _offload_command_line_arguments:
|
||||
|
||||
Offloading Specific Command-Line Arguments
|
||||
------------------------------------------
|
||||
|
||||
.. _fopenmp-targets:
|
||||
|
||||
``-fopenmp-targets``
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
| Specify which OpenMP offloading targets should be supported. For example, you
|
||||
may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx64``. This option is
|
||||
often optional when :ref:`offload_arch` is provided.
|
||||
| It is also possible to offload to CPU architectures, for instance with
|
||||
``-fopenmp-targets=x86_64-pc-linux-gnu``.
|
||||
|
||||
.. _offload_arch:
|
||||
|
||||
``--offload-arch``
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
| Specify the device architecture for OpenMP offloading. For instance
|
||||
``--offload-arch=sm_80`` to target an Nvidia Tesla A100,
|
||||
``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or
|
||||
``--offload-arch=sm_80,gfx90a`` to target both.
|
||||
| It is also possible to specify :ref:`fopenmp-targets` without specifying
|
||||
``--offload-arch``. In that case, the executables ``amdgpu-arch`` or
|
||||
``nvptx-arch`` will be executed as part of the compiler driver to
|
||||
detect the device arhitecture automatically.
|
||||
| Finally, the device architecture will also be automatically inferred with
|
||||
``--offload-arch=native``.
|
||||
|
||||
``--offload-device-only``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Compile only the code that goes on the device. This option is mainly for
|
||||
debugging purposes. It is primarily used for inspecting the intermediate
|
||||
representation (IR) output when compiling for the device. It may also be used
|
||||
if device-only runtimes are created.
|
||||
|
||||
``--offload-host-only``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Compile only the code that goes on the host. With this option enabled, the
|
||||
``.llvm.offloading`` section with embedded device code will not be included in
|
||||
the intermediate representation.
|
||||
|
||||
``--offload-host-device``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Compile the target regions for both the host and the device. That is the
|
||||
default option.
|
||||
|
||||
``-Xopenmp-target <arg>``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Pass an argument ``<arg>`` to the offloading toolchain, for instance
|
||||
``-Xopenmp-target -march=sm_80``.
|
||||
|
||||
``-Xopenmp-target=<triple> <arg>``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Pass an argument ``<arg>`` to the offloading toolchain for the target
|
||||
``<triple>``. That is especially useful when an argument must differ for each
|
||||
triple. For instance ``-Xopenmp-target=nvptx64 --offload-arch=sm_80
|
||||
-Xopenmp-target=amdgcn --offload-arch=gfx90a`` to specify the device
|
||||
architecture. Alternatively, :ref:`Xarch_host` and :ref:`Xarch_device` can
|
||||
pass an argument to the host and device compilation toolchain.
|
||||
|
||||
``-Xoffload-linker<triple> <arg>``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Pass an argument ``<arg>`` to the offloading linker for the target specified in
|
||||
``<triple>``.
|
||||
|
||||
.. _Xarch_device:
|
||||
|
||||
``-Xarch_device <arg>``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Pass an argument ``<arg>`` to the device compilation toolchain.
|
||||
|
||||
.. _Xarch_host:
|
||||
|
||||
``-Xarch_host <arg>``
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
Pass an argument ``<arg>`` to the host compilation toolchain.
|
||||
|
||||
``-foffload-lto[=<arg>]``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Enable device link time optimization (LTO) and select the LTO mode ``<arg>``.
|
||||
Select either ``-foffload-lto=thin`` or ``-foffload-lto=full``. Thin LTO takes
|
||||
less time while still achieving some performance gains. If no argument is set,
|
||||
this option defaults to ``-foffload-lto=full``.
|
||||
|
||||
``-fopenmp-offload-mandatory``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
| This option is set to avoid generating the host fallback code
|
||||
executed when offloading to the device fails. That is
|
||||
helpful when the target contains code that cannot be compiled for the host, for
|
||||
instance, if it contains unguarded device intrinsics.
|
||||
| This option can also be used to reduce compile time.
|
||||
| This option should not be used when one wants to verify that the code is being
|
||||
offloaded to the device. Instead, set the environment variable
|
||||
``OMP_TARGET_OFFLOAD='MANDATORY'`` to confirm that the code is being offloaded to
|
||||
the device.
|
||||
|
||||
``-fopenmp-target-debug[=<arg>]``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Enable debugging in the device runtime library (RTL). Note that it is both
|
||||
necessary to configure the debugging in the device runtime at compile-time with
|
||||
``-fopenmp-target-debug=<arg>`` and enable debugging at runtime with the
|
||||
environment variable ``LIBOMPTARGET_DEVICE_RTL_DEBUG=<arg>``. Further, it is
|
||||
currently only supported for Nvidia targets as of July 2023. Alternatively, the
|
||||
environment variable ``LIBOMPTARGET_DEBUG`` can be set to debug both Nvidia and
|
||||
AMD GPU targets. For more information, see the
|
||||
`debugging instructions <https://openmp.llvm.org/design/Runtimes.html#debugging>`_.
|
||||
The debugging instructions list the supported debugging arguments.
|
||||
|
||||
``-fopenmp-target-jit``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
| Emit code that is Just-in-Time (JIT) compiled for OpenMP offloading. Embed
|
||||
LLVM-IR for the device code in the object files rather than binary code for the
|
||||
respective target. At runtime, the LLVM-IR is optimized again and compiled for
|
||||
the target device. The optimization level can be set at runtime with
|
||||
``LIBOMPTARGET_JIT_OPT_LEVEL``, for instance,
|
||||
``LIBOMPTARGET_JIT_OPT_LEVEL=3`` corresponding to optimizations level ``-O3``.
|
||||
See the
|
||||
`OpenMP JIT details <https://openmp.llvm.org/design/Runtimes.html#libomptarget-jit-pre-opt-ir-module>`_
|
||||
for instructions on extracting the embedded device code before or after the
|
||||
JIT and more.
|
||||
| We want to emphasize that JIT for OpenMP offloading is good for debugging as
|
||||
the target IR can be extracted, modified, and injected at runtime.
|
||||
|
||||
``--offload-new-driver``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
In upstream LLVM, OpenMP only uses the new driver. However, enabling this
|
||||
option for experimental linking with CUDA or HIP files is necessary.
|
||||
|
||||
``--offload-link``
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
Use the new offloading linker `clang-linker-wrapper` to perform the link job.
|
||||
`clang-linker-wrapper` is the default offloading linker for OpenMP. This option
|
||||
can be used to use the new offloading linker in toolchains that do not automatically
|
||||
use it. It is necessary to enable this option when linking with CUDA or HIP files.
|
||||
|
||||
``-nogpulib``
|
||||
^^^^^^^^^^^^^
|
||||
Do not link the device library for CUDA or HIP device compilation.
|
||||
|
||||
``-nogpuinc``
|
||||
^^^^^^^^^^^^^
|
||||
Do not include the default CUDA or HIP headers, and do not add CUDA or HIP
|
||||
include paths.
|
||||
@@ -52,13 +52,15 @@ All patches go through the regular `LLVM review process
|
||||
Q: How to build an OpenMP GPU offload capable compiler?
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
To build an *effective* OpenMP offload capable compiler, only one extra CMake
|
||||
option, `LLVM_ENABLE_RUNTIMES="openmp"`, is needed when building LLVM (Generic
|
||||
option, ``LLVM_ENABLE_RUNTIMES="openmp"``, is needed when building LLVM (Generic
|
||||
information about building LLVM is available `here
|
||||
<https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that
|
||||
are targeted by OpenMP to be enabled. By default, Clang will be built with all
|
||||
backends enabled. When building with `LLVM_ENABLE_RUNTIMES="openmp"` OpenMP
|
||||
should not be enabled in `LLVM_ENABLE_PROJECTS` because it is enabled by
|
||||
default.
|
||||
<https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that
|
||||
are targeted by OpenMP are enabled. That can be done by adjusting the CMake
|
||||
option ``LLVM_TARGETS_TO_BUILD``. The corresponding targets for offloading to AMD
|
||||
and Nvidia GPUs are ``"AMDGPU"`` and ``"NVPTX"``, respectively. By default,
|
||||
Clang will be built with all backends enabled. When building with
|
||||
``LLVM_ENABLE_RUNTIMES="openmp"`` OpenMP should not be enabled in
|
||||
``LLVM_ENABLE_PROJECTS`` because it is enabled by default.
|
||||
|
||||
For Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`.
|
||||
For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
|
||||
@@ -72,14 +74,17 @@ For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
|
||||
|
||||
.. _build_nvidia_offload_capable_compiler:
|
||||
|
||||
Q: How to build an OpenMP NVidia offload capable compiler?
|
||||
Q: How to build an OpenMP Nvidia offload capable compiler?
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
The Cuda SDK is required on the machine that will execute the openmp application.
|
||||
|
||||
If your build machine is not the target machine or automatic detection of the
|
||||
available GPUs failed, you should also set:
|
||||
|
||||
- `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY` where `YY` is the numeric compute capacity of your GPU, e.g., 75.
|
||||
- ``LIBOMPTARGET_DEVICE_ARCHITECTURES=sm_<xy>,...`` where ``<xy>`` is the numeric
|
||||
compute capability of your GPU. For instance, set
|
||||
``LIBOMPTARGET_DEVICE_ARCHITECTURES=sm_70,sm_80`` to target the Nvidia Volta
|
||||
and Ampere architectures.
|
||||
|
||||
|
||||
.. _build_amdgpu_offload_capable_compiler:
|
||||
@@ -133,6 +138,14 @@ With those libraries installed, then LLVM build and installed, try:
|
||||
|
||||
clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example
|
||||
|
||||
If your build machine is not the target machine or automatic detection of the
|
||||
available GPUs failed, you should also set:
|
||||
|
||||
- ``LIBOMPTARGET_DEVICE_ARCHITECTURES=gfx<xyz>,...`` where ``<xyz>`` is the
|
||||
shader core instruction set architecture. For instance, set
|
||||
``LIBOMPTARGET_DEVICE_ARCHITECTURES=gfx906,gfx90a`` to target AMD GCN5
|
||||
and CDNA2 devices.
|
||||
|
||||
Q: What are the known limitations of OpenMP AMDGPU offload?
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
LD_LIBRARY_PATH or rpath/runpath are required to find libomp.so and libomptarget.so
|
||||
@@ -349,7 +362,7 @@ create generic libraries.
|
||||
The architecture can either be specified manually using ``--offload-arch=``. If
|
||||
``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the
|
||||
targets will be inferred from the architectures. Conversely, if
|
||||
``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
|
||||
``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
|
||||
architecture will be set to a default value, usually the architecture supported
|
||||
by the system LLVM was built on.
|
||||
|
||||
@@ -451,3 +464,25 @@ with OpenMP.
|
||||
|
||||
For more information on how this is implemented in LLVM/OpenMP's offloading
|
||||
runtime, refer to the `runtime documentation <libomptarget_libc>`_.
|
||||
|
||||
Q: What command line options can I use for OpenMP?
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
We recommend taking a look at the OpenMP
|
||||
:doc:`command line argument reference <CommandLineArgumentReference>` page.
|
||||
|
||||
Q: Why is my build taking a long time?
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
When installing OpenMP and other LLVM components, the build time on multicore
|
||||
systems can be significantly reduced with parallel build jobs. As suggested in
|
||||
*LLVM Techniques, Tips, and Best Practices*, one could consider using ``ninja`` as the
|
||||
generator. This can be done with the CMake option ``cmake -G Ninja``. Afterward,
|
||||
use ``ninja install`` and specify the number of parallel jobs with ``-j``. The build
|
||||
time can also be reduced by setting the build type to ``Release`` with the
|
||||
``CMAKE_BUILD_TYPE`` option. Recompilation can also be sped up by caching previous
|
||||
compilations. Consider enabling ``Ccache`` with
|
||||
``CMAKE_CXX_COMPILER_LAUNCHER=ccache``.
|
||||
|
||||
Q: Did this FAQ not answer your question?
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Feel free to post questions or browse old threads at
|
||||
`LLVM Discourse <https://discourse.llvm.org/c/runtimes/openmp/>`__.
|
||||
|
||||
@@ -91,6 +91,21 @@ please refer to :doc:`remarks/OptimizationRemarks`.
|
||||
|
||||
remarks/OptimizationRemarks
|
||||
|
||||
OpenMP Command-Line Argument Reference
|
||||
======================================
|
||||
In addition to the
|
||||
`Clang command-line argument reference <https://clang.llvm.org/docs/ClangCommandLineReference.html>`_
|
||||
we also recommend the OpenMP
|
||||
:doc:`command-line argument reference <CommandLineArgumentReference>`
|
||||
page that offers a detailed overview of options specific to OpenMP. It also
|
||||
contains a list of OpenMP offloading related command-line arguments.
|
||||
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
:maxdepth: 1
|
||||
|
||||
CommandLineArgumentReference
|
||||
|
||||
Support, Getting Involved, and Frequently Asked Questions (FAQ)
|
||||
===============================================================
|
||||
|
||||
Reference in New Issue
Block a user