Files
llvm/offload/include/device.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

178 lines
6.3 KiB
C
Raw Normal View History

//===----------- device.h - Target independent OpenMP target RTL ----------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// Declarations for managing devices that are handled by RTL plugins.
//
//===----------------------------------------------------------------------===//
#ifndef _OMPTARGET_DEVICE_H
#define _OMPTARGET_DEVICE_H
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <cstring>
#include <list>
#include <map>
#include <memory>
#include <mutex>
#include <set>
#include "ExclusiveAccess.h"
#include "OffloadEntry.h"
#include "omptarget.h"
#include "rtl.h"
#include "OpenMP/Mapping.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SmallVector.h"
[Reland][Libomptarget] Statically link all plugin runtimes (#87009) This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868
2024-05-09 06:35:54 -05:00
#include "PluginInterface.h"
using GenericPluginTy = llvm::omp::target::plugin::GenericPluginTy;
// Forward declarations.
struct __tgt_bin_desc;
struct __tgt_target_table;
struct DeviceTy {
int32_t DeviceID;
[Reland][Libomptarget] Statically link all plugin runtimes (#87009) This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868
2024-05-09 06:35:54 -05:00
GenericPluginTy *RTL;
int32_t RTLDeviceID;
[Reland][Libomptarget] Statically link all plugin runtimes (#87009) This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868
2024-05-09 06:35:54 -05:00
DeviceTy(GenericPluginTy *RTL, int32_t DeviceID, int32_t RTLDeviceID);
// DeviceTy is not copyable
DeviceTy(const DeviceTy &D) = delete;
DeviceTy &operator=(const DeviceTy &D) = delete;
[OpenMP] Introduce target memory manager Target memory manager is introduced in this patch which aims to manage target memory such that they will not be freed immediately when they are not used because the overhead of memory allocation and free is very large. For CUDA device, cuMemFree even blocks the context switch on device which affects concurrent kernel execution. The memory manager can be taken as a memory pool. It divides the pool into multiple buckets according to the size such that memory allocation/free distributed to different buckets will not affect each other. In this version, we use the exact-equality policy to find a free buffer. This is an open question: will best-fit work better here? IMO, best-fit is not good for target memory management because computation on GPU usually requires GBs of data. Best-fit might lead to a serious waste. For example, there is a free buffer of size 1960MB, and now we need a buffer of size 1200MB. If best-fit, the free buffer will be returned, leading to a 760MB waste. The allocation will happen when there is no free memory left, and the memory free on device will take place in the following two cases: 1. The program ends. Obviously. However, there is a little problem that plugin library is destroyed before the memory manager is destroyed, leading to a fact that the call to target plugin will not succeed. 2. Device is out of memory when we request a new memory. The manager will walk through all free buffers from the bucket with largest base size, pick up one buffer, free it, and try to allocate immediately. If it succeeds, it will return right away rather than freeing all buffers in free list. Update: A threshold (8KB by default) is set such that users could control what size of memory will be managed by the manager. It can also be configured by an environment variable `LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD`. Reviewed By: jdoerfert, ye-luo, JonChesterfield Differential Revision: https://reviews.llvm.org/D81054
2020-08-19 23:12:02 -04:00
~DeviceTy();
/// Try to initialize the device and return any failure.
llvm::Error init();
/// Provide access to the mapping handler.
MappingInfoTy &getMappingInfo() { return MappingInfo; }
llvm::Expected<__tgt_device_binary> loadBinary(__tgt_device_image *Img);
// device memory allocation/deallocation routines
/// Allocates \p Size bytes on the device, host or shared memory space
/// (depending on \p Kind) and returns the address/nullptr when
/// succeeds/fails. \p HstPtr is an address of the host data which the
/// allocated target data will be associated with. If it is unknown, the
/// default value of \p HstPtr is nullptr. Note: this function doesn't do
/// pointer association. Actually, all the __tgt_rtl_data_alloc
/// implementations ignore \p HstPtr. \p Kind dictates what allocator should
/// be used (host, shared, device).
void *allocData(int64_t Size, void *HstPtr = nullptr,
int32_t Kind = TARGET_ALLOC_DEFAULT);
/// Deallocates memory which \p TgtPtrBegin points at and returns
/// OFFLOAD_SUCCESS/OFFLOAD_FAIL when succeeds/fails. p Kind dictates what
/// allocator should be used (host, shared, device).
int32_t deleteData(void *TgtPtrBegin, int32_t Kind = TARGET_ALLOC_DEFAULT);
// Data transfer. When AsyncInfo is nullptr, the transfer will be
// synchronous.
// Copy data from host to device
int32_t submitData(void *TgtPtrBegin, void *HstPtrBegin, int64_t Size,
AsyncInfoTy &AsyncInfo,
HostDataToTargetTy *Entry = nullptr,
MappingInfoTy::HDTTMapAccessorTy *HDTTMapPtr = nullptr);
// Copy data from device back to host
int32_t retrieveData(void *HstPtrBegin, void *TgtPtrBegin, int64_t Size,
AsyncInfoTy &AsyncInfo,
HostDataToTargetTy *Entry = nullptr,
MappingInfoTy::HDTTMapAccessorTy *HDTTMapPtr = nullptr);
// Return true if data can be copied to DstDevice directly
bool isDataExchangable(const DeviceTy &DstDevice);
// Copy data from current device to destination device directly
int32_t dataExchange(void *SrcPtr, DeviceTy &DstDev, void *DstPtr,
int64_t Size, AsyncInfoTy &AsyncInfo);
/// Notify the plugin about a new mapping starting at the host address
/// \p HstPtr and \p Size bytes.
int32_t notifyDataMapped(void *HstPtr, int64_t Size);
/// Notify the plugin about an existing mapping being unmapped starting at
/// the host address \p HstPtr.
int32_t notifyDataUnmapped(void *HstPtr);
// Launch the kernel identified by \p TgtEntryPtr with the given arguments.
int32_t launchKernel(void *TgtEntryPtr, void **TgtVarsPtr,
ptrdiff_t *TgtOffsets, KernelArgsTy &KernelArgs,
AsyncInfoTy &AsyncInfo);
/// Synchronize device/queue/event based on \p AsyncInfo and return
/// OFFLOAD_SUCCESS/OFFLOAD_FAIL when succeeds/fails.
int32_t synchronize(AsyncInfoTy &AsyncInfo);
/// Query for device/queue/event based completion on \p AsyncInfo in a
/// non-blocking manner and return OFFLOAD_SUCCESS/OFFLOAD_FAIL when
/// succeeds/fails. Must be called multiple times until AsyncInfo is
/// completed and AsyncInfo.isDone() returns true.
int32_t queryAsync(AsyncInfoTy &AsyncInfo);
/// Calls the corresponding print device info function in the plugin.
bool printDeviceInfo();
/// Event related interfaces.
/// {
/// Create an event.
int32_t createEvent(void **Event);
/// Record the event based on status in AsyncInfo->Queue at the moment the
/// function is called.
int32_t recordEvent(void *Event, AsyncInfoTy &AsyncInfo);
/// Wait for an event. This function can be blocking or non-blocking,
/// depending on the implementation. It is expected to set a dependence on the
/// event such that corresponding operations shall only start once the event
/// is fulfilled.
int32_t waitEvent(void *Event, AsyncInfoTy &AsyncInfo);
/// Synchronize the event. It is expected to block the thread.
int32_t syncEvent(void *Event);
/// Destroy the event.
int32_t destroyEvent(void *Event);
/// }
/// Print all offload entries to stderr.
void dumpOffloadEntries();
/// Ask the device whether the runtime should use auto zero-copy.
bool useAutoZeroCopy();
/// Check if there are pending images for this device.
bool hasPendingImages() const { return HasPendingImages; }
/// Indicate that there are pending images for this device or not.
void setHasPendingImages(bool V) { HasPendingImages = V; }
private:
/// Deinitialize the device (and plugin).
void deinit();
/// All offload entries available on this device.
using DeviceOffloadEntriesMapTy =
llvm::DenseMap<llvm::StringRef, OffloadEntryTy>;
ProtectedObj<DeviceOffloadEntriesMapTy> DeviceOffloadEntries;
/// Handler to collect and organize host-2-device mapping information.
MappingInfoTy MappingInfo;
/// Flag to indicate pending images (true after construction).
bool HasPendingImages = true;
};
#endif