Commit Graph

333 Commits

Author SHA1 Message Date
Luke Drummond
e3fbede7f3 [HIP] Add missing __hip_atomic_fetch_sub support
The rest of the fetch/op intrinsics were added in e13246a2ec but sub
was conspicuous by its absence.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D151701
2023-05-30 22:22:43 +01:00
M. Zeeshan Siddiqui
e621757365 [Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and extend excess precision support
Pursuant to discussions at
https://discourse.llvm.org/t/rfc-c-23-p1467r9-extended-floating-point-types-and-standard-names/70033/22,
this commit enhances the handling of the __bf16 type in Clang.
- Firstly, it upgrades __bf16 from a storage-only type to an arithmetic
  type.
- Secondly, it changes the mangling of __bf16 to DF16b on all
  architectures except ARM. This change has been made in
  accordance with the finalization of the mangling for the
  std::bfloat16_t type, as discussed at
  https://github.com/itanium-cxx-abi/cxx-abi/pull/147.
- Finally, this commit extends the existing excess precision support to
  the __bf16 type. This applies to hardware architectures that do not
  natively support bfloat16 arithmetic.
Appropriate tests have been added to verify the effects of these
changes and ensure no regressions in other areas of the compiler.

Reviewed By: rjmccall, pengfei, zahiraam

Differential Revision: https://reviews.llvm.org/D150913
2023-05-27 13:33:50 +08:00
Artem Belevich
25708b3df6 [NVPTX, CUDA] barrier intrinsics and builtins for sm_90
Differential Revision: https://reviews.llvm.org/D151363
2023-05-25 11:57:57 -07:00
Artem Belevich
0a0bae1e9f [CUDA] plumb through new sm_90-specific builtins.
Differential Revision: https://reviews.llvm.org/D151168
2023-05-25 11:57:56 -07:00
Sergei Barannikov
f46b0e6d75 [clang] Convert a few tests to opaque pointers
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D150520
2023-05-14 21:00:15 +03:00
Matt Arsenault
bc37be1855 LangRef: Add "dynamic" option to "denormal-fp-math"
This is stricter than the default "ieee", and should probably be the
default. This patch leaves the default alone. I can change this in a
future patch.

There are non-reversible transforms I would like to perform which are
legal under IEEE denormal handling, but illegal with flushing zero
behavior. Namely, conversions between llvm.is.fpclass and fcmp with
zeroes.

Under "ieee" handling, it is legal to translate between
llvm.is.fpclass(x, fcZero) and fcmp x, 0.

Under "preserve-sign" handling, it is legal to translate between
llvm.is.fpclass(x, fcSubnormal|fcZero) and fcmp x, 0.

I would like to compile and distribute some math library functions in
a mode where it's callable from code with and without denormals
enabled, which requires not changing the compares with denormals or
zeroes.

If an IEEE function transforms an llvm.is.fpclass call into an fcmp 0,
it is no longer possible to call the function from code with denormals
enabled, or write an optimization to move the function into a denormal
flushing mode. For the original function, if x was a denormal, the
class would evaluate to false. If the function compiled with denormal
handling was converted to or called from a preserve-sign function, the
fcmp now evaluates to true.

This could also be of use for strictfp handling, where code may be
changing the denormal mode.

Alternative name could be "unknown".

Replaces the old AMDGPU custom inlining logic with more conservative
logic which tries to permit inlining for callees with dynamic handling
and avoids inlining other mismatched modes.
2023-04-29 08:44:59 -04:00
Artem Belevich
2aa90da012 [CUDA] Update cached kernel handle when the function instance changes.
Fixes clang crash caused by a stale function pointer.

The bug has been present for a pretty long time, but we were lucky not to
trigger it until  D140663.

Differential Revision: https://reviews.llvm.org/D146448
2023-03-21 15:36:12 -07:00
Nikita Popov
06621ecdaf [Clang] Convert some tests to opaque pointers (NFC) 2023-02-17 15:08:50 +01:00
Matt Arsenault
7cbbbc0a04 clang: Rename misleading test name
This is a test for attribute propagation, not metadata
2023-02-15 08:32:57 -04:00
Yaxun (Sam) Liu
f4d8b8781d [AMDGPU ASAN] Remove reference to asan bitcode library
The asan functions are now attributed as used
in the device library, no need to keep the
declaration of asan device preserve function.

Patch by: Praveen Velliengiri

Reviewed by: Yaxun Liu

Differential Revision: https://reviews.llvm.org/D143495
2023-02-14 11:52:41 -05:00
Daniele Castagna
32c26e27b6 CUDA/HIP: Use kernel name to map to symbol
Currently CGCUDANV uses an llvm::Function as a key to map kernels to a
symbol in host code.  HIP adds one level of indirection and uses the
llvm::Function to map to a global variable that will be initialized to
the kernel stub ptr.

Unfortunately there is no garantee that the llvm::Function created
by GetOrCreateLLVMFunction will be the same.  In fact, the first
time we encounter GetOrCrateLLVMFunction for a kernel, the type
might not be completed yet, and the type of llvm::Function will be
a generic {}, since the complete type is not required to get a symbol
to a function.  In this case we end up creating two global variables,
one for the llvm::Function with the incomplete type and one for the
function with the complete type. The first global variable will be
declared by not defined, resulting in a linking error.

This change uses the mangled name of the llvm::Function as key in the
KernelHandles map, in this way the same llvm::Function will be
associated to the same kernel handle even if they types are different.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D140663
2023-01-19 15:02:14 -08:00
Nikita Popov
b164b047f2 [Clang] Convert test to opaque pointers (NFC) 2023-01-17 10:08:57 +01:00
Nikita Popov
02856565ac [Clang] Emit noundef metadata next to range metadata
To preserve the previous semantics after D141386, adjust places
that currently emit !range metadata to also emit !noundef metadata.
This retains range violation as immediate undefined behavior,
rather than just poison.

Differential Revision: https://reviews.llvm.org/D141494
2023-01-12 10:03:05 +01:00
Matt Arsenault
2e9c663ab4 clang/AMDGPU: Add missing tests for some builtin
These were tested under opencl but need hip testing for the potential
addrspacecasts.
2023-01-10 13:07:01 -05:00
Pierre van Houtryve
678d8946ba [AMDGPU] Add bf16 storage support
- [Clang] Declare AMDGPU target as supporting BF16 for storage-only purposes on amdgcn
  - Add Sema & CodeGen tests cases.
  - Also add cases that D138651 would have covered as this patch replaces it.
- [AMDGPU] Add BF16 storage-only support
  - Support legalization/dealing with bf16 operations in DAGIsel.
  - bf16 as a type remains illegal and is represented as i16 for storage purposes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D139398
2022-12-13 10:34:26 -05:00
Nikita Popov
0419465fa4 [Clang] Update some CUDA tests to opaque pointers (NFC) 2022-12-13 11:50:08 +01:00
Nikita Popov
9466b49171 [Clang] Convert various tests to opaque pointers (NFC)
These were all tests where no manual fixup was required.
2022-12-12 17:11:46 +01:00
Yaxun (Sam) Liu
262c3034bb Revert "[AMDGPU] Disable bool range metadata to workaround backend issue"
This reverts commit 107ee26130 to facilitate
investigating and fixing the root cause.

Differential Revision: https://reviews.llvm.org/D135269
2022-12-08 17:35:10 -05:00
Ron Lieberman
ca856fff1c Revert "enable code-object-version=5"
very sorry wrong repo.

This reverts commit d882ba7aea.
2022-11-29 15:21:09 -06:00
Ron Lieberman
d882ba7aea enable code-object-version=5 2022-11-29 15:11:57 -06:00
Pierre van Houtryve
c05f1639f7 [clang][cuda/hip] Allow __noinline__ lambdas
D124866 seem to have had an unintended side effect: __noinline__ on lambdas was no longer accepted.

This fixes the regression and adds a test case for it.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D137251
2022-11-04 07:33:31 +00:00
Artem Belevich
0e8a414ab3 [CUDA, NVPTX] Added basic __bf16 support for NVPTX.
Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and
that makes those types visible furing GPU-side compilation, where it currently
fails with Sema complaining that __bf16 is not supported.

Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's
enabled on the host should pose no issues, correctness-wise.

Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better
support for __bf16 on NVPTX going forward.

Differential Revision: https://reviews.llvm.org/D136311
2022-10-25 11:08:06 -07:00
Artem Belevich
a10eb07d1a Do not append terminating NUL to the binary string with embedded fatbin.
Extra NUL does not impact functionality of the generated code, but it confuses
various NVIDIA tools used to examine embedded GPU binaries.

Differential Revision: https://reviews.llvm.org/D135832
2022-10-17 15:39:39 -07:00
Fangrui Song
ccd89be645 [Driver] Remove legacy cc1 alias -mlink-cuda-bitcode
r340193 added -mlink-builtin-bitcode as the canonical spelling.
2022-10-15 14:02:13 -07:00
Zahira Ammarguellat
84a9ec2ff1 Remove redundant option -menable-unsafe-fp-math.
There are currently two options that are used to tell the compiler to perform
unsafe floating-point optimizations:
'-ffast-math' and '-funsafe-math-optimizations'.

'-ffast-math' is enabled by default. It automatically enables the driver option
'-menable-unsafe-fp-math'.
Below is a table illustrating the special operations enabled automatically by
'-ffast-math', '-funsafe-math-optimizations' and '-menable-unsafe-fp-math'
respectively.

Special Operations -ffast-math	-funsafe-math-optimizations -menable-unsafe-fp-math
MathErrno	       0	         1	                    1
FiniteMathOnly         1 	         0                          0
AllowFPReassoc	       1         	 1                          1
NoSignedZero	       1                 1                          1
AllowRecip             1                 1                          1
ApproxFunc             1                 1                          1
RoundingMath	       0                 0                          0
UnsafeFPMath	       1                 0                          1
FPContract	       fast	         on	                    on

'-ffast-math' enables '-fno-math-errno', '-ffinite-math-only',
'-funsafe-math-optimzations' and sets 'FpContract' to 'fast'. The driver option
'-menable-unsafe-fp-math' enables the same special options than
'-funsafe-math-optimizations'. This is redundant.
We propose to remove the driver option '-menable-unsafe-fp-math' and use
instead, the setting of the special operations to set the function attribute
'unsafe-fp-math'. This attribute will be enabled only if those special
operations are enabled and if 'FPContract' is either 'fast' or set to the
default value.

Differential Revision: https://reviews.llvm.org/D135097
2022-10-14 10:55:29 -04:00
Yaxun (Sam) Liu
eb26baf4ad Fix test bool-range.cu
Promoting kernel arg pointer to global addr space is only
available with registered amdgcn target.

Fix test so that it does not require registered amdgcn target.
2022-10-07 11:17:28 -04:00
Yaxun (Sam) Liu
107ee26130 [AMDGPU] Disable bool range metadata to workaround backend issue
Currently there is a middle-end or backend issue
https://github.com/llvm/llvm-project/issues/58176
which causes values loaded from bool pointer incorrect when
bool range metadata is emitted. Temporarily
disable bool range metadata until the backend issue
is fixed.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D135269

Fixes: SWDEV-344137
2022-10-07 10:46:04 -04:00
Yaxun (Sam) Liu
5e25284dbc [AMDGPU] Emit module flag for all code object versions
Reviewed by: Changpeng Fang, Matt Arsenault, Brian Sumner

Differential Revision: https://reviews.llvm.org/D134355
2022-09-22 16:51:33 -04:00
Fangrui Song
74742147ee [test] Change cc1 -fvisibility to -fvisibility= 2022-09-02 12:36:44 -07:00
Johannes Doerfert
48d6f52401 [CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive
A copy-paste error caused UB in the definition of the unsigned long long
versions of the shfl intrinsics. Reported and diagnosed by @trws.

Differential Revision: https://reviews.llvm.org/D129536
2022-07-21 12:36:54 -05:00
Joseph Huber
b370be37cc [CUDA] Allow the new driver to compile CUDA in non-RDC mode
The new driver primarily allows us to support RDC-mode compilations with
proper linking. This is not needed for non-RDC mode compilation, but we
still would like the new driver to be able to handle this mode so we can
transition away from the old driver in the future. This patch adds the
necessary code to support creating a fatbinary for CUDA code generation
as well as removing old assumptions and errors about RDC-mode with the
new driver.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D129655
2022-07-13 21:49:15 -04:00
Joseph Huber
e88d53d25f [HIP] Generate offloading entries for HIP with the new driver.
This patch adds the small change required to output offloading entried
for HIP instead of CUDA. These should be placed in different sections so
because they need to be distinct to the offloading toolchain, otherwise
we'd have HIP trying to register CUDA kernels or vice-versa. This patch will
precede support for HIP in the linker wrapper.

Reviewed By: yaxunl, tra

Differential Revision: https://reviews.llvm.org/D128850
2022-07-11 15:49:21 -04:00
Nikita Popov
935570b2ad [ConstExpr] Don't create div/rem expressions
This removes creation of udiv/sdiv/urem/srem constant expressions,
in preparation for their removal. I've added a
ConstantExpr::isDesirableBinOp() predicate to determine whether
an expression should be created for a certain operator.

With this patch, div/rem expressions can still be created through
explicit IR/bitcode, forbidding them entirely will be the next step.

Differential Revision: https://reviews.llvm.org/D128820
2022-07-05 15:54:53 +02:00
Yaxun (Sam) Liu
8ad4c6e4b1 [HIP] add -fhip-kernel-arg-name
Add option -fhip-kernel-arg-name to emit kernel argument
name metadata, which is needed for certain HIP applications.

Reviewed by: Artem Belevich, Fangrui Song, Brian Sumner

Differential Revision: https://reviews.llvm.org/D128022
2022-06-24 11:15:36 -04:00
Nikita Popov
6f258c0fd3 [Clang] Don't test register allocation
This test was broken by 719658d078.

How did an assembly test get into clang/test?
2022-06-23 12:46:00 +02:00
Yaxun (Sam) Liu
af9ee3357c [HIP] fix long double size
For amdgpu target long double type is the same as double type.
The width and align of long double type was incorrectly
overridden when copying aux target properties, which
caused assertion in codegen when emitting global
variables with long double type.

This patch fix that by saving and restoring width
and align of long double type.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D127771

Fixes: SWDEV-335515
2022-06-14 21:57:56 -04:00
Nikita Popov
41d5033eb1 [IR] Enable opaque pointers by default
This enabled opaque pointers by default in LLVM. The effect of this
is twofold:

* If IR that contains *neither* explicit ptr nor %T* types is passed
  to tools, we will now use opaque pointer mode, unless
  -opaque-pointers=0 has been explicitly passed.
* Users of LLVM as a library will now default to opaque pointers.
  It is possible to opt-out by calling setOpaquePointers(false) on
  LLVMContext.

A cmake option to toggle this default will not be provided. Frontends
or other tools that want to (temporarily) keep using typed pointers
should disable opaque pointers via LLVMContext.

Differential Revision: https://reviews.llvm.org/D126689
2022-06-02 09:40:56 +02:00
Joseph Huber
1bae02b773 [Cuda] Use fallback method to mangle externalized decls if no CUID given
CUDA requires that static variables be visible to the host when
offloading. However, The standard semantics of a stiatc variable dictate
that it should not be visible outside of the current file. In order to
access it from the host we need to perform "externalization" on the
static variable on the device. This requires generating a semi-unique
name that can be affixed to the variable as to not cause linker errors.

This is currently done using the CUID functionality, an MD5 hash value
set up by the clang driver. This allows us to achieve is mostly unique
ID that is unique even between multiple compilations of the same file.
However, this is not always availible. Instead, this patch uses the
unique ID from the file to generate a unique symbol name. This will
create a unique name that is consistent between the host and device side
compilations without requiring the CUID to be entered by the driver. The
one downside to this is that we are no longer stable under multiple
compilations of the same file. However, this is a very niche use-case
and is not supported by Nvidia's CUDA compiler so it likely to be good
enough.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D125904
2022-05-26 09:18:22 -04:00
Joseph Huber
0035f7154c [CUDA] Create offloading entries when using the new driver
The changes made in D123460 generalized the code generation for OpenMP's
offloading entries. We can use the same scheme to register globals for
CUDA code. This patch adds the code generation to create these
offloading entries when compiling using the new offloading driver mode.
The offloading entries are simple structs that contain the information
necessary to register the global. The struct used is as follows:

```
Type struct __tgt_offload_entry {
  void    *addr;      // Pointer to the offload entry info.
                      // (function or global)
  char    *name;      // Name of the function or global.
  size_t  size;       // Size of the entry info (0 if it a function).
  int32_t flags;
  int32_t reserved;
};
```

Currently CUDA handles RDC code generation by deferring the registration
of globals in the current TU to a callback function containing the
modules ID. Later all the module IDs will be used to register all of the
globals at once. Rather than mimic this, offloading entries allow us to
mimic the way OpenMP registers globals. That is, we create a simple
global struct for each device global to be registered. These are placed
at a special section `cuda_offloading_entires`. Because this section is
a valid C-identifier, the linker will profide a `__start` and `__stop`
pointer that we can use to iterate and register all globals at runtime.

the registration requires a flag variable to indicate which registration
function to use. I have assigned the flags somewhat arbitrarily, but
these use the following values.

Kernel: 0
Variable: 0
Managed: 1
Surface: 2
Texture: 3

Depends on D120272

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D123471
2022-05-11 07:30:21 -04:00
Yaxun (Sam) Liu
afc9d674fe [CUDA][HIP] support __noinline__ as keyword
CUDA/HIP programs use __noinline__ like a keyword e.g.
__noinline__ void foo() {} since __noinline__ is defined
as a macro __attribute__((noinline)) in CUDA/HIP runtime
header files.

However, gcc and clang supports __attribute__((__noinline__))
the same as __attribute__((noinline)). Some C++ libraries
use __attribute__((__noinline__)) in their header files.
When CUDA/HIP programs include such header files,
clang will emit error about invalid attributes.

This patch fixes this issue by supporting __noinline__ as
a keyword, so that CUDA/HIP runtime could remove
the macro definition.

Reviewed by: Aaron Ballman, Artem Belevich

Differential Revision: https://reviews.llvm.org/D124866
2022-05-10 14:32:27 -04:00
Yaxun (Sam) Liu
11d3e31c60 [CUDA][HIP] Fix mangling number for local struct
MSVC and Itanium mangling use different mangling numbers
for function-scope structs, which causes inconsistent
mangled kernel names in device and host compilations.

This patch uses Itanium mangling number for structs
in for mangling device side names in CUDA/HIP host
compilation on Windows to fix this issue.

A state is added to ASTContext to indicate whether the
current name mangling is for device side names in host
compilation. Device and host mangling number
are encoded/decoded as upper and lower half of 32 bit
unsigned integer to fit into the original mangling number
field for AST. Diagnostic will be emitted if a manglining
number exceeds limit.

Reviewed by: Artem Belevich, Reid Kleckner

Differential Revision: https://reviews.llvm.org/D122734

Fixes: SWDEV-328515
2022-04-28 19:54:43 -04:00
Yaxun (Sam) Liu
57a210e5b7 [CUDA][HIP] Fix linkage of __clang_gpu_used_external
Different TU's may have this globl var. appending linkage can
only be used with lld recognized special variables.

Change it to internal linkage.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D124466
2022-04-26 20:43:39 -04:00
David Green
9727c77d58 [NFC] Rename Instrinsic to Intrinsic 2022-04-25 18:13:23 +01:00
Yaxun (Sam) Liu
04fb81674e [CUDA][HIP] Externalize kernels with internal linkage
This patch is a continuation of https://reviews.llvm.org/D123353.

Not only kernels in anonymous namespace, but also template
kernels with template arguments in anonymous namespace
need to be externalized.

To be more generic, this patch checks the linkage of a kernel
assuming the kernel does not have __global__ attribute. If
the linkage is internal then clang will externalize it.

This patch also fixes the postfix for externalized symbol
since nvptx does not allow '.' in symbol name.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D124189

Fixes: https://github.com/llvm/llvm-project/issues/54560
2022-04-22 17:05:36 -04:00
Yaxun (Sam) Liu
cac4e2fe25 [CUDA][HIP] Fix gpu.used.external
Rename gpu.used.external as __clang_gpu_used_external as ptxas does not
allow . in global variable name.

Fixes: https://github.com/llvm/llvm-project/issues/54934

Reviewed by: Joseph Huber, Artem Belevich

Differential Revision: https://reviews.llvm.org/D123946
2022-04-18 23:10:31 -04:00
Yaxun (Sam) Liu
0424b5115c [CUDA][HIP] Fix host used external kernel in archive
For -fgpu-rdc, a host function may call an external kernel
which is defined in an archive of bitcode. Since this external
kernel is only referenced in host function, the device
bitcode does not contain reference to this external
kernel, then the linker will not try to resolve this external
kernel in the archive.

To fix this issue, host-used external kernels and device
variables are tracked. A global array containing pointers
to these external kernels and variables is emitted which
serves as an artificial references to the external kernels
and variables used by host.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D123441
2022-04-13 10:47:16 -04:00
Fangrui Song
63fbc77121 [Driver][test] Remove unused/obsoleted REQUIRES: clang-driver
It (introduced by 556d713c70) appears to be
related to the removed dragonegg project. In addition, the feature was a bit
misnamed and may lur users to unnecessarily use it.
2022-04-12 13:29:46 -07:00
Yaxun (Sam) Liu
4ea1d43509 [CUDA][HIP] Externalize kernels in anonymous name space
kernels in anonymous name space needs to have unique name
to avoid duplicate symbols.

Fixes: https://github.com/llvm/llvm-project/issues/54560

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D123353
2022-04-10 21:56:28 -04:00
Jonas Hahnfeld
e4903d8be3 [CUDA/HIP] Remove argument from module ctor/dtor signatures
In theory, constructors can take arguments when called via .init_array
where at least glibc passes in (argc, argv, envp). This isn't used in
the generated code and if it was, the first argument should be an
integer, not a pointer. For destructors registered via atexit, the
function should never take an argument.

Differential Revision: https://reviews.llvm.org/D123370
2022-04-09 12:34:41 +02:00
Nikita Popov
b16a3b4f3b [Clang] Add -no-opaque-pointers to more tests (NFC)
This adds the flag to more tests that were not caught by the
mass-migration in 532dc62b90.
2022-04-07 12:53:29 +02:00