Commit Graph

640 Commits

Author SHA1 Message Date
Joshua Cranmer
bcad161db3 [Clang][SPIR-V] Emit target extension types for OpenCL types on SPIR-V.
Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D141008
2023-03-13 14:20:24 -04:00
Yaxun (Sam) Liu
37114036aa [AMDGPU] Mark mbcnt as convergent
since it depends on CFG.

Otherwise some passes will try to merge them and cause
incorrect results.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D145072
2023-03-02 11:56:32 -05:00
Nikita Popov
3d84f4268d [Clang] Convert some tests to opaque pointers (NFC) 2023-02-17 09:49:03 +01:00
Diana Picus
819dfc338b [AMDGPU] Autogenerate checks for several tests. NFCI 2023-02-16 10:54:34 +01:00
Matt Arsenault
647925648a clang/OpenCL: Apply default attributes to enqueued blocks
This was missing important environment context, like denormal-fp-math
and target-features. Curiously this seems to be losing nounwind. Note
this only fixes the actual invoke kernel. The invoke function is
already setting the default attribute set for internal
functions. However that is still buggy since it's not applying any use
function attributes (it's also missing uniform-work-group-size).

There seem to be too many different functions for setting attributes
with inconsistent behavior. The Function overload of
addDefaultFunctionAttributes seems to miss the target-cpu and
target-features. The AttrBuilder one seems to miss optnone (but that
seems to be disallowed on blocks anyway). Neither one calls
setTargetAttributes, when it probably should. uniform-work-group-size
is also set through AMDGPU code when it should be emitting generically
as a language property.

I also noticed update_cc_test_checks for attributes seem to not
connect the captured attribute variables to the attributes at the end
(although I think the numbers happen to work out correctly).
2023-01-30 15:03:15 -04:00
Matt Arsenault
d12ee4bf7c clang/OpenCL: Extend tests for enqueued block attributes
Baseline tests showing that enqueued blocks are not getting the
correct attributes applied.
2023-01-30 15:03:15 -04:00
Matt Arsenault
00f6a7f02f clang/OpenCL: Fix not setting convergent on block invoke kernels
Yet another example how convergent not being the default is dangerous
and backwards.
2023-01-30 15:03:14 -04:00
Stanislav Mekhanoshin
df0488369d [AMDGPU] Split dot7 feature
Differential Revision: https://reviews.llvm.org/D142507
2023-01-26 10:34:36 -08:00
Nikita Popov
eaea793d5e [Clang] Convert some tests to opaque pointers (NFC)
These are all tests that end up running SROA.
2023-01-26 11:33:19 +01:00
Stanislav Mekhanoshin
870b92977e [AMDGPU] Split dot8 feature
Differential Revision: https://reviews.llvm.org/D142407
2023-01-24 11:16:07 -08:00
Stanislav Mekhanoshin
4ab2246d48 [AMDGPU] Remove dot1 and dot6 features from clang for gfx11
These are unsupported.

Differential Revision: https://reviews.llvm.org/D142493
2023-01-24 10:52:42 -08:00
Sven van Haastregt
1495210914 [OpenCL] Always add nounwind attribute for OpenCL
Neither OpenCL nor C++ for OpenCL support exceptions, so add the
`nounwind` attribute unconditionally for those languages.

Differential Revision: https://reviews.llvm.org/D142033
2023-01-20 12:01:22 +00:00
Matt Arsenault
7f2f6eec3e clang/OpenCL: Check calling convention in test
update_cc_test_checks misses this, so make sure at least one block
enqueue test manually checks the calling convention for the kernel.
2023-01-12 13:39:23 -05:00
Nikita Popov
02856565ac [Clang] Emit noundef metadata next to range metadata
To preserve the previous semantics after D141386, adjust places
that currently emit !range metadata to also emit !noundef metadata.
This retains range violation as immediate undefined behavior,
rather than just poison.

Differential Revision: https://reviews.llvm.org/D141494
2023-01-12 10:03:05 +01:00
Paul Walker
eae26b6640 [IRBuilder] Use canonical i64 type for insertelement index used by vector splats.
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.

NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.

Differential Revision: https://reviews.llvm.org/D140983
2023-01-11 14:08:06 +00:00
Matt Arsenault
f9559b1e30 clang: Convert test to generated checks and opaque pointers 2023-01-10 20:35:49 -05:00
Matt Arsenault
81849497b4 clang/AMDGPU: Remove flat-address-space from feature map
This was only used for checking if is_shared/is_private were legal,
which we're not bothering to do anymore.

This is apparently visible to more than the target attribute (which
seems to silently ignore unrecognized features), so this has the
potential to break something (i.e. see the OpenMP test change)
2023-01-05 16:35:04 -05:00
Nikita Popov
aae20a7421 [CodeGenOpenCL] Convert some tests to opaque pointers (NFC) 2023-01-05 10:57:30 +01:00
Matt Arsenault
e630d9b299 AMDGPU/clang: Remove target features from address space test builtins
It turns out we can codegen these on targets without flat addressing,
although the runtime probably didn't put anything useful there. The
proper diagnostic would be to disallow flat pointer uses or languages
with them, not this one edge case. Allows removing one of the special
cases requiring subtarget support in the device libraries.
2022-12-29 18:46:41 -05:00
Matt Arsenault
f4bcd7f598 AMDGPU/clang: Add builtins for llvm.amdgcn.ballot
Use explicit _w32/_w64 suffixes for the wave size to be consistent
with the existing other wave dependent intrinsics. Also start
diagnosing trying to use both wave32 and wave64.

I would have preferred to avoid the +wavefrontsize64 spam on targets
where that's the only option, but avoiding this seems to be more work
than I expected.
2022-12-29 17:58:55 -05:00
Roman Lebedev
96d3c82645 Revert "[SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes (take 3)"
While the PPC litte-endian miscompile did get addressed
by https://reviews.llvm.org/D140046
the PPV big-endian bots are still unhappy.
https://lab.llvm.org/buildbot/#/builders/93/builds/12560

This reverts commit 7bd358bcb4e358b4351c69e02ef76939e08acdc7.
2022-12-16 22:58:41 +03:00
Roman Lebedev
cfd594f8bb [SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes (take 3)
* This is a recommit of 3c4d2a0396,
* which was reverted in 25f01d593c,
  because it exposed a miscompile in PPC backend,  which was resolved
  in https://reviews.llvm.org/D140089 / cb3f415cd2.
* which was a recommit of cf624b23bc,
* which was reverted in 5cfc22cafe,
  because the cut-off on the number of vector elements was not low enough,
  and it triggered both SDAG SDNode operand number assertions,
  5and caused compile time explosions in some cases.

Let's try with something really *REALLY* conservative first,
just to get somewhere, and try to bump it later.

FIXME: should this respect TTI reg width * num vec regs?

Original commit message:

Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.

But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.

Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.

Fixes #59116.
2022-12-16 19:27:38 +03:00
Nikita Popov
9466b49171 [Clang] Convert various tests to opaque pointers (NFC)
These were all tests where no manual fixup was required.
2022-12-12 17:11:46 +01:00
Johannes Doerfert
f6e3a89cc0 [AMDGPU] Annotate the intrinsics to be default and nocallback
Differential Revision: https://reviews.llvm.org/D135155
2022-12-07 14:25:25 -08:00
Manuel Brito
481170cb55 [Clang][CodeGen] Use poison instead of undef for extra argument in __builtin_amdgcn_mov_dpp [NFC]
Differential Revision: https://reviews.llvm.org/D138755
2022-12-06 12:40:33 +00:00
Ron Lieberman
ca856fff1c Revert "enable code-object-version=5"
very sorry wrong repo.

This reverts commit d882ba7aea.
2022-11-29 15:21:09 -06:00
Ron Lieberman
d882ba7aea enable code-object-version=5 2022-11-29 15:11:57 -06:00
Sven van Haastregt
b0e4897a1b [OpenCL] Remove arm-integer-dot-product extension pragmas
This extension only adds builtin functions and thus doesn't need to be
included as an extension.  Instead of a pragma, the builtin functions
of the extension can be exposed through enabling preprocessor defines.
2022-11-29 13:26:50 +00:00
David Stuttard
7940888c59 [AMDGPU] Intrinsic to expose s_wait_event for export ready
Differential Revision: https://reviews.llvm.org/D138216
2022-11-28 11:26:15 +00:00
Roman Lebedev
25f01d593c Revert "[SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes (take 2)"
TableGen is still getting miscompiled on PPC buildbots.
Sent a mail with request for help.

This reverts commit 3c4d2a0396.
2022-11-27 00:00:06 +03:00
Roman Lebedev
3c4d2a0396 [SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes (take 2)
This is a recommit of cf624b23bc,
which was reverted in 5cfc22cafe,
because the cut-off on the number of vector elements was not low enough,
and it triggered both SDAG SDNode operand number assertions,
and caused compile time explosions in some cases.

Let's try with something really *REALLY* conservative first,
just to get somewhere, and try to bump it (to 64/128) later.

FIXME: should this respect TTI reg width * num vec regs?

Original commit message:

Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.

But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.

Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.

Fixes #59116.
2022-11-26 23:19:15 +03:00
Alex Richardson
54ad4d2dd1 Drop redundant pipe to opt -instnamer in clang tests
This used to be required, but the difference between asserts/!asserts
builds no longer exists for %clang_cc1 (only for %clang), so they pass
just fine without this flag.
2022-11-25 11:34:55 +00:00
Benjamin Kramer
5cfc22cafe Revert "[SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes"
This reverts commit cf624b23bc. It
triggers crashes in clang, see the comments on github on the original
change.
2022-11-23 13:11:16 +01:00
Roman Lebedev
cf624b23bc [SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes
Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.

But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.

Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.

Fixes #59116.
2022-11-23 02:38:25 +03:00
Arthur Eubanks
e564f5153f [clang][test] Avoid UB in overload.cl 2022-11-13 14:02:24 -08:00
Zahira Ammarguellat
91628f0616 The handling of 'funsafe-math-optimizations' doesn't update the 'MathErrno'
flag. But the driver checks for 'fno-math-errno' before passing
'funsafe-math-optimizations' to the FE. In GCC, the option
'funsafe-math-optimizations' doesn't affect the 'fmath-errno' flag.
This patch aligns clang with GCC.

'-ffast-math' sets the FPContract to 'fast'. But 'funsafe-math-optimizations'
the driver doesn't consider the FPContract when handling the option.
Unfortunately there are places in the BE that interpret unsafe math
mode as allowing FMA. This patch makes -ffast-math' and
'funsafe-math-optimizations' behave similarly in regard to the setting of the
FPContract.

Differential Revision: https://reviews.llvm.org/D137578
2022-11-11 10:24:12 -05:00
Bjorn Pettersson
5f9a82683d [clang][test] Use opt -passes=<name> instead of opt -name
Updated the RUN line in several test cases to use the new PM syntax
  opt -passes=<pipeline>
instead of the deprecated syntax
  opt -pass1 -pass2

This was not a complete cleanup in clang/test. But just a swipe using
some simple search-and-replace. Mainly for RUN lines involving
-mem2reg, -instnamer and -early-cse.
2022-11-08 12:15:42 +01:00
Nikita Popov
304f1d59ca [IR] Switch everything to use memory attribute
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.

The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.

High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.

Differential Revision: https://reviews.llvm.org/D135780
2022-11-04 10:21:38 +01:00
Matt Arsenault
f59f116bd5 AMDGPU: Add __builtin_amdgcn_permlane64 2022-10-13 21:12:11 -07:00
Petar Avramovic
dcc756d03e [AMDGPU] Pattern for flat atomic fadd f64 intrinsic with local addr
Fix regression from clang opencl test in builtins-fp-atomics-gfx90a.cl
test_flat_add_local_f64 caused by D130579
Revert a3becb333d.

Differential Revision: https://reviews.llvm.org/D134568
2022-09-25 13:25:41 +02:00
Petar Avramovic
a3becb333d [clang][AMDGPU] Temporarily disable clang atomic fadd test for gfx90a
Test is broken by D130579. Temporarily disable to silence builbot failures.
2022-09-23 21:49:16 +02:00
Stanislav Mekhanoshin
e540965915 [AMDGPU] Added __builtin_amdgcn_ds_bvh_stack_rtn
Differential Revision: https://reviews.llvm.org/D133966
2022-09-16 02:42:09 -07:00
Fangrui Song
74742147ee [test] Change cc1 -fvisibility to -fvisibility= 2022-09-02 12:36:44 -07:00
Muhammad Omair Javaid
18de7c6a3b Revert "[InstCombine] Treat passing undef to noundef params as UB"
This reverts commit c911befaec.

It has broken LLDB Arm/AArch64 Linux buildbots. I dont really understand
the underlying reason. Reverting for now make buildbot green.

https://reviews.llvm.org/D133036
2022-09-02 16:09:50 +05:00
Arthur Eubanks
c911befaec [InstCombine] Treat passing undef to noundef params as UB
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D133036
2022-09-01 15:16:45 -07:00
Anastasia Stulova
6b1a04529c [OpenCL][SPIR-V] Test extern functions with a pointer arg.
Added a test case that enhances coverage of opaque pointers
particularly for the problematic case with extern functions
for which there is no solution found for type recovery.

Differential Revision: https://reviews.llvm.org/D130768
2022-09-01 10:22:47 +01:00
Yaxun (Sam) Liu
9f6cb3e9fd [AMDGPU] Add builtin s_sendmsg_rtn
Reviewed by: Brian Sumner, Artem Belevich

Differential Revision: https://reviews.llvm.org/D132140

Fixes: SWDEV-352017
2022-08-22 18:29:23 -04:00
Austin Kerbow
b0f4678b90 [AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy
Adds a builtin that serves as an optimization hint to apply specific optimized
DAG mutations during scheduling. This also disables any other mutations or
clustering that may interfere with the desired pipeline. The first optimization
strategy that is added here is designed to improve the performance of small gemm
kernels on gfx90a.

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D132079
2022-08-19 15:38:36 -07:00
Austin Kerbow
f5b21680d1 [AMDGPU] Add amdgcn_sched_group_barrier builtin
This builtin allows the creation of custom scheduling pipelines on a per-region
basis. Like the sched_barrier builtin this is intended to be used either for
testing, in situations where the default scheduler heuristics cannot be
improved, or in critical kernels where users are trying to get performance that
is close to handwritten assembly. Obviously using these builtins will require
extra work from the kernel writer to maintain the desired behavior.

The builtin can be used to create groups of instructions called "scheduling
groups" where ordering between the groups is enforced by the scheduler.
__builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter
is a mask that determines the types of instructions that you would like to
synchronize around and add to a scheduling group. These instructions will be
selected from the bottom up starting from the sched_group_barrier's location
during instruction scheduling. The second parameter is the number of matching
instructions that will be associated with this sched_group_barrier. The third
parameter is an identifier which is used to describe what other
sched_group_barriers should be synchronized with. Note that multiple
sched_group_barriers must be added in order for them to be useful since they
only synchronize with other sched_group_barriers. Only "scheduling groups" with
a matching third parameter will have any enforced ordering between them.

As an example, the code below tries to create a pipeline of 1 VMEM_READ
instruction followed by 1 VALU instruction followed by 5 MFMA instructions...
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 1 VALU
__builtin_amdgcn_sched_group_barrier(2, 1, 0)
// 5 MFMA
__builtin_amdgcn_sched_group_barrier(8, 5, 0)
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 3 VALU
__builtin_amdgcn_sched_group_barrier(2, 3, 0)
// 2 VMEM_WRITE
__builtin_amdgcn_sched_group_barrier(64, 2, 0)

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D128158
2022-07-28 10:43:14 -07:00
Aaron Ballman
7068aa9841 Strengthen -Wint-conversion to default to an error
Clang has traditionally allowed C programs to implicitly convert
integers to pointers and pointers to integers, despite it not being
valid to do so except under special circumstances (like converting the
integer 0, which is the null pointer constant, to a pointer). In C89,
this would result in undefined behavior per 3.3.4, and in C99 this rule
was strengthened to be a constraint violation instead. Constraint
violations are most often handled as an error.

This patch changes the warning to default to an error in all C modes
(it is already an error in C++). This gives us better security posture
by calling out potential programmer mistakes in code but still allows
users who need this behavior to use -Wno-error=int-conversion to retain
the warning behavior, or -Wno-int-conversion to silence the diagnostic
entirely.

Differential Revision: https://reviews.llvm.org/D129881
2022-07-22 15:24:54 -04:00