SPIR-V doesn't support variadic functions, though we make an exception
for `printf`.
If we don't error, we generate invalid SPIR-V because the backend has no
idea how to codegen vararg functions as it is not described in the spec.
We get asm like this:
```
%27 = OpFunction %6 None %7
%28 = OpFunctionParameter %4
; -- End function
```
The above asm is totally invalid, there's no `OpFunctionEnd` and it
causes crashes in downstream tools like `spirv-as` and `spirv-link`.
We already have many `printf` tests locking down that this doesn't break
`printf`, it was already handled elsewhere at the time the error check
runs.
Note the SPIR-V Translator does the same thing, see
[here](https://github.com/KhronosGroup/SPIRV-LLVM-Translator/pull/2703).
---------
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
These are C tests, not C++, so no function parameters means unspecified
number of parameters, not `void`.
These compile fine on the current tested offload targets because an
error is only
[thrown](https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDecl.cpp#L10695)
if the calling convention doesn't support variadic arguments, which they
happen to.
When compiling this test for other targets that do not support variadic
arguments, we get an error, which does not seem intentional.
Just add `void` to the parameter list.
---------
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
This prevents the scheduler from thinking copy instructions are free. In
#167008, we saw cases where the scheduler moved ABI copies past other
instructions creating high register pressure that caused the register
allocator to run out of registers. They can't be spilled because the
physical register lifetime was increased, not the virtual register.
Ideally, we would detect what register class the COPY is for, but for now
I've just treated it as a scalar integer copy.
VPVector(End)PointerRecipes are single-scalar if all their operands are.
This should be effectively NFC currently, but it should re-enable cost
checking for some more VPWidenMemoryRecipe after
https://github.com/llvm/llvm-project/pull/157387 as discovered by
John Brawn.
Refines the existing conversion to allow `fir.do_loop` annotated with
`unordered` to be lowered to `scf.parallel`, while other loops retain
their original lowering.
These were in TargetLibraryInfo, but missing from RuntimeLibcalls.
This only adds the cases that already have the non-chk variants
already. Copies the enabled-by-default logic from TargetLibraryInfo,
which is probably overly permissive. Only isPS opts-out.
This patch is a follow-up of #162306 for the reduction clause.
Inside the compute region that carries the reduction clause, a new
hlfir.declare is generated for symbol appearing in the reduction clause.
The input of this hlfir.declare is the acc.reduction result. The related
semantics::Symbol is remapped to the hlfir.declare result so that any
reference to the symbol inside the compute region will use this SSA
value as the starting point instead of the SSA value for the host
address.
`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.
- https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
The following functions/classes have been annotated in this patch:
- [x] `bind_back`, `bind_front`, `bind`
- [x] `function`, `mem_fn`
- [x] `reference_wrapper`
The command line reality is this:
$ clang -c prog.c -fveclib=accelerate
error: invalid value 'accelerate' in '-fveclib=accelerate'
$ clang -c prog.c -fveclib=Accelerate
prog.c:1:2: warning: This is only a test [-W#warnings]
1 | #warning This is only a test
| ^
1 warning generated.
$ clang -c prog.c -fveclib=libmvec
prog.c:1:2: warning: This is only a test [-W#warnings]
1 | #warning This is only a test
| ^
1 warning generated.
$ clang -c prog.c -fveclib=LIBMVEC
error: invalid value 'LIBMVEC' in '-fveclib=LIBMVEC'
$ clang -c prog.c -fveclib=massv
error: invalid value 'massv' in '-fveclib=massv'
$ clang -c prog.c -fveclib=MASSV
prog.c:1:2: warning: This is only a test [-W#warnings]
1 | #warning This is only a test
| ^
1 warning generated.
$ clang -c prog.c -fveclib=sleef
error: invalid value 'sleef' in '-fveclib=sleef'
$ clang -c prog.c -fveclib=sleefgnuabi
error: invalid value 'sleefgnuabi' in '-fveclib=sleefgnuabi'
$ clang -c prog.c -fveclib=SLEEF
prog.c:1:2: warning: This is only a test [-W#warnings]
1 | #warning This is only a test
| ^
1 warning generated.
$ clang -c prog.c -fveclib=darwin_libsystem_m
error: invalid value 'darwin' in '-fveclib=darwin_libsystem_m'
$ clang -c prog.c -fveclib=Darwin_libsystem_m
prog.c:1:2: warning: This is only a test [-W#warnings]
1 | #warning This is only a test
| ^
1 warning generated.
$ clang -c prog.c -fveclib=armpl
error: invalid value 'armpl' in '-fveclib=armpl'
$ clang -c prog.c -fveclib=ARMPL
error: invalid value 'ARMPL' in '-fveclib=ARMPL'
$ clang -c prog.c -fveclib=ArmPL
prog.c:1:2: warning: This is only a test [-W#warnings]
1 | #warning This is only a test
| ^
1 warning generated.
$ clang -c prog.c -fveclib=amdlibm
error: invalid value 'amdlibm' in '-fveclib=amdlibm'
$ clang -c prog.c -fveclib=AMDLIBM
clang: error: unsupported option 'AMDLIBM' for target 'aarch64'
This PR adds hardware-measured latencies for all instructions defined in
Section 13 of the RVV specification: "Vector Floating-Point
Instructions" to the SpacemiT-X60 scheduling model.
This patch does the lowering for a 'declare' construct that is not a
function-local-scope. It also does the lowering for 'create', which has
an entry-op of create and exit-op of delete.
Global/NS/Struct scope 'declare's emit a single 'acc_ctor' and
'acc_dtor' (except in the case of 'link') per variable referenced. The
ctor is the entry op followed by a declare_enter. The dtor is a
get_device_ptr, followed by a declare_exit, followed by a delete(exit
op). This DOES include any necessary bounds.
This patch implements all of the above. We use a separate 'visitor' for
the clauses here since it is particularly different from the other uses,
AND there are only 4 valid clauses. Additionally, we had to split the
modifier conversion into its own 'helpers' file, which will hopefully
get some additional use in the future.
Values were parsed into an unsigned APInt with just enough of a bit
width to hold the number then interpreted as signed values. This
resulted in hex, octal and binary literals from being interpreted as
negative when the most significant bit is 1.
For example the `-0b11` would have a bit width of 2, would be
interpreted as -1, then negated to become 1.
This patch updates the mbarrier.arrive.* family of Ops to include
all features added up-to Blackwell.
* Update the `mbarrier.arrive` Op to include shared_cluster
memory space, cta/cluster scope and an option to lower using
relaxed semantics.
* An `arrive_drop` variant is added for both the `arrive` and
`arrive.nocomplete` operations.
* Updates for expect_tx and complete_tx operations.
* Verifier checks are added wherever appropriate.
* lit tests are added to verify the lowering to the intrinsics.
TODO:
* Updates for the remaining mbarrier family will be done in
subsequent PRs. (mainly, arrive.expect-tx, test_wait and try_waits)
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Previously, only literal upper-bounded loops were recognized. This patch
relaxes this matching to accept any compile-time deducible constant
expression.
It would be better to rely on the SVals (values from the symbolic
domain), as those could potentially have more accurate answers, but this
one is much simpler.
Note that at the time we calculate this value, we have not evaluated the
sub-exprs of the condition, consequently, we can't just query the
Environment for the folded SVal.
Because of this, the next best tool in our toolbox is comp-time
evaluating the Expr.
rdar://165363923
The 16-bit immediate operand of s_waitcnt_depctr / s_wait_alu has some
unused bits. Previously codegen would set these bits to 1, but setting
them to 0 matches the SP3 assembler behaviour better, which in turn
means that we can print them using the human readable SP3 syntax:
s_wait_alu 0xfffd ; unused bits set to 1
s_wait_alu 0xff9d ; unused bits set to 0
s_wait_alu depctr_va_vcc(0) ; unused bits set to 0, human readable
Note that the set of unused bits changed between GFX10.1 and GFX10.3.
If we are given the same index in the comparator callback, simply return
false. Otherwise we will end up adding invalid items to
occludedChildren, causing extra items to get removed that should not be,
resulting in failures that manifest in different forms (assertions, asan
failures, ubsan failures, etc.).
This patch introduces SpecificFP matcher for SelectionDAG nodes.
This includes:
Adding SpecificFP_match() in SDPatternMatch.h.
Adding test coverage in SelectionDAGPatternMatchTest.cpp.
Closes#165566
After this commit, DAGCombiner will have more opportunities to perform
vector folding. This patch includes several foldings, as follows:
- VANDN(x,NOT(y)) -> AND(NOT(x),NOT(y)) -> NOT(OR(X,Y))
- VANDN(x, SplatVector(Imm)) -> AND(NOT(x), NOT(SplatVector(~Imm)))
This patch proposes to move the AVX512 CTLZ/CTTZ i256/i512 codegen to
ReplaceNodeResults to allow them to be declared as custom lowering -
this allows expansion of larger int types (e.g. i1024) to fallback to
them during their expansion.
However to declare these i256/i512 ops as custom, we need to add
MVT::i256/i512 simple types - I'm intending to add further large integer
handling in the future, some of which will use vector register
instructions, and its going to be much easier if this can be handled
with i128/i256/i512 types that match the vector register sizes.
This exposed a regression in NVPTX due to their use of EVT::isSimple()
to match their upper integer size bounds.
Move building the .mod files from openmp/flang to openmp/flang-rt using
a shared mechanism. Motivations to do so are:
1. Most modules are target-dependent and need to be re-compiled for each
target separately, which is something the LLVM_ENABLE_RUNTIMES system
already does. Prime example is `iso_c_binding.mod` which encodes the
target's ABI. Most other modules have `#ifdef`-enclosed code as well.
2. CMake has support for Fortran that we should use. Among other things,
it automatically determines module dependencies so there is no need to
hardcode them in the CMakeLists.txt.
3. It allows using Fortran itself to implement Flang-RT. Currently, only
`iso_fortran_env_impl.f90` emits object files that are needed by Fortran
applications (#89403). The workaround of #95388 could be reverted.
Some new dependencies come into play:
* openmp depends on flang-rt for building `lib_omp.mod` and
`lib_omp_kinds.mod`. Currently, if flang-rt is not found then the
modules are not built.
* check-flang depends on flang-rt: If not found, the majority of tests
are disabled. If not building in a bootstrpping build, the location of
the module files can be pointed to using
`-DFLANG_INTRINSIC_MODULES_DIR=<path>`, e.g. in a flang-standalone
build. Alternatively, the test needing any of the intrinsic modules
could be marked with `REQUIRES: flangrt-modules`.
* check-flang depends on openmp: Not a change; tests requiring
`lib_omp.mod` and `lib_omp_kinds.mod` those are already marked with
`openmp_runtime`.
As intrinsic are now specific to the target, their location is moved
from `include/flang` to `<resource-dir>/finclude/flang/<triple>`. The
mechnism to compute the location have been moved from flang-rt
(previously used to compute the location of `libflang_rt.*.a`) to common
locations in `cmake/GetToolchainDirs.cmake` and
`runtimes/CMakeLists.txt` so they can be used by both, openmp and
flang-rt. Potentially the mechnism could also be shared by other
libraries such as compiler-rt.
`finclude` was chosen because `gfortran` uses it as well and avoids
misuse such as `#include <flang/iso_c_binding.mod>`. The search location
is now determined by `ToolChain` in the driver, instead of by the
frontend. Now the driver adds `-fintrinsic-module-path` for that
location to the frontend call (Just like gfortran does).
`-fintrinsic-module-path` had to be fixed for this because ironically it
was only added to `searchDirectories`, but not
`intrinsicModuleDirectories_`. Since the driver determines the location,
tests invoking `flang -fc1` and `bbc` must also be passed the location
by llvm-lit. This works like llvm-lit does for finding the include dirs
for Clang using `-print-file-name=...`.
The xcnt wait is actually required before any memory access that can
only be done once, so atomic stores and volatile accesses are affected.
This patch also ensures buffer instructions are handled.
In the MachineSMEABIPass, if we have a function with ZT0 state, then
there are some additional cases where we need to zero ZA and ZT0.
If the function has a private ZA interface, i.e., new ZT0 (and new ZA if
present). Then ZT0/ZA must be zeroed when committing the incoming ZA
save.
If the function has a shared ZA interface, e.g. new ZA and shared ZT0.
Then ZA must be zeroed on function entry (without a ZA save commit).
The logic in the ABI pass has been reworked to use an "ENTRY" state to
handle this (rather than the more specific "CALLER_DORMANT" state).
- creates a BTI j|c landing pad MCInst.
- create getBTIHintNum utility in AArch64/Utils, to make sure BOLT
generates BTI immediates the same way as LLVM.
- add MCPlusBuilder unittests to cover new function.
The expansion of move immediate in `expandMOVImm` follows the priority
of the `MOV` alias. In addition, the selection there properly prefers
expansion based on perf optimality order. This change adds a simple
assert that `expandMOVImmSimple` expands a single optimal MOVZ/MOVK.
This patch moves the print functions from `NVVMIntrinsicUtils.h` to
`NVVMIntrinsicUtils.cpp`, a file created in the `llvm/lib/IR` directory.
Signed-off-by: Dharuni R Acharya <dharunira@nvidia.com>