Commit Graph

1354 Commits

Author SHA1 Message Date
Martin Erhart
07c079a97a [mlir][bufferization] Add lowering of bufferization.dealloc to memref.dealloc
Adds a generic lowering that suppors all cases of bufferization.dealloc
and one specialized, more efficient lowering for the simple case. Using
a helper function with for loops in the general case enables
O(|num_dealloc_memrefs|+|num_retain_memrefs|) size of the lowered code.

Depends on D155467

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D155468
2023-07-19 14:28:01 +00:00
Mikhail Goncharov
571c1292b6 [bazel] remove PythonAttr TD after 67a910bbff
Differential Revision: https://reviews.llvm.org/D155686
2023-07-19 11:46:47 +02:00
Oleg Shyshkov
c425cfa98e [mlir][bazel] Fix build. 2023-07-19 11:11:32 +02:00
Guillaume Chatelet
1f5783474f [libc][NFC] Rename files
This patch mostly renames files so it better reflects the function they declare.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D155607
2023-07-19 09:06:29 +00:00
Mikhail Goncharov
5bba176366 [bazel] fix layering check to run bazel with 16.x
Preparation to update bazel builder to use LLVM 16 release
where layering check was enabled https://reviews.llvm.org/D132779

Current setup missed some backsliding in layering check as it has
only on for projects with the check enforced.

Disabled it completely for libc and fixed for DWARFLinkerParallel.
It would be great to re-enable it for libc later.
2023-07-18 22:05:14 +02:00
Adrian Kuegel
7da6e0af1d [mlir][Bazel] Add missing dependency. 2023-07-18 16:10:48 +02:00
Adrian Kuegel
107a7d1f05 [mlir][Bazel] Add VectorToArmSME target. 2023-07-18 16:02:15 +02:00
Adrian Kuegel
7bde09cc24 [mlir][Bazel] Add missing dependency. 2023-07-18 15:51:49 +02:00
Matthias Springer
db393288ff [mlir][NVGPU][transform] Add create_async_groups transform op
This transform looks for suitable vector transfers from global memory to shared memory and converts them to async device copies.

Differential Revision: https://reviews.llvm.org/D155569
2023-07-18 14:36:41 +02:00
Guillaume Chatelet
23dcdbfba7 [libc][NFC] Split memmove implementations per platform
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D155515
2023-07-18 12:20:23 +00:00
Mikhail Goncharov
d18ce73170 [bazel] fix build of ArithUtils 2023-07-18 09:20:13 +02:00
Nicolas Vasilache
582e1d58bd [mlir][test] Fix linking error post test-lower-to-nvvm
This fixes builds for 7e78ecfe10 (both cmake and bazel) as well as trim unnecessary dependencies.

This is achieved by moving the functionality to test/lib/GPU which is a more natural landing pad.
2023-07-17 18:43:32 +02:00
Adam Paszke
fbfff1caff [MLIR][CAPI] Add C API dialect registration methods for Arith, Math, MemRef and Vector dialects
Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D155450
2023-07-17 14:45:49 +00:00
Alex Zinenko
371366ce27 [mlir][nvgpu] add simple pipelining for shared memory copies
Add a simple transform operation to the NVGPU extension that performs
software pipelining of copies to shared memory. The functionality is
extremely minimalistic in this version and only supports copies from
global to shared memory inside an `scf.for` loop with either
`vector.transfer` or `nvgpu.device_async_copy` operations when
pipelining preconditions are already satisfied in the IR. This is the
minimally useful version that uses the more general loop pipeliner in an
NVGPU-specific way. Further extensions and orthogonalizations will be
necessary.

This required a change to the loop pipeliner itself to properly
propagate errors should the predicate generator fail.

This is loosely inspired from the vesion in IREE, but has less unsafe
assumptions and more principled way of communicating decisions.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155223
2023-07-17 14:29:12 +00:00
Matthias Springer
a4f4d82c35 [mlir][NVGPU][NFC] Clean up code structure
* Move passes to `Transforms` directory.
* Add `Utils.h` (will be utilized in a subsequent change).

Differential Revision: https://reviews.llvm.org/D155427
2023-07-17 14:15:42 +02:00
Guillaume Chatelet
b38dda74fa [libc][NFC] Split memcmp implementations per platform
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D155181
2023-07-17 11:35:31 +00:00
Guillaume Chatelet
83f3920854 [libc][NFC] Split memset implementations per platform
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D155174
2023-07-17 11:12:19 +00:00
Oleg Shyshkov
4592543a01 [mlir][bazel] Fix build. 2023-07-17 11:00:20 +02:00
Matthias Springer
88f4292a16 [mlir][bufferization] OneShotBufferizeOp: Add options to use linalg.copy
This new option allows users to specify a custom memcpy op.

Differential Revision: https://reviews.llvm.org/D155280
2023-07-14 13:34:22 +02:00
Hanhan Wang
8fc433f055 [mlir][MemRef] Move narrow type emulation common methods to MemRefUtils.
It also unifies the computation of StridedLayoutAttr. If the stride is
static known value, we can just use it.

Differential Revision: https://reviews.llvm.org/D155017
2023-07-13 14:43:21 -07:00
Guillaume Chatelet
8cc440b3e7 [libc][NFC] Split memcpy implementations per platform
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D155099
2023-07-13 10:30:38 +00:00
Guillaume Chatelet
1c4e4e03bd [libc][NFC] Split bcmp implementations per platform
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D155076
2023-07-13 10:19:00 +00:00
Sterling Augustine
39d6fe790c Add bazel support for new DebugBTF component. 2023-07-12 14:57:15 -07:00
Andrés Villegas
4f92557bfc [NFC][llvm-dwp] Switch from llvm::cl to OptTable
Switch the parse of command line options from llvm::cl to OptTable.

The motivation for this change is to continue adding llvm based tools
to the llvm driver multicall. For more information about the proposal
and motivation, please see https://discourse.llvm.org/t/rfc-llvm-busybox-proposal/58494

Reviewed By: abrachet

Differential Revision: https://reviews.llvm.org/D154642
2023-07-12 19:12:48 +00:00
Adrian Kuegel
a69b2e3d1c [clang][Bazel] Add dependency to the right target. 2023-07-12 10:19:06 +02:00
Adrian Kuegel
93e7ef5907 [clang][Bazel] Add missing dependency. 2023-07-12 10:14:14 +02:00
Sterling Augustine
8df8f01065 Fix bazel build for 5a1cdcbd86 2023-07-11 14:27:40 -07:00
Arthur Eubanks
4cca3de87e [bazel][docs] Update build documentation
Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D155004
2023-07-11 13:36:27 -07:00
Alex Zinenko
8a918c54bb [mlir] add backward dense dataflow analysis
This is the counterpart to the forward dense dataflow analysis and
integrates into the dataflow framework. The implementation follows the
structure of existing dataflow analyses.

Reviewed By: Mogball, phisiart

Differential Revision: https://reviews.llvm.org/D154713
2023-07-11 16:47:53 +00:00
Aliia Khasanova
be29fe2f98 Fix bazel build file for D154060.
Differential Revision: https://reviews.llvm.org/D154976
2023-07-11 17:33:58 +02:00
NAKAMURA Takumi
82371e68e4 [Bazel] Fixup for D153758, D153850, and D153861 (global-isel-combiner-matchtable) 2023-07-11 22:53:38 +09:00
Fangrui Song
7f7f4a6b17 [bazel] Adjust llvm:DebugInfo after D149501 (BTF.h) 2023-07-10 15:05:51 -07:00
Guillaume Chatelet
bfd94882f2 [libc][NFC] Move aligned access implementations to separate header
Follow up on https://reviews.llvm.org/D154770

Differential Revision: https://reviews.llvm.org/D154800
2023-07-09 22:17:05 +00:00
Guillaume Chatelet
dbaa5838c1 [libc][NFC] Move memfunction's byte per byte implementations to a separate header
There will be subsequent patches to move things around and make the file layout more principled.

Differential Revision: https://reviews.llvm.org/D154770
2023-07-09 07:21:58 +00:00
Alex Zinenko
9ab34689b0 [mlir] add a simple gpu barrier elimination mechanism
GPU code generation, and specifically the shared memory copy insertion
may introduce spurious barriers guarding read-after-read dependencies or
read-after-write on non-aliasing data, which degrades performance due to
unnecessary synchronization. Add a pattern and transform op that removes
such barriers by analyzing memory effects that the barrier actually
guards that are not also guarded by other barriers. The code is adapted
from the Polygeist incubator project.

Co-authored-by: William Moses <gh@wsmoses.com>
Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp>

Reviewed By: nicolasvasilache, wsmoses

Differential Revision: https://reviews.llvm.org/D154720
2023-07-07 18:51:49 +00:00
Guillaume Chatelet
cb1468d3cb [libc] Adding a version of memcpy w/ software prefetching
For machines with a lot of cores, hardware prefetchers can saturate the memory bus when utilization is high.
In this case it is desirable to turn off the hardware prefetcher completely.
This has a big impact on the performance of memory functions such as `memcpy` that rely on the fact that the next cache line will be readily available.

This patch adds the 'LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING' compile time option that generates a version of memcpy with software prefetching. While not fully restoring the original performances it mitigates the impact to an acceptable level.

Reviewed By: rtenneti

Differential Revision: https://reviews.llvm.org/D154494
2023-07-07 10:37:32 +00:00
Haojian Wu
99074aafc3 [bazel] Port for 88e95c1e4b 2023-07-07 09:02:05 +02:00
Michael Jones
cfbcbc8f88 [libc] fix MPFR rounding problems in fuzz test
The accuracy for the MPFR numbers in the strtofloat fuzz test was set
too high, causing rounding issues when rounding to a smaller final
result.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D154150
2023-07-05 10:53:40 -07:00
Alexander Belyaev
594643177f Fix bazel build after https://reviews.llvm.org/D150578. 2023-07-05 14:33:49 +02:00
Matthias Springer
cb7bda2ace [mlir][NFC] Use getConstantIntValue instead of casting to ConstantIndexOp
`getConstantIntValue` extracts constant values from all constant-like ops, not just `arith::ConstantIndexOp`.

Differential Revision: https://reviews.llvm.org/D154356
2023-07-04 14:08:37 +02:00
Benjamin Kramer
9846b9e2ca [bazel] Add missing dependency for d9d9be63a5 2023-07-04 13:34:03 +02:00
Matthias Springer
8b8e62d3f6 [mlir][SCF] Add loop.promote_if_one_iteration transform op
This transform op promotes loops with one iteration. I.e., the loop op is replaced by just the loop body.

Differential Revision: https://reviews.llvm.org/D154361
2023-07-04 08:58:49 +02:00
Matthias Springer
fa1a23a720 [mlir][transform] Add transform.apply_licm op
This op applies loop-invariant code motion to the targeted loop-like op.

Differential Revision: https://reviews.llvm.org/D154327
2023-07-03 15:28:53 +02:00
Adrian Kuegel
630b8d36c0 [mlir][Bazel] Add missing dependencies after 564713c471 2023-07-03 13:16:28 +02:00
Matthias Springer
180f9ef8b7 [mlir][linalg] LinalgOp-anchored empty tensor elimination
This revision adds a pre-bufferization transform that can reduce the number of allocation. It is similar to `bufferization.eliminate_empty_tensors`, but specific to LinalgOp.

The transform looks for `tensor.empty` ops where the SSA use-def chain ends in an "ins" operand of a `LinalgOp`. If the same `LinalgOp` has an unused "outs" operand (and some other conditions are met), this "outs" operand can be used instead of the `tensor.empty` and the "ins" operand can be turned into an "outs" operand.

Differential Revision: https://reviews.llvm.org/D153952
2023-07-03 09:17:48 +02:00
Haojian Wu
b28296c500 [bazel] Port bazel support for 5bf8efd269 2023-07-01 08:27:26 +02:00
Guillaume Chatelet
1c814c99aa [libc] Improve memcmp latency and codegen
This is based on ideas from @nafi to:
 - use a branchless version of 'cmp' for 'uint32_t',
 - completely resolve the lexicographic comparison through vector
   operations when wide types are available. We also get rid of byte
   reloads and serializing '__builtin_ctzll'.

I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.

The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.

Reviewed By: nafi3000

Differential Revision: https://reviews.llvm.org/D148717
2023-06-30 13:00:58 +00:00
Aart Bik
6b88c852b6 [mlir][sparse] Start migration to new surface syntax for STEA
We are in the progress of migrating to a much improved surface syntax for the Sparse Tensor Encoding Attribute (STEA).

You can see a preview of this in the StableHLO RFC at

 https://github.com/openxla/stablehlo/blob/main/rfcs/20230210-sparsity.md

//**This design is courtesy Wren Romano.**//

This initial revision
(1) Introduces the first version of a new parser written by Wren Romano
(2) Introduces a simple "migration plan" using NEW_SYNTAX on the STEA, which will allow us to test the new parser with new examples, as well as migrate existing examples over without the need to rewrite them all

This first "drop" merely provides the entry points to parse the new syntax. The parser is still under active development. For example, we need to address the "lookahead" issue when parsing the lvl spec (viz. do we see l0 = d0 or a direct d0). Another larger task is to actually implement "affine" parsing (since the MLIR affine parser is not accessible in other parts of the tree).

EXAMPLE:

Currently, CSR looks like

  #CSR = #sparse_tensor.encoding<{
    lvlTypes = ["dense","compressed"],
    dimToLvl = affine_map<(i,j) -> (i,j)>
  }>

but you can "force" the new parser with

  #CSR = #sparse_tensor.encoding<{
    NEW_SYNTAX =
    (d0, d1) -> (l0 = d0 : dense, l1 = d1 : compressed)
  }>

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D153997
2023-06-29 11:32:07 -07:00
Tue Ly
f320fefc4a [libc][math] Implement erff function correctly rounded to all rounding modes.
Implement correctly rounded `erff` functions.

For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`.

For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval:
```
  erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14).
```

For `x < 0`, we can use the same formula as above, since the odd part is factored out.

Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X:

Reciprocal throughput (clock cycles / op)
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call;
-- CORE-MATH reciprocal throughput --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call;

-- LIBC reciprocal throughput --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call;
-- LIBC reciprocal throughput --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call;
```

and latency (clock cycles / op):
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call;
-- CORE-MATH latency --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call;

-- LIBC latency --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call;
-- LIBC latency --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call;
```

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D153683
2023-06-28 13:58:37 -04:00
Nicolas Vasilache
13f4e889c5 Revert "Revert "[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite." and "[mlir][Transform] Introduce nvgpu transform extensions""
This reverts commit 6506692fe6.

Differential Revision: https://reviews.llvm.org/D153845
2023-06-28 06:50:05 +00:00