Commit Graph

1311 Commits

Author SHA1 Message Date
Adrian Kuegel
630b8d36c0 [mlir][Bazel] Add missing dependencies after 564713c471 2023-07-03 13:16:28 +02:00
Matthias Springer
180f9ef8b7 [mlir][linalg] LinalgOp-anchored empty tensor elimination
This revision adds a pre-bufferization transform that can reduce the number of allocation. It is similar to `bufferization.eliminate_empty_tensors`, but specific to LinalgOp.

The transform looks for `tensor.empty` ops where the SSA use-def chain ends in an "ins" operand of a `LinalgOp`. If the same `LinalgOp` has an unused "outs" operand (and some other conditions are met), this "outs" operand can be used instead of the `tensor.empty` and the "ins" operand can be turned into an "outs" operand.

Differential Revision: https://reviews.llvm.org/D153952
2023-07-03 09:17:48 +02:00
Haojian Wu
b28296c500 [bazel] Port bazel support for 5bf8efd269 2023-07-01 08:27:26 +02:00
Guillaume Chatelet
1c814c99aa [libc] Improve memcmp latency and codegen
This is based on ideas from @nafi to:
 - use a branchless version of 'cmp' for 'uint32_t',
 - completely resolve the lexicographic comparison through vector
   operations when wide types are available. We also get rid of byte
   reloads and serializing '__builtin_ctzll'.

I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.

The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.

Reviewed By: nafi3000

Differential Revision: https://reviews.llvm.org/D148717
2023-06-30 13:00:58 +00:00
Aart Bik
6b88c852b6 [mlir][sparse] Start migration to new surface syntax for STEA
We are in the progress of migrating to a much improved surface syntax for the Sparse Tensor Encoding Attribute (STEA).

You can see a preview of this in the StableHLO RFC at

 https://github.com/openxla/stablehlo/blob/main/rfcs/20230210-sparsity.md

//**This design is courtesy Wren Romano.**//

This initial revision
(1) Introduces the first version of a new parser written by Wren Romano
(2) Introduces a simple "migration plan" using NEW_SYNTAX on the STEA, which will allow us to test the new parser with new examples, as well as migrate existing examples over without the need to rewrite them all

This first "drop" merely provides the entry points to parse the new syntax. The parser is still under active development. For example, we need to address the "lookahead" issue when parsing the lvl spec (viz. do we see l0 = d0 or a direct d0). Another larger task is to actually implement "affine" parsing (since the MLIR affine parser is not accessible in other parts of the tree).

EXAMPLE:

Currently, CSR looks like

  #CSR = #sparse_tensor.encoding<{
    lvlTypes = ["dense","compressed"],
    dimToLvl = affine_map<(i,j) -> (i,j)>
  }>

but you can "force" the new parser with

  #CSR = #sparse_tensor.encoding<{
    NEW_SYNTAX =
    (d0, d1) -> (l0 = d0 : dense, l1 = d1 : compressed)
  }>

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D153997
2023-06-29 11:32:07 -07:00
Tue Ly
f320fefc4a [libc][math] Implement erff function correctly rounded to all rounding modes.
Implement correctly rounded `erff` functions.

For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`.

For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval:
```
  erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14).
```

For `x < 0`, we can use the same formula as above, since the odd part is factored out.

Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X:

Reciprocal throughput (clock cycles / op)
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call;
-- CORE-MATH reciprocal throughput --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call;

-- LIBC reciprocal throughput --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call;
-- LIBC reciprocal throughput --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call;
```

and latency (clock cycles / op):
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call;
-- CORE-MATH latency --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call;

-- LIBC latency --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call;
-- LIBC latency --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call;
```

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D153683
2023-06-28 13:58:37 -04:00
Nicolas Vasilache
13f4e889c5 Revert "Revert "[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite." and "[mlir][Transform] Introduce nvgpu transform extensions""
This reverts commit 6506692fe6.

Differential Revision: https://reviews.llvm.org/D153845
2023-06-28 06:50:05 +00:00
Mehdi Amini
6506692fe6 Revert "[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite." and "[mlir][Transform] Introduce nvgpu transform extensions"
This reverts commit 40deed40ae.
and commit 1660f2174d.

The buildbot is broken, the two tests aren't passing.
2023-06-27 08:46:18 +02:00
Benjamin Kramer
a18266473b [bazel][mlir] Add missing dependencies for 5a1cdcbd86 2023-06-27 01:24:15 +02:00
Andres Villegas
939c03512d [llvm-libtool-darwin] Switch to OptTableSummary
Switch the parse of command line options fromllvm::cl to OptTable.
The motivation for this change is to continue adding llvm based tools
to the llvm driver multicall.

Differential Revision: https://reviews.llvm.org/D153665
2023-06-26 14:37:51 -07:00
Fangrui Song
19e9b9b589 [bazel] Add includes after 5a63b2b304 2023-06-26 12:55:48 -07:00
Nicolas Vasilache
40deed40ae [mlir][Transform] Introduce nvgpu transform extensions
Mapping to NVGPU operations such as mma.sync with mixed precision and ldmatrix with transposes and
various data types involves complex matchings from low-level IR.
This is akin to raising complex patterns after unnecessarily having lost structural information.
To avoid such unnecessary complexity, introduce a direct mapping step from a matmul on memrefs
to distributed NVGPU vector abstractions.
In this context, mapping to specific mma.sync operations is trivial and consists in simply
translating the documentation into indexing expressions.

Correctness is demonstrated with an end-to-end integration test.

Differential Revision: https://reviews.llvm.org/D153420
2023-06-26 16:21:28 +00:00
Christian Sigg
9feed59a91 [Bazel][llvm] Fix after 8de9f2b 2023-06-26 14:55:03 +02:00
Benjamin Kramer
4340ef141c [bazel] Add TargetParser dep to tblgen after 8de9f2b558 2023-06-26 12:04:54 +02:00
Christian Sigg
cd482968dc [Bazel][mlir] Avoid ODR violation introduced in 7ab749c.
This change also prepares for 9119325 to land again.

Adds `mlir_c_runner_utils_hdrs` and `mlir_runner_utils_hdrs` targets which do not depend on `//llvm::Support`.

These can be used by other 'runner.so' targets if they are loaded along with the 'runner_utils.so' without calling `__mlir_execution_engine_init()` twice.
2023-06-22 08:00:50 +02:00
Guillaume Chatelet
bd1cba9f4f Revert D148717 "[libc] Improve memcmp latency and codegen"
Once integrated in our codebase the patch triggered a bunch of failing
tests. We do not yet understand where the bug is but we revert it to
move forward with integration.
This reverts commit 5e32765c15.
2023-06-21 12:37:14 +00:00
Christian Sigg
699e64c0d9 Revert "[Bazel][mlir] Fix ODR violation introduced in 7ab749c."
This reverts commit e83c8c3600.

Depending only on the support header files is not sufficient.
2023-06-21 14:29:44 +02:00
Christian Sigg
e83c8c3600 [Bazel][mlir] Fix ODR violation introduced in 7ab749c. 2023-06-21 11:15:09 +02:00
Christian Sigg
7ab749c3a8 [Bazel][mlir] Fix after bba2b65611 2023-06-20 23:00:38 +02:00
Tue Ly
46aa659a32 [libc][math] Improve exp2f performance.
Re-organize special cases and add a special case when `|x| < 2^-5`.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D153134
2023-06-20 09:34:20 -04:00
Tue Ly
5dbd5118ec [libc][math] Improve tanhf performance.
Re-order exceptional branches and slightly adjust the evaluation.

Performance tested with the CORE-MATH project on AMD EPYC 7B12 (clocks/op)

Reciprocal throughputs:
```
--- BEFORE ---

$ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 %  (with -mavx2 -mfma)
Ntrial = 20 ; Min = 7.794 + 0.102 clc/call; Median-Min = 0.066 clc/call; Max = 8.267 clc/call;
[####################] 100 %. (with -msse4.2)
Ntrial = 20 ; Min = 10.783 + 0.172 clc/call; Median-Min = 0.144 clc/call; Max = 11.446 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 18.926 + 0.381 clc/call; Median-Min = 0.342 clc/call; Max = 19.623 clc/call;

--- AFTER ---

$ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 %  (with -mavx2 -mfma)
Ntrial = 20 ; Min = 6.598 + 0.085 clc/call; Median-Min = 0.052 clc/call; Max = 6.868 clc/call;
[####################] 100 %  (with -msse4.2)
Ntrial = 20 ; Min = 9.245 + 0.304 clc/call; Median-Min = 0.248 clc/call; Max = 10.675 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 11.724 + 0.440 clc/call; Median-Min = 0.444 clc/call; Max = 12.262 clc/call;
```

Latency:
```
--- BEFORE ---

$ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 %  (with -mavx2 -mfma)
Ntrial = 20 ; Min = 38.821 + 0.157 clc/call; Median-Min = 0.122 clc/call; Max = 39.539 clc/call;
[####################] 100 %. (with -msse4.2)
Ntrial = 20 ; Min = 44.767 + 0.766 clc/call; Median-Min = 0.681 clc/call; Max = 45.951 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 55.055 + 1.512 clc/call; Median-Min = 1.571 clc/call; Max = 57.039 clc/call;

--- AFTER ---

$ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 %  (with -mavx2 -mfma)
Ntrial = 20 ; Min = 36.147 + 0.194 clc/call; Median-Min = 0.181 clc/call; Max = 36.536 clc/call;
[####################] 100 %  (with -msse4.2)
Ntrial = 20 ; Min = 40.904 + 0.728 clc/call; Median-Min = 0.557 clc/call; Max = 42.231 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 55.776 + 0.557 clc/call; Median-Min = 0.542 clc/call; Max = 56.551 clc/call;
```

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D153026
2023-06-20 09:25:07 -04:00
Matthias Springer
18ec203084 [mlir][transform] Add ApplyRegisteredPassOp transform op
This transform op runs a pass on the target op.

Differential Revision: https://reviews.llvm.org/D153143
2023-06-20 08:55:22 +02:00
Christian Sigg
53f6229328 [Bazel][mlir] Fix layering check after 11db162db0 2023-06-19 09:26:17 +02:00
Pranav Kant
11db162db0 [Bazel][mlir] Port ee8b8d6b58 2023-06-18 17:51:56 +00:00
Fangrui Song
6b53c35e15 [bazel] Fix clang after D148094 2023-06-16 22:19:32 -07:00
Pranav Kant
b35c3fd780 [Bazel][mlir] Port 120cd5aafc 2023-06-17 02:57:56 +00:00
Benjamin Kramer
7ae49609fd [bazel][mlir] Port 65305aeab9 2023-06-16 13:20:32 +02:00
Pranav Kant
ae7e6df15f [Bazel][mlir][tosa] Fix for 86c4972f5f 2023-06-15 19:50:19 +00:00
Kun Wu
b1c683f5c4 [mlir][sparse][gpu] enable sm80+ sparsity integration test only when explicitly set
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152966
2023-06-15 17:44:38 +00:00
Benjamin Kramer
048796b887 [bazel][bolt] Port 05634f7346 2023-06-15 15:42:08 +02:00
Pranav Kant
c731bdd6ca [Bazel] Another fix for 7a2fdc685f 2023-06-14 23:29:06 +00:00
Pranav Kant
4cfc33b8b5 [Bazel] Fix for 7a2fdc685f 2023-06-14 23:12:06 +00:00
Benoit Jacob
1c532b5e44 bazel build --incompatible_no_implicit_file_export
The Bazel build was relying, for the two files enumerated in this diff, on the legacy implicit-export semantics described here:
https://bazel.build/reference/be/functions#exports_files

This documentation page encourages migrating away from this legacy behavior, and indeed we have a user who reported a Bazel build error and it appears that they were already using the new, stricter behavior:
https://github.com/openxla/iree/pull/13982
and while examining fixes on our side and trying to get a clean Bazel build, I ran into this similar issue in the LLVM overlay.

It would arguably be cleaner (in the sense of more structured) to rely on `filegroup` to export this, but I am insufficiently familiar with the Clang build (the dependent targets seem to be below Clang) to do this myself. The present `exports_files` solution has the merit of being localized in these few lines here.

Differential Revision: https://reviews.llvm.org/D152491
2023-06-14 19:24:47 +00:00
Tue Ly
055be3c30c [libc] Enable hermetic floating point tests again.
Fixing an issue with LLVM libc's fenv.h defined rounding mode macros
differently from system libc, making get_round() return different values from
fegetround().  Also letting math tests to skip rounding modes that cannot be
set.  This should allow math tests to be run on platforms in which fenv.h is not
implemented yet.

This allows us to re-enable hermatic floating point tests in
https://reviews.llvm.org/D151123 and reverting https://reviews.llvm.org/D152742.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D152873
2023-06-14 10:53:35 -04:00
Guillaume Chatelet
9902fc8dad [libc] Enable custom logging in LibcTest
This patch mimics the behavior of Google Test and allow users to log custom messages after all flavors of ASSERT_ / EXPECT_.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D152630
2023-06-14 13:37:50 +00:00
Guillaume Chatelet
bdb07c98c4 Revert D152630 "[libc] Enable custom logging in LibcTest"
Failing buildbot https://lab.llvm.org/buildbot/#/builders/73/builds/49707
This reverts commit 9a7b4c9348.
2023-06-14 10:31:49 +00:00
Guillaume Chatelet
9a7b4c9348 [libc] Enable custom logging in LibcTest
This patch mimics the behavior of Google Test and allow users to log custom messages after all flavors of ASSERT_ / EXPECT_.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D152630
2023-06-14 10:26:18 +00:00
Pranav Kant
53e3380786 [Bazel] Fix build 2023-06-13 22:44:11 +00:00
Tue Ly
1557256ab0 [libc] Add Int<> type and fix (U)Int<128> compatibility issues.
Add Int<> and Int128 types to replace the usage of __int128_t in math
functions.  Clean up to make sure that (U)Int128 and __(u)int128_t are
interchangeable in the code base.

Reviewed By: sivachandra, mikhail.ramalho

Differential Revision: https://reviews.llvm.org/D152459
2023-06-13 09:40:48 -04:00
James Knight
c5f6a28749 [bazel] Repair clang_headers_gen when run on macOS.
The antique version of bash (3.2.57, from 2007) which is available on
macOS cannot deal with quoted slashes in a `${x/...}`
substitution. Since only prefix-removal is required here, switch to a
`${x#...}` substitution instead.

(E.g. `src="foo/bar/baz.h"; echo ${src/"foo/bar"}` echos `bar/bar/baz.h`
instead of `/baz.h` on old bash versions).

Originally broken by 459420c33a.
Fixes #63222
2023-06-12 16:30:45 -04:00
Michael Jones
d3074f16a6 [libc] Add qsort_r
This patch adds the reentrent qsort entrypoint, qsort_r. This is done by
extending the qsort functionality and moving it to a shared utility
header. For this reason the qsort_r tests focus mostly on the places
where it differs from qsort, since they share the same sorting code.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D152467
2023-06-12 11:12:17 -07:00
Guillaume Chatelet
5e32765c15 [libc] Improve memcmp latency and codegen
This is based on ideas from @nafi to:
 - use a branchless version of 'cmp' for 'uint32_t',
 - completely resolve the lexicographic comparison through vector
   operations when wide types are available. We also get rid of byte
   reloads and serializing '__builtin_ctzll'.

I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.

The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.

Reviewed By: nafi3000

Differential Revision: https://reviews.llvm.org/D148717
2023-06-12 13:47:16 +00:00
Tue Ly
a982431295 [libc] Add platform independent floating point rounding mode checks.
Many math functions need to check for floating point rounding modes to
return correct values.  Currently most of them use the internal implementation
of `fegetround`, which is platform-dependent and blocking math functions to be
enabled on platforms with unimplemented `fegetround`.  In this change, we add
platform independent rounding mode checks and switching math functions to use
them instead. https://github.com/llvm/llvm-project/issues/63016

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D152280
2023-06-12 09:36:41 -04:00
Guillaume Chatelet
1ec995cc1c Revert D148717 "[libc] Improve memcmp latency and codegen"
This broke aarch64 debug buildbot https://lab.llvm.org/buildbot/#/builders/223/builds/21703
This reverts commit bd4f978754.
2023-06-12 08:32:00 +00:00
Guillaume Chatelet
bd4f978754 [libc] Improve memcmp latency and codegen
This is based on ideas from @nafi to:
 - use a branchless version of 'cmp' for 'uint32_t',
 - completely resolve the lexicographic comparison through vector
   operations when wide types are available. We also get rid of byte
   reloads and serializing '__builtin_ctzll'.

I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.

The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.

Reviewed By: nafi3000

Differential Revision: https://reviews.llvm.org/D148717
2023-06-12 07:56:23 +00:00
Tue Ly
37458f6693 [libc][math] Move str method from FPBits class to testing utils.
str method of FPBits class is only used for pretty printing its objects
in tests.  It brings cpp::string dependency to FPBits class, which is not ideal
for embedded use case.  We move str method to a free function in test utils and
remove this dependency of FPBits class.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D152607
2023-06-10 02:50:58 -04:00
Jordan Rupprecht
261b693afd [bazel][NFC] Add Dialect/Func/Extensions library and deps
Added in D120368
2023-06-09 17:04:41 -07:00
Mikhail Goncharov
b28614c4fc [bazel] format bazel files NFC 2023-06-09 12:13:07 +02:00
Michael Jones
47fd67ec34 [libc][NFC] land long double table for printf
The Mega Table that printf uses for long doubles with some flags is too
large for the linters, and so has been split out from the main patch.
The main patch: https://reviews.llvm.org/D150399

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D152470
2023-06-08 16:14:56 -07:00
Michael Jones
688b9730d1 [libc] add options to printf decimal floats
This patch adds three options for printf decimal long doubles, and these
can also apply to doubles.

1. Use a giant table which is fast and accurate, but takes up ~5MB).
2. Use dyadic floats for approximations, which only gives ~50 digits of
   accuracy but is very fast.
3. Use large integers for approximations, which is accurate but very
   slow.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D150399
2023-06-08 14:23:15 -07:00