intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-26 12:26:52 +08:00

Author	SHA1	Message	Date
Adrian Kuegel	630b8d36c0	[mlir][Bazel] Add missing dependencies after `564713c471`	2023-07-03 13:16:28 +02:00
Matthias Springer	180f9ef8b7	[mlir][linalg] LinalgOp-anchored empty tensor elimination This revision adds a pre-bufferization transform that can reduce the number of allocation. It is similar to `bufferization.eliminate_empty_tensors`, but specific to LinalgOp. The transform looks for `tensor.empty` ops where the SSA use-def chain ends in an "ins" operand of a `LinalgOp`. If the same `LinalgOp` has an unused "outs" operand (and some other conditions are met), this "outs" operand can be used instead of the `tensor.empty` and the "ins" operand can be turned into an "outs" operand. Differential Revision: https://reviews.llvm.org/D153952	2023-07-03 09:17:48 +02:00
Haojian Wu	b28296c500	[bazel] Port bazel support for `5bf8efd269`	2023-07-01 08:27:26 +02:00
Guillaume Chatelet	1c814c99aa	[libc] Improve memcmp latency and codegen This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717	2023-06-30 13:00:58 +00:00
Aart Bik	6b88c852b6	[mlir][sparse] Start migration to new surface syntax for STEA We are in the progress of migrating to a much improved surface syntax for the Sparse Tensor Encoding Attribute (STEA). You can see a preview of this in the StableHLO RFC at https://github.com/openxla/stablehlo/blob/main/rfcs/20230210-sparsity.md //This design is courtesy Wren Romano.// This initial revision (1) Introduces the first version of a new parser written by Wren Romano (2) Introduces a simple "migration plan" using NEW_SYNTAX on the STEA, which will allow us to test the new parser with new examples, as well as migrate existing examples over without the need to rewrite them all This first "drop" merely provides the entry points to parse the new syntax. The parser is still under active development. For example, we need to address the "lookahead" issue when parsing the lvl spec (viz. do we see l0 = d0 or a direct d0). Another larger task is to actually implement "affine" parsing (since the MLIR affine parser is not accessible in other parts of the tree). EXAMPLE: Currently, CSR looks like #CSR = #sparse_tensor.encoding<{ lvlTypes = ["dense","compressed"], dimToLvl = affine_map<(i,j) -> (i,j)> }> but you can "force" the new parser with #CSR = #sparse_tensor.encoding<{ NEW_SYNTAX = (d0, d1) -> (l0 = d0 : dense, l1 = d1 : compressed) }> Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D153997	2023-06-29 11:32:07 -07:00
Tue Ly	f320fefc4a	[libc][math] Implement erff function correctly rounded to all rounding modes. Implement correctly rounded `erff` functions. For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`. For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval: ``` erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14). ``` For `x < 0`, we can use the same formula as above, since the odd part is factored out. Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X: Reciprocal throughput (clock cycles / op) ``` $ ./perf.sh erff --path2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with -march=native (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call; -- CORE-MATH reciprocal throughput -- with -march=x86-64-v2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call; -- LIBC reciprocal throughput -- with -mavx2 -mfma (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call; -- LIBC reciprocal throughput -- with -msse4.2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call; ``` and latency (clock cycles / op): ``` $ ./perf.sh erff --path2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with -march=native (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call; -- CORE-MATH latency -- with -march=x86-64-v2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call; -- LIBC latency -- with -mavx2 -mfma (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call; -- LIBC latency -- with -msse4.2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call; ``` Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153683	2023-06-28 13:58:37 -04:00
Nicolas Vasilache	13f4e889c5	Revert "Revert "[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite." and "[mlir][Transform] Introduce nvgpu transform extensions"" This reverts commit `6506692fe6`. Differential Revision: https://reviews.llvm.org/D153845	2023-06-28 06:50:05 +00:00
Mehdi Amini	6506692fe6	Revert "[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite." and "[mlir][Transform] Introduce nvgpu transform extensions" This reverts commit `40deed40ae`. and commit `1660f2174d`. The buildbot is broken, the two tests aren't passing.	2023-06-27 08:46:18 +02:00
Benjamin Kramer	a18266473b	[bazel][mlir] Add missing dependencies for `5a1cdcbd86`	2023-06-27 01:24:15 +02:00
Andres Villegas	939c03512d	[llvm-libtool-darwin] Switch to OptTableSummary Switch the parse of command line options fromllvm::cl to OptTable. The motivation for this change is to continue adding llvm based tools to the llvm driver multicall. Differential Revision: https://reviews.llvm.org/D153665	2023-06-26 14:37:51 -07:00
Fangrui Song	19e9b9b589	[bazel] Add includes after `5a63b2b304`	2023-06-26 12:55:48 -07:00
Nicolas Vasilache	40deed40ae	[mlir][Transform] Introduce nvgpu transform extensions Mapping to NVGPU operations such as mma.sync with mixed precision and ldmatrix with transposes and various data types involves complex matchings from low-level IR. This is akin to raising complex patterns after unnecessarily having lost structural information. To avoid such unnecessary complexity, introduce a direct mapping step from a matmul on memrefs to distributed NVGPU vector abstractions. In this context, mapping to specific mma.sync operations is trivial and consists in simply translating the documentation into indexing expressions. Correctness is demonstrated with an end-to-end integration test. Differential Revision: https://reviews.llvm.org/D153420	2023-06-26 16:21:28 +00:00
Christian Sigg	9feed59a91	[Bazel][llvm] Fix after `8de9f2b`	2023-06-26 14:55:03 +02:00
Benjamin Kramer	4340ef141c	[bazel] Add TargetParser dep to tblgen after `8de9f2b558`	2023-06-26 12:04:54 +02:00
Christian Sigg	cd482968dc	[Bazel][mlir] Avoid ODR violation introduced in `7ab749c`. This change also prepares for `9119325` to land again. Adds `mlir_c_runner_utils_hdrs` and `mlir_runner_utils_hdrs` targets which do not depend on `//llvm::Support`. These can be used by other 'runner.so' targets if they are loaded along with the 'runner_utils.so' without calling `__mlir_execution_engine_init()` twice.	2023-06-22 08:00:50 +02:00
Guillaume Chatelet	bd1cba9f4f	Revert D148717 "[libc] Improve memcmp latency and codegen" Once integrated in our codebase the patch triggered a bunch of failing tests. We do not yet understand where the bug is but we revert it to move forward with integration. This reverts commit `5e32765c15`.	2023-06-21 12:37:14 +00:00
Christian Sigg	699e64c0d9	Revert "[Bazel][mlir] Fix ODR violation introduced in 7ab749c." This reverts commit `e83c8c3600`. Depending only on the support header files is not sufficient.	2023-06-21 14:29:44 +02:00
Christian Sigg	e83c8c3600	[Bazel][mlir] Fix ODR violation introduced in `7ab749c`.	2023-06-21 11:15:09 +02:00
Christian Sigg	7ab749c3a8	[Bazel][mlir] Fix after `bba2b65611`	2023-06-20 23:00:38 +02:00
Tue Ly	46aa659a32	[libc][math] Improve exp2f performance. Re-organize special cases and add a special case when `\|x\| < 2^-5`. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153134	2023-06-20 09:34:20 -04:00
Tue Ly	5dbd5118ec	[libc][math] Improve tanhf performance. Re-order exceptional branches and slightly adjust the evaluation. Performance tested with the CORE-MATH project on AMD EPYC 7B12 (clocks/op) Reciprocal throughputs: ``` --- BEFORE --- $ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 7.794 + 0.102 clc/call; Median-Min = 0.066 clc/call; Max = 8.267 clc/call; [####################] 100 %. (with -msse4.2) Ntrial = 20 ; Min = 10.783 + 0.172 clc/call; Median-Min = 0.144 clc/call; Max = 11.446 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 18.926 + 0.381 clc/call; Median-Min = 0.342 clc/call; Max = 19.623 clc/call; --- AFTER --- $ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 6.598 + 0.085 clc/call; Median-Min = 0.052 clc/call; Max = 6.868 clc/call; [####################] 100 % (with -msse4.2) Ntrial = 20 ; Min = 9.245 + 0.304 clc/call; Median-Min = 0.248 clc/call; Max = 10.675 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 11.724 + 0.440 clc/call; Median-Min = 0.444 clc/call; Max = 12.262 clc/call; ``` Latency: ``` --- BEFORE --- $ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 38.821 + 0.157 clc/call; Median-Min = 0.122 clc/call; Max = 39.539 clc/call; [####################] 100 %. (with -msse4.2) Ntrial = 20 ; Min = 44.767 + 0.766 clc/call; Median-Min = 0.681 clc/call; Max = 45.951 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 55.055 + 1.512 clc/call; Median-Min = 1.571 clc/call; Max = 57.039 clc/call; --- AFTER --- $ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 36.147 + 0.194 clc/call; Median-Min = 0.181 clc/call; Max = 36.536 clc/call; [####################] 100 % (with -msse4.2) Ntrial = 20 ; Min = 40.904 + 0.728 clc/call; Median-Min = 0.557 clc/call; Max = 42.231 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 55.776 + 0.557 clc/call; Median-Min = 0.542 clc/call; Max = 56.551 clc/call; ``` Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153026	2023-06-20 09:25:07 -04:00
Matthias Springer	18ec203084	[mlir][transform] Add ApplyRegisteredPassOp transform op This transform op runs a pass on the target op. Differential Revision: https://reviews.llvm.org/D153143	2023-06-20 08:55:22 +02:00
Christian Sigg	53f6229328	[Bazel][mlir] Fix layering check after `11db162db0`	2023-06-19 09:26:17 +02:00
Pranav Kant	11db162db0	[Bazel][mlir] Port `ee8b8d6b58`	2023-06-18 17:51:56 +00:00
Fangrui Song	6b53c35e15	[bazel] Fix clang after D148094	2023-06-16 22:19:32 -07:00
Pranav Kant	b35c3fd780	[Bazel][mlir] Port `120cd5aafc`	2023-06-17 02:57:56 +00:00
Benjamin Kramer	7ae49609fd	[bazel][mlir] Port `65305aeab9`	2023-06-16 13:20:32 +02:00
Pranav Kant	ae7e6df15f	[Bazel][mlir][tosa] Fix for `86c4972f5f`	2023-06-15 19:50:19 +00:00
Kun Wu	b1c683f5c4	[mlir][sparse][gpu] enable sm80+ sparsity integration test only when explicitly set Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D152966	2023-06-15 17:44:38 +00:00
Benjamin Kramer	048796b887	[bazel][bolt] Port `05634f7346`	2023-06-15 15:42:08 +02:00
Pranav Kant	c731bdd6ca	[Bazel] Another fix for `7a2fdc685f`	2023-06-14 23:29:06 +00:00
Pranav Kant	4cfc33b8b5	[Bazel] Fix for `7a2fdc685f`	2023-06-14 23:12:06 +00:00
Benoit Jacob	1c532b5e44	bazel build --incompatible_no_implicit_file_export The Bazel build was relying, for the two files enumerated in this diff, on the legacy implicit-export semantics described here: https://bazel.build/reference/be/functions#exports_files This documentation page encourages migrating away from this legacy behavior, and indeed we have a user who reported a Bazel build error and it appears that they were already using the new, stricter behavior: https://github.com/openxla/iree/pull/13982 and while examining fixes on our side and trying to get a clean Bazel build, I ran into this similar issue in the LLVM overlay. It would arguably be cleaner (in the sense of more structured) to rely on `filegroup` to export this, but I am insufficiently familiar with the Clang build (the dependent targets seem to be below Clang) to do this myself. The present `exports_files` solution has the merit of being localized in these few lines here. Differential Revision: https://reviews.llvm.org/D152491	2023-06-14 19:24:47 +00:00
Tue Ly	055be3c30c	[libc] Enable hermetic floating point tests again. Fixing an issue with LLVM libc's fenv.h defined rounding mode macros differently from system libc, making get_round() return different values from fegetround(). Also letting math tests to skip rounding modes that cannot be set. This should allow math tests to be run on platforms in which fenv.h is not implemented yet. This allows us to re-enable hermatic floating point tests in https://reviews.llvm.org/D151123 and reverting https://reviews.llvm.org/D152742. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D152873	2023-06-14 10:53:35 -04:00
Guillaume Chatelet	9902fc8dad	[libc] Enable custom logging in LibcTest This patch mimics the behavior of Google Test and allow users to log custom messages after all flavors of ASSERT_ / EXPECT_. Reviewed By: sivachandra, lntue Differential Revision: https://reviews.llvm.org/D152630	2023-06-14 13:37:50 +00:00
Guillaume Chatelet	bdb07c98c4	Revert D152630 "[libc] Enable custom logging in LibcTest" Failing buildbot https://lab.llvm.org/buildbot/#/builders/73/builds/49707 This reverts commit `9a7b4c9348`.	2023-06-14 10:31:49 +00:00
Guillaume Chatelet	9a7b4c9348	[libc] Enable custom logging in LibcTest This patch mimics the behavior of Google Test and allow users to log custom messages after all flavors of ASSERT_ / EXPECT_. Reviewed By: sivachandra, lntue Differential Revision: https://reviews.llvm.org/D152630	2023-06-14 10:26:18 +00:00
Pranav Kant	53e3380786	[Bazel] Fix build	2023-06-13 22:44:11 +00:00
Tue Ly	1557256ab0	[libc] Add Int<> type and fix (U)Int<128> compatibility issues. Add Int<> and Int128 types to replace the usage of __int128_t in math functions. Clean up to make sure that (U)Int128 and __(u)int128_t are interchangeable in the code base. Reviewed By: sivachandra, mikhail.ramalho Differential Revision: https://reviews.llvm.org/D152459	2023-06-13 09:40:48 -04:00
James Knight	c5f6a28749	[bazel] Repair clang_headers_gen when run on macOS. The antique version of bash (3.2.57, from 2007) which is available on macOS cannot deal with quoted slashes in a `${x/...}` substitution. Since only prefix-removal is required here, switch to a `${x#...}` substitution instead. (E.g. `src="foo/bar/baz.h"; echo ${src/"foo/bar"}` echos `bar/bar/baz.h` instead of `/baz.h` on old bash versions). Originally broken by `459420c33a`. Fixes #63222	2023-06-12 16:30:45 -04:00
Michael Jones	d3074f16a6	[libc] Add qsort_r This patch adds the reentrent qsort entrypoint, qsort_r. This is done by extending the qsort functionality and moving it to a shared utility header. For this reason the qsort_r tests focus mostly on the places where it differs from qsort, since they share the same sorting code. Reviewed By: sivachandra, lntue Differential Revision: https://reviews.llvm.org/D152467	2023-06-12 11:12:17 -07:00
Guillaume Chatelet	5e32765c15	[libc] Improve memcmp latency and codegen This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717	2023-06-12 13:47:16 +00:00
Tue Ly	a982431295	[libc] Add platform independent floating point rounding mode checks. Many math functions need to check for floating point rounding modes to return correct values. Currently most of them use the internal implementation of `fegetround`, which is platform-dependent and blocking math functions to be enabled on platforms with unimplemented `fegetround`. In this change, we add platform independent rounding mode checks and switching math functions to use them instead. https://github.com/llvm/llvm-project/issues/63016 Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D152280	2023-06-12 09:36:41 -04:00
Guillaume Chatelet	1ec995cc1c	Revert D148717 "[libc] Improve memcmp latency and codegen" This broke aarch64 debug buildbot https://lab.llvm.org/buildbot/#/builders/223/builds/21703 This reverts commit `bd4f978754`.	2023-06-12 08:32:00 +00:00
Guillaume Chatelet	bd4f978754	[libc] Improve memcmp latency and codegen This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717	2023-06-12 07:56:23 +00:00
Tue Ly	37458f6693	[libc][math] Move str method from FPBits class to testing utils. str method of FPBits class is only used for pretty printing its objects in tests. It brings cpp::string dependency to FPBits class, which is not ideal for embedded use case. We move str method to a free function in test utils and remove this dependency of FPBits class. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D152607	2023-06-10 02:50:58 -04:00
Jordan Rupprecht	261b693afd	[bazel][NFC] Add Dialect/Func/Extensions library and deps Added in D120368	2023-06-09 17:04:41 -07:00
Mikhail Goncharov	b28614c4fc	[bazel] format bazel files NFC	2023-06-09 12:13:07 +02:00
Michael Jones	47fd67ec34	[libc][NFC] land long double table for printf The Mega Table that printf uses for long doubles with some flags is too large for the linters, and so has been split out from the main patch. The main patch: https://reviews.llvm.org/D150399 Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D152470	2023-06-08 16:14:56 -07:00
Michael Jones	688b9730d1	[libc] add options to printf decimal floats This patch adds three options for printf decimal long doubles, and these can also apply to doubles. 1. Use a giant table which is fast and accurate, but takes up ~5MB). 2. Use dyadic floats for approximations, which only gives ~50 digits of accuracy but is very fast. 3. Use large integers for approximations, which is accurate but very slow. Reviewed By: sivachandra, lntue Differential Revision: https://reviews.llvm.org/D150399	2023-06-08 14:23:15 -07:00

1 2 3 4 5 ...

1311 Commits