Commit Graph

441864 Commits

Author SHA1 Message Date
Mahesh Ravishankar
fc367dfa67 [mlir] Remove Transforms/SideEffectUtils.h and move the methods into Interface/SideEffectInterfaces.h.
The methods in `SideEffectUtils.h` (and their implementations in
`SideEffectUtils.cpp`) seem to have similar intent to methods already
existing in `SideEffectInterfaces.h`. Move the decleration (and
implementation) from `SideEffectUtils.h` (and `SideEffectUtils.cpp`)
into `SideEffectInterfaces.h` (and `SideEffectInterface.cpp`).

Also drop the `SideEffectInterface::hasNoEffect` method in favor of
`mlir::isMemoryEffectFree` which actually recurses into the operation
instead of just relying on the `hasRecursiveMemoryEffectTrait`
exclusively.

Differential Revision: https://reviews.llvm.org/D137857
2022-11-15 20:07:35 +00:00
Shafik Yaghmour
9332ddfba6 [Clang] Extend the number of case Sema::CheckForIntOverflow covers
Currently Sema::CheckForIntOverflow misses several case that other compilers
diagnose for overflow in integral constant expressions. This includes the
arguments of a CXXConstructExpr as well as the expressions used in an
ArraySubscriptExpr, CXXNewExpr and CompoundLiteralExpr.

This fixes https://github.com/llvm/llvm-project/issues/58944

Differential Revision: https://reviews.llvm.org/D137897
2022-11-15 12:07:03 -08:00
Kazu Hirata
eba3fece88 [mlir] Fix warnings
This patch fixes:

  mlir/include/mlir/ExecutionEngine/SparseTensor/Storage.h:955:20:
  error: unused variable 'sz' [-Werror,-Wunused-variable]

  mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp:1460:2:
  error: extra ';' outside of a function is incompatible with C++98
  [-Werror,-Wc++98-compat-extra-semi]
2022-11-15 12:01:00 -08:00
bixia1
13c41757b1 [mlir][sparse] Only insert non-zero values to the result of the concatenate operation.
Modify the integration test to check number_of_entries and use it to limit for
outputing sparse tensor values.

Reviewed By: aartbik, Peiming

Differential Revision: https://reviews.llvm.org/D138046
2022-11-15 11:52:52 -08:00
Peiming Liu
c3821e1684 [mlir][sparse] fix bugs in concatenate rewriter.
Reviewed By: aartbik, bixia

Differential Revision: https://reviews.llvm.org/D138053
2022-11-15 19:49:59 +00:00
Reed
88eb3c62f2 Add FP8 E4M3 support to APFloat.
NVIDIA, ARM, and Intel recently introduced two new FP8 formats, as described in the paper: https://arxiv.org/abs/2209.05433. The first of the two FP8 dtypes, E5M2, was added in https://reviews.llvm.org/D133823. This change adds the second of the two: E4M3.

There is an RFC for adding the FP8 dtypes here: https://discourse.llvm.org/t/rfc-add-apfloat-and-mlir-type-support-for-fp8-e5m2/65279. I spoke with the RFC's author, Stella, and she gave me the go ahead to implement the E4M3 type. The name of the E4M3 type in APFloat is Float8E4M3FN, as discussed in the RFC. The "FN" means only Finite and NaN values are supported.

Unlike E5M2, E4M3 has different behavior from IEEE types in regards to Inf and NaN values. There are no Inf values, and NaN is represented when the exponent and mantissa bits are all 1s. To represent these differences in APFloat, I added an enum field, fltNonfiniteBehavior, to the fltSemantics struct. The possible enum values are IEEE754 and NanOnly. Only Float8E4M3FN has the NanOnly behavior.

After this change is submitted, I plan on adding the Float8E4M3FN type to MLIR, in the same way as E5M2 was added in https://reviews.llvm.org/D133823.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D137760
2022-11-15 20:26:42 +01:00
Roy Sundahl
b2d9e08c44 [asan][darwin] This test is x86_64 specific, not non-ios in general.
This test was unsupported in iOS when a more accurate test is that the architecture is x86_64. This "fix" is first in a series of updates intended to get asan arm64 tests fully functional.

Reviewed By: thetruestblue, vitalybuka

Differential Revision: https://reviews.llvm.org/D138001
2022-11-15 11:25:54 -08:00
Fangrui Song
6c7666a408 Revert D137574 "PEI should be able to use backward walk in replaceFrameIndicesBackward."
This reverts commit e05ce03cfa.

Caused asan use-after-poison to 4 DebugInfo/AMDGPU/ tests.
Triggered in PEI::replaceFrameIndicesBackward called llvm::MachineInstr::getNumOperands
2022-11-15 19:19:46 +00:00
Tue Ly
bc10a41080 [libc][math] Improve the performance and error printing of UInt.
Use add_with_carry builtin to improve the performance of
addition and multiplication of UInt class.  For 128-bit, it is as
fast as using __uint128_t.

Microbenchmark for addition:
https://quick-bench.com/q/-5a6xM4T8rIXBhqMTtLE-DD2h8w

Microbenchmark for multiplication:
https://quick-bench.com/q/P2muLAzJ_W-VqWCuxEJ0CU0bLDg

Microbenchmark for shift right:
https://quick-bench.com/q/N-jkKXaVsGQ4AAv3k8VpsVkua5Y

Microbenchmark for shift left:
https://quick-bench.com/q/5-RzwF8UdslC-zuhNajXtXdzLRM

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D137871
2022-11-15 14:17:59 -05:00
Fangrui Song
ea5be2571d [AVR] Fix use-of-uninitialized-value after D137520 2022-11-15 11:06:18 -08:00
Andrew Savonichev
4f9321f92c [NVPTX] Fix alignment for arguments of function pointer calls
Alignment of function arguments can be increased only if we can do
this for all call sites. Therefore we do not increase it for external
functions, and now we skip functions that have address taken, to avoid
any issues with functions pointers.

Differential Revision: https://reviews.llvm.org/D135708
2022-11-15 21:43:06 +03:00
Andrew Savonichev
69e73d076b [NVPTX] Fix pointer argument declaration for --nvptx-short-ptr
When --nvptx-short-ptr is set, local pointers are stored as 32-bit on
nvptx64 target.

Before this patch, arguments for a function declaration were always
emitted as b64 regardless of their address space, but they were set as
b32 for the corresponding call instruction:

   .extern .func test
   (
    .param .b64 test_param_0
   )
   [...]
    .param .b32 param0;
    st.param.b32 [param0+0], %r1;
    call.uni test, (param0);

This is not supported:

  ptxas: Type of argument does not match formal parameter
  'test_param_0'

Now short pointers in a function declaration are emitted as b32 if
--nvptx-short-ptr is set.

Differential Revision: https://reviews.llvm.org/D135674
2022-11-15 21:41:33 +03:00
Andrew Savonichev
c38fa7c014 [NVPTX] Fix pointer type for short 32-bit pointers
Global variables used to be printed as u64/b64 even when
-nvptx-short-ptr is set.

Differential Revision: https://reviews.llvm.org/D127668
2022-11-15 21:39:34 +03:00
Roman Lebedev
11abb7fedb [NFC][X86][Costmodel] Drop reduntant interleaved cost test coverage
These are already covered by the more general tests i've added.
2022-11-15 21:30:06 +03:00
Mehdi Amini
9b1bd2357b Apply clang-tidy fixes for modernize-loop-convert in CodegenUtils.cpp (NFC) 2022-11-15 18:14:02 +00:00
Mehdi Amini
fbfca43e6d Apply clang-tidy fixes for llvm-qualified-auto in TileUsingInterface.cpp (NFC) 2022-11-15 18:14:01 +00:00
Joseph Huber
93d1a7bfc1 [libc] Fix tablegen when using a runtimes build
When using `LLVM_ENABLE_RUNTIMES=libc` we need to perform a few extra
steps to include LLVM utilities similar to if we were performing a
standalone build. Libc depends on the tablegen utilities and the LLVM
libraries when performing a full build. When using an
`LLVM_ENABLE_PROJECTS=libc` build these are included as a part of the
greater LLVM build, but here we need to perform it maunally. This patch
should allow using `LLVM_LIBC_FULL_BUILD=ON` when building with
runtimes.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D138040
2022-11-15 12:09:52 -06:00
Roman Lebedev
8e37b53360 [X86] Rewrite getScalarizationOverhead()
All of our insert/extract ops work on 128-bit lanes.

For `Insert`, we need to extract affected 128-bit lane,
unless it's being fully overwritten (FIXME: do we need to be
careful about legalization-induced padding that we obviously don't demand?),
perform insertions, and then insert the 128-bit lane back.

But hold on. If we are operating on an 256-bit legal vector,
and thus have two 128-bit subvectors, and are fully overwriting them both,
we don't actually need to insert *both* subvectors,
only the second one, into the implicitly-widened first one.

Also, `Insert` wasn't actually querying the costs,
but just assuming them to be `1`.

`getShuffleCost(TTI::SK_ExtractSubvector)` notes:
```
  // Note that in general, the insertion starting at the beginning of a vector
  // isn't free, because we need to preserve the rest of the wide vector.
```
... so as far as i can tell, we didn't account for that.

I was hoping this would allow vectorization at a higher VF at one case i looked at,
but the subvector insertion cost is still dis-advising that.

The change for `Extract` is NFC, and is for consistency only,
i wanted to get rid of of that weird explicit discounting of insertion of 0'th element,
since the general code should already deal with that.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D137913
2022-11-15 21:07:12 +03:00
Krzysztof Parzyszek
a75bab6e60 [Hexagon] Fix even/odd word shuffling
Used the wrong shuffle instruction... -_-
2022-11-15 10:06:19 -08:00
Fangrui Song
00c22351ba [IR] Fix -Wreturn-type after D135714 "[MemProf] ThinLTO summary support" 2022-11-15 09:55:29 -08:00
Mohammed Anany
77533d79f7 [mlir][SCF] Adding custom builder to SCF::WhileOp.
This is a similar builder to the one for SCF::IfOp which allows users to pass region builders to it. Refer to the builders for IfOp.

Reviewed By: tpopp

Differential Revision: https://reviews.llvm.org/D137709
2022-11-15 18:16:49 +01:00
Guray Ozen
beaffb041c [mlir][transform] Decouple GPUDeviceMapping attribute from the GPU transfrom dialect code generator
`DeviceMappingAttrInterface` is implemented as unifiying mechanism for thread mapping. A code generator could use any attribute that implements this interface to lower `scf.foreach_thread` to device specific code. It is allowed to choose its own mapping and interpretation.

Currently, GPU transform dialect supports only `GPUThreadMapping` and `GPUBlockMapping`; however, other mappings should to be supported as well. This change addresses this issue. It decouples gpu transform dialect from the `GPUThreadMapping` and `GPUBlockMapping`. Now, they can work any other mapping.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D138020
2022-11-15 18:16:32 +01:00
Mikhail Goncharov
4a77d96903 [LegacyPM] remove unset variables in PassManagerBuilder
D137915 stopped setting this variables but NewGVN was still used and caused asan failure

Differential Revision: https://reviews.llvm.org/D138034
2022-11-15 17:57:44 +01:00
Teresa Johnson
98ed423361 Restore "[MemProf] ThinLTO summary support" with fixes
This restores 4745945500, which was
reverted in commit 452a14efc8, along with
fixes for a couple of bot failures.
2022-11-15 08:55:17 -08:00
Nico Weber
674729813e Revert "[libc++] Only include_next C library headers when they exist"
This reverts commit 226409c628.
Breaks check-clang on mac, see comments on https://reviews.llvm.org/D136683
2022-11-15 11:35:00 -05:00
Louis Dionne
bf68a595f6 [libc++] Start classifying debug mode features with more granularity
I am starting to granularize debug-mode checks so they can be controlled
more individually. The goal is for vendors to eventually be able to select
which categories of checks they want embedded in their configuration of
the library with more granularity.

Note that this patch is a bit weird on its own because it does not touch
any of the containers that implement iterator bounds checking through the
__dereferenceable check of the legacy debug mode. However, I added TODOs
to string and vector to change that.

Differential Revision: https://reviews.llvm.org/D138033
2022-11-15 11:18:22 -05:00
Paul Kirth
2b8917f8ad [pgo] Avoid introducing relocations by using private alias
Instead of using the public, interposable symbol, we can use a private
alias and avoid relocations and addends.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D137982
2022-11-15 16:05:24 +00:00
Michael Maitland
2323a4ee61 Revert "[RISCV][llvm-mca] Use LMUL Instruments to provide more accurate reports on RISCV"
This reverts commit 5e82ee5373.
2022-11-15 08:04:11 -08:00
Sanjay Patel
fe05a0a3dd [SDAG] avoid udiv/urem transform for vector/scalar type mismatches
This solves the crashing from issue #58994.
I don't know anything about VE, so I don't know if the output
is as expected or even correct.
2022-11-15 11:01:18 -05:00
Sanjay Patel
e487356e13 [SDAG] improve assert text for getSetCC type assumptions; NFC
Having identical text for these 2 conditions made it harder
to find the root problem for issue #58994.
2022-11-15 11:01:18 -05:00
Bradley Smith
daf1a1f690 [AArch64][SVE] Add instcombine to convert ptest.last/first to ptest.any
This allow for better optimization later in the backend.

This fixes the remaining missed optimizations in D137717.

Depends on D137930

Differential Revision: https://reviews.llvm.org/D137947
2022-11-15 15:59:21 +00:00
Louis Dionne
226409c628 [libc++] Only include_next C library headers when they exist
Some platforms don't provide all C library headers. In practice, libc++
only requires a few C library headers to exist, and only a few functions
on those headers. Missing functions that libc++ doesn't need for its own
implementation are handled properly by the using_if_exists attribute,
however a missing header is currently a hard error when we try to
do #include_next.

This patch should make libc++ more flexible on platforms that do not
provide C headers that libc++ doesn't actually require for its own
implementation. The only downside is that it may move some errors from
the #include_next point to later in the compilation if we actually try
to use something that isn't provided, which could be somewhat confusing.
However, these errors should be caught by folks trying to port libc++
over to a new platform (when running the libc++ test suite), not by end
users.

Differential Revision: https://reviews.llvm.org/D136683
2022-11-15 10:58:11 -05:00
Nico Weber
20b0e0a71a [gn build] Stop defining GTEST_LANG_CXX11, pass /Zc:__cplusplus with msvc
Ports:
* https://reviews.llvm.org/D84023
* https://reviews.llvm.org/rG4f5ccc72f6a6e
  (but see https://reviews.llvm.org/rG4901199f5b84b223)

No intended behavior change.
2022-11-15 10:56:53 -05:00
Michael Maitland
5e82ee5373 [RISCV][llvm-mca] Use LMUL Instruments to provide more accurate reports on RISCV
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.

On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.

* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change

Differential Revision: https://reviews.llvm.org/D137440
2022-11-15 07:54:06 -08:00
Bradley Smith
2fb3e3c46d [AArch64][SVE] Add PTEST_ANY pseudo instruction
This allow recognition of when a ptest was emitted as an any condition
and allows for extra optimization to be done later.

This addresses missing optimizations from D137716 and D137718, and
partially D137717.

Depends on D137716, D137717, D137718

Differential Revision: https://reviews.llvm.org/D137930
2022-11-15 15:46:28 +00:00
Nikita Popov
8051c1db95 [AST] Don't use WeakVH for unknown insts (NFCI)
After D138014 we do not support using AST with IR that is being
mutated. As such, we also no longer need to track unknown
instructions using WeakVH. Replace with AssertingVH to make sure
that they are not invalidated.
2022-11-15 16:46:05 +01:00
Nico Weber
1b829c3d3a [gn build] Make libcxx_enable_debug_mode work better, maybe
Refer to _LIBCPP_ENABLE_DEBUG_MODE instead of the old _LIBCPP_DEBUG
in a comment, and write that to __config_site correctly too.

See 13ea134323 and the comments in https://crbug.com/1358646.

Also change the default of libcxx_enable_debug_mode to false for now.
Since we used to not write _LIBCPP_ENABLE_DEBUG_MODE, the previous
default of true had no effect (except for compiling debug.cpp and
legacy_debug_handler.cpp, which we now no longer build by default).
So this (mostly) preserves previous behavior.
2022-11-15 10:45:41 -05:00
Teresa Johnson
452a14efc8 Revert "[MemProf] ThinLTO summary support"
This reverts commit 4745945500.

Revert while I try to fix a couple of non-Linux build failures.
2022-11-15 07:39:40 -08:00
Nikita Popov
db5855d0e4 [AST] Restrict AliasSetTracker to immutable IR
This restricts usage of AliasSetTracker to IR that does not change.
We used to use it during LICM where the underlying IR could change,
but remaining uses all use AST as part of a separate analysis phase.

This is split out from D137955, which makes use of the new guarantee
to switch to BatchAA.

Differential Revision: https://reviews.llvm.org/D138014
2022-11-15 16:27:28 +01:00
OCHyams
139e08efc5 [Assignment Tracking][23/*] Account for assignment tracking in SLP Vectorizer
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

The SLP-Vectorizer can merge a set of scalar stores into a single vectorized
store. Merge DIAssignID intrinsics from the scalar stores onto the new
vectorized store.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133320
2022-11-15 15:20:18 +00:00
Nikita Popov
884b919f2e Reapply [Hexagon] Use default attributes for intrinsics
The issue that caused the revert has been fixed in:
44bd807512

-----

This switches Hexagon intrinsics to use the default attributes
(nosync, nofree, nocallback and willreturn). Especially willreturn
is needed to prevent optimization regressions in the future.

The only intrinsics I've excluded here are the load/store locked
intrinsics, which presumably aren't nosync.

Differential Revision: https://reviews.llvm.org/D137623
2022-11-15 16:01:14 +01:00
OCHyams
966867af73 [Assignment Tracking][22/*] Add loop-deletion test
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

This test covers the NFC-for-normal-debug-info change D133303.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133319
2022-11-15 14:56:58 +00:00
Chuanqi Xu
de034cf313 [NFC] Fix the typo and the format in the StandardCPlusPlusModules
document
2022-11-15 22:53:43 +08:00
Krzysztof Parzyszek
44bd807512 [Hexagon] Adjust handling of stack with variable-size and extra alignment
Make the stack alignment register (AP) reserved in the given function. This
will make it available everywhere in the function, and allow aligned access
to vector register spill slots.
2022-11-15 06:48:53 -08:00
Louis Dionne
13ea134323 [libc++] Make it an error to define _LIBCPP_DEBUG
We have been transitioning off of that macro since LLVM 15.

Differential Revision: https://reviews.llvm.org/D137975
2022-11-15 09:48:13 -05:00
Fraser Cormack
cfb0d628a7 [MergeICmps][NFC] Fix a couple of typos in a comment 2022-11-15 14:46:23 +00:00
Teresa Johnson
4745945500 [MemProf] ThinLTO summary support
Implements the ThinLTO summary support for memprof related metadata.

This includes support for the assembly format, and for building the
summary from IR during ModuleSummaryAnalysis.

To reduce space in both the bitcode format and the in memory index,
we do 2 things:
1. We keep a single vector of all uniq stack id hashes, and record the
   index into this vector in the callsite and allocation memprof
   summaries.
2. When building the combined index during the LTO link, the callsite
   and allocation memprof summaries are only kept on the FunctionSummary
   of the prevailing copy.

Differential Revision: https://reviews.llvm.org/D135714
2022-11-15 06:45:12 -08:00
Ayke van Laethem
a8673b7229 [AVR] Add FeatureEIJMPCALL to FamilyAVR6
This feature was probably missed when adding FamilyAVR6, but should
definitely be there. I checked all four devices in the AVR6 family and
they all support eijmp/eicall.

Found while working on https://reviews.llvm.org/D137572.

Differential Revision: https://reviews.llvm.org/D137573
2022-11-15 15:29:37 +01:00
Ayke van Laethem
09ab9d4d11 [AVR][Clang] Implement __AVR_ARCH__ macro
This macro is defined in avr-gcc, and is very useful especially in
assembly code to check whether particular instructions are supported. It
is also the basis for other macros like __AVR_HAVE_ELPM__.

Differential Revision: https://reviews.llvm.org/D137521
2022-11-15 15:29:37 +01:00
Ayke van Laethem
d2563775cd [AVR][Clang] Move family names into MCU list
This simplifies the code by avoiding some special cases for family names
(as opposed to device names).

Differential Revision: https://reviews.llvm.org/D137520
2022-11-15 15:29:37 +01:00