Commit Graph

484489 Commits

Author SHA1 Message Date
Alexander Yermolovich
ad4cead67c [BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size (#75876)
We would always allocate maximum amount for vector containing
DWARFUnitInfo. In real usecases what ends up hapenning is we allocate a
giant vector when processing one CU, or for thin-lto case multiple CUs.
This lead to a lot of memory overhead, and 2x BOLT processing slowdown
for at least one service built with monolithic DWARF.

For binaries built with LTO with clang all of CUs that have cross
references will share an abbrev table and will be processed in one
batch. Rest of CUs are processesd in --cu-processing-batch-size size.
Which defaults to 1.

For theoretical cases where cross-cu references are present, but they do
not share abbrev will increase the size of CloneUnitCtxMap as each CU is
being processsed.
2023-12-20 16:12:52 -08:00
Valentin Clement
553748356c Revert "[mlir][openacc] Add device_type support for compute operations (#75864)"
This reverts commit 8b885eb90f.
2023-12-20 16:08:10 -08:00
Valentin Clement
e98082d90a Revert "[flang][openacc] Remove unused waitdevnum"
This reverts commit 8fdc3b98b8.
2023-12-20 16:07:57 -08:00
NAKAMURA Takumi
7c9c807fa4 [Bazel] Update llvm/Config, fixup for 476812a742 2023-12-21 08:52:43 +09:00
Vitaly Buka
3dca63a32f [symbolizer] Don't threat symbolizer API as optional (#76103)
There is an assumption that we dont need to to mix sanitizer with
symbolizer from different LLVM revison. If so we can detect it by
`__sanitizer_symbolize_code` and assume that the rest is present.
2023-12-20 15:38:43 -08:00
Maksim Levental
acaff70841 [mlir][python] move transform extras (#76102) 2023-12-20 17:29:11 -06:00
Joseph Huber
f324584ae3 [Libomptarget][NFCI] Remove caching of created ELF files (#76080)
Summary:
We currently keep a cache of created ELF files from the relevant images.
This shouldn't be necessary as the entire ELF interface is generally
trivially constructable and extremely cheap. The cost of constructing
one of these objects is simply a size check and writing a pointer to the
underlying data. Given that, keeping a cache of these images should not
be necessary overall.
2023-12-20 17:13:41 -06:00
michaelrj-google
b37c0486b2 [libc][NFC] clean up printf_core and scanf_core (#74535)
Add LIBC_INLINE annotations to functions and fix variable cases within
printf_core and scanf_core.
2023-12-20 15:12:54 -08:00
Jonas Paulsson
f94adfd50c [docs] Reword the alignment implications for atomic instructions. (#75871)
Atomic instructions (load / store/ atomicrwm / cmpxchg) are not
really undefined behavior if they lack natural alignment. They will
(with AtomicExpand pass enabled) be converted into libcalls.

Update the language reference to reflect this.
2023-12-21 00:08:41 +01:00
Shilei Tian
7e4c6f6cb2 [OpenMP] Reduce the size of heap memory required by the test malloc_parallel.c (#75885)
This patch reduces the size of heap memory required by the test
`malloc_parallel.c` and `malloc.c`. The original size is too large such
that `malloc` returns `nullptr` on many threads, causing illegal
memory access.
2023-12-20 15:03:01 -08:00
Ivan R. Ivanov
39f09ec245 Invalidate analyses after running Attributor in OpenMPOpt (#74908)
Using the LoopInfo from OMPInfoCache after the Attributor ran resulted
in a crash due to it being in an invalid state.

---------

Co-authored-by: Ivan Radanov Ivanov <ivanov2@llnl.gov>
2023-12-20 15:01:21 -08:00
Ethan Luis McDonough
3c10e5b2f6 [OpenMP] Add unit tests for nextgen plugins (#74398)
This patch add three GTest unit tests that test plugin read and write
operations. Tests can be compiled with `ninja -C runtimes/runtimes-bins
LibomptUnitTests`.
2023-12-20 14:58:56 -08:00
Cyndy Ishida
c6f29dbb59 [readtapi] Setup simple stubify support (#76075)
Stubify broadly takes either tbd files or binary dylibs and turns them
into tbd files. In future patches, stubify will also allow additional
information to be embedded into the final TBD output too.

Add Util APIs to TextAPI for common operations used by readtapi for now.
2023-12-20 14:56:53 -08:00
Mikhail Gudim
8773c9be3d [InstCombine] Extend foldICmpBinOp to add-like or. (#71396)
InstCombine canonicalizes `add` to `or` when possible, but this makes
some optimizations applicable to `add` to be missed because they don't
realize that the `or` is equivalent to `add`.

In this patch we generalize `foldICmpBinOp` to handle such cases.
2023-12-20 17:28:57 -05:00
Peiming Liu
cf4dd91165 [mlir][sparse] initialize slice-driven loop-related fields in one place (#76099) 2023-12-20 14:20:57 -08:00
Valentin Clement
8fdc3b98b8 [flang][openacc] Remove unused waitdevnum 2023-12-20 14:01:51 -08:00
Valentin Clement (バレンタイン クレメン)
8b885eb90f [mlir][openacc] Add device_type support for compute operations (#75864)
This patch adds representation for `device_type` clause information on
compute construct (parallel, kernels, serial).

The `device_type` clause on compute construct impacts clauses that
appear after it. The values impacted by `device_type` are now tied with
an attribute array that represent the device_type associated with them.
`DeviceType::None` is used to represent the value produced by a clause
before any `device_type`. The operands and the attribute information are
parser/printed together.

This is an example with `vector_length` clause. The first value (64) is
not impacted by `device_type` so it will be represented with
DeviceType::None. None is not printed. The second value (128) is tied
with the `device_type(multicore)` clause.
```
!$acc parallel vector_length(64) device_type(multicore) vector_length(256)
```
```
acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) {
}
```

When multiple values can be produced for a single clause like
`num_gangs` and `wait`, an extra attribute describe the number of values
belonging to each `device_type`. Values and attributes are
parsed/printed together.

```
acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>])
```

While preparing this patch I noticed that the wait devnum is not part of
the operations and is not lowered. It will be added in a follow up
patch.
2023-12-20 13:45:47 -08:00
Krzysztof Parzyszek
7ffad37c86 [flang][OpenMP] Avoid captures of references to structured bindings
Fixes build break caused by 400c32cbf9.
2023-12-20 15:31:49 -06:00
Krzysztof Parzyszek
400c32cbf9 [flang][OpenMP] Use llvm::enumerate in few places, NFC (#76095)
Use `llvm::enumerate` instead of iterating over a range and keeping a
separate counter.
2023-12-20 15:09:37 -06:00
Justin Bogner
1f3d70a95a [Transforms][DXIL] Basic debug output in dxil-upgrade. NFC 2023-12-20 14:06:42 -07:00
Stella Laurenzo
bbc2976868 [mlir][python] Make the Context/Operation capsule creation methods work as documented. (#76010)
This fixes a longstanding bug in the `Context._CAPICreate` method
whereby it was not taking ownership of the PyMlirContext wrapper when
casting to a Python object. The result was minimally that all such
contexts transferred in that way would leak. In addition, counter to the
documentation for the `_CAPICreate` helper (see
`mlir-c/Bindings/Python/Interop.h`) and the `forContext` /
`forOperation` methods, we were silently upgrading any unknown
context/operation pointer to steal-ownership semantics. This is
dangerous and was causing some subtle bugs downstream where this
facility is getting the most use.

This patch corrects the semantics and will only do an ownership transfer
for `_CAPICreate`, and it will further require that it is an ownership
transfer (if already transferred, it was just silently succeeding).
Removing the mis-aligned behavior made it clear where the downstream was
doing the wrong thing.

It also adds some `_testing_` functions to create unowned context and
operation capsules so that this can be fully tested upstream, reworking
the tests to verify the behavior.

In some torture testing downstream, I was not able to trigger any memory
corruption with the newly enforced semantics. When getting it wrong, a
regular exception is raised.
2023-12-20 12:18:58 -08:00
Alex Beloi
d84c640143 [mlir] Remove "Syntax:" parser where it's already provided by assemblyFormat (#76002)
See #73359

Types using `assemblyFormat` to define parsing don't need an additional
handwritten parser. So we should remove the handwritten parsers where
one
provided by an `assemblyFormat` already exists to avoid confusion and
de-syncing.
2023-12-20 14:58:51 -05:00
Slava Zakharin
b4b23ff7f8 [flang][runtime] Enable more APIs in the offload build. (#75996)
This patch enables more numeric (mod, sum, matmul, etc.) APIs,
and some others.

I added new macros to disable warnings about using C++ STD methods
like operators of std::complex, which do not have __device__ attribute.
This may probably result in unresolved references, if the header files
implementation relies on libstdc++. I will need to follow up on this.
2023-12-20 11:52:51 -08:00
Abhina Sree
892862246e [SystemZ][z/OS] define HOST_NAME_MAX for z/OS (#76093)
This applies the same change made in google benchmark to define HOST_NAME_MAX
for z/OS 7b52bf7346
2023-12-20 14:29:24 -05:00
Sam Clegg
4e8cb01b01 [WebAssembly] Add symbol information for shared libraries (#75238)
The current (experimental) spec for WebAssembly shared libraries does
not include a full symbol table like the object format. This change
extracts symbol information from the normal wasm exports.

This is the first step in having the linker report undefined symbols
when linking with shared libraries. The current behaviour is to ignore
all undefined symbols when linking with `-pie` or `-shared`.

See https://github.com/emscripten-core/emscripten/issues/18198
2023-12-20 11:13:09 -08:00
Dimitry Andric
2c27013fa9 [clang] Add getClangVendor() and use it in CodeGenModule.cpp (#75935)
In 9a38a72f1d `ProductId` was assigned from the stringified value of
`CLANG_VENDOR`, if that macro was defined. However, `CLANG_VENDOR` is
supposed to be a string, as it is defined (optionally) as such in the
top-level clang `CMakeLists.txt`.

Furthermore, `CLANG_VENDOR` is only passed as a build-time define when
compiling `Version.cpp`, so add a `getClangVendor()` function to
`Version.h`, and use it in `CodegGenModule.cpp`, instead of relying on
the macro.

Fixes: 9a38a72f1d
2023-12-20 20:09:39 +01:00
Dimitry Andric
5c1a41f8ad Revert "[clang] Add getClangVendor() and use it in CodeGenModule.cpp (#75935)"
This reverts commit 9055519103, due to an
incorrectly chosen commit message.
2023-12-20 20:07:22 +01:00
Dimitry Andric
9055519103 [clang] Add getClangVendor() and use it in CodeGenModule.cpp (#75935)
In 9a38a72f1d `ProductId` was assigned from the stringified value of
`CLANG_VENDOR`, if that macro was defined. However, `CLANG_VENDOR` is
supposed to be a string, as it is defined (optionally) as such in the
top-level clang `CMakeLists.txt`.

Move the addition of `-DCLANG_VENDOR` to the compiler flags from
`clang/lib/Basic/CMakeLists.txt` to the top-level `CMakeLists.txt`, so
it is consistent across the whole clang codebase. Then remove the
stringification from `CodeGenModule.cpp`, to make it work correctly.

Fixes:		9a38a72f1d
2023-12-20 20:03:19 +01:00
Krzysztof Parzyszek
8b231d73bd [mlir] Fix build break with shared libraries
When project components are built as separate shared libraries, a lot
of errors appear about undefined symbols, e.g.

```
/usr/bin/ld: CMakeFiles/obj.MLIRGPUPipelines.dir/GPUToNVVMPipeline.cpp.o
: in function `(anonymous namespace)::buildCommonPassPipeline(mlir::OpPa
ssManager&, (anonymous namespace)::GPUToNVVMPipelineOptions const&)':
GPUToNVVMPipeline.cpp:(.text._ZN12_GLOBAL__N_123buildCommonPassPipelineE
RN4mlir13OpPassManagerERKNS_24GPUToNVVMPipelineOptionsE+0xa5): undefined
 reference to `mlir::createConvertLinalgToLoopsPass()'
```

Add the necessary dependencies to Dialect/GPU/Pipelines/CMakeLists.txt
2023-12-20 12:49:25 -06:00
Han-Chung Wang
b33a131c82 [mlir][arith] Add support for expanding arith.maxnumf/minnumf ops. (#75989)
The maxnum/minnum semantics can be found at
https://llvm.org/docs/LangRef.html#llvm-minnum-intrinsic.

The revision also updates function names in lit tests to match op name.

Take arith.maxnumf as example:

```
func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 {
  %result = arith.maxnumf %lhs, %rhs : f32
  return %result : f32
}
```

will be expanded to

```
func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 {
  %0 = arith.cmpf ugt, %lhs, %rhs : f32
  %1 = arith.select %0, %lhs, %rhs : f32
  %2 = arith.cmpf uno, %lhs, %lhs : f32
  %3 = arith.select %2, %rhs, %1 : f32
  return %3 : f32
}
```

Case 1: Both LHS and RHS are not NaN; LHS > RHS

In this case, `%1` is LHS. `%3` and `%1` have the same value, so `%3` is
LHS.

Case 2: LHS is NaN and RHS is not NaN

In this case, `%2` is true, so `%3` is always RHS.

Case 3: LHS is not NaN and RHS is NaN

In this case, `%0` is true and `%1` is LHS. `%2` is false, so `%3` and
`%1` have the same value, which is LHS.

Case 4: Both LHS and RHS are NaN:

`%1` and RHS are all NaN, so the result is still NaN.
2023-12-20 10:35:12 -08:00
Shoaib Meenai
e7bd673681 [runtimes] Fix test dependencies
compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp
needs llvm-lto and opt.
2023-12-20 10:19:06 -08:00
Craig Topper
b03f0c596a [RISCV] Add sifive-p450 CPU. (#75760)
This is an out of order core with no vector unit. More information:
https://www.sifive.com/cores/performance-p450-470

Scheduler model and other tuning will come in separate patches.
2023-12-20 09:52:02 -08:00
Schrodinger ZHU Yifan
7a87ff64e1 [libc] suppress stdlib explicitly for crt1.a (#76079)
[nd: updated oneline]
2023-12-20 09:42:35 -08:00
Florian Hahn
18170d0f28 [ConstraintElim] Extend AND implication logic to support OR as well. (#76044)
Extend the logic check if an operand of an AND is implied by the other
to also support OR. This is done by checking if !op1 implies op2 or vice
versa.
2023-12-20 18:13:41 +01:00
LLVM GN Syncbot
2c257cf872 [gn build] Port 5ea15fab19 2023-12-20 16:47:21 +00:00
Cyndy Ishida
5ea15fab19 [TextAPI] Add support to convert RecordSlices -> InterfaceFile (#75007)
Introduce RecordVisitor. This is used for different clients that want to
extract information out of RecordSlice types.
The first and immediate use case is for serializing symbol information
into TBD files.
2023-12-20 08:47:10 -08:00
Schrodinger ZHU Yifan
8bbeed05c4 [libc] [startup] add cmake function to merge separated crt1 objects (#75413)
As part of startup refactoring, this patch adds a function to merge
multiple objects into a single relocatable object:
                     cc -r obj1.o obj2.o -o obj.o

A relocatable object is an object file that is not fully linked into an
executable or a shared library. It is an intermediate file format that
can be passed into the linker.

A crt object can have arch-specific code and arch-agnostic code. To
reduce code cohesion, the implementation is splitted into multiple
units. As a result, we need to merge them into a single relocatable
object.
2023-12-20 08:18:51 -08:00
Joseph Huber
e4f4022b70 [Libomptarget][NFC] Fix linting warnings in the plugins
Summary:
Fix some linting warnings present in the plugins.
2023-12-20 10:07:34 -06:00
Florian Hahn
b1a5ee1feb [ARM] Check all terms in emitPopInst when clearing Restored for LR. (#75527)
emitPopInst checks a single function exit MBB. If other paths also exit
the function and any of there terminators uses LR implicitly, it is not
save to clear the Restored bit.

Check all terminators for the function before clearing Restored.

This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.

Alternative to #73553
2023-12-20 16:56:15 +01:00
Lucas Duarte Prates
d43fc5a6ad Reland: [AArch64] Assembly support for the Checked Pointer Arithmetic Extension (#73777)
This introduces assembly support for the Checked Pointer Arithmetic
Extension (FEAT_CPA), annouced as part of the Armv9.5-A architecture
version.

The changes include:
* New subtarget feature for FEAT_CPA
* New scalar instruction for pointer arithmetic
  * ADDPT, SUBPT, MADDPT, and MSUBPT
* New SVE instructions for pointer arithmetic
  * ADDPT (vectors, predicated), ADDPT (vectors, unpredicated)
  * SUBPT (vectors, predicated), SUBPT (vectors, unpredicated)
  * MADPT and MLAPT
* New ID_AA64ISAR3_EL1 system register

Mode details about the extension can be found at:
* https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023
* https://developer.arm.com/documentation/ddi0602/2023-09/

Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>
2023-12-20 15:43:17 +00:00
Zequan Wu
688fa35df0 [Profile] Dump binary id to raw profiles on Windows. (#75618)
#74652 adds `__buildid` symbol which allows us to dump it at runtime.
2023-12-20 10:41:36 -05:00
Paul C Fuqua
11141bc68a Fix what seems to be a silly bug in gpu.set_default_device rewriting. Smoke test included. (#75756) 2023-12-20 09:35:42 -06:00
LLVM GN Syncbot
300adbee88 [gn build] Port fdd089b500 2023-12-20 15:23:16 +00:00
LLVM GN Syncbot
d2330058df [gn build] Port 3903438860 2023-12-20 15:23:15 +00:00
Simon Pilgrim
6ec350b483 [X86] SimplifyDemandedVectorEltsForTargetShuffle - don't simplify constant mask if it has multiple uses
Avoid generating extra constant vectors
2023-12-20 15:22:48 +00:00
Razvan Lupusoru
a711b042fd [acc] Initial implementation of MemoryEffects on acc operations (#75970)
The `acc` dialect operations now implement MemoryEffects interfaces in
the following ways:
- Data entry operations which may read host memory via `varPtr` are now
marked as so. The majority of them do NOT actually read the host memory.
For example, `acc.present` works on the basis of presence of pointer and
not necessarily what the data points to - so they are not marked as
reading the host memory. They still use `varPtr` though but this
dependency is reflected through ssa.
- Data clause operations which may mutate the data pointed to by
`accPtr` are marked as doing so.
- Data clause operations which update required structured or dynamic
runtime counters are marked as reading and writing the newly defined
`RuntimeCounters` resource. Some operations, like `acc.getdeviceptr` do
not actually use the runtime counters - but are marked as reading them
since the address obtained depends on the mapping operations which do
update the runtime counters. Namely, `acc.getdeviceptr` cannot be moved
across other mapping operations.
- Constructs are marked as writing to the `ConstructResource`. This may
be too strict but is needed for the following reasons: 1) Structured
constructs may not use `accPtr` and instead use `varPtr` - when this is
the case, data actions may be removed even when used. 2) Unstructured
constructs are currently used to aggregate multiple data actions. We do
not want such constructs removed or moved for now.
- Terminators are marked as `Pure` as in other dialects.

The current approach has the following limitations which may require
further improvements:
- Subsequent `acc.copyin` operations on same data do not actually read
host memory pointed to by `varPtr` but are still marked as so.
- Two `acc.delete` operations on same data may not mutate `accPtr` until
the runtime counters are zero (but are still marked as mutating).
- The `varPtrPtr` argument, when present, points to the address of
location of `varPtr`. When mapping to target device, an `accPtrPtr`
needs computed and this memory is mutated. This effect is not captured
since the current operations do not produce `accPtrPtr`.
- Runtime counter effects are imprecise since two operations with
differing `varPtr` increment/decrement different counters. Additionally,
operations with `varPtrPtr` mutate attachment counters.
- The `ConstructResource` is too strict and likely can be relaxed with
better modeling.
2023-12-20 07:11:19 -08:00
Christian Sigg
476812a742 [bazel] Update config.h.cmake after e86a02ce89. 2023-12-20 16:07:46 +01:00
Nikita Popov
8b8f2ef06e [MergeFunc] Fix comparison of constant expressions
Functions using different constant expressions were incorrectly
merged, because a lot of state was missing from the comparison,
including the opcode, the comparison predicate, the GEP element
type, as well as the inbounds, inrange and nowrap poison flags.
2023-12-20 15:59:02 +01:00
Nico Weber
6cd296ed85 [gn] port e86a02ce89 (dladdr -> llvm-config.h)
Also set HAVE_DLADDR to 1 on non-Win instead of just on macOS.
That looked like an oversight.
2023-12-20 09:57:37 -05:00
Alexey Bataev
a13148a880 [SLP]Fix PR75995: drop wrapping flags for resized wrapped binops.
If decided to resize the instruction, need to drop wrapping flags from
the resulting vector instructions to avoid incorrect
optimizations/assumptions later.
Fixes PR75995.
2023-12-20 06:51:39 -08:00