intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-26 03:56:16 +08:00

Author	SHA1	Message	Date
Alexander Yermolovich	ad4cead67c	[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size (#75876 ) We would always allocate maximum amount for vector containing DWARFUnitInfo. In real usecases what ends up hapenning is we allocate a giant vector when processing one CU, or for thin-lto case multiple CUs. This lead to a lot of memory overhead, and 2x BOLT processing slowdown for at least one service built with monolithic DWARF. For binaries built with LTO with clang all of CUs that have cross references will share an abbrev table and will be processed in one batch. Rest of CUs are processesd in --cu-processing-batch-size size. Which defaults to 1. For theoretical cases where cross-cu references are present, but they do not share abbrev will increase the size of CloneUnitCtxMap as each CU is being processsed.	2023-12-20 16:12:52 -08:00
Valentin Clement	553748356c	Revert "[mlir][openacc] Add device_type support for compute operations (#75864 )" This reverts commit `8b885eb90f`.	2023-12-20 16:08:10 -08:00
Valentin Clement	e98082d90a	Revert "[flang][openacc] Remove unused waitdevnum" This reverts commit `8fdc3b98b8`.	2023-12-20 16:07:57 -08:00
NAKAMURA Takumi	7c9c807fa4	[Bazel] Update llvm/Config, fixup for `476812a742`	2023-12-21 08:52:43 +09:00
Vitaly Buka	3dca63a32f	[symbolizer] Don't threat symbolizer API as optional (#76103 ) There is an assumption that we dont need to to mix sanitizer with symbolizer from different LLVM revison. If so we can detect it by `__sanitizer_symbolize_code` and assume that the rest is present.	2023-12-20 15:38:43 -08:00
Maksim Levental	acaff70841	[mlir][python] move transform extras (#76102 )	2023-12-20 17:29:11 -06:00
Joseph Huber	f324584ae3	[Libomptarget][NFCI] Remove caching of created ELF files (#76080 ) Summary: We currently keep a cache of created ELF files from the relevant images. This shouldn't be necessary as the entire ELF interface is generally trivially constructable and extremely cheap. The cost of constructing one of these objects is simply a size check and writing a pointer to the underlying data. Given that, keeping a cache of these images should not be necessary overall.	2023-12-20 17:13:41 -06:00
michaelrj-google	b37c0486b2	[libc][NFC] clean up printf_core and scanf_core (#74535 ) Add LIBC_INLINE annotations to functions and fix variable cases within printf_core and scanf_core.	2023-12-20 15:12:54 -08:00
Jonas Paulsson	f94adfd50c	[docs] Reword the alignment implications for atomic instructions. (#75871 ) Atomic instructions (load / store/ atomicrwm / cmpxchg) are not really undefined behavior if they lack natural alignment. They will (with AtomicExpand pass enabled) be converted into libcalls. Update the language reference to reflect this.	2023-12-21 00:08:41 +01:00
Shilei Tian	7e4c6f6cb2	[OpenMP] Reduce the size of heap memory required by the test `malloc_parallel.c` (#75885 ) This patch reduces the size of heap memory required by the test `malloc_parallel.c` and `malloc.c`. The original size is too large such that `malloc` returns `nullptr` on many threads, causing illegal memory access.	2023-12-20 15:03:01 -08:00
Ivan R. Ivanov	39f09ec245	Invalidate analyses after running Attributor in OpenMPOpt (#74908 ) Using the LoopInfo from OMPInfoCache after the Attributor ran resulted in a crash due to it being in an invalid state. --------- Co-authored-by: Ivan Radanov Ivanov <ivanov2@llnl.gov>	2023-12-20 15:01:21 -08:00
Ethan Luis McDonough	3c10e5b2f6	[OpenMP] Add unit tests for nextgen plugins (#74398 ) This patch add three GTest unit tests that test plugin read and write operations. Tests can be compiled with `ninja -C runtimes/runtimes-bins LibomptUnitTests`.	2023-12-20 14:58:56 -08:00
Cyndy Ishida	c6f29dbb59	[readtapi] Setup simple stubify support (#76075 ) Stubify broadly takes either tbd files or binary dylibs and turns them into tbd files. In future patches, stubify will also allow additional information to be embedded into the final TBD output too. Add Util APIs to TextAPI for common operations used by readtapi for now.	2023-12-20 14:56:53 -08:00
Mikhail Gudim	8773c9be3d	[InstCombine] Extend `foldICmpBinOp` to `add`-like `or`. (#71396 ) InstCombine canonicalizes `add` to `or` when possible, but this makes some optimizations applicable to `add` to be missed because they don't realize that the `or` is equivalent to `add`. In this patch we generalize `foldICmpBinOp` to handle such cases.	2023-12-20 17:28:57 -05:00
Peiming Liu	cf4dd91165	[mlir][sparse] initialize slice-driven loop-related fields in one place (#76099 )	2023-12-20 14:20:57 -08:00
Valentin Clement	8fdc3b98b8	[flang][openacc] Remove unused waitdevnum	2023-12-20 14:01:51 -08:00
Valentin Clement (バレンタインクレメン)	8b885eb90f	[mlir][openacc] Add device_type support for compute operations (#75864 ) This patch adds representation for `device_type` clause information on compute construct (parallel, kernels, serial). The `device_type` clause on compute construct impacts clauses that appear after it. The values impacted by `device_type` are now tied with an attribute array that represent the device_type associated with them. `DeviceType::None` is used to represent the value produced by a clause before any `device_type`. The operands and the attribute information are parser/printed together. This is an example with `vector_length` clause. The first value (64) is not impacted by `device_type` so it will be represented with DeviceType::None. None is not printed. The second value (128) is tied with the `device_type(multicore)` clause. ``` !$acc parallel vector_length(64) device_type(multicore) vector_length(256) ``` ``` acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) { } ``` When multiple values can be produced for a single clause like `num_gangs` and `wait`, an extra attribute describe the number of values belonging to each `device_type`. Values and attributes are parsed/printed together. ``` acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>]) ``` While preparing this patch I noticed that the wait devnum is not part of the operations and is not lowered. It will be added in a follow up patch.	2023-12-20 13:45:47 -08:00
Krzysztof Parzyszek	7ffad37c86	[flang][OpenMP] Avoid captures of references to structured bindings Fixes build break caused by `400c32cbf9`.	2023-12-20 15:31:49 -06:00
Krzysztof Parzyszek	400c32cbf9	[flang][OpenMP] Use `llvm::enumerate` in few places, NFC (#76095 ) Use `llvm::enumerate` instead of iterating over a range and keeping a separate counter.	2023-12-20 15:09:37 -06:00
Justin Bogner	1f3d70a95a	[Transforms][DXIL] Basic debug output in dxil-upgrade. NFC	2023-12-20 14:06:42 -07:00
Stella Laurenzo	bbc2976868	[mlir][python] Make the Context/Operation capsule creation methods work as documented. (#76010 ) This fixes a longstanding bug in the `Context._CAPICreate` method whereby it was not taking ownership of the PyMlirContext wrapper when casting to a Python object. The result was minimally that all such contexts transferred in that way would leak. In addition, counter to the documentation for the `_CAPICreate` helper (see `mlir-c/Bindings/Python/Interop.h`) and the `forContext` / `forOperation` methods, we were silently upgrading any unknown context/operation pointer to steal-ownership semantics. This is dangerous and was causing some subtle bugs downstream where this facility is getting the most use. This patch corrects the semantics and will only do an ownership transfer for `_CAPICreate`, and it will further require that it is an ownership transfer (if already transferred, it was just silently succeeding). Removing the mis-aligned behavior made it clear where the downstream was doing the wrong thing. It also adds some `_testing_` functions to create unowned context and operation capsules so that this can be fully tested upstream, reworking the tests to verify the behavior. In some torture testing downstream, I was not able to trigger any memory corruption with the newly enforced semantics. When getting it wrong, a regular exception is raised.	2023-12-20 12:18:58 -08:00
Alex Beloi	d84c640143	[mlir] Remove "Syntax:" parser where it's already provided by `assemblyFormat` (#76002 ) See #73359 Types using `assemblyFormat` to define parsing don't need an additional handwritten parser. So we should remove the handwritten parsers where one provided by an `assemblyFormat` already exists to avoid confusion and de-syncing.	2023-12-20 14:58:51 -05:00
Slava Zakharin	b4b23ff7f8	[flang][runtime] Enable more APIs in the offload build. (#75996 ) This patch enables more numeric (mod, sum, matmul, etc.) APIs, and some others. I added new macros to disable warnings about using C++ STD methods like operators of std::complex, which do not have __device__ attribute. This may probably result in unresolved references, if the header files implementation relies on libstdc++. I will need to follow up on this.	2023-12-20 11:52:51 -08:00
Abhina Sree	892862246e	[SystemZ][z/OS] define HOST_NAME_MAX for z/OS (#76093 ) This applies the same change made in google benchmark to define HOST_NAME_MAX for z/OS `7b52bf7346`	2023-12-20 14:29:24 -05:00
Sam Clegg	4e8cb01b01	[WebAssembly] Add symbol information for shared libraries (#75238 ) The current (experimental) spec for WebAssembly shared libraries does not include a full symbol table like the object format. This change extracts symbol information from the normal wasm exports. This is the first step in having the linker report undefined symbols when linking with shared libraries. The current behaviour is to ignore all undefined symbols when linking with `-pie` or `-shared`. See https://github.com/emscripten-core/emscripten/issues/18198	2023-12-20 11:13:09 -08:00
Dimitry Andric	2c27013fa9	[clang] Add getClangVendor() and use it in CodeGenModule.cpp (#75935 ) In `9a38a72f1d` `ProductId` was assigned from the stringified value of `CLANG_VENDOR`, if that macro was defined. However, `CLANG_VENDOR` is supposed to be a string, as it is defined (optionally) as such in the top-level clang `CMakeLists.txt`. Furthermore, `CLANG_VENDOR` is only passed as a build-time define when compiling `Version.cpp`, so add a `getClangVendor()` function to `Version.h`, and use it in `CodegGenModule.cpp`, instead of relying on the macro. Fixes: `9a38a72f1d`	2023-12-20 20:09:39 +01:00
Dimitry Andric	5c1a41f8ad	Revert "[clang] Add getClangVendor() and use it in CodeGenModule.cpp (#75935 )" This reverts commit `9055519103`, due to an incorrectly chosen commit message.	2023-12-20 20:07:22 +01:00
Dimitry Andric	9055519103	[clang] Add getClangVendor() and use it in CodeGenModule.cpp (#75935 ) In `9a38a72f1d` `ProductId` was assigned from the stringified value of `CLANG_VENDOR`, if that macro was defined. However, `CLANG_VENDOR` is supposed to be a string, as it is defined (optionally) as such in the top-level clang `CMakeLists.txt`. Move the addition of `-DCLANG_VENDOR` to the compiler flags from `clang/lib/Basic/CMakeLists.txt` to the top-level `CMakeLists.txt`, so it is consistent across the whole clang codebase. Then remove the stringification from `CodeGenModule.cpp`, to make it work correctly. Fixes: `9a38a72f1d`	2023-12-20 20:03:19 +01:00
Krzysztof Parzyszek	8b231d73bd	[mlir] Fix build break with shared libraries When project components are built as separate shared libraries, a lot of errors appear about undefined symbols, e.g. ``` /usr/bin/ld: CMakeFiles/obj.MLIRGPUPipelines.dir/GPUToNVVMPipeline.cpp.o : in function `(anonymous namespace)::buildCommonPassPipeline(mlir::OpPa ssManager&, (anonymous namespace)::GPUToNVVMPipelineOptions const&)': GPUToNVVMPipeline.cpp:(.text._ZN12_GLOBAL__N_123buildCommonPassPipelineE RN4mlir13OpPassManagerERKNS_24GPUToNVVMPipelineOptionsE+0xa5): undefined reference to `mlir::createConvertLinalgToLoopsPass()' ``` Add the necessary dependencies to Dialect/GPU/Pipelines/CMakeLists.txt	2023-12-20 12:49:25 -06:00
Han-Chung Wang	b33a131c82	[mlir][arith] Add support for expanding arith.maxnumf/minnumf ops. (#75989 ) The maxnum/minnum semantics can be found at https://llvm.org/docs/LangRef.html#llvm-minnum-intrinsic. The revision also updates function names in lit tests to match op name. Take arith.maxnumf as example: ``` func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 { %result = arith.maxnumf %lhs, %rhs : f32 return %result : f32 } ``` will be expanded to ``` func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 { %0 = arith.cmpf ugt, %lhs, %rhs : f32 %1 = arith.select %0, %lhs, %rhs : f32 %2 = arith.cmpf uno, %lhs, %lhs : f32 %3 = arith.select %2, %rhs, %1 : f32 return %3 : f32 } ``` Case 1: Both LHS and RHS are not NaN; LHS > RHS In this case, `%1` is LHS. `%3` and `%1` have the same value, so `%3` is LHS. Case 2: LHS is NaN and RHS is not NaN In this case, `%2` is true, so `%3` is always RHS. Case 3: LHS is not NaN and RHS is NaN In this case, `%0` is true and `%1` is LHS. `%2` is false, so `%3` and `%1` have the same value, which is LHS. Case 4: Both LHS and RHS are NaN: `%1` and RHS are all NaN, so the result is still NaN.	2023-12-20 10:35:12 -08:00
Shoaib Meenai	e7bd673681	[runtimes] Fix test dependencies compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp needs llvm-lto and opt.	2023-12-20 10:19:06 -08:00
Craig Topper	b03f0c596a	[RISCV] Add sifive-p450 CPU. (#75760 ) This is an out of order core with no vector unit. More information: https://www.sifive.com/cores/performance-p450-470 Scheduler model and other tuning will come in separate patches.	2023-12-20 09:52:02 -08:00
Schrodinger ZHU Yifan	7a87ff64e1	[libc] suppress stdlib explicitly for crt1.a (#76079 ) [nd: updated oneline]	2023-12-20 09:42:35 -08:00
Florian Hahn	18170d0f28	[ConstraintElim] Extend AND implication logic to support OR as well. (#76044 ) Extend the logic check if an operand of an AND is implied by the other to also support OR. This is done by checking if !op1 implies op2 or vice versa.	2023-12-20 18:13:41 +01:00
LLVM GN Syncbot	2c257cf872	[gn build] Port `5ea15fab19`	2023-12-20 16:47:21 +00:00
Cyndy Ishida	5ea15fab19	[TextAPI] Add support to convert RecordSlices -> InterfaceFile (#75007 ) Introduce RecordVisitor. This is used for different clients that want to extract information out of RecordSlice types. The first and immediate use case is for serializing symbol information into TBD files.	2023-12-20 08:47:10 -08:00
Schrodinger ZHU Yifan	8bbeed05c4	[libc] [startup] add cmake function to merge separated crt1 objects (#75413 ) As part of startup refactoring, this patch adds a function to merge multiple objects into a single relocatable object: cc -r obj1.o obj2.o -o obj.o A relocatable object is an object file that is not fully linked into an executable or a shared library. It is an intermediate file format that can be passed into the linker. A crt object can have arch-specific code and arch-agnostic code. To reduce code cohesion, the implementation is splitted into multiple units. As a result, we need to merge them into a single relocatable object.	2023-12-20 08:18:51 -08:00
Joseph Huber	e4f4022b70	[Libomptarget][NFC] Fix linting warnings in the plugins Summary: Fix some linting warnings present in the plugins.	2023-12-20 10:07:34 -06:00
Florian Hahn	b1a5ee1feb	[ARM] Check all terms in emitPopInst when clearing Restored for LR. (#75527 ) emitPopInst checks a single function exit MBB. If other paths also exit the function and any of there terminators uses LR implicitly, it is not save to clear the Restored bit. Check all terminators for the function before clearing Restored. This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll where the machine-outliner previously introduced BLs that clobbered LR which in turn is used by the tail call return. Alternative to #73553	2023-12-20 16:56:15 +01:00
Lucas Duarte Prates	d43fc5a6ad	Reland: [AArch64] Assembly support for the Checked Pointer Arithmetic Extension (#73777 ) This introduces assembly support for the Checked Pointer Arithmetic Extension (FEAT_CPA), annouced as part of the Armv9.5-A architecture version. The changes include: * New subtarget feature for FEAT_CPA * New scalar instruction for pointer arithmetic * ADDPT, SUBPT, MADDPT, and MSUBPT * New SVE instructions for pointer arithmetic * ADDPT (vectors, predicated), ADDPT (vectors, unpredicated) * SUBPT (vectors, predicated), SUBPT (vectors, unpredicated) * MADPT and MLAPT * New ID_AA64ISAR3_EL1 system register Mode details about the extension can be found at: * https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023 * https://developer.arm.com/documentation/ddi0602/2023-09/ Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>	2023-12-20 15:43:17 +00:00
Zequan Wu	688fa35df0	[Profile] Dump binary id to raw profiles on Windows. (#75618 ) #74652 adds `__buildid` symbol which allows us to dump it at runtime.	2023-12-20 10:41:36 -05:00
Paul C Fuqua	11141bc68a	Fix what seems to be a silly bug in gpu.set_default_device rewriting. Smoke test included. (#75756 )	2023-12-20 09:35:42 -06:00
LLVM GN Syncbot	300adbee88	[gn build] Port `fdd089b500`	2023-12-20 15:23:16 +00:00
LLVM GN Syncbot	d2330058df	[gn build] Port `3903438860`	2023-12-20 15:23:15 +00:00
Simon Pilgrim	6ec350b483	[X86] SimplifyDemandedVectorEltsForTargetShuffle - don't simplify constant mask if it has multiple uses Avoid generating extra constant vectors	2023-12-20 15:22:48 +00:00
Razvan Lupusoru	a711b042fd	[acc] Initial implementation of MemoryEffects on `acc` operations (#75970 ) The `acc` dialect operations now implement MemoryEffects interfaces in the following ways: - Data entry operations which may read host memory via `varPtr` are now marked as so. The majority of them do NOT actually read the host memory. For example, `acc.present` works on the basis of presence of pointer and not necessarily what the data points to - so they are not marked as reading the host memory. They still use `varPtr` though but this dependency is reflected through ssa. - Data clause operations which may mutate the data pointed to by `accPtr` are marked as doing so. - Data clause operations which update required structured or dynamic runtime counters are marked as reading and writing the newly defined `RuntimeCounters` resource. Some operations, like `acc.getdeviceptr` do not actually use the runtime counters - but are marked as reading them since the address obtained depends on the mapping operations which do update the runtime counters. Namely, `acc.getdeviceptr` cannot be moved across other mapping operations. - Constructs are marked as writing to the `ConstructResource`. This may be too strict but is needed for the following reasons: 1) Structured constructs may not use `accPtr` and instead use `varPtr` - when this is the case, data actions may be removed even when used. 2) Unstructured constructs are currently used to aggregate multiple data actions. We do not want such constructs removed or moved for now. - Terminators are marked as `Pure` as in other dialects. The current approach has the following limitations which may require further improvements: - Subsequent `acc.copyin` operations on same data do not actually read host memory pointed to by `varPtr` but are still marked as so. - Two `acc.delete` operations on same data may not mutate `accPtr` until the runtime counters are zero (but are still marked as mutating). - The `varPtrPtr` argument, when present, points to the address of location of `varPtr`. When mapping to target device, an `accPtrPtr` needs computed and this memory is mutated. This effect is not captured since the current operations do not produce `accPtrPtr`. - Runtime counter effects are imprecise since two operations with differing `varPtr` increment/decrement different counters. Additionally, operations with `varPtrPtr` mutate attachment counters. - The `ConstructResource` is too strict and likely can be relaxed with better modeling.	2023-12-20 07:11:19 -08:00
Christian Sigg	476812a742	[bazel] Update config.h.cmake after `e86a02ce89`.	2023-12-20 16:07:46 +01:00
Nikita Popov	8b8f2ef06e	[MergeFunc] Fix comparison of constant expressions Functions using different constant expressions were incorrectly merged, because a lot of state was missing from the comparison, including the opcode, the comparison predicate, the GEP element type, as well as the inbounds, inrange and nowrap poison flags.	2023-12-20 15:59:02 +01:00
Nico Weber	6cd296ed85	[gn] port `e86a02ce89` (dladdr -> llvm-config.h) Also set HAVE_DLADDR to 1 on non-Win instead of just on macOS. That looked like an oversight.	2023-12-20 09:57:37 -05:00
Alexey Bataev	a13148a880	[SLP]Fix PR75995: drop wrapping flags for resized wrapped binops. If decided to resize the instruction, need to drop wrapping flags from the resulting vector instructions to avoid incorrect optimizations/assumptions later. Fixes PR75995.	2023-12-20 06:51:39 -08:00

1 2 3 4 5 ...

484489 Commits