Currently, -gsplit-dwarf and -mrelax are incompatible options in Clang.
The issue is that .dwo files should not contain any relocations, as they
are not processed by the linker. However, relaxable code emits
relocations in DWARF for debug ranges that reside in the .dwo file when
DWARF fission is enabled.
This patch makes DWARF fission compatible with RISC-V relaxations. It
uses the StartxEndx DWARF forms in .debug_rnglists.dwo, which allow
referencing addresses from .debug_addr instead of using absolute
addresses. This approach eliminates relocations from .dwo files.
Some globals (e.g., fir.global) have initialization regions that may
transitively reference other globals or type descriptors. Add
getInitRegion() to GlobalVariableOpInterface to retrieve these regions,
returning Region* (nullptr if the global uses attributes for
initialization, as with memref.global).
It is possible that a fork could occur while MutexTSDs is being held and
then cause a deadlock in a forked process when something attempts to
lock it again. Instead add it to the enable/disable list of mutexes.
Flags are now passed on construction/cloning. Remove unnecessary
transferFlags call, and make code independent of VPRecipeWithIRFlags, to
support additional recipes in the future.
When individual elements of a vector are updated via vector swizzle, it needs to be handled as separate store operations to the individual vector elements.
Clang treats vectors as one unit, so if a part of a vector needs to be updated, the whole vector is loaded, some elements modified, and then the whole vector is stored.
In HLSL vector elements are handled separately. We need to avoid this load/modify/store sequence to prevent overwriting other vector elements that might be getting updated in parallel.
Fixes#152815
This is noted by the specification, and should save a dynamic
instruction.
Code size should be no worse than before, as the pairs of moves can
usually be turned into two 16-bit moves, but `fmv.d` is always a 32-bit
instruction.
LLVM can look through a `FSGNJ_D_IN32X`, in
`RISCVInstrInfo::isCopyInstrImpl` which helps copy propagation.
Currently, the declarative pattern rewrite generator will always print
the [source]:[line](s) from which a pattern came. This is a useful
debugging hint, but it causes problem when absolute paths are used as
arguments to mlir-tblgen (which LLVM's build rules automatically do).
Specifially, it causes the source to be tied to the build location,
harning reproducability and our collective ability to get ccache hits
from, say, separate worktrees.
This commit resolves the issue by replacing absolute paths in thes
"Generated from:" comments with their filenames. (The alternative would
have been to implement an entire file-prefix-map the way the C compilers
do, but since this is an isolated incident, I chose to resolve it
locally.)
/usr/bin/ld: tools/mlir/lib/Dialect/GPU/Pipelines/CMakeFiles/obj.MLIRGP
UPipelines.dir/GPUToXeVMPipeline.cpp.o: in function `mlir::gpu::buildLo
werToXeVMPassPipeline(mlir::OpPassManager&, mlir::gpu::GPUToXeVMPipelin
eOptions const&)':
GPUToXeVMPipeline.cpp:(.text._ZN4mlir3gpu28buildLowerToXeVMPassPipeline
ERNS_13OpPassManagerERKNS0_24GPUToXeVMPipelineOptionsE+0x1293): undefin
ed reference to `mlir::createConvertVectorToLLVMPass()'
On targets that require even aligned 64-bit VGPRs, GWS operands
require even alignment of a 32-bit operand. Previously we had a hacky
post-processing which added an implicit operand to try to manage
the constraint. This would require special casing in other passes
to avoid breaking the operand constraint. This moves the handling
into the instruction definition, so other passes no longer need
to consider this edge case. MC still does need to special case this,
to print/parse as a 32-bit register. This also still ends up net
less work than introducing even aligned 32-bit register classes.
This also should be applied to the image special case.
I noticed that we had a hardcoded value of 4 for the pcrel section
relocations, which seems like an issue given that we recently added
support for 1-byte branch relocations in
https://github.com/llvm/llvm-project/pull/164439. The code included an
assert that the relevant relocation had the BYTE4 attribute, but that is
actually not enough to use a hardcoded value of 4: we need to assert
that the *other* `BYTE<n>` attributes are not set either.
However, since we did not support local branch relocations, that doesn't
seem to have mattered in practice. That said, local branch relocations
can be emitted by compilers, and ld64 does handle the 4-byte version of
them, so I've added support for it here.
ld64 actually seems to reject 1-byte section relocations, so the
questionable code is actually probably fine (minus the incorrect
assert). So we have two options: add an equivalent check in LLD, or just
support 1-byte local branch relocations. Supporting it actually requires
less code, so I've gone with that option here.
This patch adds parsing, semantic handling, and diagnostics for the
`OpenMP 6.1 fb_nullify` and` fb_preserve` fallback modifiers used with
the `need_device_ptr` map modifier.
…hese
We've finished all of the clauses/etc that we're going to use this
visitor for, so we can remove the SourceLocation we used just for that,
and replace all NYI with unreachables.
This hasn't been strictly necessary since c897c13dde.
Practically this makes little difference; we still enable IPRA
by default which implies this option. By removing this explicit
force, -enable-ipra=0 has the expected change in the pass pipeline
to remove the DummyCGSCC runs.
This PR enhances the CFG builder to properly handle function parameters
in lifetime analysis:
1. Added code to include parameters in the initial scope during CFG
construction for both `FunctionDecl` and `BlockDecl` types
2. Added a special case to skip reference parameters, as they don't need
automatic destruction
3. Fixed several test cases that were previously marked as "FIXME" due
to missing parameter lifetime tracking
Previously, Clang's lifetime analysis was not properly tracking the
lifetime of function parameters, causing it to miss important
use-after-return bugs when parameter values were returned by reference
or address. This change ensures that parameters are properly tracked in
the CFG, allowing the analyzer to correctly identify when stack memory
associated with parameters is returned.
Fixes https://github.com/llvm/llvm-project/issues/169014
`convert-vector-to-llvm` pass applies a set of vector transformation
patterns that are not included in the standard `convert-to-llvm` pass
interface. These additional transformations are required to properly
lower MLIR vector operations. Since not all vector ops have direct
`llvm` dialect lowering, many of them must first be progressively
rewritten into simpler or more canonical vector ops, which are then
lowered to `llvm`. Therefore, running `convert-vector-to-llvm` is
necessary to ensure a complete and correct lowering of vector operations
to the `llvm` dialect.
This patch refactors the OpenACC dialect to attach recipe symbols
directly to data operations (acc.private, acc.firstprivate,
acc.reduction)
rather than to compute constructs (acc.parallel, acc.serial, acc.loop).
Motivation:
The previous design required compute constructs to carry both the recipe
symbol and the variable reference, leading to complexity. Additionally,
recipes were required even when they could be generated automatically
through MappableType interfaces.
Changes:
- Data operations (acc.private, acc.firstprivate, acc.reduction) now
require a 'recipe' attribute referencing their respective recipe
operations
- Verifier enforces recipe attribute presence for non-MappableType
operands; MappableType operands can generate recipes on demand
- Compute constructs (acc.parallel, acc.serial, acc.loop) no longer
carry recipe symbols in their operands
- Updated flang lowering to attach recipes to data operations instead
of passing them to compute constructs
Format Migration:
Old format:
```
acc.parallel private(@recipe -> %var : !fir.ref<i32>) { ... }
```
New format:
```
%private = acc.private varPtr(%var : !fir.ref<i32>)
recipe(@recipe) -> !fir.ref<i32>
acc.parallel private(%private : !fir.ref<i32>) { ... }
```
Test Updates:
- Updated all CIR and Flang OpenACC tests to new format
- Fixed CHECK lines to verify recipe attributes on data operations
In #157388, we turned `(fmul (fneg X), Y)` into `(fneg (fmul X, Y))`.
However, we forgot to propagate SDNode flags, specifically fast math
flags, from the original FMUL to the new one. This hinders some of the
subsequent (FMA) DAG combiner patterns that relied on the contraction
flag and as a consequence, missed some of the opportunities to generate
negation FMA instructions like `fnmadd`.
This patch fixes this issue by propagating the flags.
---------
Co-authored-by: Craig Topper <craig.topper@sifive.com>
`canEvaluateTruncated` and `canEvaluateSExtd` previously rejected
multi-use values to avoid duplication. This was overly conservative, if
all users of a multi-use value are part of the transform, we can
evaluate it in a different type without duplication.
This change tracks visited values and defers decisions on multi-use
values until we verify all their users were visited.
`EvaluateInDifferentType` now memoizes multi-use values to avoid
creating duplicates.
Applied to truncation and sext. Zext unchanged due to its dual-return
nature.
The consumer of zlib in third-party/BUILD.bazel expects zlib-ng from the
BCR, if you still load this version from your WORKSPACE / MODULE.bazel
you need to use this name instead.
The 'link' clause is like the rest of the global clauses (copyin,
create, device_resident), except it only has an entry op(thus no
dtor).
This patch also removes a bunch of now stales TODOs from the tests.
Previously libcall lowering decisions were made directly
in the TargetLowering constructor. Pull these into the subtarget
to facilitate turning LibcallLoweringInfo into a separate analysis
in the future.