Previously, we were inserting za.enable/disable intrinsics for functions
with the "arm_za" attribute (at the MLIR level), rather than using the
backend attributes. This was done to avoid a dependency on the SME ABI
functions from compiler-rt (which have only recently been implemented).
Doing things this way did have correctness issues, for example, calling
a streaming-mode function from another streaming-mode function (both
with ZA enabled) would lead to ZA being disabled after returning to the
caller (where it should still be enabled). Fixing issues like this would
require re-doing the ABI work already done in the backend within MLIR.
Instead, this patch switches to use the "arm_new_za" (backend) attribute
for enabling ZA for an MLIR function. For the integration tests, this
requires some way of linking the SME ABI functions. This is done via the
`%arm_sme_abi_shlib` lit substitution. By default, this expands to a
stub implementation of the SME ABI functions, but this can be overridden
by providing the `ARM_SME_ABI_ROUTINES_SHLIB` CMake cache variable
(pointing it at an alternative implementation). For now, the ArmSME
integration tests pass with just stubs, as we don't make use of nested
ZA-enabled calls.
A future patch may add an option to compiler-rt to build the SME
builtins into a standalone shared library to allow easily
building/testing with the actual implementation.
Currently, when deleting the device functions in the second stage of filtering during MLIR to LLVM translation we can end up with invalid calls to these functions. This is because of the removal of the EarlyOutliningPass which would have otherwise gotten rid of any such calls.
This patch aims to alter the function filtering pass in the following way:
- Any host function is completely removed.
- Call to the host function are also removed and their uses replaced with Undef values.
- Any host function with target region code is marked to be removed during the the second stage.
- Calls to such functions are still removed and their uses replaced with Undef values.
Co-authored-by: Sergio Afonso <sergio.afonsofumero@amd.com>
Switch to using immediate offsets instead of the SP register to access
objects on the current stack frame in chain functions. This means we no
longer need to reserve a SP register just for accesing stack objects and
it also allows us to set the SP (when one is actually needed) to the
stack size from the very beginning.
This only works if we use a FixedObject for the ScavengeFI, which is
what we do for entry functions anyway (and we generally want to keep
chain functions close to amdgpu_cs behaviour where we don't have a good
reason to diverge).
Refactor Pointer Authentication pass in preparation for adding more
PAUTH_* pseudo instructions:
* dropped early return from runOnMachineFunction() as other PAUTH_*
instructions need expansion even when pac-ret is disabled
* refactored runOnMachineFunction() to first collect all the
instructions of interest without modifying anything and then performing
changes in the later loops. There are two types of relevant
instructions: PAUTH_* pseudos that should definitely be replaced by this
pass and tail call instructions that may require attention if pac-ret is
enabled
* made the loop iterating over all of the instructions handle
instruction bundles by itself: even though this pass still does not
support bundled TCRETURN* instructions (such as produced by KCFI) it
does not crash anymore when no support is actually required
Other implementations of the symbol graph format use zero-based indices
for source locations, which causes problems when combined with clang's
current one-based indices. This commit sets ExtractAPI's symbol graph
output to use zero-based indices to align with other implementations.
rdar://107639783
`638a8393615e911b729d5662096f60ef49f1c65e` removed the `dsym`
condition for older compiler versions which caused the `dwarf`
variants tests to XPASS. This patch reverts to only XFAIL-ing
the `dsym` variant.
`15c80852028ff4020b3f85ee13ad3a2ed4bce3be` added
`test_shadowed_static_inline_members` which isn't supported
on older compiler versions.
For -fno-pic, if an extern variable is defined in a DSO, a copy
relocation will be needed. However, loongarch*-linux does not and will
not support copy relocations.
Change Driver to default to -fno-direct-access-external-data for
LoongArch && non-PIC.
Keep Frontend conditions unchanged (-fdirect-access-external-data ||
-fno-direct-access-external-data && PIC>0 => direct access).
Fix#71645
For `{{regex}}` we don't really need a capturing group, and only add it
to properly handle cases like `{{foo|bar}}`. This is problematic,
because the use of capturing groups makes our regex implementation
slower (we have to go through the "dissect" stage, which can have
quadratic complexity).
Unfortunately, our regex implementation does not support non-capturing
groups like `(?:regex)`. So instead, avoid adding the group entirely if
the regex doesn't contain any alternations.
This causes a slight difference in escaping behavior, where previously
it was possible to write `{{{{}}` and get the same behavior as
`{{\{\{}}`. This will no longer work. I don't think this is a problem,
especially as we recently taught update_analyze_test_checks.py to emit
`{{\{\{}}`, so this shouldn't get introduced in any new tests.
For CodeGen/X86/vector-interleaved-store-i16-stride-7.ll (our slowest
X86 test) this drops FileCheck time from 6s to 5s (the remainder is
spent in a different regex issue). I expect similar speedups in other
tests using a lot of `{{}}`.
Fix the issue by not reusing the zext at all. The code already handles
creation of new zexts if more than one is needed. Always use that
code-path instead of trying to reuse the old zext in some case.
(Alternatively we could also drop poison-generating flags on the old
zext, but it seems cleaner to not reuse it at all, especially if it's
not always possible anyway.)
Fixes https://github.com/llvm/llvm-project/issues/72046.
Should avoid issues on big-endian hosts.
Note that we use aligned types because primitive integers are also
aligned. If we don't use aligned types, `HSAILProperties` ends up being
11 bytes instead of 12 (1 byte padding at the end of the struct added by
the compiler).
Technically only the first type needs to be aligned, but I just used
aligned types everywhere to be consistent.
Fixes#65280
Function and variable names are not detected when there is a
__attribute__((x)) preceding the name.
Fixes#64137.
Differential Revision: https://reviews.llvm.org/D156370
Add helper that creates a lambda similar to typeIs, but containing the
predicate check for hasStdExtF and hasStdD. We can use this with legalIf
and all to reduce the number of manual lambdas we need to write.
Use MCAsmBackend::writeNopData() interface to emit NOP instructions on
x86. There are multiple forms of NOP instruction on x86 with different
sizes. Currently, LLVM's assembly/disassembly does not support all forms
correctly which can lead to a breakage of input code semantics, e.g. if
the program relies on NOP instructions for reserving a patch space.
Add "--keep-nops" option to preserve NOP instructions.
When this option gets enabled, descriptions of stack frames will be
generated using the format provided in the launch configuration instead
of simply calling `SBFrame::GetDisplayFunctionName`. This allows
lldb-dap to show an output similar to the one in the CLI.
When a BLOCK construct is within an ASSOCIATE or related construct,
don't misinterpret an assignment to an array element of a construct
entity as being an impermissible definition of a local statement
function.
It is possible to declare the rank of an object after that object has
been used in the same specification part in a specification function
reference whose result or generic resolution may well have depended on
the object being apparently a scalar.
Catch this case, and emit a warning -- not an error, yet, due to fear of
false positives.
See the new test for examples.