This broke modff calls on 32-bit x86 Windows. See comment on the PR.
> This updates the existing modf[f|l] builtin to be lowered via the
> llvm.modf.* intrinsic (rather than directly to a library call).
>
> The legalization issues exposed by the original PR (#126750) should have
> been fixed in #128055 and #129264.
This reverts commit cd1d9a8fab.
This updates the DXV and Metal Converter actions to properly use
temporary files created by the driver. I've abstracted away a check to
determine if an action is the last in the sequence because we may have
between 1 and 3 actions depending on the arguments and environment.
FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been
saved.
- GPRCS1
- GPRCS2
- FPStatusRegs (new)
- DPRCS
- GPRCS3
- DPRCS2
FPSCR is present on all targets with a VFP, but the FPEXC register is
not present on Cortex-M devices, so different amounts of bytes are
being pushed onto the stack depending on our target, which would
affect alignment for subsequent saves.
DPRCS1 will sum up all previous bytes that were saved, and will emit
extra instructions to ensure that its alignment is correct. My
assumption is that if DPRCS1 is able to correct its alignment to be
correct, then all subsequent saves will also have correct alignment.
Avoid annotating the saving of FPSCR and FPEXC for functions marked
with the interrupt_save_fp attribute, even though this is done as part
of frame setup. Since these are status registers, there really is no
viable way of annotating this. Since these aren't GPRs or DPRs, they
can't be used with .save or .vsave directives. Instead, just record
that the intermediate registers r4 and r5 are saved to the stack
again.
Co-authored-by: Jake Vossen <jake@vossen.dev>
Co-authored-by: Alan Phipps <a-phipps@ti.com>
P2513R4 neither touched library wording nor required library
implementation to change. So it was probably a mistake to list it in
libc++'s implementation status table.
The 'bind' clause allows the renaming of a function during code
generation. There are a few rules about when this can/cannot happen,
and it takes either a string or identifier (previously mis-implemetned
as ID-expression) argument.
Note there are additional rules to this in the implicit-function routine
case, but that isn't implemented in this patch, as implicit-function
routine is not yet implemented either.
Current lowering for tosa.depthwise_conv2d assumes if both zero points
are zero then it's a floating-point operation by hardcoding the use of a
arith.addf in the lowered code. Fix code to check for the element type
to decide what add operation to use.
The SPIR-V backend is now a supported backend, and we believe it is
ready to be used by default in Clang over the SPIR-V translator.
Some IR generated by Clang today, such as those requiring SPIR-V target
address spaces, cannot be compiled by the translator for reasons in this
[RFC](https://discourse.llvm.org/t/rfc-the-spir-v-backend-should-change-its-address-space-mappings/82640),
so we expect even more programs to work as well.
Enable it by default, but keep some of the code as it is still called by
the HIP toolchain directly.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
- Separate check for hardware support of TLS register from check for support of Thumb2 encoding
- Base decision to auto enable TLS on both hardware support and Thumb2 encoding support
- Fix HW support check to correctly exclude M-Profile and include ARMV6K variants
reference:https://reviews.llvm.org/D114116
https://github.com/llvm/llvm-project/pull/129952 /
42d49a7724 added this test which is
failing on 32-bit ARM because the alignment chosen is 4 not 8. Which
would make sense if this is a 32/64 bit difference
https://lab.llvm.org/buildbot/#/builders/154/builds/13059
```
<stdin>:34:30: note: scanning from here
define dso_local void @_Z1fv(ptr dead_on_unwind noalias writable sret(%struct.B) align 4 %agg.result) #0 {
^
<stdin>:38:2: note: possible intended match here
%0 = load ptr, ptr @x, align 4
^
```
The other test does not check alignment, so I'm assuming that it is not
important here.
This change removes 189 static instances of no-op reg-reg moves (i.e.
where src == dest) across llvm-test-suite when compiled for RISC-V
rv64gc and with SPEC included.
The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an
address that is a valid device address. Specifically,
`has_device_addr(x)` means that (in C/C++ terms) `&x` is a device
address.
When entering a target region, `x` does not need to be allocated on the
device, or have its contents copied over (in the absence of additional
mapping clauses). Passing its address verbatim to the region for use is
sufficient, and is the intended goal of the clause.
Some Fortran objects use descriptors in their in-memory representation.
If `x` had a descriptor, both the descriptor and the contents of `x`
would be located in the device memory. However, the descriptors are
managed by the compiler, and can be regenerated at various points as
needed. The address of the effective descriptor may change, hence it's
not safe to pass the address of the descriptor to the target region.
Instead, the descriptor itself is always copied, but for objects like
`x`, no further mapping takes place (as this keeps the storage pointer
in the descriptor unchanged).
---------
Co-authored-by: Sergio Afonso <safonsof@amd.com>
This PR addresses the bug of not throwing warnings for the following
code:
```c++
int test13(unsigned a, int *b) {
return a > ~(95 != *b); // expected-warning {{comparison of integers of different signs}}
}
```
However, in the original issue, a comment mentioned that negation,
pre-increment, and pre-decrement operators are also incorrect in this
case.
Fixes#18878
When checking the template template parameters of template template
parameters, the PartialOrdering context was not correctly propagated.
This also has a few drive-by fixes, such as checking the template
parameter lists of template template parameters, which was previously
missing and would have been it's own bug, but we need to fix it in order
to prevent crashes in error recovery in a simple way.
Fixes#130362
This caused incorrect -Wunguarded-availability warnings. See comment on
the pull request.
> We were missing a call to DiagnoseUseOfDecl when performing typename
> access.
>
> This refactors the code so that TypeDecl lookups funnel through a helper
> which performs all the necessary checks, removing some related
> duplication on the way.
>
> Fixes#58547
>
> Differential Revision: https://reviews.llvm.org/D136533
This reverts commit 4c4fd6b031.
The previous implementation wasn't maintaining a faithful IR
representation of how this really works. The value returned by
createEnqueuedBlockKernel wasn't actually used as a function, and
hacked up later to be a pointer to the runtime handle global
variable. In reality, the enqueued block is a struct where the first
field is a pointer to the kernel descriptor, not the kernel itself. We
were also relying on passing around a reference to a global using a
string attribute containing its name. It's better to base this on a
proper IR symbol reference during final emission.
This now avoids using a function attribute on kernels and avoids using
the additional "runtime-handle" attribute to populate the final
metadata. Instead, associate the runtime handle reference to the
kernel with the !associated global metadata. We can then get a final,
correctly mangled name at the end.
I couldn't figure out how to get rename-with-external-symbol behavior
using a combination of comdats and aliases, so leaves an IR pass to
externalize the runtime handles for codegen. If anything breaks, it's
most likely this, so leave avoiding this for a later step. Use a
special section name to enable this behavior. This also means it's
possible to declare enqueuable kernels in source without going through
the dedicated block syntax or other dedicated compiler support.
We could move towards initializing the runtime handle in the
compiler/linker. I have a working patch where the linker sets up the
first field of the handle, avoiding the need to export the block
kernel symbol for the runtime. We would need new relocations to get
the private and group sizes, but that would avoid the runtime's
special case handling that requires the device_enqueue_symbol metadata
field.
https://reviews.llvm.org/D141700
This clears up the printing of a NestedNameSpecifier so a trailing '::'
is not printed, unless it refers into the global scope.
This fixes a bunch of diagnostics where the trailing :: was awkward.
This also prints the NNS quoted consistenty.
There is a drive-by improvement to error recovery, where now we print
the actual type instead of `<dependent type>`.
This will clear up further uses of NNS printing in further patches.
Fixes#106846.
This is what I learned from GCC. I found that GCC does not duplicate the
BB that has indirect jumps with the jump table. I believe GCC has
provided a clear explanation here:
> Duplicate the blocks containing computed gotos. This basically
unfactors computed gotos that were factored early on in the compilation
process to speed up edge based data flow. We used to not unfactor them
again, which can seriously pessimize code with many computed jumps in
the source code, such as interpreters.
FrontendActions.cpp is currently one of the biggest compilation units in
all of flang. Measuring its compilation gives the following metrics:
User time (seconds): 139.21
System time (seconds): 4.65
Maximum resident set size (kbytes): 5891440 (5.61 GB)
This commit separates out explicit invocations of the parser into a
separate compilation unit - ParserActions.cpp - through helper functions
in order to decrease the maximum compilation time and memory usage of a
single unit.
After the split, the measurements of FrontendActions.cpp are as follows:
User time (seconds): 70.08
System time (seconds): 3.16
Maximum resident set size (kbytes): 3961492 (3.7 GB)
While the ones for the newly created ParserActions.cpp as follows:
User time (seconds): 104.33
System time (seconds): 3.37
Maximum resident set size (kbytes): 4185600 (3.99 GB)
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Fixes#129900
If `operator delete` was called after an unsuccessful constructor call
after `operator new`, we ran into undefined behaviour.
This was discovered by our malfunction tests while preparing an upgrade
to LLVM 20, that explicitly check for such kind of bugs.
The `VectorTransformsOptions` on the `ConvertVectorToLLVMPass` is
currently represented as a struct, which makes it not serialisable. This
means a pass pipeline that contains this pass cannot be represented as
textual form, which breaks reproducer generation and options such as
`--dump-pass-pipeline`.
This PR expands the `VectorTransformsOptions` struct into the two
options that are actually used by the Pass' patterns:
`vector-contract-lowering` and `vector-transpose-lowering` . The other
options present in VectorTransformOptions are not used by any patterns
in this pass.
Additionally, I have changed some interfaces to only take these specific
options over the full options struct as, again, the vector contract and
transpose lowering patterns only need one of their respective options.
Finally, I have added a simple lit test that just prints the pass
pipeline using `--dump-pass-pipeline` to ensure the options on this pass
remain serialisable.
Fixes#129046