Don't make assumptions about the lifetime of the underlying object and
use the shared_ptr to participate in reference counting and extend the
lifetime of the object to the end of the lexical scope.
This PR fixes incorrect alignment when lowering `set` and `getBitField`
operations to LLVM IR. The issue occurred because during lowering, the
function was being called with an alignment of 0, which caused it to
default to the alignment of the packed member. For example, if the
bitfield was packed inside a `u64i`, it would use an alignment of 8.
With this change, the generated code now matches what the classic
codegen produces.
In the assembly format, I changed to be similar to how it's done in
loadOp. If there's a better approach, please feel free to point it out.
We may still need to keep CopyToReg even after folding uses into vector
loads, since the original register may be used in other blocks.
Partially reverts 1fdbe69849
This PR adds all VOP1 tests that haven't yet been upstreamed by copying
the relevant test files directly from downstream. Afterward, the
auto-generation script is run with the `--unique` option to deduplicate
any redundant tests that may have been introduced during the downstream
merge.
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
The `--hot-func-list` flag is used for sample profiles to dump the list
of hot functions. Add support to dump hot functions for IRPGO profiles
as well.
This also removes a `priority_queue` used for `--topn`. We can instead
store all functions and sort at the end before dumping. Since we are
storing `StringRef`s, I believe this won't consume too much memory.
Update LV to vectorize maxnum/minnum reductions without fast-math flags,
by adding an extra check in the loop if any inputs to maxnum/minnum are
NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros
are already handled consistently by maxnum/minnum.
If any input is NaN,
*exit the vector loop,
*compute the reduction result up to the vector iteration that contained
NaN inputs and
* resume in the scalar loop
New recurrence kinds are added for reductions using maxnum/minnum
without fast-math flags.
PR: https://github.com/llvm/llvm-project/pull/148239
This gives an override to the user to force select VGPR form of MFMA.
Eventually we will drop this in favor of compiler making better
decisions, but this provides a mechanism for users to address the cases
where MayNeedAGPRs favors the AGPR form and performance is degraded due
to poor RA.
A report of the following code not generating an error led to fixing two bugs in directive checking.
- We should treat CombinedConstructs as OpenACC Constructs
- We should treat DoConstruct index variables as private.
```fortran
subroutine sub(nn)
integer :: nn, ii
!$acc serial loop default(none)
do ii = 1, nn
end do
!$acc end serial loop
end subroutine
```
Here `nn` should be flagged as needing a data clause while `ii` should
still get one implicitly.
Pointer remappings unconditionally update the element byte size and
derived type of the pointer's descriptor. This is okay when the pointer
is polymorphic, but not when a pointer is associated with an extended
type.
To communicate this monomorphic case to the runtime, add a new entry
point so as to not break forward binary compatibility.
…d complex input
List-directed reads of complex values that can't go through the usual
fast path (as in this bug's test case, which uses DECIMAL='COMMA')
didn't skip spaces before the closing right parenthesis correctly.
Fixes https://github.com/llvm/llvm-project/issues/149164.
This patch avoids a trip through the work queue engine for cases on a
CPU where finalization and destruction actions during assignment were
handled without enqueueing another task.
When gathering the headers to fix up and place in LLDB.framework, we
were previously globbing the header files from a location in the build
directory. This commit changes this to glob from the source directory
instead, as we were globbing from the build directory without ensuring
that the necessary files were actually in that location before globbing.
This change addresses the performance issue in the **--tosa-reduce-transposes** implementation by working directly with the
raw tensor data, eliminating the need for creating the costly intermediate attributes that leads to bottleneck.
### Context
Over a year ago, I landed support for 64b Memory ranges in Minidump
(#95312). In this patch we added the Memory64 list stream, which is
effectively a Linked List on disk. The layout is a sixteen byte header
and then however many Memory descriptors.
### The Bug
This is a classic off-by one error, where I added 8 bytes instead of 16
for the header. This caused the first region to start 8 bytes before the
correct RVA, thus shifting all memory reads by 8 bytes. We are correctly
writing all the regions to disk correctly, with no physical corruption
but the RVA is defined wrong, meaning we were incorrectly reading memory

### Why wasn't this caught?
One problem we've had is forcing Minidump to actually use the 64b mode,
it would be a massive waste of resources to have a test that actually
wrote >4.2gb of IO to validate the 64b regions, and so almost all
validation has been manual. As a weakness of manual testing, this issue
is psuedo non-deterministic, as what regions end up in 64b or 32b is
handled greedily and iterated in the order it's laid out in
/proc/pid/maps. We often validated 64b was written correctly by
hexdumping the Minidump itself, which was not corrupted (other than the
BaseRVA)

### Why is this showing up now?
During internal usage, we had a bug report that the Minidump wasn't
displaying values. I was unable to repro the issue, but during my
investigation I saw the variables were in the 64b regions which resulted
in me identifying the bug.
### How do we prevent future regressions?
To prevent regressions, and honestly to save my sanity for figuring out
where 8 bytes magically came from, I've added a new API to
SBSaveCoreOptions.
```SBSaveCoreOptions::GetMemoryRegionsToSave()```
The ability to get the memory regions that we intend to include in the Coredump. I added this so we can compare what we intended to include versus what was actually included. Traditionally we've always had issues comparing regions because Minidump includes `/proc/pid/maps` and it can be difficult to know what memoryregion read failure was a genuine error or just a page that wasn't meant to be included.
We are also leveraging this API to choose the memory regions to be generated, as well as for testing what regions should be bytewise 1:1.
After much debate with @clayborg, I've moved all non-stack memory to the Memory64 List. This list doesn't incur us any meaningful overhead and Greg originally suggested doing this in the original 64b PR. This also means we're exercising the 64b path every single time we save a Minidump, preventing regressions on this feature from slipping through testing in the future.
Snippet produced by [minidump.py](https://github.com/clayborg/scripts)
```
MINIDUMP_MEMORY_LIST:
NumberOfMemoryRanges = 0x00000002
MemoryRanges[0] = [0x00007f61085ff9f0 - 0x00007f6108601000) @ 0x0003f655
MemoryRanges[1] = [0x00007ffe47e50910 - 0x00007ffe47e52000) @ 0x00040c65
MINIDUMP_MEMORY64_LIST:
NumberOfMemoryRanges = 0x000000000000002e
BaseRva = 0x0000000000042669
MemoryRanges[0] = [0x00005584162d8000 - 0x00005584162d9000)
MemoryRanges[1] = [0x00005584162d9000 - 0x00005584162db000)
MemoryRanges[2] = [0x00005584162db000 - 0x00005584162dd000)
MemoryRanges[3] = [0x00005584162dd000 - 0x00005584162ff000)
MemoryRanges[4] = [0x00007f6100000000 - 0x00007f6100021000)
MemoryRanges[5] = [0x00007f6108800000 - 0x00007f6108828000)
MemoryRanges[6] = [0x00007f6108828000 - 0x00007f610899d000)
MemoryRanges[7] = [0x00007f610899d000 - 0x00007f61089f9000)
MemoryRanges[8] = [0x00007f61089f9000 - 0x00007f6108a08000)
MemoryRanges[9] = [0x00007f6108bf5000 - 0x00007f6108bf7000)
```
### Misc
As a part of this fix I had to look at LLDB logs a lot, you'll notice I added `0x` to many of the PRIx64 `LLDB_LOGF`. This is so the user (or I) can directly copy paste the address in the logs instead of adding the hex prefix themselves.
Added some SBSaveCore tests for the new GetMemoryAPI, and Docstrings.
CC: @DavidSpickett, @da-viper @labath because we've been working together on save-core plugins, review it optional and I didn't tag you but figured you'd want to know
Remove all the .h.def files that already express nothing
whatsoever not already expressed in YAML. Clean up a few YAML
files without materially changing any generated header output.
Many more .h.def files remain that need a bit of conversion in
YAML to express macro requirements and such.
The old version would prefer the "const &" overload over the "&&" one
unless the former was not allowed in the given situation. In particular,
if the function passed was "[](auto &&)" the argument would be "const &"
even if the value passed to transformOptional was an rvalue reference.
This version improves the handling of expression categories, and the
lambda argument category will reflect the argument category in the above
scenario.
Fixes#148238.
When GFNI is present, custom bit reversal lowerings for scalar integers
become active. They work by swapping the bytes in the scalar value and
then reversing bits in a vector of bytes. However, the custom bit
reversal lowering for a vector of bytes is disabled if GFNI is present
in isolation, resulting messed up code.
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
We're going to end up repeating the operand extraction four times once
all of the routines have been updated to support both plain load/store
and vp.load/vp.store. I plan to add masked.load/masked.store in the near
future, and we'd need to add that to each of the four cases. Instead,
factor out a single copy of the operand normalization.
This happens only when you use larger tile size, which is greater than
or equal to the dimension size. In this case, it is a full slice, so it
is fusible.
The IR can be generated during the TileAndFuse process. It is hard to
fix in such driver, so we enable the naive fusion for the case.
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
…s to replicated form
This adds a new SPIR-V dialect-level conversion pass
`ConversionToReplicatedConstantCompositePass`. This pass looks for splat
composite `spirv.Constant` or `spirv.SpecConstantComposite` and rewrites
them into `spirv.EXT.ConstantCompositeReplicate` or
`spirv.EXT.SpecConstantCompositeReplicate`, respectively.
---------
Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>
Fixed a crash in the following example:
```
subroutine sub()
implicit none
print *, (i, i = 1, 2) ! Problem: using undefined var in implied-do loop
end subroutine sub
```
The error message was already generated, but the compiler crashed before
it could display it.
Some of the packed build_vector use vgpr_32 for i16/f16/bf16.
In gfx11, bf16 arithmetic get promoted to f32 and this is done via v2i16
pack. In true16 mode this v2i16 pack is selected to a
build_vector/v_lshlrev pattern which only accepts VGPR32. This causes
isel to insert an illegal copy "vgpr32 = copy vgpr16" between def and
use. In the end this illegal copy confuses cse pass and trigger wrong
code elimination.
Remove the packed build_vector pattern from true16. After removal, ISel
will use vgpr16 build_vector patterns instead.