We were using the step security fork after the tj-actions/changed-files
supply chain attack given Github disabled the repo and all our actions
were failing during that time. Switch away from the fork back to the
main repository to avoid an extra level of indirection until we can
probably just stop using this action/roll our own.
Modernized it to using `update_test_checks` which addresses an ambgiuty
in the previous test formulation, where a profile metadaat of value `i32
1` would have (incorrectly matched.
This PR adds patterns to distribute vector.step and vector.shape_cast op
from wg to sg and it also enables constant, broadcast and elementwise
ops to handle the slice attribute
There was a recent patch that added in some tests to the lit test suite
that use split-file. An explicit dependency in CMake was not added,
which led to check-lit not working if being run without doing a full
build first. This patch explicitly adds the dependency inside the CMake
file to fix this configuration.
This reverts commit 1bafd020c7.
This breaks the LLDB data formatters which means these failures show up
on every premerge run. Reverting for now until fixing the LLDB
formatters can be coordinated with a relanding.
This commit:
- Introduces a new `InParallelOpInterface`, along with the
`ParallelCombiningOpInterface`, represent the parallel updating
operations we have in a parallel loop of `scf.forall`.
- Change the name of `ParallelCombiningOpInterface` to
`InParallelOpInterface` as the naming was quite confusing.
- `ParallelCombiningOpInterface` now is used to generalize operations
that insert into shared tensors within parallel combining regions.
Previously, only `tensor.parallel_insert_slice` was supported directly
in `scf.InParallelOp` regions.
- `tensor.parallel_insert_slice` now implements
`ParallelCombiningOpInterface`.
This change enables future extensions to support additional parallel
combining operations beyond `tensor.parallel_insert_slice`, which have
different update semantics, so the `in_parallel` region can correctly
and safely represent these kinds of operation without potential mistakes
such as races.
Author credits: @qedawkins
When materializing integer ranges of splat tensors or vector as
constants, they should use DenseElementsAttr of the shaped type, not
IntegerAttrs of the element types, since this can violate the invariants
of tensor/vector ops.
Co-authored-by: Jeff Niu <jeffniu@openai.com>
This test errors when trying to append to the `%t` file when run in an
environment where the source tree is mounted read-only, since `cp`
preserves the read-only file permission.
Dependabot cannot configure the branch prefix, which means it fails
everytime it tries to run because we only allow user/ branches.
This is in preparation for using Renovate which supports custom branch
prefixes and has other advantages, like the ability to run/get setup
without any assisstance from a repository admin unlike dependabot. This
makes it significantly more hackable for the rest of the community.
Reverts llvm/llvm-project#157529
Sorry, I missed that the missed that the LLVM test was using clang -
layering dictates thats not OK. Please readjust the test case to work
like the existing test coverage (or perhaps the existing test coverage
is sufficient?) and post a new PR.
This change was motivated by CK where many VMCNT(0)'s were generated due
to instructions lacking !alias.scope metadata. The two causes of this
were:
1) LowerLDSModule not tacking on scope metadata on a single LDS variable
2) IPSCCP pass before inliner replacing noalias ptr derivative with a
global value, which made inliner unable to track it back to the noalias
ptr argument.
However, it turns out that IPSCCP losing the scope information was
largely ineffectual as ScopedNoAliasAA was able to handle asymmetric
condition, where one MemLoc was missing scope, and still return NoAlias
result.
AMDGPU however was checking for existence of scope in SIInsertWaitcnts
and conservatively treating it as aliasing all and inserted VMCNT(0)
before DS_READs, forcing it to wait for all previous LDS DMA
instructions.
Since we know that ScopedNoAliasAA can handle asymmetry, we should also
allow AA query to determine if two MIs may alias.
Passed PSDB.
Previous attempt to address the issue in IPSCCP, likely stalled:
https://github.com/llvm/llvm-project/pull/154522
This solution may be preferrable over that as issue only affects AMDGPU.
- Introduced a new method `IsNVPTX()` in `ArchSpec` to check for NVPTX
architecture.
- Implemented the corresponding method in `ArchSpec.cpp` to utilize the
existing triple architecture checks.
Make IntrinsicsToAttributesMap's func. and arg. fields be able to have
adaptive sizes based on input other than hardcoded 8bits/8bits.
This will ease the pressure for adding new intrinsics in private
downstreams.
func. attr bitsize will become 7(127/128) vs 8(255/256)
### Context
#99710 introduced `.loc_label` so we can terminate a line sequence.
However, it did not advance PC properly. This is problematic for
1-instruction functions as it will have zero-length sequence. The test
checked in that PR shows the problem:
```
# CHECK-LINE-TABLE: Address Line Column File ISA Discriminator OpIndex Flags
# CHECK-LINE-TABLE-NEXT: ------------------ ------ ------ ------ --- ------------- ------- -------------
# CHECK-LINE-TABLE-NEXT: 0x00000028: 05 DW_LNS_set_column (1)
# CHECK-LINE-TABLE-NEXT: 0x0000002a: 00 DW_LNE_set_address (0x0000000000000000)
# CHECK-LINE-TABLE-NEXT: 0x00000035: 01 DW_LNS_copy
# CHECK-LINE-TABLE-NEXT: 0x0000000000000000 1 1 1 0 0 0 is_stmt
# CHECK-LINE-TABLE-NEXT: 0x00000036: 00 DW_LNE_end_sequence
# CHECK-LINE-TABLE-NEXT: 0x0000000000000000 1 1 1 0 0 0 is_stmt end_sequence
```
Both rows having PC 0x0 is incorrect, and parsers won't be able to parse
them. See more explanation why this is wrong in #154851.
### Design
This PR attempts to fix this by advancing the PC to the next available
Label, and advance to the end of the section if no Label is available.
### Implementation
- `emitDwarfLineEndEntry` will advance PC to the `CurrLabel`
- If `CurrLabel` is null, its probably a fake LineEntry we introduced in
#110192. In that case look for the next Label
- If still not label can be found, use `null` and
`emitDwarfLineEndEntry` is smart enough to advance PC to the end of the
section
- Rename `LastLabel` to `PrevLabel`, "last" can mean "previous" or
"final", this is ambigous.
- Updated the tests to emit a correct label.
### Note
This fix should render #154986 and #154851 obsolete, they were temporary
fixes and don't resolve the root cause.
---------
Signed-off-by: Peter Rong <PeterRong@meta.com>
Now that ulimit is implemented for the internal shell, we can make sure
that the clang tests utilizing ulimit actually work. One just needs the
removal of its shell requirement while the other one needs some rework
to avoid bash for loops. These are writtein in Python for about the same
amount of complexity.
Reviewers: ilovepi, cmtice, AaronBallman, Sirraide, petrhosek
Reviewed By: ilovepi
Pull Request: https://github.com/llvm/llvm-project/pull/157977
Reverts llvm/llvm-project#157656
There are multiple reports that this is causing miscompiles in the MSan
test suite after bootstrapping and that this is causing miscompiles in
rustc. Let's revert for now, and work to capture a reproducer next week.
The neutral values for these are -1U, and 0 respectively. We already
have good arithmetic lowerings for selects with one arm equal to these
values. smin/smax are a bit harder, and will be a separate change.
Somewhat surprisingly, this looks to be a net code improvement in all of
the configurations. With both zbb, it's a clear win. With only zicond,
we still seem to come out ahead because we reduce the number of ziconds
needed (since we lower min/max to them). Without either zbb or zicond,
we're a bit more of wash, but the available arithmetic sequences are
good enough that doing the select unconditionally before using branches
for the min/max is probably still worthwhile?
C++11 allows the use of Universal Character Names (UCNs) in identifiers,
including function names.
According to the spec the behavior of std::isalpha(ch) and
std::isalnum(ch) is undefined if the argument's value is neither
representable as unsigned char nor equal to EOF. To use these functions
safely with plain chars (or signed chars), the argument should first be
converted to unsigned char.
A DependentTemplateSpecializationType (DTST) is basically just a
TemplateSpecializationType (TST) with a hardcoded DependentTemplateName
(DTN) as its TemplateName.
This removes the DTST and replaces all uses of it with a TST, removing a
lot of duplication in the implementation.
Technically the hardcoded DTN is an optimization for a most common case,
but the TST implementation is in better shape overall and with other
optimizations, so this patch ends up being an overall performance
positive:
<img width="1465" height="38" alt="image"
src="https://github.com/user-attachments/assets/084b0694-2839-427a-b664-eff400f780b5"
/>
A DTST also didn't allow a template name representing a DTN that was
substituted, such as from an alias template, while the TST does allow it
by the simple fact it can hold an arbitrary TemplateName, so this patch
also increases the amount of sugar retained, while still being faster
overall.
Example (from included test case):
```C++
template<template<class> class TT> using T1 = TT<int>;
template<class T> using T2 = T1<T::template X>;
```
Here we can now represent in the AST that `TT` was substituted for the
dependent template name `T::template X`.
This patch implements ulimit inside the lit internal shell.
Implementation wise, this functions similar to umask. But instead of
setting the limits within the lit test worker process, we set
environment variables and add a wrapper around the command to be
executed. The wrapper then sets the limits. This is because we cannot
increase the limits after lowering them, so we would otherwise end up
with a lit test worker stuck with a lower limit.
There are several tests where the use of ulimit is essential to the
semantics of the test (two in clang, ~7 in compiler-rt), so we need to
implement this in order to switch on the internal shell by default
without losing test coverage.
Reviewers: cmtice, petrhosek, ilovepi
Reviewed By: cmtice, ilovepi
Pull Request: https://github.com/llvm/llvm-project/pull/157958
If we have a phi where one of it's source blocks is an unreachable
block, we don't want to traverse back into the unreachable region. Doing
so allows e.g. finding a trivial self loop when walking back the
predecessor chain.
Add support for distributing the `vector.multi_reduction` operation
across lanes in a warp. Currently only 2D to 1D reductions are
supported. Given layouts for the source and accumulator vectors,
* If the reduction dimension is distributed across lanes, the reduction
is non-lane-local and the reduction is done using warp shuffles. Here we
simply rewrite the `MultiDimReductionOp` to a sequence of `ReductionOp`s
inside the warp op body. Actual distribution will be done by
`WarpOpReduction` pattern.
* If the reduction dimension is not distributed across lanes, the
reduction is lane-local. In this case, we yield the source and
accumulator vectors from the warp op and perform the lane-local
reduction outside the warp op using a sequence of `ReductionOp`s.
PR also adds support for distributing `vector.shape_cast` based on
layouts.
[lldb] Track CFA pointer metadata in StackID
In this commit:
9c8e716442 [lldb] Make StackID call Fix{Code,Data} pointers (#152796)
We made StackID keep track of the CFA without any pointer metadata in
it. This is necessary when comparing two StackIDs to determine which one
is "younger".
However, the CFA inside StackIDs is also used in other contexts through
the method StackID::GetCallFrameAddress. One notable case is
DWARFExpression: the computation of `DW_OP_call_frame_address` is done
using StackID. This feeds into many other places, e.g. expression
evaluation may require the address of a variable that is computed from
the CFA; to access the variable without faulting, we may need to
preserve the pointer metadata. As such, StackID must be able to provide
both versions of the CFA.
In the spirit of allowing consumers of pointers to decide what to do
with pointer metadata, this patch changes StackID to store both versions
of the cfa pointer. Two getter methods are provided, and all call sites
except DWARFExpression preserve their existing behavior (stripped
pointer). Other alternatives were considered:
* Just store the raw pointer. This would require changing the
comparisong operator `<` to also receive a Process, as the comparison
requires stripped pointers. It wasn't clear if all call-sites had a
non-null process, whereas we know we have a process when creating a
StackID.
* Store a weak pointer to the process inside the class, and then strip
metadata as needed. This would require a `weak_ptr::lock` in many
operations of LLDB, and it felt wasteful. It also prevents stripping
of the pointer if the process has gone away.
This patch also changes RegisterContextUnwind::ReadFrameAddress, which
is the method computing the CFA fed into StackID, to also preserve the
signature pointers.
This patch fixes:
llvm/lib/Transforms/Utils/SimplifyCFG.cpp:338:6: error: unused
function 'isSelectInRoleOfConjunctionOrDisjunction'
[-Werror,-Wunused-function]
Extend replicateByVF added in #142433 (aa24029319) to also explicitly
unroll replicating VPInstructions.
Now the only remaining case where we replicate for all lanes is
VPReplicateRecipes in replicate regions.
PR: https://github.com/llvm/llvm-project/pull/155102
This is necessary when using ASan, since the larger code size will lead
to errors such as:
```
JIT session error: In graph clojure_core-clojure.core$clojure_core_cpp_cast_24538-24543-jitted-objectbuffer, section .eh_frame: relocation target 0x7bffe374b000 (DW.ref.__gxx_personality_v0) is out of range of Delta32 fixup at address 0x7bffe374b000 (<anonymous block> @ 0x7fffebf48158 + 0x13)
```
Previously, `clang::Interpreter` would hard-code the usage of a small
code model. With this change, we default to small, but allow for custom
values. This related to #102858 and #135401.
There is no change to default behavior here.
@lhames for review.
The AAPCS recommends avoiding the use of x18 as it may be used for other
purposes such as a shadow call stack.
In this particular case it could just as well use x16 instead.