This reverts commit 23a5073600.
Although this patch achieved better codegen in most cases, it is really
important to accurately describe the cost of instructions. So I revert it.
During dialect conversion, target materialization is triggered to create
cast-like operations when a type mismatch occurs between the value that
replaces a rewritten operation and the type that another operations expects as
operands processed by the type conversion. First, a dummy cast is inserted to
make sure the pattern application can proceed. The decision to trigger the
user-provided materialization hook is taken later based on the result of the
dummy cast having uses. However, it only has uses if other patterns constructed
new operations using the casted value as operand. If existing (legal)
operations use the replaced value, they may have not been updated to use the
casted value yet. The conversion infra would then delete the dummy cast first,
and then would replace the uses with now-invalid (null in the bast case) value.
When deciding whether to trigger cast materialization, check for liveness the
uses not only of the casted value, but also of all the values that it replaces.
This was discovered in the finalizing bufferize pass that cleans up
mutually-cancelling casts without touching other operations. It is not
impossible that there are other scenarios where the dialect converison infra
could produce invalid operand uses because of dummy casts erased too eagerly.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D119937
This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.
Reviewed-by: xazax.hun
Differential Revision: https://reviews.llvm.org/D119953
Also apply the same fix on glibc. This takes the test one step closer
to passing on glibc, but it still fails on the zh_CN test (which
requires a more involved fix in libc++ itself).
Differential Revision: https://reviews.llvm.org/D119791
Expect the same NAN formatting on Windows as on Glibc. (Both MSVC and
MinGW produce the same formatting there.)
The hex float formatting tests pass on MinGW, so opt in to those tests.
Document exactly what issues are remaining in Clang-cl/MSVC
configurations. (It's easily possible to make the tests pass there too,
but it requires a whole lot of small-scope ifndefs in the test file;
around 60 ifdefs in total for those both test files. Those could
be avoided if the CI environment could run with a newer version
of UCRT, but that's nontrivial to fix right away.)
Differential Revision: https://reviews.llvm.org/D119766
Refactor createBinaryContext and RewriteInstance/MachORewriteInstance
constructors to report an error in a library and fuzzer-friendly way instead of
returning a nullptr or exiting.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D119658
Previously `gpu-kernel-outlining` pass was also doing index computation sinking into gpu.launch before actual outlining.
Split ops sinking from `gpu-kernel-outlining` pass into separate pass, so users can use theirs own sinking pass before outlining.
To achieve old behavior users will need to call both passes: `-gpu-launch-sink-index-computations -gpu-kernel-outlining`.
Differential Revision: https://reviews.llvm.org/D119932
ARMv5 and older architectures don’t support SMP and do not have atomic instructions. Still they’re in use in IoT world, where one has to stick to libgcc.
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D116088
Test a range of acceptable forms of SYNC IMAGES statements,
including combinations with and without the stat-variable
and errmsg-variable present. Also test that several invalid
forms of SYNC IMAGES call generate the correct error messages.
Differential Revision: https://reviews.llvm.org/D118933
Currently some of the nested IR building inconsistently uses `nb` and `b`, it's very easy to call wrong builder outside of the current scope, so for simplicity all builders are always called `b`, and in nested IR building regions they just shadow the "parent" builder.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D120003
Test a range of acceptable forms of SYNC ALL statements,
including combinations with and without the stat-variable
and errmsg-variable present. Also test that several invalid
forms of SYNC ALL call generate the correct error messages.
Differential Revision: https://reviews.llvm.org/D114181
Atomic store with Release semantic allows re-ordering of unordered load/store before the store.
Implement it.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D119844
The fortran standard views blanks in IO formats as white space in
non-string contexts. Other compilers extend this to also view horizontal
tabs as white space. Some compilers additionally add other white space
characters to this group.
Add recognition of horizontal and vertical tabs to runtime format
validation code to match what the runtime code currently does.
A very small refactoring, but a big impact on tests that expect an exact order.
This revision fixes the tests, but also makes them less brittle for similar
minor changes in the future!
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119992
In BPF backend, BTF type generation may skip
some debuginfo types if they are the pointee
type of a struct member. For example,
struct task_struct {
...
struct mm_struct *mm;
...
};
BPF backend may generate a forward decl for
'struct mm_struct' instead of full type if
there are no other usage of 'struct mm_struct'.
The reason is to avoid bringing too much unneeded types
in BTF.
Alexei found a pruning bug where we may miss
some full type generation. The following is an illustrating
example:
struct t1 { ... }
struct t2 { struct t1 *p; };
struct t2 g;
void foo(struct t1 *arg) { ... }
In the above case, we will have partial debuginfo chain like below:
struct t2 -> member p
\ -> ptr -> struct t1
/
foo -> argument arg
During traversing
struct t2 -> member p -> ptr -> struct t1
The corresponding BTF types are generated except 'struct t1' which
will be in FixUp stage. Later, when traversing
foo -> argument arg -> ptr -> struct t1
The 'ptr' BTF type has been generated and currently implementation
ignores 'pointer' type hence 'struct t1' is not generated.
This patch fixed the issue not just for the above case, but for
general case with multiple derived types, e.g.,
struct t2 -> member p
\ -> const -> ptr -> volatile -> struct t1
/
foo -> argument arg
Differential Revision: https://reviews.llvm.org/D119986
Main motivation: including `llvm/CodeGen/CommandFlags.h` in
`CommonLinkerContext.h` means that the declaration of `llvm::Reloc` is
visible in any file that includes `CommonLinkerContext.h`. Since our
cpp files have both `using namespace llvm` and `using namespace
lld::macho`, this results in conflicts with `lld::macho::Reloc`.
I suppose we could put `llvm::Reloc` into a nested namespace, but in general,
I think we should avoid transitively including too many header files in
a very widely used header like `CommonLinkerContext.h`.
RegisterCodeGenFlags' ctor initializes a bunch of function-`static`
structures and does nothing else, so it should be fine to "initialize"
it as a temporary stack variable rather than as a file static.
Reviewed By: aganea
Differential Revision: https://reviews.llvm.org/D119913
This makes `__wasm_lpad_context`, a struct that is used as a
communication channel between compiler-generated code and personality
function in libunwind, thread local. The library code will be changed to
thread local in the emscripten side.
Reviewed By: sbc100, tlively
Differential Revision: https://reviews.llvm.org/D119803
Just because there aren't AGPRs in the original program doesn't mean
the register allocator can't choose to use them (unless we were to
forcibly reserve all AGPRs if there weren't any uses). This happens in
high pressure situations and introduces copies to avoid spills.
In this test, the allocator ends up introducing a copy from SGPR to
AGPR which requires an intermediate VGPR. I don't believe it would
introduce a copy from AGPR to AGPR in this situation, since it would
be trying to use an intermediate with a different class.
Theoretically this is also broken on gfx90a, but I have been unable to
come up with a testcase.
gfx90a allows the number of ACC registers (AGPRs) to be set
independently to the VGPR registers. For both HSA and PAL metadata, we
now include an "agpr_count" key to report the number of AGPRs set for
supported devices (gfx90a, gfx908, as determined by hasMAIInsts()).
This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL.
The AsmParser also now recognizes ".kernel.agpr_count" for supported
devices.
Differential Revision: https://reviews.llvm.org/D116140
When we move an allocation from the heap to the stack we need to
allocate it in the alloca AS and then cast the result. This also
prevents us from inserting the alloca after the allocation call but
rather right before.
Fixes https://github.com/llvm/llvm-project/issues/53858
When we use liveness for edges during the `genericValueTraversal` we
need to make sure to use the AAIsDead of the correct function. This
patch adds the proper logic and some simple caching scheme. We also
add an assertion to the `isEdgeDead` call to make sure future misuse
is detected earlier.
Fixes https://github.com/llvm/llvm-project/issues/53872