Switch to C++14 standard method as llvm::make_unique has been removed (
https://reviews.llvm.org/D66259). Also mark some targets as c++14 to ease next
integrates.
PiperOrigin-RevId: 263953918
*) Factors slice union computation out of LoopFusion into Analysis/Utils (where other iteration slice utilities exist).
*) Generalizes slice union computation to take the union of slices computed on all loads/stores pairs between source and destination loop nests.
*) Fixes a bug in FlatAffineConstraints::addSliceBounds where redundant constraints were added.
*) Takes care of a TODO to expose FlatAffineConstraints::mergeAndAlignIds as a public method.
--
PiperOrigin-RevId: 250561529
Similarly to other value-type wrappers, the default constructor of AffineExpr
constructs a null object and removes the need for an explicit ::Null
constructor. Drop it and remove the only user which can trivially rely on the
default constructor.
--
PiperOrigin-RevId: 249207502
* dyn_cast_or_null
- This will first check if the operation is null before trying to 'dyn_cast':
Value *v = ...;
if (auto forOp = dyn_cast_or_null<AffineForOp>(v->getDefiningOp()))
...
* isa_nonnull
- This will first check if the pointer is null before trying to 'isa':
Value *v = ...;
if (isa_nonnull<AffineForOp>(v->getDefiningOp());
...
--
PiperOrigin-RevId: 242171343
inherited constructors, which is cleaner and means you can now use DimOp()
to get a null op, instead of having to use Instruction::getNull<DimOp>().
This removes another 200 lines of code.
PiperOrigin-RevId: 240068113
This eliminate ConstOpPointer (but keeps OpPointer for now) by making OpPointer
implicitly launder const in a const incorrect way. It will eventually go away
entirely, this is a progressive step towards the new const model.
PiperOrigin-RevId: 239512640
- fix for getConstantBoundOnDimSize: floordiv -> ceildiv for extent
- make getConstantBoundOnDimSize also return the identifier upper bound
- fix unionBoundingBox to correctly use the divisor and upper bound identified by
getConstantBoundOnDimSize
- deal with loop step correctly in addAffineForOpDomain (covers most cases now)
- fully compose bound map / operands and simplify/canonicalize before adding
dim/symbol to FlatAffineConstraints; fixes false positives in -memref-bound-check; add
test case there
- expose mlir::isTopLevelSymbol from AffineOps
PiperOrigin-RevId: 238050395
- compute tile sizes based on a simple model that looks at memory footprints
(instead of using the hardcoded default value)
- adjust tile sizes to make them factors of trip counts based on an option
- update loop fusion CL options to allow setting maximal fusion at pass creation
- change an emitError to emitWarning (since it's not a hard error unless the client
treats it that way, in which case, it can emit one)
$ mlir-opt -debug-only=loop-tile -loop-tile test/Transforms/loop-tiling.mlir
test/Transforms/loop-tiling.mlir:81:3: note: using tile sizes [4 4 5 ]
for %i = 0 to 256 {
for %i0 = 0 to 256 step 4 {
for %i1 = 0 to 256 step 4 {
for %i2 = 0 to 250 step 5 {
for %i3 = #map4(%i0) to #map11(%i0) {
for %i4 = #map4(%i1) to #map11(%i1) {
for %i5 = #map4(%i2) to #map12(%i2) {
%0 = load %arg0[%i3, %i5] : memref<8x8xvector<64xf32>>
%1 = load %arg1[%i5, %i4] : memref<8x8xvector<64xf32>>
%2 = load %arg2[%i3, %i4] : memref<8x8xvector<64xf32>>
%3 = mulf %0, %1 : vector<64xf32>
%4 = addf %2, %3 : vector<64xf32>
store %4, %arg2[%i3, %i4] : memref<8x8xvector<64xf32>>
}
}
}
}
}
}
PiperOrigin-RevId: 237461836
Supports use case where FlatAffineConstraints::composeMap adds dim identifiers with no SSA values (because the identifiers are the result of an AffineValueMap which is not materialized in the IR and thus has no SSA Value results).
PiperOrigin-RevId: 237145506
Adds utility to convert slice bounds to a FlatAffineConstraints representation.
Adds utility to FlatAffineConstraints to promote loop IV symbol identifiers to dim identifiers.
PiperOrigin-RevId: 236973261
- fix for the mod detection
- simplify/avoid the mod at construction (if the dividend is already known to be less
than the divisor), since the information is available at hand there
PiperOrigin-RevId: 236882988
- this was detected when memref-bound-check was run on the output of the
loop-fusion pass
- the addition (to represent ceildiv as a floordiv) had to be performed only
for the constant term of the constraint
- update test cases
- memref-bound-check no longer returns an error on the output of this test case
PiperOrigin-RevId: 236731137
- This change only impacts the cost model for fusion, given the way
addSliceBounds was being used. It so happens that the output in spite of this
CL's fix is the same; however, the assertions added no longer fail. (an
invalid/inconsistent memref region was being used earlier).
PiperOrigin-RevId: 236405030
*) Breaks fusion pass into multiple sub passes over nodes in data dependence graph:
- first pass fuses single-use producers into their unique consumer.
- second pass enables fusing for input-reuse by fusing sibling nodes which read from the same memref, but which do not share dependence edges.
- third pass fuses remaining producers into their consumers (Note that the sibling fusion pass may have transformed a producer with multiple uses into a single-use producer).
*) Fusion for input reuse is enabled by computing a sibling node slice using the load/load accesses to the same memref, and fusion safety is guaranteed by checking that the sibling node memref write region (to a different memref) is preserved.
*) Enables output vector and output matrix computations from KFAC patches-second-moment operation to fuse into a single loop nest and reuse input from the image patches operation.
*) Adds a generic loop utilitiy for finding all sequential loops in a loop nest.
*) Adds and updates unit tests.
PiperOrigin-RevId: 236350987
- handle floordiv/mod's in loop bounds for all analysis purposes
- allows fusion slicing to be more powerful
- add simple test cases based on -memref-bound-check
- fusion based test cases in follow up CLs
PiperOrigin-RevId: 236328551
- add a method to merge and align the spaces (identifiers) of two
FlatAffineConstraints (both get dimension-wise and symbol-wise unique
columns)
- this completes several TODOs, gets rid of previous assumptions/restrictions
in composeMap, unionBoundingBox, and reuses common code
- remove previous workarounds / duplicated funcitonality in
FlatAffineConstraints::composeMap and unionBoundingBox, use mergeAlignIds
from both
PiperOrigin-RevId: 236320581
- detect more trivially redundant constraints in
FlatAffineConstraints::removeTrivialRedundantConstraints. Redundancy due to
constraints that only differ in the constant part (eg., 32i + 64j - 3 >= 0, 32 +
64j - 8 >= 0) is now detected. The method is still linear-time and does
a single scan over the FlatAffineConstraints buffer. This detection is useful
and needed to eliminate redundant constraints generated after FM elimination.
- update GCDTightenInequalities so that we also normalize by the GCD while at
it. This way more constraints will show up as redundant (232i - 203 >= 0
becomes i - 1 >= 0 instead of 232i - 232 >= 0) without having to call
normalizeConstraintsByGCD.
- In FourierMotzkinEliminate, call GCDTightenInequalities and
normalizeConstraintsByGCD before calling removeTrivialRedundantConstraints()
- so that more redundant constraints are detected. As a result, redundancy
due to constraints like i - 5 >= 0, i - 7 >= 0, 2i - 5 >= 0, 232i - 203 >=
0 is now detected (here only i >= 7 is non-redundant).
As a result of these, a -memref-bound-check on the added test case runs in 16ms
instead of 1.35s (opt build) and no longer returns a conservative result.
PiperOrigin-RevId: 235983550
LoopFusion
- getConstDifference in LoopFusion is pending a refactoring to handle bounds
with min's and max's; it currently asserts on some useful test cases that we
want to experiment with. This CL changes getSliceBounds to be more
conservative so as to not trigger the assertion. Filed b/126426796 to track this.
PiperOrigin-RevId: 235826538
Analysis - NFC
- refactor AffineExprFlattener (-> SimpleAffineExprFlattener) so that it
doesn't depend on FlatAffineConstraints, and so that FlatAffineConstraints
could be moved out of IR/; the simplification that the IR needs for
AffineExpr's doesn't depend on FlatAffineConstraints
- have AffineExprFlattener derive from SimpleAffineExprFlattener to use for
all Analysis/Transforms purposes; override addLocalFloorDivId in the derived
class
- turn addAffineForOpDomain into a method on FlatAffineConstraints
- turn AffineForOp::getAsValueMap into an AffineValueMap ctor
PiperOrigin-RevId: 235283610
* AffineStructures has moved to IR.
* simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR.
* makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps.
* ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted.
PiperOrigin-RevId: 232586468
loops), (2) take into account fast memory space capacity and lower 'dmaDepth'
to fit, (3) add location information for debug info / errors
- change dma-generate pass to work on blocks of instructions (start/end
iterators) instead of 'for' loops; complete TODOs - allows DMA generation for
straightline blocks of operation instructions interspersed b/w loops
- take into account fast memory capacity: check whether memory footprint fits
in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA
generation is performed until it does fit in the provided memory
- add location information to MemRefRegion; any insufficient fast memory
capacity errors or debug info w.r.t dma generation shows location information
- allow DMA generation pass to be instantiated with a fast memory capacity
option (besides command line flag)
- change getMemRefRegion to return unique_ptr's
- change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst'
- other helper methods; add postDomInstFilter option for
replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods
Eg. output
$ mlir-opt -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir
/tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity
for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
^
$ mlir-opt -debug-only=dma-generate -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir
/tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block
for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
PiperOrigin-RevId: 232297044