intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-16 05:32:28 +08:00

Author	SHA1	Message	Date
Bill Nell	510f227cbd	BOLT: Add feature to sort functions by dyno stats. Summary: Add -print-sorted-by and -print-sorted-by-order command line options. The first option takes a list of dyno stats keys used to sort functions that are printed at the end of all optimization passes. Only the top 100 functions are printed. The -print-sorted-by-order option can be either ascending or descending (descending is the default). (cherry picked from FBD3898818)	2016-09-20 20:55:49 -07:00
Maksim Panchenko	62bff426c3	Do no collect dyno stats on functions with stale profile. Summary: Dyno stats collected on functions with invalid profile may appear completely bogus. Skip them. (cherry picked from FBD3879371)	2016-09-16 13:13:16 -07:00
Maksim Panchenko	2c9bf9afd6	Add PLT dyno stats. Summary: Get PLT call stats. (cherry picked from FBD3874799)	2016-09-15 15:47:10 -07:00
Maksim Panchenko	c4e36c1dd6	Fix issue with zero-size duplicate function symbols. Summary: While working on PLT dyno stats I've noticed that we were missing BinaryFunctions for some symbols that were not PLT. Upon closer inspection turned out that those symbols were marked as zero-sized functions in symbol table, but they had duplicates with non-zero size. Since the zero-size symbols were preceding other duplicates, we were not creating BinaryFunction for them and they were not added as duplicates. The 2 most prominent functions that were missing for a test were free() and malloc(). There's not much to optimize in these functions, but they were contributing quite significantly to dyno stats. As a result dyno stats for this test needed an adjustment. Also several assembly functions (e.g. _init()) had zero size, and now we set the size to the max size and start processing those. It's good for coverage but will not affect the performance. (cherry picked from FBD3874622)	2016-09-15 15:47:10 -07:00
Maksim Panchenko	8dbf0e2b3d	Add dyno stats for jump tables. Summary: Add dyno stats for jump tables. (cherry picked from FBD3871035)	2016-09-15 10:24:22 -07:00
Maksim Panchenko	2f3a859772	Add experimental jump table support. Summary: Option "-jump-tables=1" enables experimental support for jump tables. The option hasn't been tested with optimizations other than block re-ordering. Only non-PIC jump tables are supported at the moment. (cherry picked from FBD3867849)	2016-09-14 16:45:40 -07:00
Bill Nell	7483cd0fa6	BOLT: Clean up interface between BinaryFunction and BinaryBasicBlock. Summary: This is just a bit of refactoring to make sure that BinaryFunction goes through methods to get at the state in BinaryBasicBlock. I did this so that changing the way Index/LayoutIndex/Valid works will be easier. (cherry picked from FBD3860899)	2016-09-13 17:12:00 -07:00
Maksim Panchenko	b0f4031db3	Add cluster randomization layout algorithm. Summary: Add "-reorder-blocks=cluster-shuffle" for performance experiments. Use "-bolt-seed=<N>" to set a randomization seed. (cherry picked from FBD3851035)	2016-09-11 14:33:58 -07:00
Maksim Panchenko	52bfc3f92f	Fix switch table detection. Disassemble all instructions in non-simple functions. Summary: Switch table can contain __builtin_unreachable(). As a result, a compiler may place an entry into a jump table that contains an address immediately past the last instruction in the function. Sometimes it may coincide with a start of the next function in the binary. Thus when we check for switch tables in such cases we have to check more than a single entry until we see either an address inside containing function or some address outside different from the address past the last instruction. Additonally, don't stop disassembly after discovering that the function was not simple. We need to detect all outside references whenever possible. (cherry picked from FBD3850825)	2016-09-12 10:12:31 -07:00
Maksim Panchenko	617c6a13b7	Use BB.getNumNonPseudos() in more places. Summary: Use BB.getNumNonPseudos() in more places. Fix analyze_potential script to pass the new parameter. (cherry picked from FBD3844416)	2016-09-09 14:42:35 -07:00
Maksim Panchenko	c4c518ee9d	Rewrite SCTC pass to do UCE and make it the last optimization pass. Summary: For now we make SCTC a special pass that runs at the end of all optimizations and transformations right after fixupBranches(). Since it's the last pass, it has to do its own UCE. (cherry picked from FBD3838051)	2016-09-08 14:52:26 -07:00
Maksim Panchenko	6bef336cc2	Add dyno stats to BOLT. Summary: Add "-dyno-stats" option that prints instruction stats based on the execution profile similar to below: BOLT-INFO: program-wide dynostats after optimizations: executed forward branches : 109706407 (+8.1%) taken forward branches : 13769074 (-55.5%) executed backward branches : 24517582 (-25.0%) taken backward branches : 15330256 (-27.2%) executed unconditional branches : 6009826 (-35.5%) function calls : 17192114 (+0.0%) executed instructions : 837733057 (-0.4%) total branches : 140233815 (-2.3%) taken branches : 35109156 (-42.8%) Also fixed pseudo instruction discrepancies and added assertions for BinaryBasicBlock::getNumPseudos() to make sure the number is synchronized with real number of pseudo instructions. (cherry picked from FBD3826995)	2016-08-29 21:11:22 -07:00
Maksim Panchenko	17e691915b	Make BinaryFunction::fixBranches() more flexible and support CFG updates. Summary: The CFG represents "the ultimate source of truth". Transformations on functions and blocks have to update the CFG and fixBranches() would make sure the correct branch instructions are inserted at the end of basic blocks (or removed when necessary). We do require a conditional branch at the end of the basic block if the block has 2 successors as CFG currently lacks the conditional code support (it will probably stay that way). We only use this branch instruction for its conditional code, the destination is determined by CFG - first successor representing true/taken branch, while the second successor - false/fall-through branch. When we reverse the branch condition, the CFG is updated accordingly. The previous version used to insert jumps after some terminating instructions sometimes resulting in a larger code than needed. As a result with the new version 1 extra function becomes overwritten for HHVM binary. With this diff we also convert conditional branches with one successor (result of code from __builtin_unreachable()) into unconditional jumps. (cherry picked from FBD3802062)	2016-08-29 21:11:22 -07:00
Bill Nell	48b55300e0	BOLT: Make most command line options ZeroOrMore. Summary: This will make it easier to run experiments with the same baseline BOLT binary but different command line options. (cherry picked from FBD3831978)	2016-09-07 14:41:56 -07:00
Maksim Panchenko	1cf200107e	Fix tail call conversion and test cases. Summary: A previous diff accidentally disabled tail call conversion. Additionally some test cases relied on output of "-v=2". Fix those. (cherry picked from FBD3823760)	2016-09-06 13:19:26 -07:00
Bill Nell	c27a6a5c63	Add verbosity level and clean up stream usage. Summary: I've added a verbosity level to help keep the BOLT spewage to a minimum. The default level is pretty terse now, level 1 is closer to the original, I've saved level 2 for the noisiest of messages. Error messages should never be suppressed by the verbosity level only warnings and info messages. The rational behind stream usage is as follows: outs() for info and debugging controlled by command line flags. errs() for errors and warnings. dbgs() for output within DEBUG(). With the exception of a few of the level 2 messages I don't have any strong feelings about the others. (cherry picked from FBD3814259)	2016-09-02 14:15:29 -07:00
Maksim Panchenko	43acb6a28a	Emit remember_state CFI in the same code region as restore_state. Summary: While creating remember_state/restore_state CFI sequences, we were always placing remember_state instruction into the first basic block. However, when we have hot-cold splitting, the cold part has and independent FDE entry in .eh_frame, and thus the restore_state instruction was missing its counter part. The fix is to adjust the basic block that is used for placing remember_state instruction whenever we see the hot-cold split boundary. (cherry picked from FBD3767102)	2016-08-24 14:25:33 -07:00
Maksim Panchenko	97f598fd17	Handling for indirect tail calls. Summary: Analyze indirect branches and convert them into indirect tail calls when possible. We analyze the memory contents when the address could be calculated statically and also detect epilogue code. (cherry picked from FBD3754395)	2016-08-22 14:24:09 -07:00
Maksim Panchenko	a10fb73ab3	Compute ClusterEdges only when necessary. Summary: We only need ClusterEdges in reordering algorithm optimized for branches and the computation is quite resource-hungry, thus it makes sense to only do it when needed. Some refactoring too. (cherry picked from FBD3721107)	2016-08-15 15:37:00 -07:00
Bill Nell	406aa62083	Add additional info to BOLT graphviz CFG dumps. Summary: Add the following info the graphviz CFG dump: - Edges are labeled with the jmp instruction that leads to that edge. - Edges include the count and misprediction count. - Nodes have (offset, BB index, BB layout index) - Nodes optionally have tooltips which contain the code of the basic block. (enabled with -dot-tooltip-code) - Added dashed edges to landing pads. (cherry picked from FBD3646568)	2016-07-29 19:18:37 -07:00
Maksim Panchenko	36df6057b0	Refactoring. Mainly NFC. Summary: Eliminated BinaryFunction::getName(). The function was confusing since the name is ambigous. Instead we have BinaryFunction::getPrintName() used for printing and whenever unique string identifier is needed one can use getSymbol()->getName(). In the next diff I'll have a map from MCSymbol to BinaryFunction in BinaryContext to facilitate function lookup from instruction operand expressions. There's one bug fixed where the function was called only under assert() in ICF::foldFunction(). For output we update all symbols associated with the function. At the moment it has no effect on the generated binary but in the future we would like to have all symbols in the symbol table updated. (cherry picked from FBD3704790)	2016-08-07 12:35:23 -07:00
Theodoros Kasampalis	32739247eb	More aggressive inlining pass Summary: This adds functionality for a more aggressive inlining pass, that can inline tail calls and functions with more than one basic block. (cherry picked from FBD3677856)	2016-07-29 14:17:06 -07:00
Bill Nell	82d76ae18b	Add MCInst annotation mechanism to MCInstrAnalysis class. Summary: Add three new MCOperand types: Annotation, LandingPad and GnuArgsSize. Annotation is used for associating random data with MCInsts. Clients can construct their own annotation types (subclassed from MCAnnotation) and associate them with instructions. Annotations are looked up by string keys. Annotations can be added, removed and queried using an instance of the MCInstrAnalysis class. The LandingPad operand is a MCSymbol, uint64_t pair used to encode exception handling information for call instructions. GnuArgsSize is used to annotate calls with the DW_CFA_GNU_args_size attribute. (cherry picked from FBD3597877)	2016-07-28 10:34:50 -07:00
Theodoros Kasampalis	713e361f36	Fix for correct disassembling of conditional tail calls. Summary: BOLT attempts to convert jumps that serve as tail calls to dedicated tail call instructions, but this is impossible when the jump is conditional because there is no corresponding tail call instruction. This was causing the creation of a duplicate fall-through edge for basic blocks terminated with a conditional jump serving as a tail call when there is profile data available for the non-taken branch. In this case, the first fall-through edge had a count taken from the profile data, while the second has a count computed (incorrectly) by BinaryFunction::inferFallThroughCounts. (cherry picked from FBD3560504)	2016-07-13 18:57:40 -07:00
Maksim Panchenko	486ab273c7	Add printing support for indirect tail calls. Summary: LLVM was missing assembler print string for indirect tail calls which are synthetic instructions created by us. (cherry picked from FBD3640197)	2016-07-28 18:49:48 -07:00
Bill Nell	50e011f4e5	CFG editing functions Summary: This diff adds a number of methods to BinaryFunction that can be used to edit the CFG after it is created. The basic public functions are: - createBasicBlock - create a new block that is not inserted into the CFG. - insertBasicBlocks - insert a range of blocks (made with createBasicBlock) into the CFG. - updateLayout - update the CFG layout (either by inserting new blocks at a certain point or recomputing the entire layout). - fixFallthroughBranch - add a direct jump to the fallthrough successor for a given block. There are a number of private helper functions used to implement the above. This was split off the ICP diff to simplify it a bit. (cherry picked from FBD3611313)	2016-07-23 12:50:34 -07:00
Theodoros Kasampalis	ab599fe71a	Basic block clustering algorithm for minimizing branches. Summary: This algorithm is similar to our main clustering algorithm but uses a different heuristic for selecting edges to become fall-throughs. The weight of an edge is calculated as the win in branches if we choose to layout this edge as a fall-through. For example, the edges A -> B with execution count 100 and A -> C with execution count 500 (where B and C are the only successors of A) have weights -400 and +400 respectively. (cherry picked from FBD3606591)	2016-07-15 16:11:30 -07:00
Theodoros Kasampalis	a9bb3320ad	Identical Code Folding (ICF) pass Summary: Added an ICF pass to BOLT, that can recognize identical functions and replace references to these functions with references to just one representative. (cherry picked from FBD3460297)	2016-06-09 11:36:55 -07:00
Bill Nell	82401630a2	Factor out instruction printing and size computation. Summary: I've factored out the instruction printing and size computation routines to methods on BinaryContext. I've also added some more debug print functions. This was split off the ICP diff to simplify it a bit. (cherry picked from FBD3610690)	2016-07-23 08:01:53 -07:00
Theodoros Kasampalis	156a55209c	Simplification of loads from read-only data sections. Summary: Instructions that load data from the a read-only data section and their target address can be computed statically (e.g. RIP-relative addressing) are modified to corresponding instructions that use immediate operands. We apply the transformation only when the resulting instruction will have smaller or equal size. (cherry picked from FBD3397112)	2016-06-03 00:58:11 -07:00
Theodoros Kasampalis	17b846586c	Loop detection for BOLT's CFG. Summary: Loop detection for the CFG data structure. Added a GraphTraits specialization for BOLT's CFG that allows us to use LLVM's loop detection interface. (cherry picked from FBD3604837)	2016-05-26 10:58:01 -07:00
Maksim Panchenko	bf46263eed	Shorten instructions if possible. Summary: Generate short versions of branch instructions by default and rely on relaxation to produce longer versions when needed. Also produce short versions of arithmetic instructions if immediate fits into one byte. This was only triggered once on HHVM binary. (cherry picked from FBD3591466)	2016-07-19 11:19:18 -07:00
Theodoros Kasampalis	c20506c570	Fix in inferFallthroughCounts Summary: This fixes the initialization of basic block execution counts, where we should skip edges to the first basic block but we were not skipping the corresponding profile info. Also, I removed a check that was done twice. (cherry picked from FBD3519265)	2016-07-03 21:30:35 -07:00
Bill Nell	260f6fbdb6	Add option to dump CFGs in (simple) graphviz format during all passes. Summary: I noticed the BinaryFunction::viewGraph() method that hadn't been implemented and decided I could use a simple DOT dumper for CFGs while working on the indirect call optimization. I've implemented the bare minimum for the dumper. It's just nodes+BB labels with dges. We can add more detailed information as needed/desired. (cherry picked from FBD3509326)	2016-07-01 08:40:56 -07:00
Theodoros Kasampalis	287fa51324	Fix for ignoring fall-through profile data when jump is followed by no-op Summary: When a conditional jump is followed by one or more no-ops, the destination of fall-through branch was recorded as the first no-op in FuncBranchInfo. However the fall-through basic block after the jump starts after the no-ops, so the profile data could not match the CFG and was ignored. (cherry picked from FBD3496084)	2016-06-27 14:51:38 -07:00
Theodoros Kasampalis	d09b00ebff	Refactoring of the reordering algorithms Summary: The various reorder and clustering algorithms have been refactored into separate classes, so that it is easier to add new algorithms and/or change the logic of algorithm selection. (cherry picked from FBD3473656)	2016-06-16 18:47:57 -07:00
Maksim Panchenko	f1192a7118	Support for multiple function names. Summary: With ICF optimization in the linker we were getting mismatches of function names in .fdata and BinaryFunction name. This diff adds support for multiple function names for BinaryFunction and does a match against all possible names for the profile. (cherry picked from FBD3466215)	2016-06-10 17:13:05 -07:00
Maksim Panchenko	70f82d9371	Reject profile data for functions that do not match. Summary: Verify profile data for a function and reject if there are branches that don't correspond to any branches in the function CFG. Note that we have to ignore branches resulting from recursive calls. Fix printing instruction offsets in disassembled state. Allow function to have non-zero execution count even if we don't have branch information. (cherry picked from FBD3451596)	2016-06-15 18:36:16 -07:00
Bill Nell	980a06265a	Revert "Indirect call optimization." This reverts commit 33966090e18545b64013614e7929ff1bdcdf10d5. (cherry picked from FBD28110782)	2016-06-08 17:38:13 -07:00
Bill Nell	8bcfd9a392	Indirect call optimization. (cherry picked from FBD28110629)	2016-06-07 16:27:52 -07:00
Bill Nell	45e2219ae4	Allocate BinaryBasicBlocks with new rather than storing them in the BasicBlocks vector. Summary: This will help optimization passes that need to modify the CFG after it is constructed. Otherwise, the BinaryBasicBlock pointers stored in the layout, successors and predecessors would need to be modified every time a new basic block is created. (cherry picked from FBD3403372)	2016-06-07 16:27:52 -07:00
Maksim Panchenko	4460da0d81	Improvements for debug info. Summary: Assembly functions could have no corresponding DW_AT_subprogram entries, yet they are represented in module ranges (and .debug_aranges) and will have line number information. Make sure we update those. Eliminated unnecessary data structures and optimized some passes. For .debug_loc unused location entries are no longer processed resulting in smaller output files. Overall it's a small processing time improvement and memory imporement. (cherry picked from FBD3362540)	2016-05-27 20:19:19 -07:00
Theodoros Kasampalis	65ac8bbdf2	Better edge counts for fall through blocks in presence of C++ exceptions. Summary: The inference algorithm for counts of fall through edges takes possible jumps to landing pad blocks into account. Also, the landing pad block execution counts are updated using profile data. (cherry picked from FBD3350727)	2016-05-26 15:10:09 -07:00
Theodoros Kasampalis	485f9220b7	Taking LP counts into account for FT count inference (cherry picked from FBD28110493)	2016-05-24 09:26:25 -07:00
Theodoros Kasampalis	fb5f18b2dc	Correctly updating landing pad exec counts. (cherry picked from FBD28110316)	2016-05-23 16:16:25 -07:00
Maksim Panchenko	43bc4a09ad	Changed splitting options and fixed sorting. Summary: Splitting option now has different meanings/values. Since landing pads are mostly always cold/frozen, we should split them before anything else (we still check the execution count is 0). That's value '1'. Everything else goes on top of that and has increased value (2 - large functions, 3 - everything). Sorting was non-deterministic and somewhat broken for functions with EH ranges. Fixed that and added '-split-all-cold' option to outline all 0-count blocks. Fixed compilation of test cases. After my last commit the binaries were linked to wrong source files (i.e. debug info). Had to rebuild the binaries from updated sources. (cherry picked from FBD3209369)	2016-04-20 15:31:11 -07:00
Maksim Panchenko	4f44d60947	Special handling for GNU_args_size call frame instruction. Summary: GNU_args_size is a special kind of CFI that tells runtime to adjust %rsp when control is passed to a landing pad. It is used for annotating call instructions that pass (extra) parameters on the stack and there's a corresponding landing pad. It is also special in a way that its value is not handled by DW_CFA_remember_state/DW_CFA_restore_state instruction sequence that we utilize to restore the state after block re-ordering. This diff adds association of call instructions with GNU_args_size value when it's used. If the function does not use GNU_args_size, there is no overhead. Otherwise, we regenerate GNU_args_size instruction during code emission, i.e. after all optimizations and block-reordering. (cherry picked from FBD3201322)	2016-04-19 22:00:29 -07:00
Gabriel Poesia	ad344c4387	Group debugging info representation and serialization code. Summary: Moved the classes related to representing and serializing DWARF entities into a single header, DebugData.h. (cherry picked from FBD3153279)	2016-04-07 15:06:43 -07:00
Gabriel Poesia	ffa9641e16	Update DWARF lexical blocks address ranges. Summary: Updates DWARF lexical blocks address ranges in the output binary after optimizations. This is similar to updating function address ranges except that the ranges representation needs to be more general, since address ranges can begin or end in the middle of a basic block. The following changes were made: - Added a data structure for iterating over the basic blocks that intersect an address range: BasicBlockTable.h - Added some more bookkeeping in BinaryBasicBlock. Basically, I needed to keep track of the block's size in the input binary as well as its address in the output binary. This information is mostly set by BinaryFunction after disassembly. - Added a representation for address ranges relative to basic blocks (BasicBlockOffsetRanges.h). Will also serve for location lists. - Added a representation for Lexical Blocks (LexicalBlock.h) - Small refactorings in DebugArangesWriter: -- Renamed to DebugRangesSectionsWriter since it also writes .debug_ranges -- Refactored it not to depend on BinaryFunction but instead on anything that can be assined an aoffset in .debug_ranges (added an interface for that) - Iterate over the DIE tree during initialization to find lexical blocks in .debug_info (BinaryContext.cpp) - Added patches to .debug_abbrev and .debug_info in RewriteInstance to update lexical blocks attributes (in fact, this part is very similar to what was done to function address ranges and I just refactored/reused that code) - Added small test case (lexical_blocks_address_ranges_debug.test) (cherry picked from FBD3113181)	2016-03-28 17:45:22 -07:00
Maksim Panchenko	595d0885d9	Populate function execution count while parsing fdata. Summary: Populate function execution count while parsing fdata. Before we used a quadratic algorithm to populate the execution count (had to iterate over all branches for every single function). Ignore non-symbol to non-symbol branches while parsing fdata. These changes combined drop HHVM processing time from 4 minutes 53 seconds down to 2 minutes 9 seconds on my devserver. Test case had to be modified since it contained irrelevant branches from PLT to libc. (cherry picked from FBD3106263)	2016-03-28 11:06:28 -07:00

1 2

94 Commits