intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-16 21:55:39 +08:00

Author	SHA1	Message	Date
Theodoros Kasampalis	c20506c570	Fix in inferFallthroughCounts Summary: This fixes the initialization of basic block execution counts, where we should skip edges to the first basic block but we were not skipping the corresponding profile info. Also, I removed a check that was done twice. (cherry picked from FBD3519265)	2016-07-03 21:30:35 -07:00
Bill Nell	260f6fbdb6	Add option to dump CFGs in (simple) graphviz format during all passes. Summary: I noticed the BinaryFunction::viewGraph() method that hadn't been implemented and decided I could use a simple DOT dumper for CFGs while working on the indirect call optimization. I've implemented the bare minimum for the dumper. It's just nodes+BB labels with dges. We can add more detailed information as needed/desired. (cherry picked from FBD3509326)	2016-07-01 08:40:56 -07:00
Theodoros Kasampalis	287fa51324	Fix for ignoring fall-through profile data when jump is followed by no-op Summary: When a conditional jump is followed by one or more no-ops, the destination of fall-through branch was recorded as the first no-op in FuncBranchInfo. However the fall-through basic block after the jump starts after the no-ops, so the profile data could not match the CFG and was ignored. (cherry picked from FBD3496084)	2016-06-27 14:51:38 -07:00
Theodoros Kasampalis	d09b00ebff	Refactoring of the reordering algorithms Summary: The various reorder and clustering algorithms have been refactored into separate classes, so that it is easier to add new algorithms and/or change the logic of algorithm selection. (cherry picked from FBD3473656)	2016-06-16 18:47:57 -07:00
Maksim Panchenko	f1192a7118	Support for multiple function names. Summary: With ICF optimization in the linker we were getting mismatches of function names in .fdata and BinaryFunction name. This diff adds support for multiple function names for BinaryFunction and does a match against all possible names for the profile. (cherry picked from FBD3466215)	2016-06-10 17:13:05 -07:00
Maksim Panchenko	70f82d9371	Reject profile data for functions that do not match. Summary: Verify profile data for a function and reject if there are branches that don't correspond to any branches in the function CFG. Note that we have to ignore branches resulting from recursive calls. Fix printing instruction offsets in disassembled state. Allow function to have non-zero execution count even if we don't have branch information. (cherry picked from FBD3451596)	2016-06-15 18:36:16 -07:00
Bill Nell	980a06265a	Revert "Indirect call optimization." This reverts commit 33966090e18545b64013614e7929ff1bdcdf10d5. (cherry picked from FBD28110782)	2016-06-08 17:38:13 -07:00
Bill Nell	8bcfd9a392	Indirect call optimization. (cherry picked from FBD28110629)	2016-06-07 16:27:52 -07:00
Bill Nell	45e2219ae4	Allocate BinaryBasicBlocks with new rather than storing them in the BasicBlocks vector. Summary: This will help optimization passes that need to modify the CFG after it is constructed. Otherwise, the BinaryBasicBlock pointers stored in the layout, successors and predecessors would need to be modified every time a new basic block is created. (cherry picked from FBD3403372)	2016-06-07 16:27:52 -07:00
Maksim Panchenko	4460da0d81	Improvements for debug info. Summary: Assembly functions could have no corresponding DW_AT_subprogram entries, yet they are represented in module ranges (and .debug_aranges) and will have line number information. Make sure we update those. Eliminated unnecessary data structures and optimized some passes. For .debug_loc unused location entries are no longer processed resulting in smaller output files. Overall it's a small processing time improvement and memory imporement. (cherry picked from FBD3362540)	2016-05-27 20:19:19 -07:00
Theodoros Kasampalis	65ac8bbdf2	Better edge counts for fall through blocks in presence of C++ exceptions. Summary: The inference algorithm for counts of fall through edges takes possible jumps to landing pad blocks into account. Also, the landing pad block execution counts are updated using profile data. (cherry picked from FBD3350727)	2016-05-26 15:10:09 -07:00
Theodoros Kasampalis	485f9220b7	Taking LP counts into account for FT count inference (cherry picked from FBD28110493)	2016-05-24 09:26:25 -07:00
Theodoros Kasampalis	fb5f18b2dc	Correctly updating landing pad exec counts. (cherry picked from FBD28110316)	2016-05-23 16:16:25 -07:00
Maksim Panchenko	43bc4a09ad	Changed splitting options and fixed sorting. Summary: Splitting option now has different meanings/values. Since landing pads are mostly always cold/frozen, we should split them before anything else (we still check the execution count is 0). That's value '1'. Everything else goes on top of that and has increased value (2 - large functions, 3 - everything). Sorting was non-deterministic and somewhat broken for functions with EH ranges. Fixed that and added '-split-all-cold' option to outline all 0-count blocks. Fixed compilation of test cases. After my last commit the binaries were linked to wrong source files (i.e. debug info). Had to rebuild the binaries from updated sources. (cherry picked from FBD3209369)	2016-04-20 15:31:11 -07:00
Maksim Panchenko	4f44d60947	Special handling for GNU_args_size call frame instruction. Summary: GNU_args_size is a special kind of CFI that tells runtime to adjust %rsp when control is passed to a landing pad. It is used for annotating call instructions that pass (extra) parameters on the stack and there's a corresponding landing pad. It is also special in a way that its value is not handled by DW_CFA_remember_state/DW_CFA_restore_state instruction sequence that we utilize to restore the state after block re-ordering. This diff adds association of call instructions with GNU_args_size value when it's used. If the function does not use GNU_args_size, there is no overhead. Otherwise, we regenerate GNU_args_size instruction during code emission, i.e. after all optimizations and block-reordering. (cherry picked from FBD3201322)	2016-04-19 22:00:29 -07:00
Gabriel Poesia	ad344c4387	Group debugging info representation and serialization code. Summary: Moved the classes related to representing and serializing DWARF entities into a single header, DebugData.h. (cherry picked from FBD3153279)	2016-04-07 15:06:43 -07:00
Gabriel Poesia	ffa9641e16	Update DWARF lexical blocks address ranges. Summary: Updates DWARF lexical blocks address ranges in the output binary after optimizations. This is similar to updating function address ranges except that the ranges representation needs to be more general, since address ranges can begin or end in the middle of a basic block. The following changes were made: - Added a data structure for iterating over the basic blocks that intersect an address range: BasicBlockTable.h - Added some more bookkeeping in BinaryBasicBlock. Basically, I needed to keep track of the block's size in the input binary as well as its address in the output binary. This information is mostly set by BinaryFunction after disassembly. - Added a representation for address ranges relative to basic blocks (BasicBlockOffsetRanges.h). Will also serve for location lists. - Added a representation for Lexical Blocks (LexicalBlock.h) - Small refactorings in DebugArangesWriter: -- Renamed to DebugRangesSectionsWriter since it also writes .debug_ranges -- Refactored it not to depend on BinaryFunction but instead on anything that can be assined an aoffset in .debug_ranges (added an interface for that) - Iterate over the DIE tree during initialization to find lexical blocks in .debug_info (BinaryContext.cpp) - Added patches to .debug_abbrev and .debug_info in RewriteInstance to update lexical blocks attributes (in fact, this part is very similar to what was done to function address ranges and I just refactored/reused that code) - Added small test case (lexical_blocks_address_ranges_debug.test) (cherry picked from FBD3113181)	2016-03-28 17:45:22 -07:00
Maksim Panchenko	595d0885d9	Populate function execution count while parsing fdata. Summary: Populate function execution count while parsing fdata. Before we used a quadratic algorithm to populate the execution count (had to iterate over all branches for every single function). Ignore non-symbol to non-symbol branches while parsing fdata. These changes combined drop HHVM processing time from 4 minutes 53 seconds down to 2 minutes 9 seconds on my devserver. Test case had to be modified since it contained irrelevant branches from PLT to libc. (cherry picked from FBD3106263)	2016-03-28 11:06:28 -07:00
Gabriel Poesia	dc7cc1fb18	Fix default line number information for instructions. Summary: The line number information generated from a null pointer was actually valid, which caused new instructions without the line number information set to have a valid and wrong line number reference. This diff fixes this by making the null pointer be assigned to an invalid line number row. (cherry picked from FBD3048453)	2016-03-14 11:40:52 -07:00
Gabriel Poesia	77a6b72842	BOLT: Read and tie .debug_line info to IR. Summary: Reads information in the DWARF .debug_line section using LLVM and tie every MCInst to one line of a line table from the input binary. Subsequent diffs will update this information to match the final binary layout and output updated line tables. (cherry picked from FBD2989813)	2016-02-25 16:57:07 -08:00
Maksim Panchenko	7f7d4af7e0	Add an option to use PT_GNU_STACK for new segment. Summary: Added an option to reuse existing program header entry. This option allows for bfd tools like strip and objcopy to operate on the optimized binary without destroying it. Also, all new sections are now properly marked in ELF. (cherry picked from FBD2943339)	2016-02-12 19:01:53 -08:00
Maksim Panchenko	d1526083fc	Rename binary optimizer to BOLT. Summary: BOLT - Binary Optimization and Layout Tool replaces FLO. I'm keeping .fdata extension for "feedback data". (cherry picked from FBD2908028)	2016-02-05 14:42:04 -08:00
Maksim Panchenko	628d06b1e5	Preserve layout of basic blocks with 0 profile counts. Summary: Preserve original layout for basic blocks that have 0 execution count. Since we don't optimize for size, it's better to rely on the original input order. (cherry picked from FBD2875335)	2016-01-21 14:18:30 -08:00
Maksim Panchenko	218c5f0916	Fix a bug with outlining first basic block. Summary: We should never outline the first basic block. Also add an option to accept a file with the list of functions to optimize. (cherry picked from FBD2868184)	2016-01-26 16:03:58 -08:00
Maksim Panchenko	89578e2314	Allow to partially split functions with exceptions. Summary: We could split functions with exceptions even without creating a new exception handling table. This limits us to only move basic blocks that never throw, and are not a start of a landing pad. (cherry picked from FBD2862937)	2016-01-22 16:45:39 -08:00
Maksim Panchenko	bbb745efa9	Don't create empty basic blocks. Fix CFI bug. Summary: Some basic blocks were created empty because they only contained alignment nop's. Ignore such nop's before basic block gets created. Fixed intermittent aborts related to CFI update. (cherry picked from FBD2844465)	2016-01-19 00:20:06 -08:00
Maksim Panchenko	4a44d187c6	Handle more CFI cases and some. Summary: * Update CFI state for larger range of functions to increase coverage. * Issue more warnings indicating reasons for skipping functions. * Print top called functions in the binary. (cherry picked from FBD2839734)	2016-01-16 14:58:22 -08:00
Maksim Panchenko	d9536e6092	Added an option to reverse original basic blocks order. Summary: Modified processing of "-reorder-blocks=" option and added an option to reverse original basic blocks order for testing purposes. (cherry picked from FBD2829862)	2016-01-13 17:19:40 -08:00
Maksim Panchenko	c9b7e3e09e	Write updated LSDA's. Summary: Write new exception ranges tables (LSDA's) into the output file. (cherry picked from FBD2828312)	2015-12-18 17:00:46 -08:00
Maksim Panchenko	e2fcb371a8	Ignore functions referencing symbol at 0x0. Summary: Binary code could be weird. It could include calls to address 0 and reference data at 0 (e.g. with lea on x86). LLVM JIT fatals while resolving relocations against symbols at address 0x0. For now we will stop emitting such code, i.e. we'll skip functions. (cherry picked from FBD28109837)	2015-12-16 17:56:49 -08:00
Maksim Panchenko	f7d7a85a24	Turn EH ranges support back on. Summary: Changed the way EH info is stored/extracted from call instruction. Make sure indirect calls work. (cherry picked from FBD28109629)	2015-12-15 17:06:27 -08:00
Rafael Auler	fb6e8c5d0b	Don't touch functions whose internal BBs are targets of interprocedural branches Summary: In a test binary, we found 8 cases where code in a function A would jump to the middle of another function B. In this case, we cannot reorder function B because this would change instruction offsets and break the program. This is pretty rare but can happen in code written in assembly. (cherry picked from FBD2719850)	2015-12-03 13:29:52 -08:00
Rafael Auler	ccbbb8f8b9	Teach llvm-flo how to split functions into hot and cold regions Summary: After basic block reordering, it may be possible that the reordered function is now larger than the original because of the following reasons: - jump offsets may change, forcing some jump instructions to use 4-byte immediate operand instead of the 1-byte, shorter version. - fall-throughs change, forcing us to emit an extra jump instruction to jump to the original fall-through at the end of a basic block. Since we currently do not change function addresses, we need to rewrite the function back in the binary in the original location. If it doesn't fit, we were dropping the function. This patch adds a flag -split-functions that tells llvm-flo to split hot functions into hot and cold separate regions. The hot region is written back in the original function location, while the cold region is written in a separate, far-away region reserved to flo via a linker script. This patch also adds the logic to create and extra FDE to supply unwinding information to the cold part of the function. Owing to this, we now need to rewrite .eh_frame_hdr to another location and patch the EH_FRAME ELF segment to point to this new .eh_frame_hdr. (cherry picked from FBD2677996)	2015-11-19 17:59:41 -08:00
Rafael Auler	38dac03e6b	Make llvm-flo print dynamic coverage of rewritten functions Summary: This is an attempt at determining the hotness of functions we are rewriting and help detect if we are discarding hot functions. This patch introduces logic to estimate the number of instructions executed in each function by using the profile data for branches. It sums the products of BB frequency and size. Since we can only do this for functions we have successfully disassembled, created the CFG and annotated with profiling data, all complex functions that were not disassembled are left out from this analysis. (cherry picked from FBD2654985)	2015-11-13 15:27:59 -08:00
Rafael Auler	75798a891b	Do not bail on functions with indirect calls Summary: Previously, we were marking functions with indirect calls as too complex to be disassembled, but this was unnecessarily conservative. This patch removes this restriction. (cherry picked from FBD2669627)	2015-11-02 09:46:50 -08:00
Rafael Auler	6c851dc2e3	Attempts to fix CFI state after reordering Summary: This patch introduces logic to check how the CFI instructions define a table to help during stack unwinding at exception run time and attempts to fix any problem in this table that may have been introduced by reordering the basic blocks. If it fails to fix this problem, the function is marked as not simple and not eligible for rewriting. (cherry picked from FBD2633696)	2015-11-08 12:23:54 -08:00
Maksim Panchenko	bc9d6e3b6c	Regenerate exception handling information after optimizations. Summary: Regenerate exception handling information after optimizations. Use '-print-eh-ranges' to see CFG with updated ranges. (cherry picked from FBD2660982)	2015-11-13 14:18:45 -08:00
Maksim Panchenko	be2a19523c	Add exception handling information to CFG. Summary: Read .gcc_except_table and add information to CFG. Calls have extra operands indicating there's a possible handler for exceptions and an action. Landing pad information is recorded in BinaryFunction. Also convert JMP instructions that are calls into tail calls pseudo instructions so that they don't miss call instruction analysis. (cherry picked from FBD2652775)	2015-11-12 18:56:58 -08:00
Rafael Auler	a30d04c3e2	Annotate BinaryFunctions with MCCFIInstructions encoding CFI Summary: In order to represent CFI information in our BinaryFunction class, this patch adds a map of Offsets to CFI instructions. In this way, we make it easy to check exactly where DWARF CFI information is annotated in the disassembled function. (cherry picked from FBD2619216)	2015-11-04 16:48:47 -08:00
Rafael Auler	0e8998713c	Extract non-taken branch frequencies from LBR Summary: Previously, we inferred all non-taken branch frequencies with the information we had for taken branches. This patch teaches perf2flo and llvm-flo how to read and incorporate non-taken branch frequencies directly from the traces available in LBR data and by disassembling the binary. It still leaves the inference engine untouched in case we need it to fill out other fall-throughs. (cherry picked from FBD2589212)	2015-10-26 15:00:56 -07:00
Rafael Auler	13a520ab30	Implement two cluster layout heuristics Summary: Pettis' paper on block layout (PLDI'90) suggests we should order clusters (or chains, using the paper terminology) using a specific criterion. This patch implements two distinct ideas for cluster layout that can be activated using different command-line flags. The first one reflects Pettis' ideas on minimizing branch mispredictions and the second one is targeted at reducing I-cache misses, described in the Ispike paper (CGO'04). (cherry picked from FBD2588693)	2015-10-23 09:38:26 -07:00
Rafael Auler	2539539bde	Fixes priority queue ordering in llvm-flo block reordering Summary: Fixes a bug which caused the block reordering heuristic to put in the same cluster hot basic blocks and cold basic blocks, increasing I-cache misses. (cherry picked from FBD2588203)	2015-10-27 03:04:58 -07:00
Maksim Panchenko	d4d773458c	More control over function printing. Summary: Can use '-print-*' option to print function at specific stage. Use '-print-all' to print at every stage. (cherry picked from FBD2578196)	2015-10-23 15:52:59 -07:00
Maksim Panchenko	7f44331773	Issue warning when relaxed tail call is seen on input. Summary: Issue warning when we see a 2-byte tail call. Currently we will increase the size of these instructions. (cherry picked from FBD2575520)	2015-10-20 10:51:17 -07:00
Rafael Auler	546c4e6e84	Fix bug in BinaryFunction::fixBranches() in llvm-flo Summary: When the ignore-nops patch landed, it exposed a bug in fixBranches() where it ignored empty BBs. However, we cannot ignore empty BBs when it is reordered and its fall-through changes. We must update it with a jump to the original fall-through. This patch fixes this. (cherry picked from FBD2568244)	2015-10-21 16:25:16 -07:00
Rafael Auler	dc848b5376	Fix entry BB execution count in llvm-flo Summary: When we have tailcalls, the execution count for the entry point is wrongly computed. Fix this. (cherry picked from FBD2563112)	2015-10-20 16:48:54 -07:00
Rafael Auler	ab63ca9afb	Implement unreachable BB elimination in llvm-flo Summary: It is important to remove dead blocks to free up space in functions and allow us to reorder blocks or align branch targets with more freedom. This patch implements a simple algorithm to delete all basic blocks that are not reachable from the entry point. Note that C++ exceptions may create "unreachable" blocks, so this option must be used with care. (cherry picked from FBD2562637)	2015-10-20 12:47:37 -07:00
Rafael Auler	9f41a0d263	Do not schedule BBs before the entry point Summary: SPEC CPU2006 perlbench triggered a bug in our heuristic block reordering algorithm where a hot edge that targets the entry point (as in a recursive tail call) would make us try to allocate the call site before the function entry point. Since we don't update function addresses yet, moving the entry point will corrupt the program. This patch fixes this. (cherry picked from FBD2562528)	2015-10-20 12:30:22 -07:00
Rafael Auler	b0115a4536	Teach llvm-flo how to handle two back-to-back JMPs Summary: If we have two consecutive JMP instructions and no branches to the second one, the second one is dead code, but llvm-flo does not handle these cases properly and put two JMPs in the same BB. This patch fixes this, putting the extraneous JMP in a separate block, making it easy for us to detect it is dead code and remove it later in a separate step. (cherry picked from FBD2562465)	2015-10-20 10:17:38 -07:00
Maksim Panchenko	85b99eb7b7	Eliminate nop instruction in input and derive alignment. Summary: Nop instructions are primarily used for alignment purposes on the input. We remove all nops when we build CFG and derive alignment of basic blocks based on existing alignment and a presence of nops before it. This will not always work as some basic blocks will be naturally aligned without necessity for nops. However, it's better than random alignment. We would also add heuristics for BB alignment based on execution profile. (cherry picked from FBD2561740)	2015-10-20 10:51:17 -07:00

1 2

62 Commits