intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-14 03:50:17 +08:00

Author	SHA1	Message	Date
Maksim Panchenko	49d1f5698d	[BOLT] PLT optimization Summary: Add an option to optimize PLT calls: -plt - optimize PLT calls (requires linking with -znow) =none - do not optimize PLT calls =hot - optimize executed (hot) PLT calls =all - optimize all PLT calls When optimized, the calls are converted to use GOT reference indirectly. GOT entries are guaranteed to contain a valid function pointer if lazy binding is disabled - hence the requirement for linker's -znow option. Note: we can add an entry to .dynamic and drop a requirement for -znow if we were moving .dynamic to a new segment. (cherry picked from FBD5579789)	2017-08-04 11:21:05 -07:00
Maksim Panchenko	0c07445110	[BOLT] Fix printing of dyno-stats Summary: We used to print dyno-stats after instruction lowering which was skewing our metrics as tail calls were no longer recognized as calls for one thing. The fix is to control the point at which dyno-stats printing pass is run and run it immediately before instruction lowering. In the future we may decide to run the pass before some other intervening pass. (cherry picked from FBD5605639)	2017-08-10 13:18:44 -07:00
Rafael Auler	21c48f7d78	Fix profiling for functions with multiple entry points Summary: Fix issue in memcpy where one of its entry points was getting no profiling data and was wrongly considered cold, being put in the cold region. (cherry picked from FBD5569156)	2017-08-02 18:14:01 -07:00
Maksim Panchenko	e4290d083f	[BOLT] Disable last basic block assertion. Summary: While converting code from __builtin_unreachable() we were asserting that a basic block with a conditional jump and a single CFG successor was the last one before converting the jump to an unconditional one. However, if that code was executed after a conditional tail call conversion in the same function, the original last basic block will no longer be the last one in the post-conversion layout. I'm disabling the assertion since it doesn't seem worth it to add extra checks for the basic block that used to be the last one. (cherry picked from FBD5570298)	2017-08-04 19:39:45 -07:00
Maksim Panchenko	ae409f0b27	[BOLT] Better match LTO functions profile. Summary: * Improve profile matching for LTO binaries that don't match 100%. * Fix profile matching for '.LTHUNK' functions. Add external outgoing branches (calls) for profile validation. There's an improvement for 100% match profile and for stale LTO profile. However, we are still not fully closing the gap with stale profile when LTO is enabled. (NOTE: I haven't updated all test cases yet) (cherry picked from FBD5529293)	2017-07-17 11:22:22 -07:00
Bohan Ren	eb63a0b295	[BOLT] Expand BOLT report for basic block ordering Summary: Add a new positional option onto bolt: "-print-function-statistics=<uint64>" which prints information about block ordering for requested number of functions. (cherry picked from FBD5105323)	2017-05-22 11:04:01 -07:00
Rafael Auler	583790ee22	Fix dynostats for conditional tail calls Summary: Don't treat conditional tail calls as branches for dynostats. Count taken conditional tails calls as calls. Change SCTC to report dynamic numbers after it is done. (cherry picked from FBD5203708)	2017-06-07 14:20:39 -07:00
Rafael Auler	d850ca3622	[BOLT] Add shrink wrapping pass Summary: Add an implementation for shrink wrapping, a frame optimization that moves callee-saved register spills from hot prologues to cold successors. (cherry picked from FBD4983706)	2017-05-01 16:52:54 -07:00
Maksim Panchenko	6c32079d57	[BOLT] Update addresses for DW_TAG_GNU_call_site and DW_TAG_label. Summary: Some DWARF tags (such as GNU_call_site and label) reference instruction addresses in the input binary. When we update debug info we need to update these tags too with new addresses. Also fix base address used for calculation of output addresses in relocation mode. (cherry picked from FBD5155814)	2017-05-31 09:36:49 -07:00
Maksim Panchenko	2e744e6867	[BOLT] Emit sorted DWARF ranges and location lists. Summary: When producing address ranges and location lists for debug info add a post-processing step that sorts them and merges adjacent entries. Fix a memory allocation/free issue for .debug_ranges section. (cherry picked from FBD5130583)	2017-05-24 15:20:27 -07:00
Maksim Panchenko	2428567f7d	[BOLT] Fix no-assertions build. (cherry picked from FBD5130285)	2017-05-25 10:29:38 -07:00
Rafael Auler	2ee4bbd3c1	[BOLT] Optimize jump tables with hot entries Summary: This diff is similar to Bill's diff for optimizing jump tables (and is built on top of it), but it differs in the strategy used to optimize the jump table. The previous approach loads the target address from the jump table and compare it to check if it is a hot target. This accomplishes branch misprediction reduction by promote the indirect jmp to a (more predictable) direct jmp. load %r10, JMPTABLE cmp %r10, HOTTARGET je HOTTARGET ijmp [JMPTABLE + %index * scale] The idea in this diff is instead to make dcache better by avoiding the load of the jump table, leaving branch mispredictions as a secondary target. To do this we compare the index used in the indirect jmp and if it matches a known hot entry, it performs a direct jump to the target. cmp %index, HOTINDEX je CORRESPONDING_TARGET ijmp [JMPTABLE + %index * scale] The downside of this approach is that we may have multiple indices associated with a single target, but we only have profiling to show which targets are hot and we have no clue about which indices are hot. INDEX TARGET 0 4004f8 8 4004f8 10 4003d0 18 4004f8 Profiling data: TARGET COUNT 4004f8 10020 4003d0 17 In this example, we know 4004f8 is hot, but to make a direct call to it we need to check for indices 0, 8 and 18 -- 3 comparisons instead of 1. Therefore, once we know a target is hot, we must generate code to compare against all possible indices associated with this target because we don't know which index is the hot one (IF there's a hotter index). cmp %index, 0 je 4004f8 cmp %index, 8 je 4004f8 cmp %index, 18 je 4004f8 (... up to N comparisons as in --indirect-call-promotion-topn=N ) ijmp [JMPTABLE + %index * scale] (cherry picked from FBD5005620)	2017-05-01 14:04:40 -07:00
Bill Nell	3a3bcd767e	Don't add useless uncond branch to fallthroughs when running SCTC. Summary: SCTC was sometimes adding unconditional branches to fallthrough blocks. This diff checks to see if the unconditional branch is really necessary, e.g. it's not to a fallthrough block. (cherry picked from FBD5098493)	2017-05-19 14:45:46 -07:00
Maksim Panchenko	96adec51eb	[BOLT] Rework debug info processing. Summary: Multiple improvements to debug info handling: * Add support for relocation mode. * Speed-up processing. * Reduce memory consumption. * Bug fixes. The high-level idea behind the new debug handling is that we don't save intermediate state for ranges and location lists. Instead we depend on function and basic block address transformations to update the info as a final post-processing step. For HHVM in non-relocation mode the peak memory went down from 55GB to 35GB. Processing time went from over 6 minutes to under 5 minutes. (cherry picked from FBD5113431)	2017-05-16 09:27:34 -07:00
Bill Nell	4806b13835	[BOLT] Add jump table support to ICP Summary: Add jump table support to ICP. The optimization is basically the same as ICP for tail calls. The big difference is that the profiling data comes from the jump table and the targets are local symbols rather than global. I've removed an instruction from ICP for tail calls. The code used to have a conditional jump to a block with a direct jump to the target, i.e. B1: cmp foo,(%rax) jne B3 B2: jmp foo B3: ... this code is now: B1: cmp foo,(%rax) je foo B2: ... The other changes in this diff: - Move ICP + new jump table support to separate file in Passes. - Improve the CFG validation to handle jump tables. - Fix the double jump peephole so that the successor of the modified block is updated properly. Also make sure that any existing branches in the block are modified to properly reflect the new CFG. - Add an invocation of the double jump peephole to SCTC. This allows us to remove a call to peepholes/UCE occurring after fixBranches() in the pass manager. - Miscellaneous cleanups to BOLT output. (cherry picked from FBD4727757)	2017-03-08 19:58:33 -08:00
Maksim Panchenko	3f42fdf7da	[BOLT] Update function address and size in relocation mode. Summary: Set function addresses after code emission but before we update debug info and symbol table entries. (cherry picked from FBD5029609)	2017-05-08 22:51:36 -07:00
Maksim Panchenko	13c89e6ef1	[BOLT] Fix branch data for __builtin_unreachable(). Summary: When we have a conditional branch past the end of function (a result of a call to__builtin_unreachable()), we replace the branch with nop, but keep branch information for validation purposes. If that branch has a recorded profile we mistakenly create an additional successor to a containing basic block (a 3rd successor). Instead of adding the branch to FTBranches list we should be adding to IgnoredBranches. (cherry picked from FBD4912840)	2017-04-18 23:32:11 -07:00
Bill Nell	c7cccacc4f	[BOLT] Enable SCTC by default. (cherry picked from FBD4837849)	2017-04-05 13:23:58 -07:00
Bill Nell	6c5c65e3a3	[BOLT] Fix double jump peephole, remove useless conditional branches. Summary: I split some of this out from the jumptable diff since it fixes the double jump peephole. I've changed the pass manager so that UCE and peepholes are not called after SCTC. I've incorporated a call to the double jump fixer to SCTC since it is needed to fix things up afterwards. While working on fixing the double jump peephole I discovered a few useless conditional branches that could be removed as well. I highly doubt that removing them will improve perf at all but it does seem odd to leave in useless conditional branches. There are also some minor logging improvements. (cherry picked from FBD4751875)	2017-03-20 22:44:25 -07:00
Maksim Panchenko	c166a8c1a7	[BOLT] Fix debug info update for inlining. Summary: When inlining, if a callee has debug info and a caller does not (i.e. a containing compilation unit was compiled without "-g"), we try to update a nonexistent compilation unit. Instead we should skip updating debug info in such cases. Minor refactoring of line number emitting code. (cherry picked from FBD4823982)	2017-04-03 16:24:26 -07:00
Maksim Panchenko	0bde796e50	[BOLT] Organize options in categories for pretty printing (near NFC). Summary: Each BOLT-specific option now belongs to BoltCategory or BoltOptCategory. Use alphabetical order for options in source code (does not affect output). The result is a cleaner output of "llvm-bolt -help" which does not include any unrelated llvm options and is close to the following: ..... BOLT generic options: -data=<string> - <data file> -dyno-stats - print execution info based on profile -hot-text - hot text symbols support (relocation mode) -o=<string> - <output file> -relocs - relocation mode - use relocations to move functions in the binary -update-debug-sections - update DWARF debug sections of the executable -use-gnu-stack - use GNU_STACK program header for new segment (workaround for issues with strip/objcopy) -use-old-text - re-use space in old .text if possible (relocation mode) -v=<uint> - set verbosity level for diagnostic output BOLT optimization options: -align-blocks - try to align BBs inserting nops -align-functions=<uint> - align functions at a given value (relocation mode) -align-functions-max-bytes=<uint> - maximum number of bytes to use to align functions -boost-macroops - try to boost macro-op fusions by avoiding the cache-line boundary -eliminate-unreachable - eliminate unreachable code -frame-opt - optimize stack frame accesses ...... (cherry picked from FBD4793684)	2017-03-28 14:40:20 -07:00
Rafael Auler	ad81bd6779	Change dynostats dynamic instruction count policy Summary: Also add LOAD/STORE counters. (cherry picked from FBD4732284)	2017-03-17 10:32:56 -07:00
Maksim Panchenko	e6f96de4d0	[BOLT] Add option to print only specific functions. Summary: Add option '-print-only=func1,func2,...' to print only functions of interest. The rest of the functions are still processed and optimized (e.g. inlined), but only the ones on the list are printed. (cherry picked from FBD4734610)	2017-03-17 19:05:11 -07:00
Maksim Panchenko	559a57a181	[BOLT] Improve dynostats output. Summary: Reduce verbosity of dynostats to make them more readable. * Don't print "before" dynostats twice. * Detect if dynostats have changed after optimization and print before/after only if at least one metric have changed. Otherwise just print dynostats once and indicate "no change". * If any given metric hasn't changed, then print the difference as "(=)" as opposed to (+0.0%). (cherry picked from FBD4705920)	2017-03-14 09:03:23 -07:00
Maksim Panchenko	351af0c895	[BOLT] Do not process empty functions. Summary: While running on a recent test binary BOLT failed with an error. We were trying to process '__hot_end' (which is not really a function), and asserted that it had no basic blocks. This diff marks functions with empty basic blocks list as non-simple since there's no need to process them. (cherry picked from FBD4696517)	2017-03-12 11:30:05 -07:00
Bill Nell	fed0980139	[BOLT] Update tests Summary: Fix validateCFG to handle BBs that were generated from code that used _builtin_unreachable(). Add -verify-cfg option to run CFG validation after every optimization pass. (cherry picked from FBD4641174)	2017-02-27 21:44:38 -08:00
Maksim Panchenko	0acba2bcf0	[BOLT] Detect unmarked data in text. Summary: Sometimes a code written in assembly will have unmarked data (such as constants) embedded into text. Typically such data falls into a "padding" address space of a function. This diffs detects such references, and adjusts the padding space to prevent overwriting of code in data. Note that in relocation mode we prefer to overwrite the original code (-use-old-text) and thus cannot simply ignore data in text. (cherry picked from FBD4662780)	2017-02-21 14:18:09 -08:00
Maksim Panchenko	f241e252fc	[BOLT] Detect and handle __builtin_unreachable(). Summary: Calls to __builtin_unreachable() can result in a inconsistent CFG. It was possible for basic block to end with a conditional branche and have a single successor. Or there could exist non-terminated basic block without successors. We also often treated conditional jumps with destination past the end of a function as conditional tail calls. This can be prevented reliably at least when the byte past the end of the function does not belong to the next function. This diff includes several changes: * At disassembly stage jumps past the end of a function are converted into 'nops'. This is done only for cases when we can guarantee that the jump is not a tail call. Conversion to nop is required since the instruction could be referenced either by exception handling tables and/or debug info. Nops are later removed. * In CFG insert 'ret' into non-terminated basic blocks without successors (this almost never happens). * Conditional jumps at the end of the function are removed from CFG. The block will still have a single successor. * Cases where a destination of a jump instruction is the start of the next function, are still conservatively handled as (conditional) tail calls. (cherry picked from FBD4655046)	2017-03-03 11:35:41 -08:00
Maksim Panchenko	6dc2351505	[BOLT] New CFI handling policy. Summary: The new interface for handling Call Frame Information: * CFI state at any point in a function (in CFG state) is defined by CFI state at basic block entry and CFI instructions inside the block. The state is independent of basic blocks layout order (this is implied by CFG state but wasn't always true in the past). * Use BinaryBasicBlock::getCFIStateAtInstr(const MCInst Inst) to get CFI state at any given instruction in the program. No need to call fixCFIState() after any given pass. fixCFIState() is called only once during function finalization, and any function transformations after that point are prohibited. * When introducing new basic blocks, make sure CFI state at entry is set correctly and matches CFI instructions in the basic block (if any). * When splitting basic blocks, use getCFIStateAtInstr() to get a state at the split point, and set the new basic block's CFI state to this value. Introduce CFG_Finalized state to indicate that no further optimizations are allowed on the function. This state is reached after we have synced CFI instructions and updated EH info. Rename "-print-after-fixup" option to "-print-finalized". This diffs fixes CFI for cases when we split conditional tail calls, and for indirect call promotion optimization. (cherry picked from FBD4629307)	2017-02-24 21:59:33 -08:00
Maksim Panchenko	2029458f34	[BOLT] Strip 'repz' prefix from 'repz retq'. Summary: Add pass to strip 'repz' prefix from 'repz retq' sequence. The prefix is not used in Intel CPUs afaik. The pass is on by default. (cherry picked from FBD4610329)	2017-02-23 18:09:10 -08:00
Maksim Panchenko	d3e33b6edc	[BOLT] Fix -jump-tables=basic in relocation mode. Summary: In a prev diff I added an option to update jump tables in-place (on by default) and accidentally broke the default handling of jump tables in relocation mode. The update should be happening semi-automatically, but because we ignore relocations for jump tables it wasn't happening (derp). Since we mostly use '-jump-tables=move' this hasn't been noticed for some time. This diff gets rid of IgnoredRelocations and removes relocations from a relocation set when they are no longer needed. If relocations are created later for jump tables they are no longer ignored. (cherry picked from FBD4595159)	2017-02-21 16:15:15 -08:00
Maksim Panchenko	88244a10bb	[BOLT] Move BOLT passes under Passes subdirectory (NFC). Summary: Move passes under Passes subdirectory. Move inlining passes under Passes/Inliner.* (cherry picked from FBD4575832)	2017-02-16 14:57:57 -08:00
Maksim Panchenko	f06a1455ea	[BOLT] Add support for *GOTPCRELX relocation type. Summary: gcc5 can generate new types of relocations that give linker a freedom to substitute instructions. These relocations are PC-relative, and since we manually process such relocations they don't present much of a problem. Additionally, detect non-pc-relative access from code into a middle of a function. Occasionally I've seen such code, but don't know exactly how to trigger its generation. Just issue a warning for now. (cherry picked from FBD4566473)	2017-02-14 22:55:10 -08:00
Maksim Panchenko	734a7a5437	[BOLT] Skip disassembly of padding at function end. Summary: Some functions coming from assembly may not have been marked with size. We assume the size to include all bytes up to the next function/object in the file. As a result, function body will include any padding inserted by the linker. If linker inserts 0-value bytes this could be misinterpreted as invalid instruction and BOLT will bail out on such functions in non-relocation mode, and give up on a binary in relocation mode. This diff detects zero-padding, ignores it, and continues processing as normal. (cherry picked from FBD4528893)	2017-02-08 09:14:10 -08:00
Bill Nell	d74997c3cc	Indirect call promotion optimization. Summary: Perform indirect call promotion optimization in BOLT. The code scans the instructions during CFG creation for all indirect calls. Right now indirect tail calls are not handled since the functions are marked not simple. The offsets of the indirect calls are stored for later use by the ICP pass. The indirect call promotion pass visits each indirect call and examines the BranchData for each. If the most frequent targets from that callsite exceed the specified threshold (default 90%), the call is promoted. Otherwise, it is ignored. By default, only one target is considered at each callsite. When an candiate callsite is processed, we modify the callsite to test for the most common call targets before calling through the original generic call mechanism. The CFG and layout are modified by ICP. A few new command line options have been added: -indirect-call-promotion -indirect-call-promotion-threshold=<percentage> -indirect-call-promotion-topn=<int> The threshold is the minimum frequency of a call target needed before ICP is triggered. The topn option controls the number of targets to consider for each callsite, e.g. ICP is triggered if topn=2 and the total requency of the top two call targets exceeds the threshold. Example of ICP: C++ code: int B_count = 0; int C_count = 0; struct A { virtual void foo() = 0; } struct B : public A { virtual void foo() { ++B_count; }; }; struct C : public A { virtual void foo() { ++C_count; }; }; A* a = ... a->foo(); ... original: 400863: 49 8b 07 mov (%r15),%rax 400866: 4c 89 ff mov %r15,%rdi 400869: ff 10 callq (%rax) 40086b: 41 83 e6 01 and $0x1,%r14d 40086f: 4d 89 e6 mov %r12,%r14 400872: 4c 0f 44 f5 cmove %rbp,%r14 400876: 4c 89 f7 mov %r14,%rdi ... after ICP: 40085e: 49 8b 07 mov (%r15),%rax 400861: 4c 89 ff mov %r15,%rdi 400864: 49 ba e0 0b 40 00 00 movabs $0x400be0,%r10 40086b: 00 00 00 40086e: 4c 3b 10 cmp (%rax),%r10 400871: 75 29 jne 40089c <main+0x9c> 400873: 41 ff d2 callq %r10 400876: 41 83 e6 01 and $0x1,%r14d 40087a: 4d 89 e6 mov %r12,%r14 40087d: 4c 0f 44 f5 cmove %rbp,%r14 400881: 4c 89 f7 mov %r14,%rdi ... 40089c: ff 10 callq *(%rax) 40089e: eb d6 jmp 400876 <main+0x76> (cherry picked from FBD3612218)	2016-09-07 18:59:23 -07:00
Maksim Panchenko	6ff1795d96	[BOLT] Support overwriting jump tables in-place. Summary: Add an option to overwrite jump tables without moving and make it a default: -jump-tables - jump tables support (default=basic) =none - do not optimize functions with jump tables =basic - optimize functions with jump tables =move - move jump tables to a separate section =split - split jump tables section into hot and cold based on function execution frequency =aggressive - aggressively split jump tables section based on usage of the tables (cherry picked from FBD4448499)	2017-01-17 15:49:59 -08:00
Maksim Panchenko	0894905373	[ICF] Don't re-fold functions in non-relocation mode. Summary: In-non relocation mode, when we run ICF the second time, we fold the same functions again since they were not removed from the function set. This diff marks them as folded and ignores them during ICF optimization. Note that we still want to optimize such functions since they are potentially called from the code not covered by BOLT in non-relocation mode. Folded functions are also excluded from dyno stats with this diff Also print the number of times folded functions were called. When 2 functions - f1() and f2() are folded, that number would be min(call_frequency(f1), call_frequency(f2)). (cherry picked from FBD4399993)	2017-01-10 11:20:56 -08:00
Maksim Panchenko	bc8a456309	ICF improvements. Summary: Re-worked the way ICF operates. The pass now checks for more than just call instructions, but also for all references including function pointers. Jump tables are handled too. (cherry picked from FBD4372491)	2016-12-21 17:13:56 -08:00
Maksim Panchenko	55fc5417f8	Relocations support for BOLT. Summary: Read relocation from linker and relocate all functions. (cherry picked from FBD4223901)	2016-09-27 19:09:38 -07:00
Bill Nell	3a3dfc3dc2	BOLT: Use profiling info to control branch simplification optimization. Summary: An optimization to simplify conditional tail calls by removing unnecessary branches. It adds the following two command line options: -simplify-conditional-tail-calls - simplify conditional tail calls by removing unnecessary jumps -sctc-mode - mode for simplify conditional tail calls =always - always perform sctc =preserve - only perform sctc when branch direction is preserved =heuristic - use branch prediction data to control sctc This optimization considers both of the following cases: foo: ... jcc L1 original ... L1: jmp bar # TAILJMP -> foo: ... jcc bar iff jcc L1 is expected ... L1 is unreachable OR foo: ... jcc L2 L1: jmp dest # TAILJMP L2: ... -> foo: jncc dest # TAILJMP L2: ... L1 is unreachable For this particular case, the first basic block ends with a conditional branch and has two successors, one fall-through and one for when the condition is true. The target of the conditional is a basic block with a single unconditional branch (i.e. tail call) to another function. We don't care about the contents of the fall-through block. (cherry picked from FBD3719617)	2016-09-22 18:08:20 -07:00
Rafael Auler	5cc9c58410	Avoid const_iterator on std::vector::emplace Summary: This is part of a series of clean-up patches to make bolt cleanly compile with clang 4.0. This patch fixes an error where clang will fail to compile because it does not support passing a const_iterator to std::vector<T>::emplace(Iter, ...). (cherry picked from FBD4242546)	2016-11-28 17:45:25 -08:00
Maksim Panchenko	a7fb610eba	Relocate old .eh_frame section next to the new one. Summary: In order to improve gdb experience with BOLT we have to make sure the output file has a single .eh_frame section. Otherwise gdb will use either old or new section for unwinding purposes. This diff relocates the original .eh_frame section next to the new one generated by LLVM. Later we merge two sections into one and make sure only the newly created section has .eh_frame name. (cherry picked from FBD4203943)	2016-11-11 14:33:34 -08:00
Maksim Panchenko	99dce7d05e	Disable processing of functions with EVEX-encoded instructions (AVX-512). Summary: AVX-512 disassembler support in LLVM is not quite ready yet. Before we feel more comfortable about it we disable processing of all functions that use any EVEX-encoded instructions. (cherry picked from FBD4028706)	2016-10-16 18:56:56 -07:00
Maksim Panchenko	e241e9c156	New function discovery and support for multiple entries. Summary: Modified function discovery process to tolerate more functions and symbols coming from assembly. The processing order now matches the memory order of the functions (input symbol table is unsorted). Added basic support for functions with multiple entries. When a function references its internal address other than with a branch instruction, that address could potentially escape. We mark such addresses as entry points and make sure they are treated as roots by unreachable code elimination. Without relocations we have to mark multiple-entry functions as non-simple. (cherry picked from FBD3950243)	2016-09-29 11:19:06 -07:00
Maksim Panchenko	9cf5d74ffb	Support for PIC-style jump tables. Summary: Added support for jump tables in code compiled with "-fpic". Code pattern generated for position-independent jump tables is quite different, as is the format of the tables. More details in comments. Coverage increased slightly for a test, mostly due to the code coming from external lib that was compiled with "-fpic". (cherry picked from FBD3940771)	2016-09-27 19:09:38 -07:00
Bill Nell	4a0c494bc1	BOLT: Remove restrictions on unreachable code elimination Summary: Allow UCE when blocks have EH info. Since UCE may remove blocks that are referenced from debugging info data structures, we don't actually delete them. We just mark them with an "invalid" index and store them in a different vector to be cleaned up later once the BinaryFunction is destroyed. The debugging code just skips any BBs that have an invalid index. Eliminating blocks may also expose useless jmp instructions, i.e. a jmp around a dead block could just be a fallthrough. I've added a new routine to cleanup these jmps. Although, @maks is working on changing fixBranches() so that it can be used instead. (cherry picked from FBD3793259)	2016-09-07 18:59:23 -07:00
Maksim Panchenko	4464861a02	Support for splitting jump tables. Summary: Add level for "-jump-tables=<n>" option: 1 - all jump tables are output in the same section (default). 2 - basic splitting, if the table is used it is output to hot section otherwise to cold one. 3 - aggressively split compound jump tables and collect profile for all entries. Option "-print-jump-tables" outputs all jump tables for debugging and/or analyzing purposes. Use with "-jump-tables=3" to get profile values for every entry in a jump table. (cherry picked from FBD3912119)	2016-09-16 15:54:32 -07:00
Bill Nell	2f1341b51d	BOLT: Refactoring BinaryFunction interface. Summary: Get rid of all uses of getIndex/getLayoutIndex/getOffset outside of BinaryFunction. Also made some other offset related methods private. (cherry picked from FBD3861968)	2016-09-13 20:32:12 -07:00
Bill Nell	510f227cbd	BOLT: Add feature to sort functions by dyno stats. Summary: Add -print-sorted-by and -print-sorted-by-order command line options. The first option takes a list of dyno stats keys used to sort functions that are printed at the end of all optimization passes. Only the top 100 functions are printed. The -print-sorted-by-order option can be either ascending or descending (descending is the default). (cherry picked from FBD3898818)	2016-09-20 20:55:49 -07:00
Maksim Panchenko	62bff426c3	Do no collect dyno stats on functions with stale profile. Summary: Dyno stats collected on functions with invalid profile may appear completely bogus. Skip them. (cherry picked from FBD3879371)	2016-09-16 13:13:16 -07:00

1 2 3

142 Commits