intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-16 21:55:39 +08:00

Author	SHA1	Message	Date
Bill Nell	d74997c3cc	Indirect call promotion optimization. Summary: Perform indirect call promotion optimization in BOLT. The code scans the instructions during CFG creation for all indirect calls. Right now indirect tail calls are not handled since the functions are marked not simple. The offsets of the indirect calls are stored for later use by the ICP pass. The indirect call promotion pass visits each indirect call and examines the BranchData for each. If the most frequent targets from that callsite exceed the specified threshold (default 90%), the call is promoted. Otherwise, it is ignored. By default, only one target is considered at each callsite. When an candiate callsite is processed, we modify the callsite to test for the most common call targets before calling through the original generic call mechanism. The CFG and layout are modified by ICP. A few new command line options have been added: -indirect-call-promotion -indirect-call-promotion-threshold=<percentage> -indirect-call-promotion-topn=<int> The threshold is the minimum frequency of a call target needed before ICP is triggered. The topn option controls the number of targets to consider for each callsite, e.g. ICP is triggered if topn=2 and the total requency of the top two call targets exceeds the threshold. Example of ICP: C++ code: int B_count = 0; int C_count = 0; struct A { virtual void foo() = 0; } struct B : public A { virtual void foo() { ++B_count; }; }; struct C : public A { virtual void foo() { ++C_count; }; }; A* a = ... a->foo(); ... original: 400863: 49 8b 07 mov (%r15),%rax 400866: 4c 89 ff mov %r15,%rdi 400869: ff 10 callq (%rax) 40086b: 41 83 e6 01 and $0x1,%r14d 40086f: 4d 89 e6 mov %r12,%r14 400872: 4c 0f 44 f5 cmove %rbp,%r14 400876: 4c 89 f7 mov %r14,%rdi ... after ICP: 40085e: 49 8b 07 mov (%r15),%rax 400861: 4c 89 ff mov %r15,%rdi 400864: 49 ba e0 0b 40 00 00 movabs $0x400be0,%r10 40086b: 00 00 00 40086e: 4c 3b 10 cmp (%rax),%r10 400871: 75 29 jne 40089c <main+0x9c> 400873: 41 ff d2 callq %r10 400876: 41 83 e6 01 and $0x1,%r14d 40087a: 4d 89 e6 mov %r12,%r14 40087d: 4c 0f 44 f5 cmove %rbp,%r14 400881: 4c 89 f7 mov %r14,%rdi ... 40089c: ff 10 callq *(%rax) 40089e: eb d6 jmp 400876 <main+0x76> (cherry picked from FBD3612218)	2016-09-07 18:59:23 -07:00
Maksim Panchenko	6ff1795d96	[BOLT] Support overwriting jump tables in-place. Summary: Add an option to overwrite jump tables without moving and make it a default: -jump-tables - jump tables support (default=basic) =none - do not optimize functions with jump tables =basic - optimize functions with jump tables =move - move jump tables to a separate section =split - split jump tables section into hot and cold based on function execution frequency =aggressive - aggressively split jump tables section based on usage of the tables (cherry picked from FBD4448499)	2017-01-17 15:49:59 -08:00
Maksim Panchenko	503c741d43	[BOLT] Report stale functions' percentage wrt all profiled functions. Summary: Report stale functions percentage with respect to all profiled functions instead of all simple functions in the binary. The new reporting format should make it more apparent if the profile is out-of-date. Compare: BOLT-INFO: 341 (16.7% of all profiled) functions have invalid (possibly stale) profile. vs old: BOLT-INFO: 341 (0.3%) functions have invalid (possibly stale) profile. (cherry picked from FBD4451746)	2017-01-23 13:08:40 -08:00
Maksim Panchenko	bc8a456309	ICF improvements. Summary: Re-worked the way ICF operates. The pass now checks for more than just call instructions, but also for all references including function pointers. Jump tables are handled too. (cherry picked from FBD4372491)	2016-12-21 17:13:56 -08:00
Maksim Panchenko	55fc5417f8	Relocations support for BOLT. Summary: Read relocation from linker and relocate all functions. (cherry picked from FBD4223901)	2016-09-27 19:09:38 -07:00
Rafael Auler	8609ad51e5	Detect default CFI frame instructions for the target Summary: Make BOLT resilient to changes in the LLVM's X86 target library by not hardwiring the list of default CIE instructions, but detecting it at run time. (cherry picked from FBD4200982)	2016-11-17 14:56:42 -08:00
Maksim Panchenko	a7fb610eba	Relocate old .eh_frame section next to the new one. Summary: In order to improve gdb experience with BOLT we have to make sure the output file has a single .eh_frame section. Otherwise gdb will use either old or new section for unwinding purposes. This diff relocates the original .eh_frame section next to the new one generated by LLVM. Later we merge two sections into one and make sure only the newly created section has .eh_frame name. (cherry picked from FBD4203943)	2016-11-11 14:33:34 -08:00
Maksim Panchenko	809c28f585	Generate .eh_frame_hdr based on contents of .eh_frame's. Summary: We used to patch an existing .eh_frame_hdr and append contents for split functions at the end. However, this approach does not work in relocation mode since function addresses change and split functions will not necessarily be at the end. Instead of patching and appending we generate the new .eh_frame_hdr based on contents of old and new .eh_frame sections. (cherry picked from FBD4180756)	2016-11-14 16:39:55 -08:00
Maksim Panchenko	055dfe48e7	Another EH fix for cold fragments of functions that we fail to write. Summary: In a prev diff I disabled inclusion of FDEs for cold fragments that we fail to write. The side effect of it was that we failed to write FDE for the next function with a cold fragment since it had the same assigned address that we had put in FailedAddresses. The correct fix is to assign zero address to failed cold fragments and ignore them when we write .eh_frame_hdr. (cherry picked from FBD4156740)	2016-11-09 11:19:02 -08:00
Rafael Auler	355dbd769e	Fix DW_CFA_def_cfa CFI duping in output binary Summary: CFI instructions may live in CIEs or FDEs. CIEs hold common instructions used across many FDEs. When replaying CFIs to the output binary, llvm-bolt needs to replay both instructions from CIE and the corresponding FDE for the function. However, some instructions need not to be replayed because MCStreamer/MCDwarf and friends will write them by default in the output CIE. This patch fix the code that tried to recognize one of these default instructions but was failing, resulting in an extra CFI instruction in each FDE we outputted. With this patch, the output binary should be a bit smaller. (cherry picked from FBD4194753)	2016-11-16 17:47:31 -08:00
Rafael Auler	bc8cb088c0	Support DWARF expressions in CFI instructions Summary: Modify the MC layer (MCDwarf.h\|cpp) to understand CFI instructions dealing with DWARF expressions. Add code to emit DWARF expressions in MCDwarf. Change llvm-bolt to pass these CFI instructions to streamer instead of bailing on them. Change -dump-eh-frame option in llvm-bolt to dump the EH frame of the rewritten binary in addition to the one in the original binary, allowing us to proper test this patch. (cherry picked from FBD4194452)	2016-11-15 10:40:00 -08:00
Maksim Panchenko	0eb2559fee	Fix EH for cold fragments that we fail to write. Summary: When we fail to write functions that are too big, we have to effectively cancel their effect on exception handling by ignoring their FDE entries in .eh_frame while writing .eh_frame_hdr. This can happen to functions that we split too. In such cases the cold part has its own FDE and we have to ignore that one too. This doesn't happen very often - I've only seen one case on hhvm binary, however it is a potential issue. The fix is to add the cold part address to the list of failed-to-write addresses. (cherry picked from FBD3987984)	2016-10-07 09:34:16 -07:00
Maksim Panchenko	e241e9c156	New function discovery and support for multiple entries. Summary: Modified function discovery process to tolerate more functions and symbols coming from assembly. The processing order now matches the memory order of the functions (input symbol table is unsorted). Added basic support for functions with multiple entries. When a function references its internal address other than with a branch instruction, that address could potentially escape. We mark such addresses as entry points and make sure they are treated as roots by unreachable code elimination. Without relocations we have to mark multiple-entry functions as non-simple. (cherry picked from FBD3950243)	2016-09-29 11:19:06 -07:00
Maksim Panchenko	4464861a02	Support for splitting jump tables. Summary: Add level for "-jump-tables=<n>" option: 1 - all jump tables are output in the same section (default). 2 - basic splitting, if the table is used it is output to hot section otherwise to cold one. 3 - aggressively split compound jump tables and collect profile for all entries. Option "-print-jump-tables" outputs all jump tables for debugging and/or analyzing purposes. Use with "-jump-tables=3" to get profile values for every entry in a jump table. (cherry picked from FBD3912119)	2016-09-16 15:54:32 -07:00
Maksim Panchenko	c4e36c1dd6	Fix issue with zero-size duplicate function symbols. Summary: While working on PLT dyno stats I've noticed that we were missing BinaryFunctions for some symbols that were not PLT. Upon closer inspection turned out that those symbols were marked as zero-sized functions in symbol table, but they had duplicates with non-zero size. Since the zero-size symbols were preceding other duplicates, we were not creating BinaryFunction for them and they were not added as duplicates. The 2 most prominent functions that were missing for a test were free() and malloc(). There's not much to optimize in these functions, but they were contributing quite significantly to dyno stats. As a result dyno stats for this test needed an adjustment. Also several assembly functions (e.g. _init()) had zero size, and now we set the size to the max size and start processing those. It's good for coverage but will not affect the performance. (cherry picked from FBD3874622)	2016-09-15 15:47:10 -07:00
Maksim Panchenko	2f3a859772	Add experimental jump table support. Summary: Option "-jump-tables=1" enables experimental support for jump tables. The option hasn't been tested with optimizations other than block re-ordering. Only non-PIC jump tables are supported at the moment. (cherry picked from FBD3867849)	2016-09-14 16:45:40 -07:00
Bill Nell	71be567969	BOLT: Add per pass dyno stats + factor out post pass printing. Summary: I've added dyno stats printing per pass so we can see the results of each optimization pass on the stats. I've also factored out the post pass function printing code since it was pretty much the same after each pass. (cherry picked from FBD3843587)	2016-09-09 12:37:37 -07:00
Maksim Panchenko	c4c518ee9d	Rewrite SCTC pass to do UCE and make it the last optimization pass. Summary: For now we make SCTC a special pass that runs at the end of all optimizations and transformations right after fixupBranches(). Since it's the last pass, it has to do its own UCE. (cherry picked from FBD3838051)	2016-09-08 14:52:26 -07:00
Maksim Panchenko	6bef336cc2	Add dyno stats to BOLT. Summary: Add "-dyno-stats" option that prints instruction stats based on the execution profile similar to below: BOLT-INFO: program-wide dynostats after optimizations: executed forward branches : 109706407 (+8.1%) taken forward branches : 13769074 (-55.5%) executed backward branches : 24517582 (-25.0%) taken backward branches : 15330256 (-27.2%) executed unconditional branches : 6009826 (-35.5%) function calls : 17192114 (+0.0%) executed instructions : 837733057 (-0.4%) total branches : 140233815 (-2.3%) taken branches : 35109156 (-42.8%) Also fixed pseudo instruction discrepancies and added assertions for BinaryBasicBlock::getNumPseudos() to make sure the number is synchronized with real number of pseudo instructions. (cherry picked from FBD3826995)	2016-08-29 21:11:22 -07:00
Bill Nell	48b55300e0	BOLT: Make most command line options ZeroOrMore. Summary: This will make it easier to run experiments with the same baseline BOLT binary but different command line options. (cherry picked from FBD3831978)	2016-09-07 14:41:56 -07:00
Bill Nell	dcaffe64d3	Inlining fixes/enhancements Summary: A number of fixes/enhancements to inline-small-functions - Fixed size estimateHotSize to use computeCodeSize instead of the original layout offsets. - Added -print-inline option to dump CFGs for functions that have been modified by inlining. - Added flag to force consideration of functions without any profiling info (mostly for testing) - Updated debug line info for inlined functions. - Ignore the number of pseudo instructions when checking for candidates of suitable size. Misc changes - Moved most print flags to BinaryPasses.cpp (cherry picked from FBD3812658)	2016-09-02 11:58:53 -07:00
Bill Nell	c27a6a5c63	Add verbosity level and clean up stream usage. Summary: I've added a verbosity level to help keep the BOLT spewage to a minimum. The default level is pretty terse now, level 1 is closer to the original, I've saved level 2 for the noisiest of messages. Error messages should never be suppressed by the verbosity level only warnings and info messages. The rational behind stream usage is as follows: outs() for info and debugging controlled by command line flags. errs() for errors and warnings. dbgs() for output within DEBUG(). With the exception of a few of the level 2 messages I don't have any strong feelings about the others. (cherry picked from FBD3814259)	2016-09-02 14:15:29 -07:00
Maksim Panchenko	97f598fd17	Handling for indirect tail calls. Summary: Analyze indirect branches and convert them into indirect tail calls when possible. We analyze the memory contents when the address could be calculated statically and also detect epilogue code. (cherry picked from FBD3754395)	2016-08-22 14:24:09 -07:00
Maksim Panchenko	42c5894fe2	Write padding for .eh_frame_hdr to a file. Summary: We were applying padding to the calculated address but were never writing it to a file triggering an assertion for cases when .gcc_except_table size wasn't multiple of 4. (cherry picked from FBD3744638)	2016-08-19 13:54:35 -07:00
Bill Nell	c1d1c2e7cd	Check if operands are immediates before trying shortening. Summary: Operands in the initial instruction stream should all have immediate operands for instructions that can be shortened. But if a BOLT optimization pass adds one of these instructions with a symbolic operand, the shortening operation will assert. This diff adds checks to make sure that the operands are immediate. I've also disabled shortening pass by default since it won't really be needed until ICP is submitted. It will still run at CFG creation time. (cherry picked from FBD3610646)	2016-07-22 20:52:57 -07:00
Bill Nell	406aa62083	Add additional info to BOLT graphviz CFG dumps. Summary: Add the following info the graphviz CFG dump: - Edges are labeled with the jmp instruction that leads to that edge. - Edges include the count and misprediction count. - Nodes have (offset, BB index, BB layout index) - Nodes optionally have tooltips which contain the code of the basic block. (enabled with -dot-tooltip-code) - Added dashed edges to landing pads. (cherry picked from FBD3646568)	2016-07-29 19:18:37 -07:00
Maksim Panchenko	003d106c0b	More refactoring work. Summary: Avoid referring to BinaryFunction's by name. Functions could be found by MCSymbol using BinaryContext::getFunctionForSymbol(). (cherry picked from FBD3707685)	2016-08-11 14:23:54 -07:00
Maksim Panchenko	36df6057b0	Refactoring. Mainly NFC. Summary: Eliminated BinaryFunction::getName(). The function was confusing since the name is ambigous. Instead we have BinaryFunction::getPrintName() used for printing and whenever unique string identifier is needed one can use getSymbol()->getName(). In the next diff I'll have a map from MCSymbol to BinaryFunction in BinaryContext to facilitate function lookup from instruction operand expressions. There's one bug fixed where the function was called only under assert() in ICF::foldFunction(). For output we update all symbols associated with the function. At the moment it has no effect on the generated binary but in the future we would like to have all symbols in the symbol table updated. (cherry picked from FBD3704790)	2016-08-07 12:35:23 -07:00
Theodoros Kasampalis	a9bb3320ad	Identical Code Folding (ICF) pass Summary: Added an ICF pass to BOLT, that can recognize identical functions and replace references to these functions with references to just one representative. (cherry picked from FBD3460297)	2016-06-09 11:36:55 -07:00
Bill Nell	82401630a2	Factor out instruction printing and size computation. Summary: I've factored out the instruction printing and size computation routines to methods on BinaryContext. I've also added some more debug print functions. This was split off the ICP diff to simplify it a bit. (cherry picked from FBD3610690)	2016-07-23 08:01:53 -07:00
Theodoros Kasampalis	156a55209c	Simplification of loads from read-only data sections. Summary: Instructions that load data from the a read-only data section and their target address can be computed statically (e.g. RIP-relative addressing) are modified to corresponding instructions that use immediate operands. We apply the transformation only when the resulting instruction will have smaller or equal size. (cherry picked from FBD3397112)	2016-06-03 00:58:11 -07:00
Theodoros Kasampalis	17b846586c	Loop detection for BOLT's CFG. Summary: Loop detection for the CFG data structure. Added a GraphTraits specialization for BOLT's CFG that allows us to use LLVM's loop detection interface. (cherry picked from FBD3604837)	2016-05-26 10:58:01 -07:00
Bill Nell	ea53cffb2d	Add movabs -> mov shortening optimization. Add peephole optimization pass that does instruction shortening. Summary: Shorten when a mov instruction has a 64-bit immediate that can be repesented as a sign extended 32-bit number, use the smaller mov instruction (MOV64ri -> MOV64ri32). Add peephole optimization pass that does instruction shortening. (cherry picked from FBD3603099)	2016-07-21 16:40:06 -07:00
Maksim Panchenko	c6d0c568d4	Add BinaryContext::getSectionForAddress() Summary: Interface for accessing section from BinaryContext. (cherry picked from FBD3600854)	2016-07-21 12:45:35 -07:00
Maksim Panchenko	f2d82919d0	Move debug-handling code into DWARFRewriter (NFC). Summary: RewriteInstance.cpp is getting too big. Split the code. (cherry picked from FBD3596103)	2016-05-31 19:12:26 -07:00
Bill Nell	674dbcc0de	Fix crash in patchELFPHDRTable when no functions are modified. Summary: patchELFPHDRTable was asserting that it could not find an entry for .eh_frame_hdr in SectionMapInfo when no functions were modified by BOLT. This just changes code to skip modifying GNU_EH_FRAME program headers hen SectionMapInfo is empty. The existing header is copied and written instead. (cherry picked from FBD3557481)	2016-07-12 16:43:53 -07:00
Maksim Panchenko	84b5b9e462	Create alternative name for local symbols. Summary: If a profile data was collected on a stripped binary but an input to BOLT is unstripped, we would use a different mangling scheme for local functions and ignore their profiles. To solve the issue this diff adds alternative name for all local functions such that one of the names would match the name in the profile. If the input binary was stripped, we reject it, unless "-allow-stripped" option was passed. It's more complicated to do a matching in this case since we have less information than at the time of profile collection. It's also not that simple to tell if the profile was gathered on a stripped binary (in which case we would have no issue matching data). (cherry picked from FBD3548012)	2016-07-11 18:51:13 -07:00
Bill Nell	260f6fbdb6	Add option to dump CFGs in (simple) graphviz format during all passes. Summary: I noticed the BinaryFunction::viewGraph() method that hadn't been implemented and decided I could use a simple DOT dumper for CFGs while working on the indirect call optimization. I've implemented the bare minimum for the dumper. It's just nodes+BB labels with dges. We can add more detailed information as needed/desired. (cherry picked from FBD3509326)	2016-07-01 08:40:56 -07:00
Maksim Panchenko	f1192a7118	Support for multiple function names. Summary: With ICF optimization in the linker we were getting mismatches of function names in .fdata and BinaryFunction name. This diff adds support for multiple function names for BinaryFunction and does a match against all possible names for the profile. (cherry picked from FBD3466215)	2016-06-10 17:13:05 -07:00
Maksim Panchenko	70f82d9371	Reject profile data for functions that do not match. Summary: Verify profile data for a function and reject if there are branches that don't correspond to any branches in the function CFG. Note that we have to ignore branches resulting from recursive calls. Fix printing instruction offsets in disassembled state. Allow function to have non-zero execution count even if we don't have branch information. (cherry picked from FBD3451596)	2016-06-15 18:36:16 -07:00
Bill Nell	45e2219ae4	Allocate BinaryBasicBlocks with new rather than storing them in the BasicBlocks vector. Summary: This will help optimization passes that need to modify the CFG after it is constructed. Otherwise, the BinaryBasicBlock pointers stored in the layout, successors and predecessors would need to be modified every time a new basic block is created. (cherry picked from FBD3403372)	2016-06-07 16:27:52 -07:00
Maksim Panchenko	6da0d95326	Fix large functions debug info by default. Summary: Turn on -fix-debuginfo-large-functions by default. In the process of testing I've discovered that we output cold code for functions that were too large to be emitted. Fixed that. (cherry picked from FBD3372697)	2016-05-31 19:29:34 -07:00
Maksim Panchenko	4460da0d81	Improvements for debug info. Summary: Assembly functions could have no corresponding DW_AT_subprogram entries, yet they are represented in module ranges (and .debug_aranges) and will have line number information. Make sure we update those. Eliminated unnecessary data structures and optimized some passes. For .debug_loc unused location entries are no longer processed resulting in smaller output files. Overall it's a small processing time improvement and memory imporement. (cherry picked from FBD3362540)	2016-05-27 20:19:19 -07:00
Maksim Panchenko	06b9c5b342	Better .debug_line for non-simple functions. Summary: Generate .debug_line info for non-simple functions in a way that if preferrable by 'objdump -S'. (cherry picked from FBD3345485)	2016-05-24 20:50:36 -07:00
Maksim Panchenko	7b97793b94	Fix for clang .debug_info. Summary: Clang uses different attribute for high_pc which was incompatible with the way we were updating ranges. This diff fixes it. (cherry picked from FBD3345537)	2016-05-24 14:54:23 -07:00
Maksim Panchenko	cfa5d753eb	Miscellaneous fixes for debug info. Summary: * Fix several cases for handling debug info: - properly update CU DW_AT_ranges for function with folded body due to ICF optimization - convert ranges to DW_AT_ranges from hi/low PC for all DIEs - add support for [a, a) range - update CU ranges even when there are no functions registered * Overwrite .debug_ranges section instead of appending. * Convert assertions in debug info handling part into warnings. (cherry picked from FBD3339383)	2016-05-23 19:36:38 -07:00
Maksim Panchenko	7ab3db129b	Create DW_AT_ranges for compile units. Summary: Some compile unit DIEs might be missing DW_AT_ranges because they were compiled without "-ffunction-sections" option. This diff adds the attribute to all compile units. If the section is not present, we need to create it. Will do it in a separate diff. (cherry picked from FBD3314984)	2016-05-17 18:10:14 -07:00
Maksim Panchenko	f047b9d43a	Overwrite contents of .debug_line section. Summary: Overwrite contents of .debug_line section since we don't reference the original contents anymore. This saves ~100MB of HHVM binary. (cherry picked from FBD3314917)	2016-05-16 17:02:17 -07:00
Maksim Panchenko	b445f5eb7b	Fix issue with garbage address in .debug_line. Summary: While emitting debug lines for a function we don't overwrite, we don't have a code section context that is needed by default writing routine. Hence we have to emit end_sequence after the last address, not at the end of section. (cherry picked from FBD3291533)	2016-05-11 19:13:38 -07:00
Bill Nell	f7e7e25b88	Put all optimization passes under the pass manager. Summary: Move eliminate unreachable code, block reordering, and CFI/exception fixup into official optimization passes. (cherry picked from FBD3248991)	2016-05-02 12:47:18 -07:00

1 2 3

102 Commits