intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-25 10:55:58 +08:00

Author	SHA1	Message	Date
Bill Nell	0e4d86bf19	[BOLT] Refactor global symbol handling code. Summary: This is preparation work for static data reordering. I've created a new class called BinaryData which represents a symbol contained in a section. It records almost all the information relevant for dealing with data, e.g. names, address, size, alignment, profiling data, etc. BinaryContext still stores and manages BinaryData objects similar to how it managed symbols and global addresses before. The interfaces are not changed too drastically from before either. There is a bit of overlap between BinaryData and BinaryFunction. I would have liked to do some more refactoring to make a BinaryFunctionFragment that subclassed from BinaryData and then have BinaryFunction be composed or associated with BinaryFunctionFragments. I've also attempted to use (symbol + offset) for when addresses are pointing into the middle of symbols with known sizes. This changes the simplify rodata loads optimization slightly since the expression on an instruction can now also be a (symbol + offset) rather than just a symbol. One of the overall goals for this refactoring is to make sure every relocation is associated with a BinaryData object. This requires adding "hole" BinaryData's wherever there are gaps in a section's address space. Most of the holes seem to be data that has no associated symbol info. In this case we can't do any better than lumping all the adjacent hole symbols into one big symbol (there may be more than one actual data object that contributes to a hole). At least the combined holes should be moveable. Jump tables have similar issues. They appear to mostly be sub-objects for top level local symbols. The main problem is that we can't recognize jump tables at the time we scan the symbol table, we have to wait til disassembly. When a jump table is discovered we add it as a sub-object to the existing local symbol. If there are one or more existing BinaryData's that appear in the address range of a newly created jump table, those are added as sub-objects as well. (cherry picked from FBD6362544)	2017-11-14 20:05:11 -08:00
Rafael Auler	6d0401ccfb	[BOLT/LSDA] Fix alignment Summary: Fix a bug introduced by rebasing with respect to aligned ULEBs. This wasn't breaking anything but it is good to keep LDSA aligned. (cherry picked from FBD7094742)	2018-02-26 20:09:14 -08:00
Rafael Auler	8a5a30156e	[BOLT rebase] Rebase fixes on top of LLVM Feb2018 Summary: This commit includes all code necessary to make BOLT working again after the rebase. This includes a redesign of the EHFrame work, cherry-pick of the 3dnow disassembly work, compilation error fixes, and port of the debug_info work. The macroop fusion feature is not ported yet. The rebased version has minor changes to the "executed instructions" dynostats counter because REP prefixes are considered a part of the instruction it applies to. Also, some X86 instructions had the "mayLoad" tablegen property removed, which BOLT uses to identify and account for loads, thus reducing the total number of loads reported by dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are not terminators anymore, changing our CFG. This commit adds compensation to preserve this old behavior and minimize tests changes. debug_info sections are now slightly larger. The discriminator field in the line table is slightly different due to a change upstream. New profiles generated with the other bolt are incompatible with this version because of different hash values calculated for functions, so they will be considered 100% stale. This commit changes the corresponding test to XFAIL so it can be updated. The hash function changes because it relies on raw opcode values, which change according to the opcodes described in the X86 tablegen files. When processing HHVM, bolt was observed to be using about 800MB more memory in the rebased version and being about 5% slower. (cherry picked from FBD7078072)	2018-02-06 15:00:23 -08:00
Rafael Auler	1fa80594cf	[BOLT] Do not assign a LP to tail calls Summary: Do not assign a LP to tail calls. They are not calls in the view of an unwinder, they are just regular branches. We were hitting an assertion in BinaryFunction::removeConditionalTailCalls() complaining about landing pads in a CTC, however it was in fact a builtin_unreachable being conservatively treated as a CTC. (cherry picked from FBD6564957)	2017-12-13 19:08:43 -08:00
Bill Nell	b2f132c7c2	[RFC] [BOLT] Use iterators for MC branch/call analysis code. Summary: Here's an implementation of an abstract instruction iterator for the branch/call analysis code in MCInstrAnalysis. I'm posting it up to see what you guys think. It's a bit sloppy with constness and probably needs more tidying up. (cherry picked from FBD6244012)	2017-11-04 19:22:05 -07:00
Bill Nell	46866f5fa0	[BOLT] Refactor branch analysis code. Summary: Move the indirect branch analysis code from BinaryFunction to MCInstrAnalysis/X86MCTargetDesc.cpp. In the process of doing this, I've added an MCRegInfo to MCInstrAnalysis which allowed me to remove a bunch of extra method parameters. I've also had to refactor how BinaryFunction held on to instructions/offsets so that it would be easy to pass a sequence of instructions to the analysis code (rather than a map keyed by offset). Note: I think there are a bunch of MCInstrAnalysis methods that have a BitVector output parameter that could be changed to a return value since the size of the vector is based on the number of registers, i.e. from MCRegisterInfo. I haven't done this in order to keep the diff a more manageable size. (cherry picked from FBD6213556)	2017-11-01 10:26:07 -07:00
Maksim Panchenko	1288c81c9b	[BOLT][Refactoring] Change landing pads handling Summary: Change the way we store and handle landing pads and throwers. (cherry picked from FBD6169992)	2017-10-26 18:36:30 -07:00
Maksim Panchenko	b006d2a860	[BOLT] Fix issue with exception handlers splitting Summary: A cold part of a function can start with a landing pad. As a result, this landing pad will have offset 0 from the start of the corresponding FDE, and it wouldn't get registered by exception-handling runtime. The solution is to use a different landing pad base address (LPStart), such as (FDE_start - 1). (cherry picked from FBD5876561)	2017-09-20 13:32:46 -07:00
Maksim Panchenko	bd8e4b9e87	[BOLT] Support PIC-style exception tables Summary: Exceptions tables for PIC may contain indirect type references that are also encoded using relative addresses. This diff adds support for such encodings. We read PIC-style type info table, and write it using new encoding. (cherry picked from FBD5716060)	2017-08-27 17:04:06 -07:00
Maksim Panchenko	49d1f5698d	[BOLT] PLT optimization Summary: Add an option to optimize PLT calls: -plt - optimize PLT calls (requires linking with -znow) =none - do not optimize PLT calls =hot - optimize executed (hot) PLT calls =all - optimize all PLT calls When optimized, the calls are converted to use GOT reference indirectly. GOT entries are guaranteed to contain a valid function pointer if lazy binding is disabled - hence the requirement for linker's -znow option. Note: we can add an entry to .dynamic and drop a requirement for -znow if we were moving .dynamic to a new segment. (cherry picked from FBD5579789)	2017-08-04 11:21:05 -07:00
Maksim Panchenko	d27b31ee07	[BOLT] Fix reading LSDA address for PIC code Summary: Fix a bug while reading LSDA address in PIC format. The base address was wrong for PC-relative value. There's more work involved in making PIC code with C++ exceptions work. (cherry picked from FBD5538755)	2017-08-01 11:19:01 -07:00
Maksim Panchenko	0bde796e50	[BOLT] Organize options in categories for pretty printing (near NFC). Summary: Each BOLT-specific option now belongs to BoltCategory or BoltOptCategory. Use alphabetical order for options in source code (does not affect output). The result is a cleaner output of "llvm-bolt -help" which does not include any unrelated llvm options and is close to the following: ..... BOLT generic options: -data=<string> - <data file> -dyno-stats - print execution info based on profile -hot-text - hot text symbols support (relocation mode) -o=<string> - <output file> -relocs - relocation mode - use relocations to move functions in the binary -update-debug-sections - update DWARF debug sections of the executable -use-gnu-stack - use GNU_STACK program header for new segment (workaround for issues with strip/objcopy) -use-old-text - re-use space in old .text if possible (relocation mode) -v=<uint> - set verbosity level for diagnostic output BOLT optimization options: -align-blocks - try to align BBs inserting nops -align-functions=<uint> - align functions at a given value (relocation mode) -align-functions-max-bytes=<uint> - maximum number of bytes to use to align functions -boost-macroops - try to boost macro-op fusions by avoiding the cache-line boundary -eliminate-unreachable - eliminate unreachable code -frame-opt - optimize stack frame accesses ...... (cherry picked from FBD4793684)	2017-03-28 14:40:20 -07:00
Maksim Panchenko	6dc2351505	[BOLT] New CFI handling policy. Summary: The new interface for handling Call Frame Information: * CFI state at any point in a function (in CFG state) is defined by CFI state at basic block entry and CFI instructions inside the block. The state is independent of basic blocks layout order (this is implied by CFG state but wasn't always true in the past). * Use BinaryBasicBlock::getCFIStateAtInstr(const MCInst Inst) to get CFI state at any given instruction in the program. No need to call fixCFIState() after any given pass. fixCFIState() is called only once during function finalization, and any function transformations after that point are prohibited. * When introducing new basic blocks, make sure CFI state at entry is set correctly and matches CFI instructions in the basic block (if any). * When splitting basic blocks, use getCFIStateAtInstr() to get a state at the split point, and set the new basic block's CFI state to this value. Introduce CFG_Finalized state to indicate that no further optimizations are allowed on the function. This state is reached after we have synced CFI instructions and updated EH info. Rename "-print-after-fixup" option to "-print-finalized". This diffs fixes CFI for cases when we split conditional tail calls, and for indirect call promotion optimization. (cherry picked from FBD4629307)	2017-02-24 21:59:33 -08:00
Maksim Panchenko	55fc5417f8	Relocations support for BOLT. Summary: Read relocation from linker and relocate all functions. (cherry picked from FBD4223901)	2016-09-27 19:09:38 -07:00
Rafael Auler	5c0e4b6a57	Fix undefined behavior in DebugInfo Summary: The CFI instructions parser in libDebugInfo was relying on undefined behavior to parse operands by assuming the order function parameters are evaluated in a function call site is defined (it is not). This patch fix this and makes our clang and gcc tests agree. It also fixes wrong LIT tests in our codebase with respect to the order of DW_CFA_def_cfa operands. (cherry picked from FBD4255227)	2016-11-30 15:52:24 -08:00
Maksim Panchenko	a7fb610eba	Relocate old .eh_frame section next to the new one. Summary: In order to improve gdb experience with BOLT we have to make sure the output file has a single .eh_frame section. Otherwise gdb will use either old or new section for unwinding purposes. This diff relocates the original .eh_frame section next to the new one generated by LLVM. Later we merge two sections into one and make sure only the newly created section has .eh_frame name. (cherry picked from FBD4203943)	2016-11-11 14:33:34 -08:00
Maksim Panchenko	809c28f585	Generate .eh_frame_hdr based on contents of .eh_frame's. Summary: We used to patch an existing .eh_frame_hdr and append contents for split functions at the end. However, this approach does not work in relocation mode since function addresses change and split functions will not necessarily be at the end. Instead of patching and appending we generate the new .eh_frame_hdr based on contents of old and new .eh_frame sections. (cherry picked from FBD4180756)	2016-11-14 16:39:55 -08:00
Maksim Panchenko	055dfe48e7	Another EH fix for cold fragments of functions that we fail to write. Summary: In a prev diff I disabled inclusion of FDEs for cold fragments that we fail to write. The side effect of it was that we failed to write FDE for the next function with a cold fragment since it had the same assigned address that we had put in FailedAddresses. The correct fix is to assign zero address to failed cold fragments and ignore them when we write .eh_frame_hdr. (cherry picked from FBD4156740)	2016-11-09 11:19:02 -08:00
Rafael Auler	bc8cb088c0	Support DWARF expressions in CFI instructions Summary: Modify the MC layer (MCDwarf.h\|cpp) to understand CFI instructions dealing with DWARF expressions. Add code to emit DWARF expressions in MCDwarf. Change llvm-bolt to pass these CFI instructions to streamer instead of bailing on them. Change -dump-eh-frame option in llvm-bolt to dump the EH frame of the rewritten binary in addition to the one in the original binary, allowing us to proper test this patch. (cherry picked from FBD4194452)	2016-11-15 10:40:00 -08:00
Maksim Panchenko	e241e9c156	New function discovery and support for multiple entries. Summary: Modified function discovery process to tolerate more functions and symbols coming from assembly. The processing order now matches the memory order of the functions (input symbol table is unsorted). Added basic support for functions with multiple entries. When a function references its internal address other than with a branch instruction, that address could potentially escape. We mark such addresses as entry points and make sure they are treated as roots by unreachable code elimination. Without relocations we have to mark multiple-entry functions as non-simple. (cherry picked from FBD3950243)	2016-09-29 11:19:06 -07:00
Maksim Panchenko	6bef336cc2	Add dyno stats to BOLT. Summary: Add "-dyno-stats" option that prints instruction stats based on the execution profile similar to below: BOLT-INFO: program-wide dynostats after optimizations: executed forward branches : 109706407 (+8.1%) taken forward branches : 13769074 (-55.5%) executed backward branches : 24517582 (-25.0%) taken backward branches : 15330256 (-27.2%) executed unconditional branches : 6009826 (-35.5%) function calls : 17192114 (+0.0%) executed instructions : 837733057 (-0.4%) total branches : 140233815 (-2.3%) taken branches : 35109156 (-42.8%) Also fixed pseudo instruction discrepancies and added assertions for BinaryBasicBlock::getNumPseudos() to make sure the number is synchronized with real number of pseudo instructions. (cherry picked from FBD3826995)	2016-08-29 21:11:22 -07:00
Bill Nell	48b55300e0	BOLT: Make most command line options ZeroOrMore. Summary: This will make it easier to run experiments with the same baseline BOLT binary but different command line options. (cherry picked from FBD3831978)	2016-09-07 14:41:56 -07:00
Bill Nell	c27a6a5c63	Add verbosity level and clean up stream usage. Summary: I've added a verbosity level to help keep the BOLT spewage to a minimum. The default level is pretty terse now, level 1 is closer to the original, I've saved level 2 for the noisiest of messages. Error messages should never be suppressed by the verbosity level only warnings and info messages. The rational behind stream usage is as follows: outs() for info and debugging controlled by command line flags. errs() for errors and warnings. dbgs() for output within DEBUG(). With the exception of a few of the level 2 messages I don't have any strong feelings about the others. (cherry picked from FBD3814259)	2016-09-02 14:15:29 -07:00
Maksim Panchenko	36df6057b0	Refactoring. Mainly NFC. Summary: Eliminated BinaryFunction::getName(). The function was confusing since the name is ambigous. Instead we have BinaryFunction::getPrintName() used for printing and whenever unique string identifier is needed one can use getSymbol()->getName(). In the next diff I'll have a map from MCSymbol to BinaryFunction in BinaryContext to facilitate function lookup from instruction operand expressions. There's one bug fixed where the function was called only under assert() in ICF::foldFunction(). For output we update all symbols associated with the function. At the moment it has no effect on the generated binary but in the future we would like to have all symbols in the symbol table updated. (cherry picked from FBD3704790)	2016-08-07 12:35:23 -07:00
Bill Nell	82d76ae18b	Add MCInst annotation mechanism to MCInstrAnalysis class. Summary: Add three new MCOperand types: Annotation, LandingPad and GnuArgsSize. Annotation is used for associating random data with MCInsts. Clients can construct their own annotation types (subclassed from MCAnnotation) and associate them with instructions. Annotations are looked up by string keys. Annotations can be added, removed and queried using an instance of the MCInstrAnalysis class. The LandingPad operand is a MCSymbol, uint64_t pair used to encode exception handling information for call instructions. GnuArgsSize is used to annotate calls with the DW_CFA_GNU_args_size attribute. (cherry picked from FBD3597877)	2016-07-28 10:34:50 -07:00
Maksim Panchenko	4f44d60947	Special handling for GNU_args_size call frame instruction. Summary: GNU_args_size is a special kind of CFI that tells runtime to adjust %rsp when control is passed to a landing pad. It is used for annotating call instructions that pass (extra) parameters on the stack and there's a corresponding landing pad. It is also special in a way that its value is not handled by DW_CFA_remember_state/DW_CFA_restore_state instruction sequence that we utilize to restore the state after block re-ordering. This diff adds association of call instructions with GNU_args_size value when it's used. If the function does not use GNU_args_size, there is no overhead. Otherwise, we regenerate GNU_args_size instruction during code emission, i.e. after all optimizations and block-reordering. (cherry picked from FBD3201322)	2016-04-19 22:00:29 -07:00
Maksim Panchenko	9212a9ad69	Proper skipping of unsupported CFI instructions. Summary: Skip DW_CFA_expression and DW_CFA_val_expression instructions properly, according to DWARF spec. If CFI range does not match function range skip that function. (cherry picked from FBD3040502)	2016-03-10 23:03:17 -08:00
Maksim Panchenko	73e9afe99c	Don't abort on unknown CFI instructions. Summary: If we see an unknown CFI instruction, skip processing the function containing it instead of aborting execution. (cherry picked from FBD2964557)	2016-02-22 18:25:43 -08:00
Maksim Panchenko	7f7d4af7e0	Add an option to use PT_GNU_STACK for new segment. Summary: Added an option to reuse existing program header entry. This option allows for bfd tools like strip and objcopy to operate on the optimized binary without destroying it. Also, all new sections are now properly marked in ELF. (cherry picked from FBD2943339)	2016-02-12 19:01:53 -08:00
Maksim Panchenko	d1526083fc	Rename binary optimizer to BOLT. Summary: BOLT - Binary Optimization and Layout Tool replaces FLO. I'm keeping .fdata extension for "feedback data". (cherry picked from FBD2908028)	2016-02-05 14:42:04 -08:00
Maksim Panchenko	89578e2314	Allow to partially split functions with exceptions. Summary: We could split functions with exceptions even without creating a new exception handling table. This limits us to only move basic blocks that never throw, and are not a start of a landing pad. (cherry picked from FBD2862937)	2016-01-22 16:45:39 -08:00
Maksim Panchenko	c9b7e3e09e	Write updated LSDA's. Summary: Write new exception ranges tables (LSDA's) into the output file. (cherry picked from FBD2828312)	2015-12-18 17:00:46 -08:00
Maksim Panchenko	b42c72cbf6	Fix issues with some CFI instructions with gcc 4.9. Summary: Fixes some issues discovered after hhvm switched to gcc 4.9. Add support for DW_CFA_GNU_args_size instruction. Allow CFI instruction after the last instruction in a function. Reverse conditions of assert for DW_CFA_set_loc. (cherry picked from FBD28110096)	2015-12-18 20:26:44 -08:00
Maksim Panchenko	a6efd11c05	Code/comments cleanup. Summary: Consolidate cold function info under cold FragmentInfo. Minor code and comment mods to LSDA handling. (cherry picked from FBD28109981)	2015-12-17 12:59:15 -08:00
Maksim Panchenko	f7d7a85a24	Turn EH ranges support back on. Summary: Changed the way EH info is stored/extracted from call instruction. Make sure indirect calls work. (cherry picked from FBD28109629)	2015-12-15 17:06:27 -08:00
Rafael Auler	04c80af012	Don't choke on DW_CFA_def_cfa_expression and friends Summary: Our CFI parser in the LLVM library was giving up on parsing all CFI instructions when finding a single instruction with expression operands. Yet, all gcc-4.9 binaries seem to have at least one CFI instruction with expression operands (DW_CFA_def_cfa_expression). This patch fixes this and makes DebugInfo continue to parse other instructions, even though it does not completely parse DWARF expressions yet. However, this seems to be enough to allow llvm-flo to process gcc-4.9 binaries because the FDEs with DWARF expressions are linked to the PLT region, and not to functions that we process. If we ever try to read a function whose CFI depends on DWARF expression, which is unlikely, llvm-flo will assert. (cherry picked from FBD2693088)	2015-11-24 13:55:44 -08:00
Rafael Auler	ccbbb8f8b9	Teach llvm-flo how to split functions into hot and cold regions Summary: After basic block reordering, it may be possible that the reordered function is now larger than the original because of the following reasons: - jump offsets may change, forcing some jump instructions to use 4-byte immediate operand instead of the 1-byte, shorter version. - fall-throughs change, forcing us to emit an extra jump instruction to jump to the original fall-through at the end of a basic block. Since we currently do not change function addresses, we need to rewrite the function back in the binary in the original location. If it doesn't fit, we were dropping the function. This patch adds a flag -split-functions that tells llvm-flo to split hot functions into hot and cold separate regions. The hot region is written back in the original function location, while the cold region is written in a separate, far-away region reserved to flo via a linker script. This patch also adds the logic to create and extra FDE to supply unwinding information to the cold part of the function. Owing to this, we now need to rewrite .eh_frame_hdr to another location and patch the EH_FRAME ELF segment to point to this new .eh_frame_hdr. (cherry picked from FBD2677996)	2015-11-19 17:59:41 -08:00
Rafael Auler	1d248ec51b	Write .eh_frame and .eh_frame_hdr after reordering BBs Summary: This patch adds logic to detect when the binary has extra space reserved for us via the __flo_storage symbol. If this symbol is present, it means we have extra space in the binary to write extraneous information. When we write a new .eh_frame, we cannot discard the old .eh_frame because it may still contain relevant information for functions we do not reorder. Thus, we write the new .eh_frame into __flo_storage and patch the current .eh_frame_hdr to point to the new .eh_frame only for the functions we touched, generating a binary that works with a bi-.eh_frame model. (cherry picked from FBD2639326)	2015-11-10 15:20:50 -08:00
Rafael Auler	6c851dc2e3	Attempts to fix CFI state after reordering Summary: This patch introduces logic to check how the CFI instructions define a table to help during stack unwinding at exception run time and attempts to fix any problem in this table that may have been introduced by reordering the basic blocks. If it fails to fix this problem, the function is marked as not simple and not eligible for rewriting. (cherry picked from FBD2633696)	2015-11-08 12:23:54 -08:00
Maksim Panchenko	bc9d6e3b6c	Regenerate exception handling information after optimizations. Summary: Regenerate exception handling information after optimizations. Use '-print-eh-ranges' to see CFG with updated ranges. (cherry picked from FBD2660982)	2015-11-13 14:18:45 -08:00
Maksim Panchenko	56cca2fb5b	Fix LSDA reading issues. Summary: There were two issues: we were trying to process non-simple functions, i.e. function that we don't fully understand, and then we failed to stop iterating if EH closing label was after the last instruction in a function. (cherry picked from FBD2664460)	2015-11-17 11:02:04 -08:00
Maksim Panchenko	be2a19523c	Add exception handling information to CFG. Summary: Read .gcc_except_table and add information to CFG. Calls have extra operands indicating there's a possible handler for exceptions and an action. Landing pad information is recorded in BinaryFunction. Also convert JMP instructions that are calls into tail calls pseudo instructions so that they don't miss call instruction analysis. (cherry picked from FBD2652775)	2015-11-12 18:56:58 -08:00
Rafael Auler	a30d04c3e2	Annotate BinaryFunctions with MCCFIInstructions encoding CFI Summary: In order to represent CFI information in our BinaryFunction class, this patch adds a map of Offsets to CFI instructions. In this way, we make it easy to check exactly where DWARF CFI information is annotated in the disassembled function. (cherry picked from FBD2619216)	2015-11-04 16:48:47 -08:00
Maksim Panchenko	de46e6fc07	Parse whole contents of .gcc_except_table even if we are not printing. Summary: We need to parse the whole contents of .gcc_except_table even if we are not printing exceptions. Otherwise we are missing type index table and miscalculate the size of the current table. (cherry picked from FBD2632965)	2015-11-09 12:27:13 -08:00
Rafael Auler	2088875656	Teach llvm-flo how to read .eh_frame information from binaries Summary: In order to reorder binaries with C++ exceptions, we first need to read DWARF CFI (call frame info) from binaries in a table in the .eh_frame ELF section. This table contains unwinding information we need to be aware of when reordering basic blocks, so as to avoid corrupting it. This patch also cleans up some code from Exceptions.cpp due to a refactoring where we moved some functions to the LLVM's libSupport. (cherry picked from FBD2614464)	2015-11-05 13:37:30 -08:00
Maksim Panchenko	7d592d0975	Verbose printing of actions from .gcc_except_table Summary: Print actions for exception ranges from .gcc_except_table. Types are printed as names if the name is available from symbol table. (cherry picked from FBD2612631)	2015-11-03 14:26:33 -08:00
Maksim Panchenko	21cc191ea8	Added function to parse and dump .gcc_except_table Summary: Use '-print-exceptions' option to dump contents of .gcc_except_table. (cherry picked from FBD2609925)	2015-11-02 11:50:53 -07:00

47 Commits