intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-18 16:50:51 +08:00

Author	SHA1	Message	Date
Bill Nell	f7e7e25b88	Put all optimization passes under the pass manager. Summary: Move eliminate unreachable code, block reordering, and CFI/exception fixup into official optimization passes. (cherry picked from FBD3248991)	2016-05-02 12:47:18 -07:00
Gabriel Poesia	5fa128e748	Inlining of small functions. Summary: Added an optimization pass of inlining calls to small functions (with only one basic block). Inlining is done in a very simple way, inserting instructions to simulate the changes to the stack pointer that call/ret would make before/after the inlined function executes. Also, the heuristic prefers to inline calls that happen in the hottest blocks (by looking at their execution count). Calls in cold blocks are ignored. (cherry picked from FBD3233516)	2016-04-25 14:25:58 -07:00
Gabriel Poesia	d1f525499e	Optimize calls to functions that are a single unconditional jump Summary: Many functions (around 600) in the HHVM binary are simply a single unconditional jump instruction to another function. These can be trivially optimized by modifying the call sites to directly call the branch target instead (because it also happens with more than one jump in sequence, we do it iteratively). This diff also adds a very simple analysis/optimization pass system in which this pass is the first one to be implemented. A follow-up to this could be to move the current optimizations to other passes. (cherry picked from FBD3211138)	2016-04-15 15:59:52 -07:00
Gabriel Poesia	e6acc7bb53	Optimize calls to functions that are a single unconditional jump Summary: Many functions (around 600) in the HHVM binary are simply a single unconditional jump instruction to another function. These can be trivially optimized by modifying the call sites to directly call the branch target instead (because it also happens with more than one jump in sequence, we do it iteratively). This diff also adds a very simple analysis/optimization pass system in which this pass is the first one to be implemented. A follow-up to this could be to move the current optimizations to other passes. (cherry picked from FBD3211138)	2016-04-15 15:59:52 -07:00
Gabriel Poesia	459eb8c230	Fix "Cannot update ranges for DIE at offset" error messages. Summary: Fix the error message by not printing it :) Explanation: a previous diff accidentally removed this error message from within the DEBUG macro, and it's expected that we'll have a bunch of them since a lot of the DIEs we try to update are empty or meaningless. For instance (and mainly), there is a huge number of lexical block DIEs with no attributes in .debug_info. In the first phase of collecting debugging info, we store the offsets of all these DIEs, only later to realize that we cannot update their address ranges because they have none. A better fix would be to check this earlier and not store offsets of DIEs we cannot update to begin with. (cherry picked from FBD3236923)	2016-04-28 12:55:35 -07:00
Maksim Panchenko	1258903b54	Fix for functions in different segments. Summary: In a test binary some functions are placed in a segment preceding the segment containing .text section. As a result, we were miscalculating maximum function size as the calculation was based on addresses only. This diff fixes the calculation by checking if symbol after function belongs to the same section. If it does not, then we set the maximum function size based on the size of the containing section and not on the address distance to the next symbol. (cherry picked from FBD3229205)	2016-04-26 23:42:39 -07:00
Maksim Panchenko	3811673a0c	Option to break in given functions. Summary: Added option "-break-funcs=func1,func2,...." to coredump in any given function by introducing ud2 sequence at the beginning of the function. Useful for debugging and validating stack traces. Also renamed options containing "_" to use "-" instead. Also run hhvm test with "-update-debug-sections". (cherry picked from FBD3210248)	2016-04-21 09:54:33 -07:00
Maksim Panchenko	43bc4a09ad	Changed splitting options and fixed sorting. Summary: Splitting option now has different meanings/values. Since landing pads are mostly always cold/frozen, we should split them before anything else (we still check the execution count is 0). That's value '1'. Everything else goes on top of that and has increased value (2 - large functions, 3 - everything). Sorting was non-deterministic and somewhat broken for functions with EH ranges. Fixed that and added '-split-all-cold' option to outline all 0-count blocks. Fixed compilation of test cases. After my last commit the binaries were linked to wrong source files (i.e. debug info). Had to rebuild the binaries from updated sources. (cherry picked from FBD3209369)	2016-04-20 15:31:11 -07:00
Maksim Panchenko	4f44d60947	Special handling for GNU_args_size call frame instruction. Summary: GNU_args_size is a special kind of CFI that tells runtime to adjust %rsp when control is passed to a landing pad. It is used for annotating call instructions that pass (extra) parameters on the stack and there's a corresponding landing pad. It is also special in a way that its value is not handled by DW_CFA_remember_state/DW_CFA_restore_state instruction sequence that we utilize to restore the state after block re-ordering. This diff adds association of call instructions with GNU_args_size value when it's used. If the function does not use GNU_args_size, there is no overhead. Otherwise, we regenerate GNU_args_size instruction during code emission, i.e. after all optimizations and block-reordering. (cherry picked from FBD3201322)	2016-04-19 22:00:29 -07:00
Gabriel Poesia	ad344c4387	Group debugging info representation and serialization code. Summary: Moved the classes related to representing and serializing DWARF entities into a single header, DebugData.h. (cherry picked from FBD3153279)	2016-04-07 15:06:43 -07:00
Gabriel Poesia	f6c8929799	Fix debugging info for simple functions that we fail to rewrite. Summary: Simple functions which we fail to rewrite after optimizations were having wrong debugging information because the latter would reflect the optimized version of the function. There are only 48 functions (at this time) in this situation in the HHVM binary. The simple fix is to add another full pass. Another more complicated path, which will be more efficient, is to reset only the BinaryContext and emit again, but then we need to recreate all symbols in the new MCContext and update the pointers. I started taking this path but it started getting too complicated for only those 48 functions (needed to create a new map of global symbols, recreate landing pads - which needed to have the internal intermediate labels in the functions kept to be updated too, etc). Because the overhead is quite large (another full emission pass - around 4m30s here) and the impact is small I put this behind a new command-line flag which is off by default: -fix-debuginfo-large-functions. (cherry picked from FBD3166576)	2016-04-11 17:46:18 -07:00
Gabriel Poesia	0e77c53b89	Update address ranges of inlined functions and try/catch blocks. Summary: Update address ranges of inlined functions and try/catch blocks. This was missing and lead gdb to show weird information in a core dump we inspected because of the several nestings of inline in the call stack. This is very similar to Lexical Blocks, so the change is to basically generalize that code to do the same for DW_AT_try_block, DW_AT_catch_block and DW_AT_inlined_subroutine. (cherry picked from FBD3169417)	2016-04-12 11:41:03 -07:00
Maksim Panchenko	e16b5d8b78	Option to pass a file with list of functions to skip. Summary: Take "-skip_funcs_file=<file>" option and don't process any function listed in the <file>. (cherry picked from FBD3160226)	2016-04-08 19:30:27 -07:00
Gabriel Poesia	2694e58fa2	Update unmatched and nested subprogram DIEs. Summary: readelf was showing some errors because we weren't updating DIEs that were not shallow in the DIE tree, or DIEs of functions with addresses we don't recognize (mostly functions with address 0, which could have been removed by the Linker Script but still have debugging information there). These DIEs need to be updated because their abbreviations are patched. (cherry picked from FBD3159335)	2016-04-08 16:24:38 -07:00
Gabriel Poesia	665b03a464	Fix behavior with multiple functions with same address. Summary: We were updating only one DIE per function, but because the Linker Script may map multiple functions to the same address this would cause us to generate invalid debug info (as some DIEs weren't updated but their abbreviations were changed). (cherry picked from FBD3157263)	2016-04-08 11:55:42 -07:00
Gabriel Poesia	784f6a8773	Emit debug line information for non-simple functions. Summary: Non-simple functions aren't emitted, and thus didn't have line number information emitted. This diff emits it for those functions by extending LLVM's generation of the line number program to allow for absolute addresses (it is wholly symbolic), then iterating over the relevant line tables from the input and appending entries with absolute addresses to the line tables to be emited. This still leaves the simple but not overwritten functions unhandled (there were 48 in HHVM in my last run). However, I think that to fix them we'd need another pass, since by the time we realize a simple function wont't fit, debug line info was already written to the output. (cherry picked from FBD3148468)	2016-04-05 19:35:45 -07:00
Maksim Panchenko	e513bfd86d	Only set output ranges when updating dbg info. Summary: Save processing time by setting output ranges when needed. (cherry picked from FBD3148791)	2016-04-06 18:03:44 -07:00
Gabriel Poesia	4b4db40174	Update DWARF location lists after optimization. Summary: Summary: Update DWARF location lists in .debug_loc and pointers to them in .debug_info so that gdb can print variables which change location during their lifetime. The following changes were made: - Refactored BasicBlockOffsetRanges to allow ranges to be tied to binary information (so that we can reuse it for location lists) - Implemented range compression optimization in BasicBlockOffsetRanges (needed otherwise too much data was being generated). - Added representation for location lists (LocationList.h, BinaryContext.h) - Implemented .debug_loc serializer that keeps the updated offsets (DebugLocWriter.{h,cpp}) - After disassembly, traverse entries in .debug_loc and save them in context (BinaryContext.cpp) - After optimizations, serialize .debug_loc and update pointers in .debug_info (RewriteInstance.cpp) (cherry picked from FBD3130682)	2016-04-01 11:37:28 -07:00
Maksim Panchenko	4349b63144	Re-enable conditional function spitting under an option. Summary: Add a parameter value to "-split-functions=" option to allow splitting only when the function is too large to fit: 0 - never split 1 - split if too large to fit 2 - always split We may use this option when the profile data is not very precise. In that case excessive splitting may increase iTLB misses. (cherry picked from FBD3137700)	2016-03-31 16:38:49 -07:00
Gabriel Poesia	0a07d9bf88	Don't skip non-simple functions on function address ranges update. Summary: This fixes a problem in which bolt was generating a malformed .debug_info section on the bzip2 binary. The bug was the following: - A simple and a non-simple function shared an abbreviation - The abbreviation was patched to contain DW_AT_ranges because of the simple function - The non-simple function's data was not updated, but then it didn't match the layout expected by the abbreviation anymore And because we were already creating an address ranges list in .debug_ranges even for non-simple functions, it doesn't make sense not to use it anyway. (cherry picked from FBD3129219)	2016-04-01 15:09:34 -07:00
Gabriel Poesia	ffa9641e16	Update DWARF lexical blocks address ranges. Summary: Updates DWARF lexical blocks address ranges in the output binary after optimizations. This is similar to updating function address ranges except that the ranges representation needs to be more general, since address ranges can begin or end in the middle of a basic block. The following changes were made: - Added a data structure for iterating over the basic blocks that intersect an address range: BasicBlockTable.h - Added some more bookkeeping in BinaryBasicBlock. Basically, I needed to keep track of the block's size in the input binary as well as its address in the output binary. This information is mostly set by BinaryFunction after disassembly. - Added a representation for address ranges relative to basic blocks (BasicBlockOffsetRanges.h). Will also serve for location lists. - Added a representation for Lexical Blocks (LexicalBlock.h) - Small refactorings in DebugArangesWriter: -- Renamed to DebugRangesSectionsWriter since it also writes .debug_ranges -- Refactored it not to depend on BinaryFunction but instead on anything that can be assined an aoffset in .debug_ranges (added an interface for that) - Iterate over the DIE tree during initialization to find lexical blocks in .debug_info (BinaryContext.cpp) - Added patches to .debug_abbrev and .debug_info in RewriteInstance to update lexical blocks attributes (in fact, this part is very similar to what was done to function address ranges and I just refactored/reused that code) - Added small test case (lexical_blocks_address_ranges_debug.test) (cherry picked from FBD3113181)	2016-03-28 17:45:22 -07:00
Maksim Panchenko	e8ef8a5619	Speedup section remapping. Summary: Before this diff LLVM used to iterate over all sections to find the one with an address we want to remap. Since we have extremely large number of section this process is highly inefficient. Instead we add a new interface to remap a section with a given ID (which effectively is an index into an array of sections), and pass the ID instead of the address. This cuts down the processing time of hhvm binary by 10 seconds, and brings the total processing time to a little under 2 minutes. (cherry picked from FBD3110015)	2016-03-28 22:39:48 -07:00
Gabriel Poesia	466cbae866	Update subroutine address ranges in binary. Summary: [WIP] Update DWARF info for function address ranges. This diff currently does not work for unknown reasons, but I'm describing here what's the current state. According to both llvm-dwarf and readelf our output seems correct, but GDB does not interpret it as expected. All details go below in hope I missed something. I couldn't actually track the whole change that introduced support for what we need in gdb yet, but I think I can get to it (2007-12-04: Support lexical bocks and function bodies that occupy non-contiguous address ranges). I have reasons to believe gdb at least at some nges). The set of introduced changes was basically this: - After disassembly, iterate over the DIEs in .debug_info and find the ones that correspond to each BinaryFunction. - Refactor DebugArangesWriter to also write addresses of functions to .debug_ranges and track the offsets of function address ranges there - Add some infrastructure to facilitate patching the binary in simple ways (BinaryPatcher.h) - In RewriteInstance, after writing .debug_ranges already with function address ranges, for each function do: -- Find the abbreviation corresponding to the function -- Patch .debug_abbrev to replace DW_AT_low_pc with DW_AT_ranges and DW_AT_high_pc with DW_AT_producer (I'll explain this hack below). Also patch the corresponding forms to DW_FORM_sec_offset and DW_FORM_string (null-terminated in-place string). -- Patch debug_info with the .debug_ranges offset in place of the first 4 bytes of DW_AT_low_pc (DW_AT_ranges only occupies 4 bytes whereas low_pc occupies 8), and write an arbitrary string in-place in the other 12 bytes that were the 4 MSB of low_pc and the 8 bytes of high_pc before the patch. This depends on low_pc and high_pc being put consecutively by the compiler, but it serves to validate the idea. I tried another way of doing it that does not rely on this but it didn't work either and I believe the reason for either not working is the same (and still unknown, but unrelated to them. I might be wrong though, and if I find yet another way of doing it I may try it). The other way was to use a form of DW_FORM_data8 for the section offset. This is disallowed by the specification, but I doubt gdb validates this, as it's just easier to store it as 64-bit anyway as this is even necessary to support 64-bit DWARF (which is not what gcc generates by default apparently). I still need to make changes to the diff to make it production-ready, but first I want to figure out why it doesn't work as expected. By looking at the output of llvm-dwarfdump or readelf, all of .debug_ranges, .debug_abbrev and .debug_info seem to have been correctly updated. However, gdb seems to have serious problems with what we write. (In fact, readelf --debug-dump=Ranges shows some funny warning messages of the form ("Warning: There is a hole [0x100 - 0x120] in .debug_ranges"), but I played around with this and it seems it's just because no compile unit was using these ranges. Changing .debug_info apparently changes these warnings, so they seem to be unrelated to the section itself. Also looking at the hex dump of the section doesn't help, as everything seems fine. llvm-dwarfdump doesn't say anything. So I think .debug_ranges is fine.) The result is that gdb not only doesn't show the function name as we wanted, but it also stops showing line number information. Apparently it's not reading/interpreting the address ranges at all, and so the functions now have no associated address ranges, only the symbol value which allows one to put a breakpoint in the function, but not to show source code. As this left me without more ideas of what to try to feed gdb with, I believe the most promising next trial is to try to debug gdb itself, unless someone spots anything I missed. I found where the interesting part of the code lies for this case (gdb/dwarf2read.c and some other related files, but mainly that one). It seems in some parts gdb uses DW_AT_ranges for only getting its lowest and highest addresses and setting that as low_pc and high_pc (see dwarf2_get_pc_bounds in gdb's code and where it's called). I really hope this is not actually the case for function address ranges. I'll investigate this further. Otherwise I don't think any changes we make will make it work as initially intended, as we'll simply need gdb to support it and in that case it doesn't. (cherry picked from FBD3073641)	2016-03-16 18:08:29 -07:00
Gabriel Poesia	9cdb7bdb55	Write only minimal .debug_line information. Summary: We used to output .debug_line information for every instruction, but because of the way gdb (and probably lldb as of llvm::DWARFDebugLine::LineTable::findAddress) queries the line table it's not necessary to output information for two instructions if they follow each other and map to the same source line. By not repeating this information we generate a bit less .debug_line data. (cherry picked from FBD3056402)	2016-03-15 16:22:04 -07:00
Maksim Panchenko	a60914427c	Update DW_AT_ranges for CU when it exists. Summary: If CU has DW_AT_ranges update the value. Note that it does not create DW_AT_ranges attribute. (cherry picked from FBD3051904)	2016-03-14 19:04:23 -07:00
Maksim Panchenko	d01172ffa8	Refactor existing debugging code. Summary: Almost NFC. Isolate code for updating debug info. (cherry picked from FBD3051536)	2016-03-14 18:48:05 -07:00
Gabriel Poesia	dc7cc1fb18	Fix default line number information for instructions. Summary: The line number information generated from a null pointer was actually valid, which caused new instructions without the line number information set to have a valid and wrong line number reference. This diff fixes this by making the null pointer be assigned to an invalid line number row. (cherry picked from FBD3048453)	2016-03-14 11:40:52 -07:00
Gabriel Poesia	80ea31b24e	Write updated .debug_aranges section after optimizations. Summary: Write the .debug_aranges section after optimizations to the output binary. Each function generates at least one range and at most two (one extra for its cold part). The writing is done manually because LLVM's implementation is tied to the output of .debug_info (see EmitGenDwarfInfo and EmitGenDwarfARanges in lib/MC/MCDwarf.cpp), which we don't want to trigger right now. (cherry picked from FBD3043108)	2016-03-11 11:30:30 -08:00
Maksim Panchenko	e7e9e15b90	Check function data in symbol table against data in .eh_frame. Summary: At the moment we rely solely on the symbol table information to discover function boundaries. However, similar information is contained in .eh_frame. Verify that the information from these two sources is consistent, and if it's not, then skip processing the functions with conflicting information. (cherry picked from FBD3043800)	2016-03-11 11:09:34 -08:00
Maksim Panchenko	f2df1a8d97	Update stmt_list value to point to new .debug_line offset. Summary: After we add new line number information we have to update stmt_list offsets in .debug_info. For this I had to add a primitive relocations support for non-allocatable sections we are copying from input file. Also enabled functionality to process relocations in non-allocatable sections that LLVM is generating, such as .debug_line. I thought we already had it, but apparently it didn't work, at least not for ELF binaries. (cherry picked from FBD3037903)	2016-03-09 16:06:41 -08:00
Gabriel Poesia	73c9f0abe3	Write updated .debug_line information to temp file Summary: Writes .debug_line section by setting the state in MCContext that LLVM needs to produce and output the line tables. This basically consists of setting the current location and compile unit offset. This makes LLVM output .debug_line in the temporary file, but not yet in the generated ELF file. Also computes the line table offsets for each compile unit and saves them into BinaryContext. Added an option to print these offsets. (cherry picked from FBD3004554)	2016-03-02 18:40:10 -08:00
Maksim Panchenko	d68b1c7b16	Extending support for non-allocatable sections. Summary: The is a set of changes that allow modification of non-allocatable sections in ELF binary. Primarily for the purpose of updating debug info. Extend LLVM interface to allow processing relocations in non-allocatable sections. This allows to produce .debug* sections with resolved relocations against generated code. Extend BOLT rewriting framework to allow appending contents to non-allocatable sections in the binary. Re-worked ELF binary rewriting to support the above and to allow future extensions (e.g. new section names). (cherry picked from FBD3023403)	2016-03-03 10:13:11 -08:00
Gabriel Poesia	77a6b72842	BOLT: Read and tie .debug_line info to IR. Summary: Reads information in the DWARF .debug_line section using LLVM and tie every MCInst to one line of a line table from the input binary. Subsequent diffs will update this information to match the final binary layout and output updated line tables. (cherry picked from FBD2989813)	2016-02-25 16:57:07 -08:00
Maksim Panchenko	62da18d32a	Always split functions under '-split-functions=1' option. Summary: Force the splitting of the function into hot/cold even when the function fits into original slot. This reduces BOLT optimization time by 50% without affecting hhvm performance. (cherry picked from FBD2973773)	2016-02-22 16:49:26 -08:00
Maksim Panchenko	73e9afe99c	Don't abort on unknown CFI instructions. Summary: If we see an unknown CFI instruction, skip processing the function containing it instead of aborting execution. (cherry picked from FBD2964557)	2016-02-22 18:25:43 -08:00
Maksim Panchenko	7f7d4af7e0	Add an option to use PT_GNU_STACK for new segment. Summary: Added an option to reuse existing program header entry. This option allows for bfd tools like strip and objcopy to operate on the optimized binary without destroying it. Also, all new sections are now properly marked in ELF. (cherry picked from FBD2943339)	2016-02-12 19:01:53 -08:00
Maksim Panchenko	50c895ad0c	Drop requirement for __flo_storage in the input binary. Summary: We used to require pre-allocated space in the input binary so that we can write extra sections in there (.eh_frame, .eh_frame_hdr, .gcc_except_table, etc.). With this diff there's no further need for pre-allocated storage as we create a new segment and can use as much space as needed. There are certain limitations on where the new segment could be allocated, and as a result the size of the file may increase. There's currently a limitation if the binary size is close to 4GB we cannot allocate new segment prior to that and as a result we require debug info to be stripped to reduce the file size. The fix is in progress. (cherry picked from FBD2916029)	2016-02-08 10:02:48 -08:00
Maksim Panchenko	e1a61e1eed	Keep intermediate .o file only under -keep-tmp option. Summary: We use intermediate .o file for debugging purposes, but there's no reason to generate it by default. Only do it if "-keep-tmp" is specified. (cherry picked from FBD2912098)	2016-02-08 10:08:28 -08:00
Maksim Panchenko	d1526083fc	Rename binary optimizer to BOLT. Summary: BOLT - Binary Optimization and Layout Tool replaces FLO. I'm keeping .fdata extension for "feedback data". (cherry picked from FBD2908028)	2016-02-05 14:42:04 -08:00
Maksim Panchenko	628d06b1e5	Preserve layout of basic blocks with 0 profile counts. Summary: Preserve original layout for basic blocks that have 0 execution count. Since we don't optimize for size, it's better to rely on the original input order. (cherry picked from FBD2875335)	2016-01-21 14:18:30 -08:00
Maksim Panchenko	218c5f0916	Fix a bug with outlining first basic block. Summary: We should never outline the first basic block. Also add an option to accept a file with the list of functions to optimize. (cherry picked from FBD2868184)	2016-01-26 16:03:58 -08:00
Maksim Panchenko	89578e2314	Allow to partially split functions with exceptions. Summary: We could split functions with exceptions even without creating a new exception handling table. This limits us to only move basic blocks that never throw, and are not a start of a landing pad. (cherry picked from FBD2862937)	2016-01-22 16:45:39 -08:00
Maksim Panchenko	bbb745efa9	Don't create empty basic blocks. Fix CFI bug. Summary: Some basic blocks were created empty because they only contained alignment nop's. Ignore such nop's before basic block gets created. Fixed intermittent aborts related to CFI update. (cherry picked from FBD2844465)	2016-01-19 00:20:06 -08:00
Maksim Panchenko	4a44d187c6	Handle more CFI cases and some. Summary: * Update CFI state for larger range of functions to increase coverage. * Issue more warnings indicating reasons for skipping functions. * Print top called functions in the binary. (cherry picked from FBD2839734)	2016-01-16 14:58:22 -08:00
Maksim Panchenko	d9536e6092	Added an option to reverse original basic blocks order. Summary: Modified processing of "-reorder-blocks=" option and added an option to reverse original basic blocks order for testing purposes. (cherry picked from FBD2829862)	2016-01-13 17:19:40 -08:00
Maksim Panchenko	c9b7e3e09e	Write updated LSDA's. Summary: Write new exception ranges tables (LSDA's) into the output file. (cherry picked from FBD2828312)	2015-12-18 17:00:46 -08:00
Maksim Panchenko	a6efd11c05	Code/comments cleanup. Summary: Consolidate cold function info under cold FragmentInfo. Minor code and comment mods to LSDA handling. (cherry picked from FBD28109981)	2015-12-17 12:59:15 -08:00
Maksim Panchenko	f7d7a85a24	Turn EH ranges support back on. Summary: Changed the way EH info is stored/extracted from call instruction. Make sure indirect calls work. (cherry picked from FBD28109629)	2015-12-15 17:06:27 -08:00
Rafael Auler	fb6e8c5d0b	Don't touch functions whose internal BBs are targets of interprocedural branches Summary: In a test binary, we found 8 cases where code in a function A would jump to the middle of another function B. In this case, we cannot reorder function B because this would change instruction offsets and break the program. This is pretty rare but can happen in code written in assembly. (cherry picked from FBD2719850)	2015-12-03 13:29:52 -08:00
Rafael Auler	9a73a8c446	Turns off basic block alignment by default Summary: We found out that the insertion of extra nops to preserve alignment of some loop bodies do not pay off the increased function size, since this extra size may inhibit us from rewriting a reordered version of this function. (cherry picked from FBD2718466)	2015-12-03 09:45:18 -08:00

1 2

53 Commits