intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-14 03:50:17 +08:00

Author	SHA1	Message	Date
Maksim Panchenko	088e3c032a	[BOLT] Improve handling of secondary function entry points Summary: "Fix symbol table entries for secondary entries" diff broke the inliner. Fix the breakage and make the discovery of secondary entry points more accurate. Add ability to BinaryContext::getFunctionForSymbol() to return an entry point discriminator and use it instead of calling getEntryForSymbol() and isSecondaryEntry(). This is the preferred way since getFunctionForSymbol() is thread-safe. (cherry picked from FBD19295983)	2020-01-06 14:57:15 -08:00
Rafael Auler	de284bc510	[BOLT] Fix symbol table entries for secondary entries Summary: Commit "Support full instrumentation" changed the map SymbolToFunction in BinaryContext to map secondary entries of functions too. This introduced unexpected behavior in our symbol table rewriting logic, which caused it to mistakenly write them with the address of the original function. Fix the behavior of getBinaryFunctionAtAddress to correct this. Also fix other users of SymbolToFunction to ensure they are not accidentally using secondary entries when they shouldn't. (cherry picked from FBD19168319)	2019-12-18 12:14:42 -08:00
Rafael Auler	16a497c627	[BOLT] Support full instrumentation Summary: Add full instrumentation support (branches, direct and indirect calls). Add output statistics to show how many hot bytes were split from cold ones in functions. Add -cold-threshold option to allow splitting warm code (non-zero count). Add option in bolt-diff to report missing functions in profile 2. In instrumentation, fini hooks are fixed to run proper finalization code after program finishes. Hooks for startup are added to setup the runtime structures that needs initilization, such as indirect call hash tables. Add support for automatically dumping profile data every N seconds by forking a watcher process during runtime. (cherry picked from FBD17644396)	2019-12-13 17:27:03 -08:00
Maksim Panchenko	3cc4fc267b	[BOLT] Proper support for -trap-avx512 option Summary: If -trap-avx512 option is not set, verify that we correctly encode AVX-512 instructions and treat them as ordinary instructions. (cherry picked from FBD18666427)	2019-11-22 14:53:20 -08:00
Maksim Panchenko	7350d40404	[BOLT][NFC] Refactor BinaryFunction::addEntryPoint() Summary: There is no need to support existing functionality of adding entry points after the CFG is built as the function is only called in empty or disassembled state. Previously we used to run disassemble+buildCFG per function, but now these phases are decoupled. Also, remove a couple of redundant checks. (cherry picked from FBD18622822)	2019-11-11 17:02:37 -08:00
Maksim Panchenko	72b52edcbb	[BOLT] Free more memory in BinaryFunction::releaseCFG() Summary: Free more lists in BinaryFunction::releaseCFG(). Release BinaryFunction::Relocations after disassembly. Do not populate BinaryFunction::MoveRelocations as we are not using them currently. Also remove PCRelativeRelocationOffsets that weren't used. (cherry picked from FBD18413256)	2019-11-08 14:41:31 -08:00
Maksim Panchenko	d5ddb320ef	[BOLT] Free memory for CFG after emission Summary: Once we emit function code, we no longer need CFG for next phases that use basic blocks for address-translation and symbol update purposes. We free memory used by CFG and instructions. The freed memory gets reused by later phases resulting in overall memory usage reduction. We can probably improve memory consumption even further by replacing BinaryBasicBlocks with more compact data structures. (cherry picked from FBD18408954)	2019-10-31 16:54:48 -07:00
Maksim Panchenko	f2b257bec8	[BOLT] Update SDTs based on translation tables Summary: We've used to emit special annotations to update SDT markers. However, we can just use "Offset" annotations for the same purpose. Unlike BAT, we have to generate "reverse" address translation tables. This approach eliminates reliance on instructions after code emission. (cherry picked from FBD18318660)	2019-11-03 21:57:15 -08:00
Maksim Panchenko	98e63610b1	[BOLT] Create OffsetTranslationTable for basic blocks Summary: Use BinaryBasicBlock::OffsetTranslationTable for BAT. This removes dependency on instructions after the code emission. (cherry picked from FBD18283965)	2019-11-01 16:19:45 -07:00
Maksim Panchenko	8fb6512a23	[BOLT][Docs] Instructions for linking with jemalloc/tcmalloc (cherry picked from FBD18050722)	2019-10-21 15:57:36 -07:00
Maksim Panchenko	12aca4005c	[BOLT] Ignore __builtin_unreachable destination Summary: For functions with unknown control flow, do not populate TakenBranches with an entry pointing to the end of the function. (cherry picked from FBD18034019)	2019-10-20 20:46:32 -07:00
Rafael Auler	ba31344fa9	[BOLT] Fix build for Mac Summary: Change our CMake config for the standalone runtime instrumentation library to check for the elf.h header before using it, so the build doesn't break on systems lacking it. Also fix a SmallPtrSet usage where its elements are not really pointers, but uint64_t, breaking the build in Apple's Clang. (cherry picked from FBD17505759)	2019-09-20 11:29:35 -07:00
Rafael Auler	cc4b2fb614	[BOLT] Efficient edge profiling in instrumented mode Summary: Change our edge profiling technique when using instrumentation to do not instrument every edge. Instead, build the spanning tree for the CFG and omit instrumentation for edges in the spanning tree. Infer the edge count for these edges when writing the profile during run time. The inference works with a bottom-up traversal of the spanning tree and establishes the value of the edge connecting to the parent based on a simple flow equation involving output and input edges, where the only unknown variable is the parent edge. This requires some engineering in the runtime lib to support dynamic allocation for building these graphs at runtime. (cherry picked from FBD17062773)	2019-08-07 16:09:50 -07:00
Maksim Panchenko	f588d7a6ea	[BOLT] Tighter control of jump table detection Summary: We were too permissive by allowing more jump tables during the preliminary scan of memory. This allowed for jump tables to be falsely detected. And since we didn't have a way to backtrack the jump table creation, we had to assert. This diff refactors the code that analyzes jump table contents. Preliminary and final passes share the same code. The only difference should be the detection of instruction boundaries that are available during the final pass. This should affect strict relocation mode only. (cherry picked from FBD16923335)	2019-08-19 14:06:36 -07:00
Maksim Panchenko	8d5854ef09	[BOLT] Add option to verify instruction encoder/decoder Summary: Add option `-check-encoding` to verify if the input to LLVM disassembler matches the output of the assembler. When set, the verification runs on every instruction in processed functions. I'm not enabling the option by default as it could be quite noisy on x86 where instruction encoding is ambiguous and can include redundant prefixes. (cherry picked from FBD16595415)	2019-07-31 16:03:49 -07:00
Maksim Panchenko	a9b9aa1e02	[BOLT] Add code padding verification Summary: In non-relocation mode, we allow data objects to be embedded in the code. Such objects could be unmarked, and could occupy an area between functions, the area which is considered to be code padding. When we disassemble code, we detect references into the padding area and adjust it, so that it is not overwritten during the code emission. We assume the reference to be pointing to the beginning of the object. However, assembly-written functions may reference the middle of an object and use negative offsets to reference data fields. Thus, conservatively, we reduce the possibly-overwritten padding area to a minimum if the object reference was detected. Since we also allow functions with unknown code in non-relocation mode, it is possible that we miss references to some objects in code. To cover such cases, we need to verify the padding area before we allow to overwrite it. (cherry picked from FBD16477787)	2019-07-23 20:48:41 -07:00
laith sakka	fde5a2b470	Run shrink wrapping in parallel Summary: Shrink wrapping is an expensive part of frame optimizations if performed on all functions. This diff makes it run in parallel, reducing wall time. (cherry picked from FBD16092651)	2019-07-02 10:48:43 -07:00
laith sakka	7d42835418	Run buildCFG in disassembly in parallel Summary: This diff parallelize the construction of call graph during disassembly. The diff includes a change to parallel-utilities where another interface is added, that support running tasks on binaryFunctions that involves adding instruction annotations. This pattern is common in different places, e.g. frame optimizations. And such, pattern justify creating an interface, that abstract out all the messy details. (cherry picked from FBD16232809)	2019-07-12 07:25:50 -07:00
laith sakka	f4ab6e6924	run finalize functions in parallel Summary: (cherry picked from FBD16188733)	2019-07-10 10:59:56 -07:00
laith sakka	9977b03fea	Run reorder blocks in parallel Summary: This diff change reorderBasicBlocks pass to run in parallel, it does so by adding locks to the fix branches function, and creating temporary MCCodeEmitters when estimating basic block code size. (cherry picked from FBD16161149)	2019-07-08 12:32:58 -07:00
Rafael Auler	1169f1fdd8	[BOLT] Support duplicating jump tables Summary: If two indirect branches use the same jump table, we need to detect this and duplicate dump tables so we can modify this CFG correctly. This is necessary for instrumentation and shrink wrapping. For the latter, we only detect this and bail, fixing this old known issue with shrink wrapping. Other minor changes to support better instrumentation: add an option to instrument only hot functions, add LOCK prefix to instrumentation increment instruction, speed up splitting critical edges by avoiding calling recomputeLandingPads() unnecessarily. (cherry picked from FBD16101312)	2019-07-02 16:56:41 -07:00
Maksim Panchenko	e89ad0db4b	[BOLT] Introduce strict relocation mode Summary: In strict relocation mode we rely on relocations to represent all possible entry points into a function. Most of the code generated by tested compilers (gcc and clang) will result in relocations against any internal labels for jump tables and for computed goto tables. In situations where we cannot properly reconstruct a jump table, or when we cannot determine a table that guides an indirect jump, e.g. when multiple computed goto tables are used, we conservatively assume that the indirect jump can end up at any possible basic block referenced by relocations. In strict mode, simple functions may include the aforementioned instructions with unknown control flow with a conservative list of destinations added to the containing basic block. This allows us to expand coverage of simple functions and to enable code reordering optimizations for more functions. The strict mode is recommended when BOLT is used with a well-formed code generated by a compiler. To use the strict mode, add "-strict" on the command line. Another effect of this diff, is that with relocations, we will always replace the immediate operand of an instruction with a symbol if the relocation exists against this operand. Also this diff fixes issues with Clang compiled with -fpic. (cherry picked from FBD15872849)	2019-06-28 09:21:27 -07:00
Rafael Auler	0d23cbaa52	[BOLT] Initial experimental instrumentation pass Summary: An instrumentation pass that modifies the input binary to generate a profile after execution finishes. It modifies branches to increment counters stored in the process memory and injects a new function that dumps this data to an fdata file, readable by BOLT. This instrumentation is experimental and currently uses a naive approach where every branch is instrumented. This is not ideal for runtime performance, but should be good enough for us to evaluate/debug LBR profile quality against instrumentation. Does not support instrumenting indirect calls yet, only direct calls, direct branches and indirect local branches. (cherry picked from FBD15998096)	2019-06-19 20:10:49 -07:00
Rafael Auler	db02a1a142	[BOLT] Ignore empty funcs in relocation mode Summary: Make BOLT ignore empty functions (those containing no instructions, despite having some space allocated to it filled with zeroes). (cherry picked from FBD15981683)	2019-06-24 20:23:22 -07:00
Maksim Panchenko	9894de0094	[BOLT] Check instruction boundaries while populating jump tables Summary: Now that we populate jump tables after all functions are disassembled, we can check for instruction boundaries corresponding to jump table entries. No need to delegate this task to postProcessJumpTables(). (cherry picked from FBD15814762)	2019-06-13 15:31:30 -07:00
Maksim Panchenko	9e2ad3f593	[BOLT] Delay populating jump tables Summary: During the initial disassembly pass, only identify jump tables without populating the contents. Later, after all functions have been disassembled, we have a better idea of jump table boundaries and can do a better job of populating their entries. As a result, we no longer have embedded jump tables (i.e. a jump table that is parter of another jump table). If we ever need to keep sequential jump tables inseparable during the output, we can always add such functionality later. Fixes facebookincubator/BOLT#56. (cherry picked from FBD15800427)	2019-06-12 18:21:02 -07:00
Maksim Panchenko	fac6a89c23	[BOLT] Better handling of address references Summary: We used to handle PC-relative address references differently from direct address references. As a result, some cases, such as escaped function label address, were not handled when dealing with absolute (non-PIC) code. This diff moves processing of an address reference into BinaryContext::handleAddressRef() which is called for both PIC and non-PIC code. (cherry picked from FBD15643535)	2019-06-04 15:30:22 -07:00
Rafael Auler	21f4303bfd	Support data collection in bolted binaries Summary: Similarly to how the compiler relies on DWARF to map samples, so it is possible to collect profile data in binaries optimized by PGO techniques and retrofit data to be used in a representation of the program that was not optimized by PGO, this diff implements an option in BOLT to encode a table in the output binary that allows us to map data collected in optimized binaries back to the address space used in the input binary (where the profile is useful, since we do not support running BOLT on a binary already optimized by BOLT). The goal is to offer an option to support BOLT in scenarios where it is not easy to run a special deployment of the binary with a version that was not optimized by BOLT just for data collection. This feature is enabled with the -enable-bat flag. BAT stands for BOLT Address Translation, which refers to the process of mapping output to input addresses. (cherry picked from FBD15531860)	2019-04-12 17:33:46 -07:00
Laith Sakka	3df2c9ea1f	Update SDT locations after bolt reordering Summary: Update SDT locations in .note section to match the new location after bolt reorder the code. (cherry picked from FBD15427779)	2019-05-17 07:58:27 -07:00
Maksim Panchenko	9ef9a7b1be	[BOLT] Use regex matching for function names passed on command line Summary: Options such as `-print-only`, `-skip-funcs`, etc. now take regular expressions. Internally, the option is converted to '^funcname$' form prior to regex matching. This ensures that names without special symbols will match exactly, i.e. "foo" will not match "foo123". (cherry picked from FBD15551930)	2019-05-29 18:33:09 -07:00
Maksim Panchenko	e5b1d9cd8c	[BOLT][NFC] Fix white space (cherry picked from FBD15485688)	2019-05-23 15:49:36 -07:00
Maksim Panchenko	f57d3c00fc	[BOLT] Better verification of jump tables Summary: Run analyzeIndirectBranch() using basic block boundaries instead of running ad-hoc validation of the jump table assumptions. (cherry picked from FBD15465034)	2019-05-22 18:14:34 -07:00
Maksim Panchenko	be344c8de7	[BOLT] Refactor handling of interproc refs Summary: Move handling of interprocedural references to BinaryContext. Post-process indirect branches immediately after the CFG is built. This is almost NFC. Since indirect branches are now post-processed before the profile data is processed it interferes with the way the profile data in YAML format is handled. (cherry picked from FBD15456003)	2019-05-22 11:26:58 -07:00
Laith Saed Sakka	ca659e4336	Preserve nops that are SDT markers in binaries and disable SDT conflicting optimizations Summary: SDT markers that appears as nops in the assembly, are preserved and not eliminated. Functions with SDT markers are also flagged. Inlining and folding are disabled for functions that have SDT markers. (cherry picked from FBD15379799)	2019-05-16 12:46:32 -07:00
Maksim Panchenko	fee61231ef	[BOLT] Move JumpTable management to BinaryContext Summary: Make BinaryContext responsible for creation and management of JumpTables. This will be used for detection and resolution of jump table conflicts across functions. (cherry picked from FBD15196017)	2019-05-02 17:42:06 -07:00
Maksim Panchenko	310b32fbe5	[BOLT] Limit jump table size by containing object Summary: While checking for a size of a jump table, we've used containing section as a boundary. This worked for most cases as typically jump tables are not marked with symbol table entries. However, the compiler may generate objects for indirect goto's. (cherry picked from FBD15158905)	2019-04-30 15:47:10 -07:00
Maksim Panchenko	f1dfd38dec	[BOLT][NFC] Move DynoStats out of BinaryFunction Summary: Move DynoStats into separate source files. (cherry picked from FBD15138883)	2019-04-29 12:51:10 -07:00
Maksim Panchenko	99ef4c90c1	[BOLT] Basic support for split functions Summary: This adds very basic and limited support for split functions. In non-relocation mode, split functions are ignored, while their debug info is properly updated. No support in the relocation mode yet. Split functions consist of a main body and one or more fragments. For fragments, the main part is called their parent. Any fragment could only be entered via its parent or another fragment. The short-term goal is to correctly update debug information for split functions, while the long-term goal is to have a complete support including full optimization. Note that if we don't detect split bodies, we would have to add multiple entry points via tail calls, which we would rather avoid. Parent functions and fragments are represented by a `BinaryFunction` and are marked accordingly. For now they are marked as non-simple, and thus only supported in non-relocation mode. Once we start building a CFG, it should be a common graph (i.e. the one that includes all fragments) in the parent function. The function discovery is unchanged, except for the detection of `\.cold\.` pattern in the function name, which automatically marks the function as a fragment of another function. Because of the local function name ambiguity, we cannot rely on the function name to establish child fragment and parent relationship. Instead we rely on disassembly processing. `BinaryContext::getBinaryFunctionContainingAddress()` now returns a parent function if an address from its fragment is passed. There's no jump table support at the moment. Jump tables can have source and destinations in both fragment and parent. Parent functions that enter their fragments via C++ exception handling mechanism are not yet supported. (cherry picked from FBD14970569)	2019-04-16 10:24:34 -07:00
Rafael Auler	31fc56b313	[BOLT] Fix adjustFunctionBoundaries w.r.t. entry points Summary: Don't consider symbols in another section when processing additional entry points for a function. (cherry picked from FBD14962853)	2019-04-16 14:35:29 -07:00
Maksim Panchenko	8f98268518	[BOLT] Reduce warnings for non-simple functions Summary: If a function was already marked as non-simple, there's no reason to issue a warning that it has a reference in the middle of an instruction. Besides, sometimes there wouldn't be instructions disassembled at a given entry, and the warning would be incorrect. (cherry picked from FBD14938227)	2019-04-15 11:56:55 -07:00
Maksim Panchenko	315ae74de3	[BOLT] Include <numeric> for std::iota Summary: Some compilers require <numeric> header. (cherry picked from FBD14868132)	2019-04-09 21:22:41 -07:00
Maksim Panchenko	88375d311e	[BOLT] Sort basic block successors for printing Summary: For easier analysis of the hottest targets of jump tables it helps to have basic block successors sorted based on the taken frequency. (cherry picked from FBD14856640)	2019-04-09 11:27:23 -07:00
Maksim Panchenko	c8a927696c	[BOLT] Detect internal references into a middle of instruction Summary: Some instructions in assembly-written functions could reference 8-byte constants from another instructions using 4-byte offsets, presumably to save a couple of bytes. Detect such cases, and skip processing such functions until we teach BOLT how to handle references into a middle of instruction. (cherry picked from FBD14768212)	2019-04-03 22:31:12 -07:00
Maksim Panchenko	8894853f42	[BOLT][DWARF] Dedup .debug_abbrev section patches Summary: When we patch .debug_abbrev we issue many duplicate patches. Instead of storing these patches as a vector, use a hash map. This saves some processing time and memory. (cherry picked from FBD14691292)	2019-03-29 14:22:54 -07:00
Maksim Panchenko	297d1a4e1a	[BOLT] Do not write jump table section headers Summary: In non-relocation mode we were accidentally emitting section headers for every single jump table. This happened with default `-jump-tables=basic`. (cherry picked from FBD14653282)	2019-03-27 13:58:31 -07:00
Maksim Panchenko	17cd2034f3	[BOLT] Fix debug line info emission Summary: GDB does not like if the first entry in the line info table after end_sequence entry is not marked with is_stmt. If this happens, it will not print the correct line number information for such address. Note that everything works fine starting with the first address marked with is_stmt. This could happen if the first instruction in the cold section wasn't marked with is_stmt. The fix is to always emit debug line info for the first instruction in any function fragment with is_stmt flag. (cherry picked from FBD14516629)	2019-03-18 19:22:26 -07:00
Rafael Auler	c593563d1f	Do not assert on addresses read from processIndirectBranch Summary: As part of our heuristics to decode an indirect branch, if we suspect the branch is an indirect tail call, we add its probable target to the BC::InterproceduralReferences vector to detect functions with more than one entry point. However, if this probable target is not in an allocatable section, we were asserting. Remove this assertion and change the code to conditionally store to InterproceduralReferences instead. The probable target could be garbage at this point because of analyzeIndirectBranch failing to identify the load instruction that has the memory address of the target, so we should tolerate this. (cherry picked from FBD14432821)	2019-03-12 16:36:35 -07:00
Maksim Panchenko	ff6e21290f	[BOLT] New inliner implementation Summary: Addresses correctness issues related to inlining. Inlining heuristics are not part of this diff. (cherry picked from FBD13796888)	2019-01-31 11:23:02 -08:00
Maksim Panchenko	365bd1f1c8	[BOLT] For non-simple functions always update jump tables in-place Summary: For non-simple function we can miss a reference to a jump table or to an indirect goto table. If we move the jump table, the missed reference will not get updated, and the corresponding indirect jump will end up in the old (wrong) location. Updating the original jump table in-place should take care of the issue. (cherry picked from FBD13849776)	2019-01-28 13:46:18 -08:00
Maksim Panchenko	b0f7fddd35	[BOLT] Add method for better function size estimation Summary: Add BinaryContext::calculateEmittedSize() that ephemerally emits code to allow precise estimation of the function size. Relaxation and macro-op alignment adjustments are taken into account. (cherry picked from FBD13092139)	2018-11-15 16:02:16 -08:00

1 2

78 Commits