intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-14 03:50:17 +08:00

Author	SHA1	Message	Date
Amir Ayupov	c36b71686c	Improve cold fragment name matching Summary: Fix cold fragment name matching regex by replacing existing regexes `.\.cold\..` and `.\.cold` and combining them into `.\.cold(\.\d)?`, applied to restored name (with BOLT-added suffixes stripped) This allows matching names like "execute_stack_op.cold/1", which previously weren't recognized. (cherry picked from FBD24804880)	2020-11-09 12:38:51 -08:00
Amir Ayupov	f86a78a4e7	Lost in rebase: call registerFragment with a reference to TargetBF Summary: Fixes broken build due to a lost dereferencing (cherry picked from FBD24799948)	2020-11-06 12:22:22 -08:00
Amir Ayupov	2b09d672ce	Conservatively handle jump tables in split functions Summary: - Allow jump table entries to point to locations inside the function and its fragments. Reasoning behind this is that jump table identification has the logic of stopping at entry which belongs to a function different from the one originally referencing jump table. This assumption is invalid for jump tables with entries pointing to both parent function and cold fragments, leading to "unclaimed PC-relative relocations" assertion. - Add fragment identification heuristic based on function name regex and contiguous jump table entries. Currently, parent-to-fragment relationship is set up based on interprocedural references – direct references from the parent function. These references don't include references through jump table. Additionally, some fragments are only reachable through jump table. In that case, in order to fully consume jump table, add parent-to-fragment relationship during `analyzeJumpTable` using the following heuristics: 1. Fragment is identified as such based on name (contains `.cold.` part), but 2. Parent function is not set – no direct interprocedural references to that fragment, and 3. Fragment has the name of the form <parent>.cold(.\d+) * For split functions with jump table entries spanning parent and fragments, mark parent and all fragments as ignored. (cherry picked from FBD24456904)	2020-11-06 11:19:03 -08:00
Amir Ayupov	dc48354f71	processInterproceduralReferences: record references to cold fragments as entry points Summary: For interprocedural references to fragments, record them as fragment entry points. Not registering these entry points leads to UCE removing the blocks and "Undefined temporary symbol" assertion. (cherry picked from FBD24511281)	2020-11-06 10:57:47 -08:00
Amir Ayupov	5452287710	Extract BinaryContext::registerFragment Summary: registerFragment to be reused in adding fragments reachable only through jump tables. (cherry picked from FBD24656651)	2020-11-06 10:27:33 -08:00
Maksim Panchenko	4f4239ceba	[BOLT] Fix C++ exceptions for shared objects Summary: Fix several issues to make C++ exceptions work in shared objects: * Set MCObjectFileInfo PIC type based on the input binary type. * Support indirect (DW_EH_PE_indirect) encoding while writing exception Type Table. * Use different LPStart value and landing pad encoding for .so's. * Disable splitting of exception-handling code for .so's because of the new encoding. (cherry picked from FBD24698765)	2020-11-04 11:44:02 -08:00
Rafael Auler	37921b489a	[BOLT] Please sanitizers Summary: In BinaryContext, we had StringRef holding a reference to an r-value std::string. This triggers clang's address sanitizer warnings. In MCPlusBuilder we had a left shift overflowing a type, which is undefined behavior. Similarly, in CallGraph, we had a hash function shifting a negative value, which is also UB. The last two triggers the UB sanitizer. (cherry picked from FBD24661045)	2020-10-30 15:11:52 -07:00
Maksim Panchenko	53bd88c7fe	[BOLT] Refactor reading of debug line info Summary: Match BinaryFunction to a DWARFUnit based on the unit's address ranges skipping the parsing of DIEs. (cherry picked from FBD24269325)	2020-10-12 21:04:42 -07:00
Maksim Panchenko	0465d952cc	[BOLT] Refactor PatchEntries pass Summary: Use injected functions with fixed addresses to patch original function entries. (cherry picked from FBD24133890)	2020-10-09 16:06:27 -07:00
Maksim Panchenko	a82cff0f52	[BOLT] Eliminate "shallow" function lookup Summary: Whenever we search for a function based on its address in the input binary, we now always return a corresponding fragment for split functions. If the user needs an access to the main fragment, they can call getTopmostFragment(). (cherry picked from FBD23670311)	2020-09-14 15:48:32 -07:00
Maksim Panchenko	62469b5036	[BOLT] Do no map sections with zero address Summary: Sections that do not originate from the input binary will have an input address set to zero and thus do not have to be mapped. Mapping such sections caused a build time regression in non-relocation mode. (cherry picked from FBD23670334)	2020-09-14 14:31:50 -07:00
Amir Ayupov	8f7cb54ae5	Added execution count threshold option Summary: Added execution count threshold option (execution-count-threshold) controlling the optimizations that are sensitive to the accuracy of the profiling data: - BB reordering - function splitting - frame opts - shrink wrapping - indirect call promotion (cherry picked from FBD22682171)	2020-07-27 18:07:18 -07:00
Maksim Panchenko	3e795c8a5f	[BOLT] Ignore addresses from non-allocatable sections Summary: We've accidentally registered TBSS section address with a BinaryContext resulting in addresses being attributed to it when getSectionForAddress() was called. (cherry picked from FBD22369323)	2020-07-06 14:39:44 -07:00
Maksim Panchenko	4aaa8892dd	[BOLT] Ignore duplicate relocations Summary: lld linker may emit static relocations against addresses that also have dynamic relocations associated with them. When this happens, BOLT fails to validate the extracted value at the address. Read dynamic relocations in the binary and ignore static relocations at addresses that have a duplicate dynamic relocation. (cherry picked from FBD22192345)	2020-06-23 12:22:58 -07:00
Maksim Panchenko	db4642d0a6	[BOLT] Support -hot-text in lite mode Summary: Update special symbol references in functions that are not emitted. (cherry picked from FBD22120995)	2020-06-18 11:10:41 -07:00
Maksim Panchenko	0ce0bce9e7	[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)	2020-06-15 00:15:47 -07:00
Alexander Shaposhnikov	0823882d47	Link functions on MachO Summary: Add first bits for linking functions on MachO. (cherry picked from FBD21991721)	2020-06-12 20:16:27 -07:00
Maksim Panchenko	8729171182	[BOLT] Refactor profile-handling code Summary: This diff handles several issues related to profile reading and handling: * Unifies interface used by 3 profile readers in ProfileReaderBase. * Adds automatic detection of the profile file contents. * Removes reader-specific fields from BinaryFunction and BinaryData. All the information is stored in instruction annotations. * Removes implicit memory dependencies in annotations on profile reader instance. * Adds lite mode support to YAML reader. * Moves profile reading code out of BinaryFunction. (cherry picked from FBD21601411)	2020-05-07 23:00:29 -07:00
Maksim Panchenko	b62a1774af	[BOLT] Cover PIC jump table reference in non-strict mode Summary: In non-strict relocation mode it was possible to miss a jump table reference leading to incorrect code. (cherry picked from FBD21251467)	2020-04-26 17:51:07 -07:00
Maksim Panchenko	ac36e17a73	[BOLT][BFC] Refactor code for adding secondary function entries Summary: In non-relocation mode, the code for marking a function non-simple was decoupled from the code that added new entry points. Fix that. (cherry picked from FBD21264247)	2020-04-27 13:40:53 -07:00
Maksim Panchenko	5296b6d12a	[BOLT] Change symbol handling for secondary function entries Summary: Some functions could be called at an address inside their function body. Typically, these functions are written in assembly as C/C++ does not have a multi-entry function concept. The addresses inside a function body that could be referenced from outside are called secondary entry points. In BOLT we support processing functions with secondary/multiple entry points. We used to mark basic blocks representing those entry points with a special flag. There was only one problem - each basic block has exactly one MCSymbol associated with it, and for the most efficient processing we prefer that symbol to be local/temporary. However, in certain scenarios, e.g. when running in non-relocation mode, we need the entry symbol to be global/non-temporary. We could create global symbols for secondary points ahead of time when the entry point is marked in the symbol table. But not all such entries are properly marked. This means that potentially we could discover an entry point only after disassembling the code that references it, and it could happen after a local label was already created at the same location together with all its references. Replacing the local symbol and updating the references turned out to be an error-prone process. This diff takes a different approach. All basic blocks are created with permanently local symbols. Whenever there's a need to add a secondary entry point, we create an extra global symbol or use an existing one at that location. Containing BinaryFunction maps a local symbol of a basic block to the global symbol representing a secondary entry point. This way we can tell if the basic block is a secondary entry point, and we emit both symbols for all secondary entry points. Since secondary entry points are quite rare, the overhead of this approach is minimal. Note that the same location could be referenced via local symbol from inside a function and via global entry point symbol from outside. This is true for both primary and secondary entry points. (cherry picked from FBD21150193)	2020-04-19 22:29:54 -07:00
Maksim Panchenko	abda7dc6a7	[BOLT] Fix ICF non-determinism in non-relocation mode Summary: ICF may fold functions in arbitrary order when running multi-threaded. This is fine in relocation mode as we end up with just one function holding all function symbols. However, in non-relocation mode we keep all function bodies, and if we keep merging profiles in non-deterministic order, we end up with functions with non deterministic profiles. The fix for non-relocation mode is to not merge profiles as the factual new profile could be different from the merged one since both function instances are potentially callable. Additionally, emit extra symbols for ICF functions in non-relocation mode to make it possible to track the folding. (cherry picked from FBD20889866)	2020-04-04 20:12:38 -07:00
Maksim Panchenko	1f3e351a9c	[BOLT] Refactor code and data emission code Summary: Consolidate code and data emission code in ELF-independent BinaryEmitter. The high-level interface includes only two functions emitBinaryContext() and emitFunctionBody() used by RewriteInstance and BinaryContext respectively. (cherry picked from FBD20332901)	2020-03-06 15:06:37 -08:00
Maksim Panchenko	cb9c991dcb	[BOLT] Remove allow-section-relocations option Summary: The option is not used. Remove all related code. (cherry picked from FBD20237859)	2020-03-03 15:51:24 -08:00
Maksim Panchenko	c7e012e145	[BOLT][NFC] Get rid of BestFit parameter Summary: The parameter is no longer used. (cherry picked from FBD20236516)	2020-03-03 14:28:42 -08:00
Alexander Shaposhnikov	b0cbb60165	[BOLT] Fix begin decrementing Summary: Fix begin decrementing. (cherry picked from FBD20232474)	2020-03-03 13:36:32 -08:00
Rafael Auler	a9d85413ac	[BOLT] Emit long nops by default Summary: Change our X86 target to use long nops by default. In general, BOLT does not put nops into the instruction stream that is going to be executed, since it doesn't align basic blocks, only functions. Since we rebased BOLT, our relationship with MCAssembler changed because it stopped using multibyte nops and we never needed to revisit that. But it makes a difference if we want to mitigate perf issues with the Intel JCC erratum, since the nops inserted are going to be decoded and executed. To make MCAssembler emit long nops again, we need to explictly set mattr (Features) of the X86 target. (cherry picked from FBD19987277)	2020-02-19 16:13:58 -08:00
Maksim Panchenko	9711286858	[BOLT] Get rid of BinarySection::IsLocal Summary: The flag is no longer used/needed. (cherry picked from FBD19951571)	2020-02-18 09:20:17 -08:00
Alexander Shaposhnikov	c3c4b15a2e	[BOLT] Remove BinaryContext::getFunctionData Summary: In this diff we refactor the code around getting the original binary encoding of function's body. The main changes are: remove BinaryContext::getFunctionData, remove the parameter of the method BinaryFunction::disassemble, introduce BinaryFunction::getData. (cherry picked from FBD19824368)	2020-02-10 15:35:11 -08:00
Maksim Panchenko	d57513e4ab	[BOLT] Fix symbol table issue with ICF Summary: Not all symbol table entries were updated after ICF. (cherry picked from FBD19319685)	2020-01-08 13:32:59 -08:00
Maksim Panchenko	ac697b7d3a	[BOLT] Replace list of Names with Symbols for BinaryFunction Summary: BinaryFunction used to have a list of Names associated with its main entry point. However, the function is primarily identified by its corresponding symbol or symbols, and these symbols are available as we are creating them for a corresponding BinaryData object. There's also no reason to emit symbols for alternative function names (aliases), so change the code to only emit needed symbols. When we emit a cold fragment for a function, only emit one cold symbol for the fragment instead of one per every main entry symbol/name. When we match a symbol to an entry point in the function, with this change we can first go through the list of main entry symbols (now that they are available). (cherry picked from FBD19426709)	2020-01-13 11:56:59 -08:00
Alexander Shaposhnikov	7a59783d7a	[BOLT] Move createBinaryContext to BinaryContext Summary: 1. Move createBinaryContext to BinaryContext. 1. Add support for nonlinux triples in createBinaryContext. 2. Remove unnecessary std::move in DWARFRewriter.cpp. (cherry picked from FBD19421314)	2020-01-15 15:23:45 -08:00
Maksim Panchenko	0283271f29	[BOLT] Do no report error on mismatched instruction encoding Summary: When the validation of instruction encoding fails but we are able to continue processing the binary, do no report an error. Report encoding format only under `-v=1`. (cherry picked from FBD19376531)	2020-01-13 11:24:10 -08:00
Maksim Panchenko	45b27d7b44	[BOLT] Get rid of Names in BinaryData Summary: For BinaryData, we used to maintain a vector of StringRef names and also a vector of pointers to MCSymbol's associated with the data. There was an unnecessary duplication of information and an associated overhead of keeping it in sync. Fix it by removing Names and using Symbols wherever Names were used. Also merge two variants of registerNameAtAddress() and remove unreachable/dead code in the process. (cherry picked from FBD19359123)	2020-01-10 16:17:47 -08:00
Maksim Panchenko	088e3c032a	[BOLT] Improve handling of secondary function entry points Summary: "Fix symbol table entries for secondary entries" diff broke the inliner. Fix the breakage and make the discovery of secondary entry points more accurate. Add ability to BinaryContext::getFunctionForSymbol() to return an entry point discriminator and use it instead of calling getEntryForSymbol() and isSecondaryEntry(). This is the preferred way since getFunctionForSymbol() is thread-safe. (cherry picked from FBD19295983)	2020-01-06 14:57:15 -08:00
Rafael Auler	de284bc510	[BOLT] Fix symbol table entries for secondary entries Summary: Commit "Support full instrumentation" changed the map SymbolToFunction in BinaryContext to map secondary entries of functions too. This introduced unexpected behavior in our symbol table rewriting logic, which caused it to mistakenly write them with the address of the original function. Fix the behavior of getBinaryFunctionAtAddress to correct this. Also fix other users of SymbolToFunction to ensure they are not accidentally using secondary entries when they shouldn't. (cherry picked from FBD19168319)	2019-12-18 12:14:42 -08:00
Maksim Panchenko	3cc4fc267b	[BOLT] Proper support for -trap-avx512 option Summary: If -trap-avx512 option is not set, verify that we correctly encode AVX-512 instructions and treat them as ordinary instructions. (cherry picked from FBD18666427)	2019-11-22 14:53:20 -08:00
Maksim Panchenko	a09659fd54	[BOLT] Refactor markAmbiguousRelocations() Summary: Refactor markAmbiguousRelocations() code and move it to BinaryContext. Also remove a redundant check. (cherry picked from FBD18623815)	2019-11-18 14:08:17 -08:00
Maksim Panchenko	658f270417	[BOLT] Refactor data PC relocations in BinaryContext Summary: We only use locations of PC relocations and ignore the rest of the data. There's no need to store type and value. (cherry picked from FBD18623280)	2019-11-19 18:52:08 -08:00
Maksim Panchenko	6796b7216b	[BOLT] Fix jump table analysis for non-simple functions Summary: When we disassemble functions, we add discovered jump tables to a global container in BinaryContext. Later, we analyze and verify all jump tables. However, analysis for non-simple functions might fail for numerous reasons, e.g. there would be no instruction at a destination. Since we are not overwriting non-simple functions, it is not a critical error. Thus, we can safely skip jump table analysis for non-simple functions. (cherry picked from FBD18422997)	2019-11-10 21:09:01 -08:00
Maksim Panchenko	d5ddb320ef	[BOLT] Free memory for CFG after emission Summary: Once we emit function code, we no longer need CFG for next phases that use basic blocks for address-translation and symbol update purposes. We free memory used by CFG and instructions. The freed memory gets reused by later phases resulting in overall memory usage reduction. We can probably improve memory consumption even further by replacing BinaryBasicBlocks with more compact data structures. (cherry picked from FBD18408954)	2019-10-31 16:54:48 -07:00
Maksim Panchenko	103b0a77cc	[BOLT] Fix non-determinism while reading debug info Summary: When reading debug info in parallel, CUs for functions were populated in parallel and the order was non-deterministic. We used the first CU from the non-deterministically-ordered list to set the line number resulting in different outputs. The fix is to sort the list after it's been created and before assigning the line table unit. (cherry picked from FBD17946889)	2019-10-14 17:57:36 -07:00
Maksim Panchenko	e9c6c73bb8	[BOLT][non-reloc] Change function splitting in non-relocation mode Summary: This diff applies to non-relocation mode mostly. In this mode, we are limited by original function boundaries, i.e. if a function becomes larger after optimizations (e.g. because of the newly introduced branches) then we might not be able to write the optimized version, unless we split the function. At the same time, we do not benefit from function splitting as we do in the relocation mode since we are not moving functions/fragments, and the hot code does not become more compact. For the reasons described above, we used to execute multiple re-write attempts to optimize the binary and we would only split functions that were too large to fit into their original space. After the first attempt, we would know functions that did not fit into their original space. Then we would re-run all our passes again feeding back the function information and forcefully splitting such functions. Some functions still wouldn't fit even after the splitting (mostly because of the branch relaxation for conditional tail calls that does not happen in non-relocation mode). Yet we have emitted debug info as if they were successfully overwritten. That's why we had one more stage to write the functions again, marking failed-to-emit functions non-simple. Sadly, there was a bug in the way 2nd and 3rd attempts interacted, and we were not splitting the functions correctly and as a result we were emitting less optimized code. One of the reasons we had the multi-pass rewrite scheme in place, was that we did not have an ability to precisely estimate the code size before the actual code emission. Recently, BinaryContext obtained such functionality, and now we can use it instead of relying on the multi-pass rewrite. This eliminates redundant work of re-running the same function passes multiple times. Because function splitting runs before a number of optimization passes that run on post-CFG state (those rely on the splitting pass), we cannot estimate the non-split code size with 100% accuracy. However, it is good enough for over 99% of the cases to extract most of the performance gains for the binary. As a result of eliminating the multi-pass rewrite, the processing time in non-relocation mode with `-split-functions=2` is greatly reduced. With debug info update, it is less than half of what it used to be. New semantics for `-split-functions=<n>`: -split-functions - split functions into hot and cold regions =0 - do not split any function =1 - in non-relocation mode only split functions too large to fit into original code space =2 - same as 1 (backwards compatibility) =3 - split all functions (cherry picked from FBD17362607)	2019-09-11 15:42:22 -07:00
Rafael Auler	cc4b2fb614	[BOLT] Efficient edge profiling in instrumented mode Summary: Change our edge profiling technique when using instrumentation to do not instrument every edge. Instead, build the spanning tree for the CFG and omit instrumentation for edges in the spanning tree. Infer the edge count for these edges when writing the profile during run time. The inference works with a bottom-up traversal of the spanning tree and establishes the value of the edge connecting to the parent based on a simple flow equation involving output and input edges, where the only unknown variable is the parent edge. This requires some engineering in the runtime lib to support dynamic allocation for building these graphs at runtime. (cherry picked from FBD17062773)	2019-08-07 16:09:50 -07:00
Maksim Panchenko	f588d7a6ea	[BOLT] Tighter control of jump table detection Summary: We were too permissive by allowing more jump tables during the preliminary scan of memory. This allowed for jump tables to be falsely detected. And since we didn't have a way to backtrack the jump table creation, we had to assert. This diff refactors the code that analyzes jump table contents. Preliminary and final passes share the same code. The only difference should be the detection of instruction boundaries that are available during the final pass. This should affect strict relocation mode only. (cherry picked from FBD16923335)	2019-08-19 14:06:36 -07:00
Maksim Panchenko	8d5854ef09	[BOLT] Add option to verify instruction encoder/decoder Summary: Add option `-check-encoding` to verify if the input to LLVM disassembler matches the output of the assembler. When set, the verification runs on every instruction in processed functions. I'm not enabling the option by default as it could be quite noisy on x86 where instruction encoding is ambiguous and can include redundant prefixes. (cherry picked from FBD16595415)	2019-07-31 16:03:49 -07:00
Maksim Panchenko	a9b9aa1e02	[BOLT] Add code padding verification Summary: In non-relocation mode, we allow data objects to be embedded in the code. Such objects could be unmarked, and could occupy an area between functions, the area which is considered to be code padding. When we disassemble code, we detect references into the padding area and adjust it, so that it is not overwritten during the code emission. We assume the reference to be pointing to the beginning of the object. However, assembly-written functions may reference the middle of an object and use negative offsets to reference data fields. Thus, conservatively, we reduce the possibly-overwritten padding area to a minimum if the object reference was detected. Since we also allow functions with unknown code in non-relocation mode, it is possible that we miss references to some objects in code. To cover such cases, we need to verify the padding area before we allow to overwrite it. (cherry picked from FBD16477787)	2019-07-23 20:48:41 -07:00
laith sakka	744a2417dd	Run findSubprograms in preprocessDebugInfo in parallel Summary: While reading debug info the function findSubprograms runs on each compilation unit. This diff parallelize that loop reducing its runtime duration by 70%. (cherry picked from FBD16362867)	2019-07-17 20:54:53 -07:00
laith sakka	9977b03fea	Run reorder blocks in parallel Summary: This diff change reorderBasicBlocks pass to run in parallel, it does so by adding locks to the fix branches function, and creating temporary MCCodeEmitters when estimating basic block code size. (cherry picked from FBD16161149)	2019-07-08 12:32:58 -07:00
Rafael Auler	1169f1fdd8	[BOLT] Support duplicating jump tables Summary: If two indirect branches use the same jump table, we need to detect this and duplicate dump tables so we can modify this CFG correctly. This is necessary for instrumentation and shrink wrapping. For the latter, we only detect this and bail, fixing this old known issue with shrink wrapping. Other minor changes to support better instrumentation: add an option to instrument only hot functions, add LOCK prefix to instrumentation increment instruction, speed up splitting critical edges by avoiding calling recomputeLandingPads() unnecessarily. (cherry picked from FBD16101312)	2019-07-02 16:56:41 -07:00

1 2

81 Commits