intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-14 03:50:17 +08:00

Author	SHA1	Message	Date
Rafael Auler	f91d121eee	[BOLT] Add option to tag version Summary: Add a dummy option in BOLT to allow us to write any string in the bolt info section. This is accomplished since we record the complete argv vector. This string used to tag this binary with any ID that can later be associated with a specific BOLT invocation. (cherry picked from FBD21441902)	2020-05-06 17:31:25 -07:00
Maksim Panchenko	689447bf10	[BOLT] Change .debug_line emission for non-simple functions Summary: We use a special routine to emit line info for functions that we do not overwrite. The resulting DWARF was not quite efficient as we were advancing addresses using a larger than needed opcodes. Since there were only a few functions that we didn't emit/overwrite, it was not a big issue. However, in lite mode the majority of functions are not overwritten and as a result, the inefficiency in debug line encoding got exposed and binaries were getting larger than expected .debug_line sections. Fix it by using more conventional line table opcodes for address advancing. (cherry picked from FBD21423074)	2020-05-05 23:56:50 -07:00
Maksim Panchenko	96c4168ddc	[BOLT] Ignore kernel interrupts by default (cherry picked from FBD21431563)	2020-05-06 11:52:16 -07:00
Xun Li	7b61bdf8ea	Check runtime lib format within archiver Summary: We only support linking ELF runtime library right now. If the library is an archiver, we check that each individual library inside the archiver is an ELF library. (cherry picked from FBD21388672)	2020-05-04 13:57:21 -07:00
Maksim Panchenko	924d0bdb08	[BOLT] Introduce lite processing mode without relocations Summary: When optimizing a binary without relocations, we can skip processing functions without profile (cold functions). By skipping processing of cold functions, we reduce the processing time and memory. We call such mode a lite mode, and it is enabled by default. Some processing is still done for functions without profile even in lite mode. scanExternalRefs() function is used to detect secondary entry points to functions that are not marked in the symbol table. Note that the no-relocation requirement is a temporary limitation of the lite mode. (cherry picked from FBD21366567)	2020-05-03 15:49:58 -07:00
Maksim Panchenko	04c5d4fcab	[BOLT] Introduce isIgnored() function attribute Summary: Whenever a function is not meant for processing, e.g. when the user requests to optimize only a subset of functions, mark the function as ignored. Use this attribute instead of opts::shouldProcess(). (cherry picked from FBD21374806)	2020-05-03 13:54:45 -07:00
Maksim Panchenko	4e69764c65	[BOLT] Fix dyno stats after ICF in non-reloc mode Summary: The commit that fixed ICF determinism in non-relocation mode disabled profile merging for functions. Dyno stats output needs to be updated to reflect the lack of merge. (cherry picked from FBD21366046)	2020-05-01 17:51:43 -07:00
Maksim Panchenko	b62a1774af	[BOLT] Cover PIC jump table reference in non-strict mode Summary: In non-strict relocation mode it was possible to miss a jump table reference leading to incorrect code. (cherry picked from FBD21251467)	2020-04-26 17:51:07 -07:00
Maksim Panchenko	ac36e17a73	[BOLT][BFC] Refactor code for adding secondary function entries Summary: In non-relocation mode, the code for marking a function non-simple was decoupled from the code that added new entry points. Fix that. (cherry picked from FBD21264247)	2020-04-27 13:40:53 -07:00
Maksim Panchenko	5296b6d12a	[BOLT] Change symbol handling for secondary function entries Summary: Some functions could be called at an address inside their function body. Typically, these functions are written in assembly as C/C++ does not have a multi-entry function concept. The addresses inside a function body that could be referenced from outside are called secondary entry points. In BOLT we support processing functions with secondary/multiple entry points. We used to mark basic blocks representing those entry points with a special flag. There was only one problem - each basic block has exactly one MCSymbol associated with it, and for the most efficient processing we prefer that symbol to be local/temporary. However, in certain scenarios, e.g. when running in non-relocation mode, we need the entry symbol to be global/non-temporary. We could create global symbols for secondary points ahead of time when the entry point is marked in the symbol table. But not all such entries are properly marked. This means that potentially we could discover an entry point only after disassembling the code that references it, and it could happen after a local label was already created at the same location together with all its references. Replacing the local symbol and updating the references turned out to be an error-prone process. This diff takes a different approach. All basic blocks are created with permanently local symbols. Whenever there's a need to add a secondary entry point, we create an extra global symbol or use an existing one at that location. Containing BinaryFunction maps a local symbol of a basic block to the global symbol representing a secondary entry point. This way we can tell if the basic block is a secondary entry point, and we emit both symbols for all secondary entry points. Since secondary entry points are quite rare, the overhead of this approach is minimal. Note that the same location could be referenced via local symbol from inside a function and via global entry point symbol from outside. This is true for both primary and secondary entry points. (cherry picked from FBD21150193)	2020-04-19 22:29:54 -07:00
Maksim Panchenko	ac1af09e82	[BOLT][NFC] Change wording while reporting functions stats Summary: (cherry picked from FBD21242167)	2020-04-24 16:36:22 -07:00
Maksim Panchenko	fbca177a83	[BOLT] Speedup PLT processing Summary: With larger PLT sizes, linear PLT symbol name lookup becomes a bottleneck. (cherry picked from FBD21223695)	2020-04-23 21:29:10 -07:00
Maksim Panchenko	0ea98d1f0b	[BOLT] Option to fail if invalid profile detected Summary: Add an option to fail processing of the input binary if the profile is not accurate: -stale-threshold=<uint> - maximum percentage of stale functions to tolerate (default: 100) Default (100) means never to fail. A function profile is considered stale if any branch in its profile has invalid source or destination. Use `-stale-threshold=0` to fail if any staleness is detected in the profile. (cherry picked from FBD21189036)	2020-04-22 15:09:49 -07:00
Maksim Panchenko	33e0b2aa58	[BOLT] Do not emit old .eh_frame in relocation mode Summary: In relocation mode, there is no use for old .eh_frame section. Moreover, it can interfere with new EH frames via .eh_frame_hdr when the original .text is reused. (cherry picked from FBD21120070)	2020-04-19 12:55:43 -07:00
Maksim Panchenko	23edb3ed9c	[BOLT] Option to control .text alignment Summary: Add option `-align-text=<n>` to control .text alignment within a segment. Set to page size by default. (cherry picked from FBD21120063)	2020-04-19 15:02:50 -07:00
Maksim Panchenko	10245b5c5b	[BOLT] Emit ICF symbols for large functions Summary: In non-relocation mode, make sure we emit extra symbols for a folded function even if the function was not overwritten due to its large size. (cherry picked from FBD21080467)	2020-04-16 00:05:01 -07:00
Maksim Panchenko	606532bdf1	[BOLT] Fix .eh_frame update with ICF in non-relocation mode Summary: In a rare case, we may fold a function and fail to emit it in non-relocation mode due to a function size increase. At the same time, the function that the original function was folded into could have been successfully emitted, e.g. because it was split in the presence of a profile information. Later, because the function was not emitted, we have to use its original .eh_frame entry in the preserved .eh_frame section. However, that entry is no longer referencing the original function, but the function that the original was folded into. This happens since the original symbol gets emitted at the other function location. As a result, .eh_frame entry for the folded function is missing. To prevent incorrect update of the original .eh_frame, create relocations against absolute values. This guarantees preservation of the section contents while updating pc-relative references. (cherry picked from FBD21061130)	2020-04-16 00:02:35 -07:00
Maksim Panchenko	1be7a82540	[BOLT] Speedup RTDyld external symbol resolution Summary: RuntimeDyldImpl::resolveExternalSymbols() some time ago used to call getSymbolAddress() while in the second loop. That call could have modified the contents of ExternalSymbolRelocations that the loop was iterating over. Thus the code was written in a way that erased the processed entry on every loop iteration and reset the map iterator. With large number of entries in ExternalSymbolRelocations the loop code becomes a performance bottleneck. Since getSymbolAddress() is no longer used, the ExternalSymbolRelocations could be iterated in a straightforward way and the map cleared before the function exit. (cherry picked from FBD21057058)	2019-11-11 13:29:46 -08:00
Rafael Auler	6dbd15bc01	[BOLT-X86] Fix instrumentation issue with indirect calls Summary: Indirect calls that use RSP to compute the target address would break in instrumentation mode because we were adding instructions that changed the stack pointer. Fix this. (cherry picked from FBD20883791)	2020-04-06 17:38:11 -07:00
Maksim Panchenko	401fa5b493	[BOLT] Further speedup ICF Summary: Further speedup ICF by applying stricter rules for congruent functions. While checking symbolic operands in congruent functions, consider operands congruent only if they are equal or reference functions with identical hashes, i.e. potentially foldable functions. Note that jump table operands are handled as a special case. (cherry picked from FBD20912054)	2020-04-07 22:10:12 -07:00
Maksim Panchenko	ee0371ad97	[BOLT] Speedup ICF by better function hashing Summary: Too many hash collisions may cause ICF to run slowly. We used to hash BinaryFunction only looking at instruction opcodes, ignoring instruction operands. With many almost identical functions, such approach may lead to long ICF processing time. By including operands into the hash, we reduce the number of collisions and improve the runtime often by a factor of 2 or more. (cherry picked from FBD20888957)	2020-04-07 00:21:37 -07:00
Maksim Panchenko	abda7dc6a7	[BOLT] Fix ICF non-determinism in non-relocation mode Summary: ICF may fold functions in arbitrary order when running multi-threaded. This is fine in relocation mode as we end up with just one function holding all function symbols. However, in non-relocation mode we keep all function bodies, and if we keep merging profiles in non-deterministic order, we end up with functions with non deterministic profiles. The fix for non-relocation mode is to not merge profiles as the factual new profile could be different from the merged one since both function instances are potentially callable. Additionally, emit extra symbols for ICF functions in non-relocation mode to make it possible to track the folding. (cherry picked from FBD20889866)	2020-04-04 20:12:38 -07:00
Maksim Panchenko	b08d82d91b	[BOLT] Verify exceptions action table equivalence in ICF Summary: Some functions may have exactly the same code and exception handlers. However, their action tables could be different leading to mismatching semantics. We should verify their equivalence while running ICF. (cherry picked from FBD20889035)	2020-03-30 19:08:24 -07:00
Maksim Panchenko	58b0d9e7b0	[BOLT][DWARF] Add support for base address in DWARF location lists Summary: The version of LLVM that we are based on lacks the support for base address in DWARF location lists. Add the missing pieces. (cherry picked from FBD20640784)	2020-03-24 22:05:37 -07:00
Maksim Panchenko	bbbf679b42	[BOLT] Refactor ELF symbol table rewriting code Summary: Make ELF symbol table rewriting code more structured. While at it, remove symbols from non-allocatable sections. (cherry picked from FBD20243386)	2020-02-26 20:43:18 -08:00
Maksim Panchenko	a07f1a26e7	[BOLT] Refactor section prefixes (cherry picked from FBD20400886)	2020-03-11 15:51:32 -07:00
Maksim Panchenko	1f3e351a9c	[BOLT] Refactor code and data emission code Summary: Consolidate code and data emission code in ELF-independent BinaryEmitter. The high-level interface includes only two functions emitBinaryContext() and emitFunctionBody() used by RewriteInstance and BinaryContext respectively. (cherry picked from FBD20332901)	2020-03-06 15:06:37 -08:00
Maksim Panchenko	74a2777c54	[BOLT] Refactor ELF parts of instrumentation code Summary: This is a prerequisite for larger emitter refactoring. Since .dynamic is read unconditionally, add an error message if the section is missing, or the size of the section is zero. (cherry picked from FBD20331735)	2020-03-08 19:04:39 -07:00
Maksim Panchenko	af553124d3	[BOLT] Refactor emission of original .eh_frame Summary: There is no need to treat the emission of the original `.eh_frame` section as a special case. (cherry picked from FBD20323360)	2020-03-07 11:19:09 -08:00
Alexander Shaposhnikov	e3654fc274	[BOLT] Uniquify names of local symbols Summary: 1. Uniquify names of local symbols. 2. Handle aliases. (cherry picked from FBD20270196)	2020-03-04 18:36:44 -08:00
Alexander Shaposhnikov	842a25f785	[BOLT] Mark functions containing data as non-simple Summary: Temporarily mark functions containing data as non-simple. (cherry picked from FBD20213279)	2020-03-02 22:41:12 -08:00
Maksim Panchenko	cb9c991dcb	[BOLT] Remove allow-section-relocations option Summary: The option is not used. Remove all related code. (cherry picked from FBD20237859)	2020-03-03 15:51:24 -08:00
Maksim Panchenko	c7e012e145	[BOLT][NFC] Get rid of BestFit parameter Summary: The parameter is no longer used. (cherry picked from FBD20236516)	2020-03-03 14:28:42 -08:00
Alexander Shaposhnikov	b0cbb60165	[BOLT] Fix begin decrementing Summary: Fix begin decrementing. (cherry picked from FBD20232474)	2020-03-03 13:36:32 -08:00
Maksim Panchenko	d89bb53afa	[BOLT][NFC] Factor out relocation processing (cherry picked from FBD20087297)	2020-02-24 17:10:02 -08:00
Rafael Auler	340da8f294	[BOLT] Fix shrink wrapping to check pops Summary: Shrink wrapping has a mode where it will directly move push pop pairs, instead of replacing them with stores/loads. This is an ambitious mode that is triggered sometimes, but whenever matching with a push, it would operate with the assumption that the restoring instruction was a pop, not a load, otherwise it would assert. Fix this assertion to bail nicely back to non-pushpop mode (use regular store and load instructions). (cherry picked from FBD20085905)	2020-02-18 16:00:40 -08:00
Maksim Panchenko	2df4e7b99e	[BOLT][NFC] Minor refactoring of RewriteInstance (cherry picked from FBD20087424)	2020-02-24 17:12:41 -08:00
Maksim Panchenko	495761dc70	[BOLT][NFC] Remove unused BinarySection member functions (cherry picked from FBD20087243)	2020-02-24 16:56:45 -08:00
Maksim Panchenko	3b45212e84	[BOLT] Delete ExecutableFileMemoryManager::registerNoteSection() Summary: The interface is no longer in use. (cherry picked from FBD20070558)	2020-02-24 09:40:32 -08:00
Alexander Shaposhnikov	01b7c90242	[BOLT] Add missing override Summary: Add missing override in X86MCPlusBuilder.cpp. (cherry picked from FBD20064222)	2020-02-23 22:27:28 -08:00
Maksim Panchenko	be43f89c4f	[BOLT][llvm] Update llvm.patch Summary: (cherry picked from FBD20063562)	2020-02-23 19:51:33 -08:00
Alexander Shaposhnikov	76aa1c26aa	[BOLT] Enable reversing the order of basic blocks Summary: Enable reversing the order of basic blocks. (cherry picked from FBD19943692)	2020-02-17 13:35:09 -08:00
Alexander Shaposhnikov	4ad5048393	[BOLT] Add first bits to build CFG Summary: Add first bits to build CFG. (cherry picked from FBD19943472)	2020-02-17 12:18:42 -08:00
Alexander Shaposhnikov	5b64bf2128	[BOLT] Disassemble functions from a MachO binary Summary: Add first bits to disassemble functions from a MachO binary. (cherry picked from FBD19900493)	2020-02-11 14:30:33 -08:00
Rafael Auler	a9d85413ac	[BOLT] Emit long nops by default Summary: Change our X86 target to use long nops by default. In general, BOLT does not put nops into the instruction stream that is going to be executed, since it doesn't align basic blocks, only functions. Since we rebased BOLT, our relationship with MCAssembler changed because it stopped using multibyte nops and we never needed to revisit that. But it makes a difference if we want to mitigate perf issues with the Intel JCC erratum, since the nops inserted are going to be decoded and executed. To make MCAssembler emit long nops again, we need to explictly set mattr (Features) of the X86 target. (cherry picked from FBD19987277)	2020-02-19 16:13:58 -08:00
Maksim Panchenko	9711286858	[BOLT] Get rid of BinarySection::IsLocal Summary: The flag is no longer used/needed. (cherry picked from FBD19951571)	2020-02-18 09:20:17 -08:00
Alexander Shaposhnikov	16630f5c58	[BOLT] Factor out NameResolver from RewriteInstance Summary: Factor out the helper class NameResolver from the class RewriteInstance. (cherry picked from FBD19943916)	2020-02-17 14:37:46 -08:00
Alexander Shaposhnikov	754b6569f6	[BOLT] Add missing std::move Summary: Add missing std::move in the method BinaryFunction::addAlternativeName (cherry picked from FBD19944661)	2020-02-17 17:53:12 -08:00
Alexander Shaposhnikov	36cf37c4c1	[BOLT] Add initial bits for parsing MachO files Summary: Start adding initial bits for MachO, this diff contains some small preparations for finding functions inside a MachO binary, this will be done in the next diff. The concept of a section in the MachO world is quite different from ELF, nevertheless, for functions for now it more or less fits into the current picture (in BOLT), but things will diverge more significantly a bit later. (cherry picked from FBD19648161)	2020-01-30 13:10:48 -08:00
Rafael Auler	58a129a602	[BOLT] Move peepholes pass after sctc Summary: There are two peephole subpasses, remove-double-jumps and remove-useless-conditional-branches, that operates by reading branches directly, which makes them tricky to run before fix-branches. In the case of remove-double-jumps, it will even lead to suboptimal code if the patched branch was going to be removed by fix-branches when the target is the fall-through. If the final target is a tail call, it will lead to a broken CFG in the worst case. Fix this by moving these passes after SCTC, which already produces CFGs with conditional tail calls. (cherry picked from FBD18795592)	2019-12-03 12:28:22 -08:00

1 2 3 4 5 ...

711 Commits