intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-15 04:17:17 +08:00

Author	SHA1	Message	Date
Amir Ayupov	12e9fec697	Rebase: [BOLT] DebugFission Support Summary: Implemented support for Debug Fission. For the most part it doesn't impact Monolithic execution path. One area that was changed is the DW_AT_low_pc/DW_AT_high_pc conversion. Before it was to DW_AT_ranges/DW_AT_low_pc, now DW_AT_low_pc is kept in same place. Another more visible impact is in Skeleton CU the DW_AT_low_pc is replaced with DW_AT_ranges_base if it's not originally present and bolt converted ranges conversion inside the dwo units. Output of this are multiple .dwo files with updated debug information. (cherry picked from FBD29569788)	2021-04-01 11:43:00 -07:00
Maksim Panchenko	ba6fdb8113	[BOLT] Preserve original jump table relocations Summary: Remove relocations against internal function labels, e.g. jump table relocations, only when overwriting them. While reading an input file with relocations, we create internal relocations against code references (we skip PIC relocations). Later, when we discover jump tables, we remove corresponding relocations with the assumption that original relocations will either be ignored or replaced by new relocations. However, it is possible to miss some references to the jump table, in which case the original entries will not be ignored. While such situation is abnormal, it is still a better/safer approach to preserve relocations if we are not replacing them with new ones. (cherry picked from FBD28406628)	2021-05-12 23:35:10 -07:00
Maksim Panchenko	fe37f1870e	[BOLT][NFC] Follow LLVM variable initialization style (cherry picked from FBD28417604)	2021-05-13 10:50:47 -07:00
Alexey Moksyakov	ce84e9607a	[PR] Fix bb reordering optimization Summary: Reorder-blocks optimization pass doesn't take into account that available offset for legacy Jcc instructions (for example, JRCXZ - operand 8 bits) has to be less than 255 bytes. It's rare case and to exclude such functions with unsupported instructions from optimization passes added extra checking Alexey Moksyakov Advanced Software Technology Lab, Huawei (cherry picked from FBD28264117)	2021-04-23 11:34:40 +03:00
Amir Ayupov	eb99a6665c	Rebase: [BOLT][NFC] Remove unneeded includes with include-what-you-use Summary: Ran iwyu multiple times, manually picked header remove lines. Reached fixed point wrt removal: iwyu doesn't automatically remove any more headers or forward declarations. (cherry picked from FBD29569221)	2021-04-30 13:54:02 -07:00
Amir Ayupov	c7306cc219	Rebase: [BOLT][NFC] Expand auto types Summary: Expanded auto types across BOLT semi-automatically with the aid of clangd LSP (cherry picked from FBD33289309)	2021-04-08 00:19:26 -07:00
Maksim Panchenko	e7169be93f	[BOLT] Do not assert on jump table heuristic failure Summary: During the initial indirect jump analysis, we used to assert that the discovered jump table type matched the pattern of the corresponding instruction sequence. E.g., for PIC jump table memory we expected the PIC jump table instruction sequence. The assertions were too conservative, as in the case of a mismatch we can mark the indirect jump as having an unknown control flow. That should be sufficient to either skip the function processing or rely on relocation information for possible recovery of the control flow. (cherry picked from FBD27255816)	2021-03-23 13:41:41 -07:00
Rafael Auler	b3c34d568a	[BOLT] Fix instrumentation bug in duplicated JTs Summary: Fix a bug with instrumentation when trying to instrument functions that share a jump table with multiple indirect jumps. Usually, each indirect jump that uses a JT will have its own copy of it. When this does not happen, we need to duplicate the jump table safely, so we can split the edges correctly (each copy of the jump table may have different split edges). For this to happen, we need to correctly match the sequence of instructions that perform the indirect jump to identify the base address of the jump table and patch it to point to the new cloned JT. It was reported to us a case in which the compiler generated suboptimal code to do an indirect jump which our matcher failed to identify. Fixes facebookincubator/BOLT#126 (cherry picked from FBD27065579)	2021-03-15 16:34:25 -07:00
Rafael Auler	16521f1f79	[BOLT] Update license headers Summary: Update license and fix headers for some files. (cherry picked from FBD28112041)	2021-03-15 18:04:18 -07:00
Amir Ayupov	1c5d3a056c	Rebase: Merge BOLT codebase in monorepo Summary: This commit is the first step in rebasing all of BOLT history in the LLVM monorepo. It also solves trivial build issues by updating BOLT codebase to use current LLVM. There is still work left in rebasing some BOLT features and in making sure everything is working as intended. History has been rewritten to put BOLT in the /bolt folder, as opposed to /tools/llvm-bolt. (cherry picked from FBD33289252)	2020-12-01 16:29:39 -08:00
Rafael Auler	e3898d5969	[BOLT] Add threshold options for lite mode Summary: Add options for trading processing speed for binary performance. -lite-threshold-pct=<uint> Threshold (in percent) for selecting functions to process in lite mode. Higher threshold means fewer functions to process. E.g threshold of 90 means only top 10 percent of functions with profile will be processed. -lite-threshold-count=<uint> Similar to '-lite-threshold-pct' but specify threshold using absolute function call count. I.e. limit processing to functions executed at least the specified number of times. -no-scan Do not scan cold functions for external references (may result in slower binary). (cherry picked from FBD24739092)	2020-12-30 12:23:58 -08:00
Amir Ayupov	157129b751	[BOLT] Debug logging in analyzeJumpTable Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(2) Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(2) Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)	2020-11-12 11:54:38 -08:00
Amir Ayupov	c36b71686c	Improve cold fragment name matching Summary: Fix cold fragment name matching regex by replacing existing regexes `.\.cold\..` and `.\.cold` and combining them into `.\.cold(\.\d)?`, applied to restored name (with BOLT-added suffixes stripped) This allows matching names like "execute_stack_op.cold/1", which previously weren't recognized. (cherry picked from FBD24804880)	2020-11-09 12:38:51 -08:00
Maksim Panchenko	f15532c2aa	[BOLT][DWARF] Streamline processing of DWARF unit DIEs Summary: Do not store processed DWARF DIEs, but instead process them while reading one at a time. Reduces memory consumption when updating debug info by 10%-25%. (cherry picked from FBD24327029)	2020-10-16 00:11:24 -07:00
Maksim Panchenko	53bd88c7fe	[BOLT] Refactor reading of debug line info Summary: Match BinaryFunction to a DWARFUnit based on the unit's address ranges skipping the parsing of DIEs. (cherry picked from FBD24269325)	2020-10-12 21:04:42 -07:00
Maksim Panchenko	0465d952cc	[BOLT] Refactor PatchEntries pass Summary: Use injected functions with fixed addresses to patch original function entries. (cherry picked from FBD24133890)	2020-10-09 16:06:27 -07:00
Amir Ayupov	d1ec11b28f	postProcessEntryPoints: return after setIgnored and setSimple are set Summary: This patch fixes the assertion failure during instrumentation. The assertion is raised by `getInstructionAtOffset` , which expects `CurrentState` to be either `Disassembled` or `CFG`. The function is called from `postProcessEntryPoints`, which goes over Labels and performs a series of checks. The checks call BinaryFunction methods `setSimple(false)` or `setIgnored()`. However, if `setIgnored` is invoked, it resets the state to `Empty`. Thus subsequent call to `getInstructionAtOffset` will fail. (cherry picked from FBD24005197)	2020-09-29 19:37:47 -07:00
Maksim Panchenko	a82cff0f52	[BOLT] Eliminate "shallow" function lookup Summary: Whenever we search for a function based on its address in the input binary, we now always return a corresponding fragment for split functions. If the user needs an access to the main fragment, they can call getTopmostFragment(). (cherry picked from FBD23670311)	2020-09-14 15:48:32 -07:00
takh	0033a7612d	Linux kernel marker to update special sections Summary: This diff adds SDT marker like LK marker to update special lk sections (cherry picked from FBD22932157)	2020-08-04 13:50:00 -07:00
Rafael Auler	6c8fc28892	Revert "[BOLT] Add the FeatureMiner pass to extract Calder's features." This reverts commit 2476f46af02ccce04e9ed456462dd098460e4e1f. Reviewed By: maks (cherry picked from FBD28111787)	2020-07-16 17:35:55 -07:00
Rafael Auler	170f73ac9e	[BOLT] Fix fix-branches in presence of JRCXZ and friends Summary: Do not fail/assert when trying to reorder blocks that terminate with JRCXZ/JECXZ/LOOP instructions. We cannot invert the condition of these instructions, so just treat them accordingly in fixBranches(). (cherry picked from FBD22487107)	2020-07-15 23:02:58 -07:00
Angélica Moreira	181327d763	[BOLT] Add the FeatureMiner pass to extract Calder's features. (cherry picked from FBD19844247)	2020-07-07 23:01:22 -07:00
Maksim Panchenko	13baf47a3c	[BOLT] Add '-force-patch' to forcefully patch old entries Summary: The option is useful for debugging. Also, print personality function when dumping a function. (cherry picked from FBD22169482)	2020-06-22 13:08:28 -07:00
Maksim Panchenko	0403adde32	[BOLT] Fixes for scanExternalRefs() Summary: In my previous commit, I've accidentally reverted the condition while evaluating a branch target. Also, do not emit instruction for relocation purposes in scanExternalRefs() if there was no TargetSymbol set and we have not produced new relocations. (cherry picked from FBD22169317)	2020-06-22 12:50:49 -07:00
Maksim Panchenko	8374e8e3fe	[BOLT] Properly register symbols at secondary entry points Summary: We may end up with a secondary entry symbol set to zero if there was no symbol in the input file at the entry point address, and if we skipped the function emission, e.g. if it was ignored. In that case, the symbol should be properly initialized with a proper address. (cherry picked from FBD22169167)	2020-06-22 12:37:48 -07:00
Maksim Panchenko	15fffe2824	[BOLT] Fix memory error Summary: Fix for double-free I've introduced earlier. (cherry picked from FBD22132595)	2020-06-18 20:59:01 -07:00
Maksim Panchenko	db4642d0a6	[BOLT] Support -hot-text in lite mode Summary: Update special symbol references in functions that are not emitted. (cherry picked from FBD22120995)	2020-06-18 11:10:41 -07:00
Maksim Panchenko	e7c3464226	[BOLT] Disable trapping on AVX-512 by default Summary: (cherry picked from FBD22118562)	2020-06-18 09:55:05 -07:00
Maksim Panchenko	0ce0bce9e7	[BOLT] Support for lite mode with relocations Summary: Add '-lite' support for relocations for improved processing time, memory consumption, and more resilient processing of binaries with embedded assembly code. In lite relocation mode, BOLT will skip full processing of functions without a profile. It will run scanExternalRefs() on such functions to discover external references and to create internal relocations to update references to optimized functions. Note that we could have relied on the compiler/linker to provide relocations for function references. However, there's no assurance that all such references are reported. E.g., the compiler can resolve inter-procedural references internally, leaving no relocations for the linker. The scan process takes about <10 seconds per 100MB of code on modern hardware. It's a reasonable overhead to live with considering the flexibility it provides. If BOLT fails to scan or disassemble a function, .e.g., due to a data object embedded in code, or an unsupported instruction, it enables a patching mode to guarantee that the failed function will call optimized/moved versions of functions. The patching happens at original function entry points. '-skip=<func1,func2,...>' option now can be used to skip processing of arbitrary functions in the relocation mode. With '-use-old-text' or '-strict' we require all functions to be processed. As such, it is incompatible with '-lite' option, and '-skip' option will only disable optimizations of listed functions, not their disassembly and emission. (cherry picked from FBD22040717)	2020-06-15 00:15:47 -07:00
Alexander Shaposhnikov	cd067ae1e8	Emit functions on MachO Summary: Start emitting functions (for MachO input binaries). (cherry picked from FBD21721586)	2020-05-26 04:21:04 -07:00
Maksim Panchenko	8729171182	[BOLT] Refactor profile-handling code Summary: This diff handles several issues related to profile reading and handling: * Unifies interface used by 3 profile readers in ProfileReaderBase. * Adds automatic detection of the profile file contents. * Removes reader-specific fields from BinaryFunction and BinaryData. All the information is stored in instruction annotations. * Removes implicit memory dependencies in annotations on profile reader instance. * Adds lite mode support to YAML reader. * Moves profile reading code out of BinaryFunction. (cherry picked from FBD21601411)	2020-05-07 23:00:29 -07:00
Maksim Panchenko	cce49b9522	[BOLT] Remove StringRef from IndirectCallProfile Summary: IndirectCallProfile was holding to a StringRef from a profile reader providing an implicit dependency on the reader. (cherry picked from FBD21587101)	2020-05-14 17:34:20 -07:00
Maksim Panchenko	924d0bdb08	[BOLT] Introduce lite processing mode without relocations Summary: When optimizing a binary without relocations, we can skip processing functions without profile (cold functions). By skipping processing of cold functions, we reduce the processing time and memory. We call such mode a lite mode, and it is enabled by default. Some processing is still done for functions without profile even in lite mode. scanExternalRefs() function is used to detect secondary entry points to functions that are not marked in the symbol table. Note that the no-relocation requirement is a temporary limitation of the lite mode. (cherry picked from FBD21366567)	2020-05-03 15:49:58 -07:00
Maksim Panchenko	04c5d4fcab	[BOLT] Introduce isIgnored() function attribute Summary: Whenever a function is not meant for processing, e.g. when the user requests to optimize only a subset of functions, mark the function as ignored. Use this attribute instead of opts::shouldProcess(). (cherry picked from FBD21374806)	2020-05-03 13:54:45 -07:00
Maksim Panchenko	ac36e17a73	[BOLT][BFC] Refactor code for adding secondary function entries Summary: In non-relocation mode, the code for marking a function non-simple was decoupled from the code that added new entry points. Fix that. (cherry picked from FBD21264247)	2020-04-27 13:40:53 -07:00
Maksim Panchenko	5296b6d12a	[BOLT] Change symbol handling for secondary function entries Summary: Some functions could be called at an address inside their function body. Typically, these functions are written in assembly as C/C++ does not have a multi-entry function concept. The addresses inside a function body that could be referenced from outside are called secondary entry points. In BOLT we support processing functions with secondary/multiple entry points. We used to mark basic blocks representing those entry points with a special flag. There was only one problem - each basic block has exactly one MCSymbol associated with it, and for the most efficient processing we prefer that symbol to be local/temporary. However, in certain scenarios, e.g. when running in non-relocation mode, we need the entry symbol to be global/non-temporary. We could create global symbols for secondary points ahead of time when the entry point is marked in the symbol table. But not all such entries are properly marked. This means that potentially we could discover an entry point only after disassembling the code that references it, and it could happen after a local label was already created at the same location together with all its references. Replacing the local symbol and updating the references turned out to be an error-prone process. This diff takes a different approach. All basic blocks are created with permanently local symbols. Whenever there's a need to add a secondary entry point, we create an extra global symbol or use an existing one at that location. Containing BinaryFunction maps a local symbol of a basic block to the global symbol representing a secondary entry point. This way we can tell if the basic block is a secondary entry point, and we emit both symbols for all secondary entry points. Since secondary entry points are quite rare, the overhead of this approach is minimal. Note that the same location could be referenced via local symbol from inside a function and via global entry point symbol from outside. This is true for both primary and secondary entry points. (cherry picked from FBD21150193)	2020-04-19 22:29:54 -07:00
Maksim Panchenko	606532bdf1	[BOLT] Fix .eh_frame update with ICF in non-relocation mode Summary: In a rare case, we may fold a function and fail to emit it in non-relocation mode due to a function size increase. At the same time, the function that the original function was folded into could have been successfully emitted, e.g. because it was split in the presence of a profile information. Later, because the function was not emitted, we have to use its original .eh_frame entry in the preserved .eh_frame section. However, that entry is no longer referencing the original function, but the function that the original was folded into. This happens since the original symbol gets emitted at the other function location. As a result, .eh_frame entry for the folded function is missing. To prevent incorrect update of the original .eh_frame, create relocations against absolute values. This guarantees preservation of the section contents while updating pc-relative references. (cherry picked from FBD21061130)	2020-04-16 00:02:35 -07:00
Maksim Panchenko	ee0371ad97	[BOLT] Speedup ICF by better function hashing Summary: Too many hash collisions may cause ICF to run slowly. We used to hash BinaryFunction only looking at instruction opcodes, ignoring instruction operands. With many almost identical functions, such approach may lead to long ICF processing time. By including operands into the hash, we reduce the number of collisions and improve the runtime often by a factor of 2 or more. (cherry picked from FBD20888957)	2020-04-07 00:21:37 -07:00
Maksim Panchenko	58b0d9e7b0	[BOLT][DWARF] Add support for base address in DWARF location lists Summary: The version of LLVM that we are based on lacks the support for base address in DWARF location lists. Add the missing pieces. (cherry picked from FBD20640784)	2020-03-24 22:05:37 -07:00
Maksim Panchenko	1f3e351a9c	[BOLT] Refactor code and data emission code Summary: Consolidate code and data emission code in ELF-independent BinaryEmitter. The high-level interface includes only two functions emitBinaryContext() and emitFunctionBody() used by RewriteInstance and BinaryContext respectively. (cherry picked from FBD20332901)	2020-03-06 15:06:37 -08:00
Alexander Shaposhnikov	c3c4b15a2e	[BOLT] Remove BinaryContext::getFunctionData Summary: In this diff we refactor the code around getting the original binary encoding of function's body. The main changes are: remove BinaryContext::getFunctionData, remove the parameter of the method BinaryFunction::disassemble, introduce BinaryFunction::getData. (cherry picked from FBD19824368)	2020-02-10 15:35:11 -08:00
Rafael Auler	0080d74506	[BOLT] Fix issue with strict and builtin_unreachable Summary: In strict mode, a jump table with targets generated by builtin_unreachable (located at the very end of the function) was asserting when being recreated by postProcessIndirectBranches. Fix this. (cherry picked from FBD19614981)	2020-01-28 18:38:10 -08:00
Maksim Panchenko	ac697b7d3a	[BOLT] Replace list of Names with Symbols for BinaryFunction Summary: BinaryFunction used to have a list of Names associated with its main entry point. However, the function is primarily identified by its corresponding symbol or symbols, and these symbols are available as we are creating them for a corresponding BinaryData object. There's also no reason to emit symbols for alternative function names (aliases), so change the code to only emit needed symbols. When we emit a cold fragment for a function, only emit one cold symbol for the fragment instead of one per every main entry symbol/name. When we match a symbol to an entry point in the function, with this change we can first go through the list of main entry symbols (now that they are available). (cherry picked from FBD19426709)	2020-01-13 11:56:59 -08:00
Rafael Auler	961d3d02d8	[BOLT] Move postProcessEntryPoints after disassembly Summary: Call postProcessEntryPoints only after all functions have been disassembled and all interprocedural references have been processed, when all possible entry points have been accounted for. This makes our detection of bad entries more robust as it does not depend on the order of the functions any more. (cherry picked from FBD19404767)	2020-01-14 17:12:03 -08:00
Maksim Panchenko	088e3c032a	[BOLT] Improve handling of secondary function entry points Summary: "Fix symbol table entries for secondary entries" diff broke the inliner. Fix the breakage and make the discovery of secondary entry points more accurate. Add ability to BinaryContext::getFunctionForSymbol() to return an entry point discriminator and use it instead of calling getEntryForSymbol() and isSecondaryEntry(). This is the preferred way since getFunctionForSymbol() is thread-safe. (cherry picked from FBD19295983)	2020-01-06 14:57:15 -08:00
Rafael Auler	de284bc510	[BOLT] Fix symbol table entries for secondary entries Summary: Commit "Support full instrumentation" changed the map SymbolToFunction in BinaryContext to map secondary entries of functions too. This introduced unexpected behavior in our symbol table rewriting logic, which caused it to mistakenly write them with the address of the original function. Fix the behavior of getBinaryFunctionAtAddress to correct this. Also fix other users of SymbolToFunction to ensure they are not accidentally using secondary entries when they shouldn't. (cherry picked from FBD19168319)	2019-12-18 12:14:42 -08:00
Rafael Auler	16a497c627	[BOLT] Support full instrumentation Summary: Add full instrumentation support (branches, direct and indirect calls). Add output statistics to show how many hot bytes were split from cold ones in functions. Add -cold-threshold option to allow splitting warm code (non-zero count). Add option in bolt-diff to report missing functions in profile 2. In instrumentation, fini hooks are fixed to run proper finalization code after program finishes. Hooks for startup are added to setup the runtime structures that needs initilization, such as indirect call hash tables. Add support for automatically dumping profile data every N seconds by forking a watcher process during runtime. (cherry picked from FBD17644396)	2019-12-13 17:27:03 -08:00
Maksim Panchenko	3cc4fc267b	[BOLT] Proper support for -trap-avx512 option Summary: If -trap-avx512 option is not set, verify that we correctly encode AVX-512 instructions and treat them as ordinary instructions. (cherry picked from FBD18666427)	2019-11-22 14:53:20 -08:00
Maksim Panchenko	7350d40404	[BOLT][NFC] Refactor BinaryFunction::addEntryPoint() Summary: There is no need to support existing functionality of adding entry points after the CFG is built as the function is only called in empty or disassembled state. Previously we used to run disassemble+buildCFG per function, but now these phases are decoupled. Also, remove a couple of redundant checks. (cherry picked from FBD18622822)	2019-11-11 17:02:37 -08:00
Maksim Panchenko	72b52edcbb	[BOLT] Free more memory in BinaryFunction::releaseCFG() Summary: Free more lists in BinaryFunction::releaseCFG(). Release BinaryFunction::Relocations after disassembly. Do not populate BinaryFunction::MoveRelocations as we are not using them currently. Also remove PCRelativeRelocationOffsets that weren't used. (cherry picked from FBD18413256)	2019-11-08 14:41:31 -08:00

1 2 3

122 Commits