intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-21 12:19:23 +08:00

Author	SHA1	Message	Date
Gergely Bálint	29fef3a51e	[BOLT] Improve DWARF CFI generation for pac-ret binaries (#163381 ) During InsertNegateRAState pass we check the annotations on instructions, to decide where to generate the OpNegateRAState CFIs in the output binary. As only instructions in the input binary were annotated, we have to make a judgement on instructions generated by other BOLT passes. Incorrect placement may cause issues when an (async) unwind request is received during the new "unknown" instructions. This patch adds more logic to make a more informed decision on by taking into account: - unknown instructions in a BasicBlock with other instruction have the same RAState. Previously, if the BasicBlock started with an unknown instruction, the RAState was copied from the preceding block. Now, the RAState is copied from the succeeding instructions in the same block. - Some BasicBlocks may only contain instructions with unknown RAState, As explained in issue #160989, these blocks already have incorrect unwind info. Because of this, the last known RAState based on the layout order is copied. Updated bolt/docs/PacRetDesign.md to reflect changes.	2025-12-01 12:00:31 +01:00
Gergely Bálint	8e6fb0ee84	Reapply "[BOLT][BTI] Skip inlining BasicBlocks containing indirect tailcalls" (#169881 ) (#169929 ) This reapplies commit `5d6d74359d`. Fix: added assertions to the requirements of the test -------- Original commit message: In the Inliner pass, tailcalls are converted to calls in the inlined BasicBlock. If the tailcall is indirect, the `BR` is converted to `BLR`. These instructions require different BTI landing pads at their targets. As the targets of indirect tailcalls are unknown, inlining such blocks is unsound for BTI: they should be skipped instead.	2025-12-01 10:20:23 +01:00
Vasily Leonenko	a751ed97ac	[BOLT] Support runtime library hook via DT_INIT_ARRAY (#167467 ) Major part of this PR is commit implementing support for DT_INIT_ARRAY for BOLT runtime libraries initialization. Also, it adds related hook-init test & fixes couple of X86 instrumentation tests. This commit follows implementation of instrumentation hook via DT_FINI_ARRAY (https://github.com/llvm/llvm-project/pull/67348) and extends it for BOLT runtime libraries (including instrumentation library) initialization hooking. Initialization has has differences compared to finalization: - Executables always use ELF entry point address. Update code checks it and updates init_array entry if ELF is shared library (have no interp entry) and have no DT_INIT entry. Also this commit introduces "runtime-lib-init-hook" option to select primary initialization hook (entry_point, init, init_array) with fall back to next available hook in input binary. e.g. in case of libc we can explicitly set it to init_array. - Shared library init_array entries relocations usually has R_AARCH64_ABS64 type on AArch64 binaries. We check relocation type and adjust methods for reading init_array relocations in discovery and update methods. --------- Co-authored-by: Vasily Leonenko <vasily.leonenko@huawei.com>	2025-12-01 10:55:00 +03:00
Gergely Bálint	9bffb10e8b	Revert "[BOLT][BTI] Skip inlining BasicBlocks containing indirect tailcalls" (#169881 ) Reverts llvm/llvm-project#168403 The attached lit test is failing in some build configurations.	2025-11-28 10:10:53 +01:00
Alexey Moksyakov	ad605bdad7	[bolt][aarch64] Change indirect call instrumentation snippet Indirect call instrumentation snippet uses x16 register in exit handler to go to destination target __bolt_instr_ind_call_handler_func: msr nzcv, x1 ldp x0, x1, [sp], llvm#16 ldr x16, [sp], llvm#16 ldp x0, x1, [sp], llvm#16 br x16 <----- This patch adds the instrumentation snippet by calling instrumentation runtime library through indirect call instruction and adding the wrapper to store/load target value and the register for original indirect instruction. Example: mov x16, foo infirectCall: adrp x8, Label add x8, x8, #:lo12:Label blr x8 Before: Instrumented indirect call: stp x0, x1, [sp, #-16]! mov x0, x8 movk x1, #0x0, lsl llvm#48 movk x1, #0x0, lsl llvm#32 movk x1, #0x0, lsl llvm#16 movk x1, #0x0 stp x0, x1, [sp, #-16]! adrp x0, __bolt_instr_ind_call_handler_func add x0, x0, #:lo12:__bolt_instr_ind_call_handler_func blr x0 __bolt_instr_ind_call_handler: (exit snippet) msr nzcv, x1 ldp x0, x1, [sp], llvm#16 ldr x16, [sp], llvm#16 ldp x0, x1, [sp], llvm#16 br x16 <- overwrites the original value in X16 __bolt_instr_ind_call_handler_func: (entry snippet) stp x0, x1, [sp, #-16]! mrs x1, nzcv adrp x0, __bolt_instr_ind_call_handler add x0, x0, x0, #:lo12:__bolt_instr_ind_call_handler ldr x0, [x0] cmp x0, #0x0 b.eq __bolt_instr_ind_call_handler str x30, [sp, #-16]! blr x0 <--- runtime lib store/load all regs ldr x30, [sp], llvm#16 b __bolt_instr_ind_call_handler _________________________________________________________________________ After: mov x16, foo infirectCall: adrp x8, Label add x8, x8, #:lo12:Label blr x8 Instrumented indirect call: stp x0, x1, [sp, #-16]! mov x0, x8 movk x1, #0x0, lsl llvm#48 movk x1, #0x0, lsl llvm#32 movk x1, #0x0, lsl llvm#16 movk x1, #0x0 stp x0, x30, [sp, #-16]! adrp x8, __bolt_instr_ind_call_handler_func add x8, x8, #:lo12:__bolt_instr_ind_call_handler_func blr x8 <--- call trampoline instr lib ldp x0, x30, [sp], llvm#16 mov x8, x0 <---- restore original target ldp x0, x1, [sp], llvm#16 blr x8 <--- original indirect call instruction // don't touch regs besides x0, x1 __bolt_instr_ind_call_handler: (exit snippet) ret <---- return to original function with indirect call __bolt_instr_ind_call_handler_func: (entry snippet) adrp x0, __bolt_instr_ind_call_handler add x0, x0, #:lo12:__bolt_instr_ind_call_handler ldr x0, [x0] cmp x0, #0x0 b.eq __bolt_instr_ind_call_handler str x30, [sp, #-16]! blr x0 <--- runtime lib store/load all regs ldr x30, [sp], llvm#16 b __bolt_instr_ind_call_handler	2025-11-27 23:48:10 +03:00
Gergely Bálint	5d6d74359d	[BOLT][BTI] Skip inlining BasicBlocks containing indirect tailcalls (#168403 ) In the Inliner pass, tailcalls are converted to calls in the inlined BasicBlock. If the tailcall is indirect, the `BR` is converted to `BLR`. These instructions require different BTI landing pads at their targets. As the targets of indirect tailcalls are unknown, inlining such blocks is unsound for BTI: they should be skipped instead.	2025-11-27 16:50:38 +01:00
Gergely Bálint	cca66a21c2	[BOLT][BTI] Add MCPlusBuilder::updateBTIVariant (#167308 ) Checks if an instruction is BTI, and updates the immediate value to the newly requested variant. This can be used in situations when the compiler already inserted a BTI landing pad to a location, but BOLT needs to update it to a different variant. Example: br x0 to a location with a BTI c.	2025-11-26 17:48:34 +01:00
Gergely Bálint	de4e12849b	[BOLT] Fix assertion test (#169635 ) The AArch64_BTI MCPlusBuilder unittest was failing in no assertion builds. Add `#ifndef NDEBUG` to exclude the assertion test from no assertion builds.	2025-11-26 15:26:45 +01:00
Maksim Panchenko	6c48fbc1dc	[BOLT][Tests] Use AT&T assembler syntax only for X86 tests (#169541 ) Enabling AT&T syntax for all tests is broken when X86 target is not enabled as reported in #167225.	2025-11-25 11:15:24 -08:00
Gergely Bálint	4533699245	[BOLT][BTI] Add MCPlusBuilder::isBTILandingPad (#167306 ) - takes both implicit and explicit BTIs into account - fix related comment in llvm/lib/Target/AArch64/AArch64BranchTargets.cpp	2025-11-25 18:37:30 +01:00
Gergely Bálint	ed95c4d6ec	[BOLT][BTI] Add MCPlusBuilder::createBTI (#167305 ) - creates a BTI j\|c landing pad MCInst. - create getBTIHintNum utility in AArch64/Utils, to make sure BOLT generates BTI immediates the same way as LLVM. - add MCPlusBuilder unittests to cover new function.	2025-11-25 09:51:40 +01:00
Maksim Panchenko	5490bcf4aa	[BOLT] Add missing new line. NFC	2025-11-25 00:05:13 -08:00
Gergely Bálint	bab1c2971a	[BOLT] Extend Inliner to work on functions with Pointer Authentication (#162458 ) The inliner uses DirectSP to check if a function has instructions that modify the SP. Exceptions are stack Push and Pop instructions. We can also allow pointer signing and authenticating instructions. The inliner removes the Return instructions from the inlined functions. If it is a fused pointer-authentication-and-return (e.g. RETAA), we have to generate a new authentication instruction.	2025-11-24 18:00:58 +01:00
Raul Tambre	58d9e47672	[NFCI][bolt][test] Use AT&T syntax explicitly (#167225 ) This enables building LLVM with `-mllvm -x86-asm-syntax=intel` in one's Clang config files (i.e. a global preference for Intel syntax). `-masm=att` is insufficient as it doesn't override a specification of `-mllvm -x86-asm-syntax`.	2025-11-19 09:41:13 +02:00
YongKang Zhu	ac6daa8181	[BOLT][print] Add option '--print-only-file' (NFC) (#168023 ) With this option we can pass to BOLT names of functions to be printed through a file instead of specifying them all on command line.	2025-11-14 10:26:21 -08:00
Amir Ayupov	4c3e0320a1	[BOLT] Move call probe information to CallSiteInfo Pseudo probe matching (#100446) needs callee information for call probes. Embed call probe information (probe id, inline tree node, indirect flag) into CallSiteInfo. As a consequence: - Remove call probes from PseudoProbeInfo to avoid duplication, making it only contain block probes. - Probe grouping across inline tree nodes becomes more potent + allows to unambiguously elide block id 1 (common case). Block mask (blx) encoding becomes a low-ROI optimization and will be replaced by a more compact encoding leveraging simplified PseudoProbeInfo in #166680. The size increase is ~3% for an XL profile (461->475MB). Compact block probe encoding shrinks it by ~6%. Test Plan: updated pseudoprobe-decoding-{inline,noinline}.test Reviewers: paschalis-mpeis, ayermolo, yota9, yozhu, rafaelauler, maksfb Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/165490	2025-11-11 11:55:36 -08:00
Liu Ke	dee0afa048	[BOLT][DWARF] Slice .debug_str from the DWP for each CU (#159540 ) Slice .debug_str from the DWP for each CU using .debug_str_offsets and emit it, instead of directly copying the global .debug_str, in order to address the bloat issue of DWO after updates. (more details here - #155766 )	2025-11-11 11:46:34 +08:00
YongKang Zhu	4cd16f2a0c	[BOLT][AArch64] Add more heuristics on epilogue determination (#167077 ) Add more heuristics to check if a basic block is an AArch64 epilogue. We assume instructions that load from stack or adjust stack pointer as valid epilogue code sequence if and only if they immediately precede the branch instruction that ends the basic block.	2025-11-10 09:50:44 -08:00
Gergely Bálint	cd68056d13	[BOLT] Simplify RAState helpers (NFCI) (#162820 ) - unify isRAStateSigned and isRAStateUnsigned to a common getRAState, - unify setRASigned and setRAUnsigned into setRAState(MCInst, bool), - update users of these to match the new implementations.	2025-11-10 16:45:39 +01:00
Maksim Panchenko	f2c50f9305	[BOLT] Support restartable sequences in tcmalloc (#167195 ) Add `RSeqRewriter` to detect code references from `__rseq_cs` section and ignore function referenced from that section. Code references are detected via relocations (static or dynamic). Note that the abort handler is preceded by a 4-byte signature byte sequence and we cannot relocate the handler without that the signature, otherwise the application may crash. Thus we are ignoring the function, i.e. making sure it's not separated from its signature.	2025-11-09 12:43:50 -08:00
Kazu Hirata	7b1a74cd79	[BOLT] Use DenseMap::contains (NFC) (#167169 ) Identified with readability-container-contains.	2025-11-08 14:44:40 -08:00
Maksim Panchenko	af456dfa11	[BOLT] Refactor tracking internals of BinaryFunction. NFCI (#167074 ) In addition to tracking offsets inside a `BinaryFunction` that are referenced by data relocations, we need to track those relocations too. Plus, we will need to map symbols referenced by such relocations back to the containing function. This change introduces `BinaryFunction::InternalRefDataRelocations` to track the aforementioned relocations and expands `BinaryContext::SymbolToFunctionMap` to include local/temp symbols involved in relocation processing. There is no functional change introduced that should affect the output. Future PRs will use the new tracking capabilities.	2025-11-08 00:31:03 -08:00
Maksim Panchenko	7af2b56dd5	[BOLT] Refactor undefined symbols handling. NFCI (#167075 ) Remove internal undefined symbol tracking and instead rely on the emission state of `MCSymbol` while processing data-to-code relocations. Note that `CleanMCState` pass resets the state of all `MCSymbol`s prior to code emission.	2025-11-07 19:42:05 -08:00
Kazu Hirata	bddab8359e	[BOLT] Remove redundant declarations (NFC) (#166893 ) In C++17, static constexpr members are implicitly inline, so they no longer require an out-of-line definition. Identified with readability-redundant-declaration.	2025-11-07 07:58:24 -08:00
YongKang Zhu	6fce53af84	[BOLT][AArch64] Skip as many zeros as possible in padding validation (#166467 ) We are skipping four zero's at a time when validating code padding in case that the next zero would be part of an instruction or constant island, and for functions that have large amount of padding (like due to hugify), this could be very slow. We now change the validation to skip as many as possible but still need to be 4's exact multiple number of zero's. No valid instruction has encoding as 0x00000000 and even if we stumble into some constant island, the API `BinaryFunction::isInConstantIsland()` has been made to find the size between the asked address and the end of island (#164037), so this should be safe.	2025-11-06 09:38:25 -08:00
Ádám Kallai	a24eac88eb	[BOLT] Adding a unittest that covers Arm SPE PBT aggregation (#160095 ) When the SPE Previous Branch Target address (FEAT_SPE_PBT) feature is available, an SPE sample by combining this PBT feature, has two entries. Arm SPE records SRC/DEST addresses of the latest sampled branch operation, and it stores into the first entry. PBT records the target address of most recently taken branch in program order before the sampled operation, it places into the second entry. They are formed a chain of two consecutive branches. Where: - The previous branch operation (PBT) is always taken. - In SPE entry, the current source branch (SRC) may be either fall-through or taken, and the target address (DEST) of the recorded branch operation is always what was architecturally executed. However PBT doesn't provide as much information as SPE does. It lacks those information such as the address of source branch, branch type, and prediction bit. These information are always filled with zero in PBT entry. Therefore Bolt cannot evaluate the prediction, and source branch fields, it leaves them zero during the aggregation process. Tests includes a fully expanded example.	2025-11-06 09:54:44 +00:00
Maksim Panchenko	5f1b9023a8	[BOLT][AArch64] Fix printing of relocation types (#166621 ) Enumeration of relocation types is not always sequential, e.g. on AArch64 the first real relocation type is 0x101. As such, the existing code in `Relocation::print()` was crashing while printing AArch64 relocations. Fix it by using `llvm::object::getELFRelocationTypeName()`.	2025-11-05 12:36:57 -08:00
YongKang Zhu	b0ae054a56	[BOLT][AArch64] Fix LDR relocation type in ADRP+LDR sequence (#166391 ) `R_AARCH64_ADD_ABS_LO12_NC` is for the `ADD` instruction in the `ADRP+ADD` sequence. For `ADRP+LDR` sequence generated in LDR relaxation, relocation type for `LDR` should be `R_AARCH64_LDST64_ABS_LO12_NC` if it is 64-bit integer load or `R_AARCH64_LDST32_ABS_LO12_NC` if 32-bit. Sorry should have included this in #165787.	2025-11-05 12:01:58 -08:00
Elvina Yakubova	338fb02c98	[BOLT][NFC] Rename funtions with _negative suffix to _unknown when th… (#166536 ) …e size is unknown Keep _negative suffix only for test cases when the size is negative	2025-11-05 15:28:31 +00:00
Elvina Yakubova	a65867ac31	[BOLT][AArch64] Fix search to proceed upwards from memcpy call (#166182 ) The search should proceed from CallInst to the beginning of BB since X2 can be rewritten and we need to catch the most recent write before the call. Patch by Yafet Beyene alulayafet@gmail.com	2025-11-05 10:51:31 +00:00
Amir Ayupov	1d0aa6c2ad	[BOLT] Fix impute-fall-throughs (#166305 ) BOLT expects pre-aggregated profile entries to be unique, which holds for externally aggregated traces (or branches+fall-through ranges). Therefore, BOLT doesn't merge duplicate entries for faster processing. However, such traces are not expressly prohibited and could come from concatenated pre-aggregated profiles or otherwise. Relax the assumption about no duplicate (branch-only) traces in fall- through imputing. Test Plan: updated callcont-fallthru.s	2025-11-04 17:01:25 -08:00
YongKang Zhu	718a3b268f	[BOLT][AArch64] Run LDR relaxation (#165787 ) Replace the current `ADRRelaxationPass` with `AArch64RelaxationPass`, which, besides the existing ADR relaxation, will also run LDR relaxation that for now only handles these two forms of LDR instructions: `ldr Xt, [label]` and `ldr Wt, [label]`.	2025-11-04 06:49:04 -08:00
Jinjie Huang	f7be258c28	[BOLT][NFC] Clean up the outdated option --write-dwp in doc (#166150 ) Since the "--write-dwp" option has been removed in [PR](https://github.com/llvm/llvm-project/pull/100771), this patch also cleans up the corresponding document and test to avoid misleading issues.	2025-11-04 18:27:53 +08:00
Rafael Auler	285b57b1a6	Update BOLT's README.md example optimization flag (#166251 ) Drop hfsort in favor of a more modern function reordering algorithm.	2025-11-03 15:11:29 -08:00
YongKang Zhu	562e3bfcd4	[BOLT] Add an option for constant island cloning (#165778 ) Avoid cloning constant island helps to reduce app size, especially for BOLT optimization in which cloning would happen when a function is split into multiple fragments. Add an option to make the cloning optional, and we will introduce a new pass to handle the reference too far error that may result from disabling constant island cloning (#165787).	2025-11-03 14:44:05 -08:00
Maksim Panchenko	97660c1094	[BOLT] Issue error on unclaimed PC-relative relocation (#166098 ) Replace assert with an error and improve the report when unclaimed PC-relative relocation is left in strict mode.	2025-11-03 09:19:33 -08:00
Jakub Kuderski	4c21d0cb14	[ADT] Prepare to deprecate variadic `StringSwitch::Cases`. NFC. (#166020 ) Update all uses of variadic `.Cases` to use the initializer list overload instead. I plan to mark variadic `.Cases` as deprecated in a followup PR. For more context, see https://github.com/llvm/llvm-project/pull/163117.	2025-11-02 00:12:33 +00:00
Kazu Hirata	03d044971e	[ADT] Use a dedicated empty type for StringSet (NFC) (#165967 ) This patch introduces StringSetTag, a dedicated empty struct to serve as the "value type" for llvm::StringSet. This change is part of an effort to reduce the use of std::nullopt_t outside the context of std::optional.	2025-11-01 10:41:47 -07:00
Maksim Panchenko	7c01a90545	[BOLT] Refactor handling of branch targets. NFCI (#165828 ) Refactor code that verifies external branch destinations and creates secondary entry points.	2025-10-31 08:56:30 -07:00
Jinjie Huang	6ba2127a5c	[BOLT] Add constant island check in scanExternalRefs() (#165577 ) The [previous patch](https://github.com/llvm/llvm-project/pull/163418) has added a check to prevent adding an entry point into a constant island, but only for successfully disassembled functions. Because scanExternalRefs() is also called when a function fails to be disassembled or is skipped, it can still attempt to add an entry point at constant islands. The same issue may occur if without a check for it So, this patch complements the 'constant island' check in scanExternalRefs().	2025-10-31 10:29:00 +08:00
Amir Ayupov	04e78b4ddc	[BOLT][NFC] Drop unused profile staleness stats (#165489 ) Equal number of blocks in a function/instructions in a block between stale profile and the binary isn't used in the matching. Remove these stats to declutter the output. Test Plan: NFC	2025-10-29 00:31:56 -07:00
Gergely Bálint	e12e0d39a7	[BOLT] Fix thread-safety of MarkRAStates (#165368 ) The pass calls setIgnored() on functions in parallel, but setIgnored is not thread safe. This patch adds a std::mutex to guard setIgnored calls. Fixes: #165362	2025-10-28 12:43:52 +01:00
Liu Ke	8ee5c40fcf	[DebugInfo] Support to get TU for hash from .debug_types.dwo section in DWARF4. (#161067 ) Using the DWP's cu_index/tu_index only loads the DWO units from the .debug_info.dwo section for hash, which works fine in DWARF5. However, tu_index points to .debug_types.dwo section in DWARF4, which can cause the type unit to be lost due to the incorrect loading target. (Related discussion in [`811b60f`](`811b60f0b9`)) This patch supports to get the type unit for hash from .debug_types.dwo section in DWARF4.	2025-10-28 11:53:08 +08:00
Maksim Panchenko	cd27741c11	[BOLT] Remove CreatePastEnd parameter in getOrCreateLocalLabel(). NFC (#165065 ) CreatePastEnd parameter had no effect on the label creation. Remove it.	2025-10-25 22:16:15 -07:00
YongKang Zhu	b35c93ffe3	[BOLT] Avoid extra function dump on invalid BBs found by UCE (NFC) (#165111 )	2025-10-25 11:24:21 -07:00
Paschalis Mpeis	ae6cb98b29	[BOLT] Add --ba flag to deprecate --nl (#164257 ) The `--nl` flag, originally for Non-LBR mode, is deprecated and will be replaced by `--basic-events` (alias `--ba`). `--nl` remains as a deprecated alias for backward compatibility.	2025-10-23 10:13:28 +01:00
YongKang Zhu	e1ae126401	[BOLT][AArch64] Validate code padding (#164037 ) Check whether AArch64 function code padding is valid, and add an option to treat invalid code padding as error.	2025-10-22 20:25:06 -07:00
Asher Dobrescu	2bbc4ae850	[BOLT] Check entry point address is not in constant island (#163418 ) There are cases where `addEntryPointAtOffset` is called with a given `Offset` that points to an address within a constant island. This triggers `assert(!isInConstantIsland(EntryPointAddress)` and causes BOLT to crash. This patch adds a check which ignores functions that would add such entry points and warns the user.	2025-10-21 11:08:10 +01:00
Jakub Kuderski	d86da4efee	[ADT] Prepare for deprecation of StringSwitch cases with 4+ args. NFC. (#164173 ) Update `.Cases` and `.CasesLower` with 4+ args to use the `initializer_list` overload. The deprecation of these functions will come in a separate PR. For more context, see: https://github.com/llvm/llvm-project/pull/163405.	2025-10-20 12:03:46 -04:00
Paschalis Mpeis	96688d4b3c	[BOLT][NFC] Use brstack in guides and user outputs (#163950 ) Update guides to use brstack, with a mention to BRBE for AArch64. Use brstack in user-facing outputs. --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>	2025-10-20 09:30:06 +00:00

1 2 3 4 5 ...

2725 Commits