AArch64: Implement support for the shadowcallstack attribute.

The implementation of shadow call stack on aarch64 is quite different to
the implementation on x86_64. Instead of reserving a segment register for
the shadow call stack, we reserve the platform register, x18. Any function
that spills lr to sp also spills it to the shadow call stack, a pointer to
which is stored in x18.

Differential Revision: https://reviews.llvm.org/D45239

llvm-svn: 329236
This commit is contained in:
Peter Collingbourne
2018-04-04 21:55:44 +00:00
parent 4296ea72ff
commit f11eb3ebe7
11 changed files with 241 additions and 35 deletions

View File

@@ -9,11 +9,11 @@ Introduction
============
ShadowCallStack is an **experimental** instrumentation pass, currently only
implemented for x86_64, that protects programs against return address
overwrites (e.g. stack buffer overflows.) It works by saving a function's return
address to a separately allocated 'shadow call stack' in the function prolog and
checking the return address on the stack against the shadow call stack in the
function epilog.
implemented for x86_64 and aarch64, that protects programs against return
address overwrites (e.g. stack buffer overflows.) It works by saving a
function's return address to a separately allocated 'shadow call stack'
in the function prolog and checking the return address on the stack against
the shadow call stack in the function epilog.
Comparison
----------
@@ -37,8 +37,16 @@ support.
Compatibility
-------------
ShadowCallStack currently only supports x86_64. A runtime is not currently
provided in compiler-rt so one must be provided by the compiled application.
ShadowCallStack currently only supports x86_64 and aarch64. A runtime is not
currently provided in compiler-rt so one must be provided by the compiled
application.
On aarch64, the instrumentation makes use of the platform register ``x18``.
On some platforms, ``x18`` is reserved, and on others, it is designated as
a scratch register. This generally means that any code that may run on the
same thread as code compiled with ShadowCallStack must either target one
of the platforms whose ABI reserves ``x18`` (currently Darwin, Fuchsia and
Windows) or be compiled with the flag ``-ffixed-x18``.
Security
========
@@ -56,28 +64,37 @@ has been checked and before it has been returned to. Modifying the call-return
semantics to fix this on x86_64 would incur an unacceptable performance overhead
due to return branch prediction.
The instrumentation makes use of the ``gs`` segment register to reference the
shadow call stack meaning that references to the shadow call stack do not have
to be stored in memory. This makes it possible to implement a runtime that
avoids exposing the address of the shadow call stack to attackers that can read
arbitrary memory. However, attackers could still try to exploit side channels
exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_ to discover
the address of the shadow call stack.
The instrumentation makes use of the ``gs`` segment register on x86_64,
or the ``x18`` register on aarch64, to reference the shadow call stack
meaning that references to the shadow call stack do not have to be stored in
memory. This makes it possible to implement a runtime that avoids exposing
the address of the shadow call stack to attackers that can read arbitrary
memory. However, attackers could still try to exploit side channels exposed
by the operating system `[1]`_ `[2]`_ or processor `[3]`_ to discover the
address of the shadow call stack.
.. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/
.. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf
.. _`[3]`: https://www.vusec.net/projects/anc/
Leaf functions are optimized to store the return address in a free register
and avoid writing to the shadow call stack if a register is available. Very
short leaf functions are uninstrumented if their execution is judged to be
shorter than the race condition window intrinsic to the instrumentation.
On x86_64, leaf functions are optimized to store the return address in a
free register and avoid writing to the shadow call stack if a register is
available. Very short leaf functions are uninstrumented if their execution
is judged to be shorter than the race condition window intrinsic to the
instrumentation.
On aarch64, the architecture's call and return instructions (``bl`` and
``ret``) operate on a register rather than the stack, which means that
leaf functions are generally protected from return address overwrites even
without ShadowCallStack. It also means that ShadowCallStack on aarch64 is not
vulnerable to the same types of time-of-check-to-time-of-use races as x86_64.
Usage
=====
To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack`` flag
to both compile and link command lines.
To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack``
flag to both compile and link command lines. On aarch64, you also need to pass
``-ffixed-x18`` unless your target already reserves ``x18``.
Low-level API
-------------
@@ -125,7 +142,20 @@ Generates the following x86_64 assembly when compiled with ``-O2``:
pop %rcx
retq
Adding ``-fsanitize=shadow-call-stack`` would output the following:
or the following aarch64 assembly:
.. code-block:: none
stp x29, x30, [sp, #-16]!
mov x29, sp
bl bar
add w0, w0, #1
ldp x29, x30, [sp], #16
ret
Adding ``-fsanitize=shadow-call-stack`` would output the following x86_64
assembly:
.. code-block:: gas
@@ -148,3 +178,16 @@ Adding ``-fsanitize=shadow-call-stack`` would output the following:
trap:
ud2
or the following aarch64 assembly:
.. code-block:: none
str x30, [x18], #8
stp x29, x30, [sp, #-16]!
mov x29, sp
bl bar
add w0, w0, #1
ldp x29, x30, [sp], #16
ldr x30, [x18, #-8]!
ret

View File

@@ -18,6 +18,7 @@
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Path.h"
#include "llvm/Support/SpecialCaseList.h"
#include "llvm/Support/TargetParser.h"
#include <memory>
using namespace clang;
@@ -375,6 +376,15 @@ SanitizerArgs::SanitizerArgs(const ToolChain &TC,
<< lastArgumentForMask(D, Args, Kinds & NeedsLTO) << "-flto";
}
if ((Kinds & ShadowCallStack) &&
TC.getTriple().getArch() == llvm::Triple::aarch64 &&
!llvm::AArch64::isX18ReservedByDefault(TC.getTriple()) &&
!Args.hasArg(options::OPT_ffixed_x18)) {
D.Diag(diag::err_drv_argument_only_allowed_with)
<< lastArgumentForMask(D, Args, Kinds & ShadowCallStack)
<< "-ffixed-x18";
}
// Report error if there are non-trapping sanitizers that require
// c++abi-specific parts of UBSan runtime, and they are not provided by the
// toolchain. We don't have a good way to check the latter, so we just

View File

@@ -814,7 +814,8 @@ SanitizerMask ToolChain::getSupportedSanitizers() const {
getTriple().getArch() == llvm::Triple::wasm32 ||
getTriple().getArch() == llvm::Triple::wasm64)
Res |= CFIICall;
if (getTriple().getArch() == llvm::Triple::x86_64)
if (getTriple().getArch() == llvm::Triple::x86_64 ||
getTriple().getArch() == llvm::Triple::aarch64)
Res |= ShadowCallStack;
return Res;
}

View File

@@ -562,6 +562,19 @@
// RUN: | FileCheck --check-prefix=CHECK-SHADOWCALLSTACK-LINUX-X86-64 %s
// CHECK-SHADOWCALLSTACK-LINUX-X86-64-NOT: error:
// RUN: %clang -fsanitize=shadow-call-stack %s -### -o %t.o 2>&1 \
// RUN: -target aarch64-unknown-linux -fuse-ld=ld \
// RUN: | FileCheck --check-prefix=CHECK-SHADOWCALLSTACK-LINUX-AARCH64 %s
// CHECK-SHADOWCALLSTACK-LINUX-AARCH64: '-fsanitize=shadow-call-stack' only allowed with '-ffixed-x18'
// RUN: %clang -fsanitize=shadow-call-stack %s -### -o %t.o 2>&1 \
// RUN: -target aarch64-unknown-linux -fuse-ld=ld -ffixed-x18 \
// RUN: | FileCheck --check-prefix=CHECK-SHADOWCALLSTACK-LINUX-AARCH64-X18 %s
// RUN: %clang -fsanitize=shadow-call-stack %s -### -o %t.o 2>&1 \
// RUN: -target arm64-unknown-ios -fuse-ld=ld \
// RUN: | FileCheck --check-prefix=CHECK-SHADOWCALLSTACK-LINUX-AARCH64-X18 %s
// CHECK-SHADOWCALLSTACK-LINUX-AARCH64-X18-NOT: error:
// RUN: %clang -fsanitize=shadow-call-stack %s -### -o %t.o 2>&1 \
// RUN: -target x86-unknown-linux -fuse-ld=ld \
// RUN: | FileCheck --check-prefix=CHECK-SHADOWCALLSTACK-LINUX-X86 %s

View File

@@ -212,6 +212,8 @@ ARM::EndianKind parseArchEndian(StringRef Arch);
ARM::ProfileKind parseArchProfile(StringRef Arch);
unsigned parseArchVersion(StringRef Arch);
bool isX18ReservedByDefault(const Triple &TT);
} // namespace AArch64
namespace X86 {

View File

@@ -917,3 +917,7 @@ ARM::ProfileKind AArch64::parseArchProfile(StringRef Arch) {
unsigned llvm::AArch64::parseArchVersion(StringRef Arch) {
return ARM::parseArchVersion(Arch);
}
bool llvm::AArch64::isX18ReservedByDefault(const Triple &TT) {
return TT.isOSDarwin() || TT.isOSFuchsia() || TT.isOSWindows();
}

View File

@@ -349,3 +349,18 @@ def CSR_AArch64_StackProbe_Windows
: CalleeSavedRegs<(add (sequence "X%u", 0, 15),
(sequence "X%u", 18, 28), FP, SP,
(sequence "Q%u", 0, 31))>;
// Variants of the standard calling conventions for shadow call stack.
// These all preserve x18 in addition to any other registers.
def CSR_AArch64_NoRegs_SCS
: CalleeSavedRegs<(add CSR_AArch64_NoRegs, X18)>;
def CSR_AArch64_AllRegs_SCS
: CalleeSavedRegs<(add CSR_AArch64_AllRegs, X18)>;
def CSR_AArch64_CXX_TLS_Darwin_SCS
: CalleeSavedRegs<(add CSR_AArch64_CXX_TLS_Darwin, X18)>;
def CSR_AArch64_AAPCS_SwiftError_SCS
: CalleeSavedRegs<(add CSR_AArch64_AAPCS_SwiftError, X18)>;
def CSR_AArch64_RT_MostRegs_SCS
: CalleeSavedRegs<(add CSR_AArch64_RT_MostRegs, X18)>;
def CSR_AArch64_AAPCS_SCS
: CalleeSavedRegs<(add CSR_AArch64_AAPCS, X18)>;

View File

@@ -414,6 +414,14 @@ bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(
MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc) {
// Ignore instructions that do not operate on SP, i.e. shadow call stack
// instructions.
while (MBBI->getOpcode() == AArch64::STRXpost ||
MBBI->getOpcode() == AArch64::LDRXpre) {
assert(MBBI->getOperand(0).getReg() != AArch64::SP);
++MBBI;
}
unsigned NewOpc;
bool NewIsUnscaled = false;
switch (MBBI->getOpcode()) {
@@ -481,6 +489,14 @@ static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI,
unsigned LocalStackSize) {
unsigned Opc = MI.getOpcode();
// Ignore instructions that do not operate on SP, i.e. shadow call stack
// instructions.
if (Opc == AArch64::STRXpost || Opc == AArch64::LDRXpre) {
assert(MI.getOperand(0).getReg() != AArch64::SP);
return;
}
(void)Opc;
assert((Opc == AArch64::STPXi || Opc == AArch64::STPDi ||
Opc == AArch64::STRXui || Opc == AArch64::STRDui ||
@@ -935,6 +951,18 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
// assumes the SP is at the same location as it was after the callee-save save
// code in the prologue.
if (AfterCSRPopSize) {
// Find an insertion point for the first ldp so that it goes before the
// shadow call stack epilog instruction. This ensures that the restore of
// lr from x18 is placed after the restore from sp.
auto FirstSPPopI = MBB.getFirstTerminator();
while (FirstSPPopI != Begin) {
auto Prev = std::prev(FirstSPPopI);
if (Prev->getOpcode() != AArch64::LDRXpre ||
Prev->getOperand(0).getReg() == AArch64::SP)
break;
FirstSPPopI = Prev;
}
// Sometimes (when we restore in the same order as we save), we can end up
// with code like this:
//
@@ -949,7 +977,7 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
// a post-index ldp.
// If we managed to grab the first pop instruction, move it to the end.
if (LastPopI != Begin)
MBB.splice(MBB.getFirstTerminator(), &MBB, LastPopI);
MBB.splice(FirstSPPopI, &MBB, LastPopI);
// We should end up with something like this now:
//
// ldp x24, x23, [sp, #16]
@@ -962,7 +990,7 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
//
// ldp x26, x25, [sp], #64
//
emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
emitFrameOffset(MBB, FirstSPPopI, DL, AArch64::SP, AArch64::SP,
AfterCSRPopSize, TII, MachineInstr::FrameDestroy);
}
}
@@ -1081,7 +1109,8 @@ struct RegPairInfo {
static void computeCalleeSaveRegisterPairs(
MachineFunction &MF, const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI, SmallVectorImpl<RegPairInfo> &RegPairs) {
const TargetRegisterInfo *TRI, SmallVectorImpl<RegPairInfo> &RegPairs,
bool &NeedShadowCallStackProlog) {
if (CSI.empty())
return;
@@ -1115,6 +1144,15 @@ static void computeCalleeSaveRegisterPairs(
RPI.Reg2 = NextReg;
}
// If either of the registers to be saved is the lr register, it means that
// we also need to save lr in the shadow call stack.
if ((RPI.Reg1 == AArch64::LR || RPI.Reg2 == AArch64::LR) &&
MF.getFunction().hasFnAttribute(Attribute::ShadowCallStack)) {
if (!MF.getSubtarget<AArch64Subtarget>().isX18Reserved())
report_fatal_error("Must reserve x18 to use shadow call stack");
NeedShadowCallStackProlog = true;
}
// GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
// list to come in sorted by frame index so that we can issue the store
// pair instructions directly. Assert if we see anything otherwise.
@@ -1165,9 +1203,24 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters(
DebugLoc DL;
SmallVector<RegPairInfo, 8> RegPairs;
computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs);
bool NeedShadowCallStackProlog = false;
computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs,
NeedShadowCallStackProlog);
const MachineRegisterInfo &MRI = MF.getRegInfo();
if (NeedShadowCallStackProlog) {
// Shadow call stack prolog: str x30, [x18], #8
BuildMI(MBB, MI, DL, TII.get(AArch64::STRXpost))
.addReg(AArch64::X18, RegState::Define)
.addReg(AArch64::LR)
.addReg(AArch64::X18)
.addImm(8)
.setMIFlag(MachineInstr::FrameSetup);
// This instruction also makes x18 live-in to the entry block.
MBB.addLiveIn(AArch64::X18);
}
for (auto RPII = RegPairs.rbegin(), RPIE = RegPairs.rend(); RPII != RPIE;
++RPII) {
RegPairInfo RPI = *RPII;
@@ -1231,7 +1284,9 @@ bool AArch64FrameLowering::restoreCalleeSavedRegisters(
if (MI != MBB.end())
DL = MI->getDebugLoc();
computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs);
bool NeedShadowCallStackProlog = false;
computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs,
NeedShadowCallStackProlog);
auto EmitMI = [&](const RegPairInfo &RPI) {
unsigned Reg1 = RPI.Reg1;
@@ -1280,6 +1335,17 @@ bool AArch64FrameLowering::restoreCalleeSavedRegisters(
else
for (const RegPairInfo &RPI : RegPairs)
EmitMI(RPI);
if (NeedShadowCallStackProlog) {
// Shadow call stack epilog: ldr x30, [x18, #-8]!
BuildMI(MBB, MI, DL, TII.get(AArch64::LDRXpre))
.addReg(AArch64::X18, RegState::Define)
.addReg(AArch64::LR, RegState::Define)
.addReg(AArch64::X18)
.addImm(-8)
.setMIFlag(MachineInstr::FrameDestroy);
}
return true;
}

View File

@@ -75,21 +75,25 @@ const MCPhysReg *AArch64RegisterInfo::getCalleeSavedRegsViaCopy(
const uint32_t *
AArch64RegisterInfo::getCallPreservedMask(const MachineFunction &MF,
CallingConv::ID CC) const {
bool SCS = MF.getFunction().hasFnAttribute(Attribute::ShadowCallStack);
if (CC == CallingConv::GHC)
// This is academic because all GHC calls are (supposed to be) tail calls
return CSR_AArch64_NoRegs_RegMask;
return SCS ? CSR_AArch64_NoRegs_SCS_RegMask : CSR_AArch64_NoRegs_RegMask;
if (CC == CallingConv::AnyReg)
return CSR_AArch64_AllRegs_RegMask;
return SCS ? CSR_AArch64_AllRegs_SCS_RegMask : CSR_AArch64_AllRegs_RegMask;
if (CC == CallingConv::CXX_FAST_TLS)
return CSR_AArch64_CXX_TLS_Darwin_RegMask;
return SCS ? CSR_AArch64_CXX_TLS_Darwin_SCS_RegMask
: CSR_AArch64_CXX_TLS_Darwin_RegMask;
if (MF.getSubtarget<AArch64Subtarget>().getTargetLowering()
->supportSwiftError() &&
MF.getFunction().getAttributes().hasAttrSomewhere(Attribute::SwiftError))
return CSR_AArch64_AAPCS_SwiftError_RegMask;
return SCS ? CSR_AArch64_AAPCS_SwiftError_SCS_RegMask
: CSR_AArch64_AAPCS_SwiftError_RegMask;
if (CC == CallingConv::PreserveMost)
return CSR_AArch64_RT_MostRegs_RegMask;
return SCS ? CSR_AArch64_RT_MostRegs_SCS_RegMask
: CSR_AArch64_RT_MostRegs_RegMask;
else
return CSR_AArch64_AAPCS_RegMask;
return SCS ? CSR_AArch64_AAPCS_SCS_RegMask : CSR_AArch64_AAPCS_RegMask;
}
const uint32_t *AArch64RegisterInfo::getTLSCallPreservedMask() const {

View File

@@ -24,6 +24,7 @@
#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"
#include "llvm/CodeGen/MachineScheduler.h"
#include "llvm/IR/GlobalValue.h"
#include "llvm/Support/TargetParser.h"
using namespace llvm;
@@ -151,8 +152,8 @@ AArch64Subtarget::AArch64Subtarget(const Triple &TT, const std::string &CPU,
const std::string &FS,
const TargetMachine &TM, bool LittleEndian)
: AArch64GenSubtargetInfo(TT, CPU, FS),
ReserveX18(TT.isOSDarwin() || TT.isOSFuchsia() || TT.isOSWindows()),
IsLittle(LittleEndian), TargetTriple(TT), FrameLowering(),
ReserveX18(AArch64::isX18ReservedByDefault(TT)), IsLittle(LittleEndian),
TargetTriple(TT), FrameLowering(),
InstrInfo(initializeSubtargetDependencies(FS, CPU)), TSInfo(),
TLInfo(TM, *this) {
CallLoweringInfo.reset(new AArch64CallLowering(*getTargetLowering()));

View File

@@ -0,0 +1,47 @@
; RUN: llc -verify-machineinstrs -o - %s -mtriple=aarch64-linux-gnu -mattr=+reserve-x18 | FileCheck %s
define void @f1() shadowcallstack {
; CHECK: f1:
; CHECK-NOT: x18
; CHECK: ret
ret void
}
declare void @foo()
define void @f2() shadowcallstack {
; CHECK: f2:
; CHECK-NOT: x18
; CHECK: b foo
tail call void @foo()
ret void
}
declare i32 @bar()
define i32 @f3() shadowcallstack {
; CHECK: f3:
; CHECK: str x30, [x18], #8
; CHECK: str x30, [sp, #-16]!
%res = call i32 @bar()
%res1 = add i32 %res, 1
; CHECK: ldr x30, [sp], #16
; CHECK: ldr x30, [x18, #-8]!
; CHECK: ret
ret i32 %res
}
define i32 @f4() shadowcallstack {
; CHECK: f4:
%res1 = call i32 @bar()
%res2 = call i32 @bar()
%res3 = call i32 @bar()
%res4 = call i32 @bar()
%res12 = add i32 %res1, %res2
%res34 = add i32 %res3, %res4
%res1234 = add i32 %res12, %res34
; CHECK: ldp {{.*}}x30, [sp
; CHECK: ldr x30, [x18, #-8]!
; CHECK: ret
ret i32 %res1234
}