Taming ROP on Sandy Bridge - InfoCon 2013 Singapore/SyScan 2013... · Taming ROP on Sandy Bridge...

27
© 2013 CrowdStrike, Inc. All rights reserved. © 2013 CrowdStrike, Inc. All rights reserved. Taming ROP on Sandy Bridge Using Performance Counters to Detect Kernel Return-Oriented-Programming

Transcript of Taming ROP on Sandy Bridge - InfoCon 2013 Singapore/SyScan 2013... · Taming ROP on Sandy Bridge...

© 2013 CrowdStrike, Inc. All rights reserved. © 2013 CrowdStrike, Inc. All rights reserved.

Taming ROP on Sandy Bridge

Using Performance Counters to Detect

Kernel Return-Oriented-Programming

© 2013 CrowdStrike, Inc. All rights reserved. 2

Georg Wicherski

• Senior Security Researcher at CrowdStrike, Inc. – x86 & ARM low-level stuff (bad at MIPS)

– Reverse Engineering, Malware analysis

– Exploitation and Mitigation research

• @ochsff on Twitter

• http://blog.oxff.net/

© 2013 CrowdStrike, Inc. All rights reserved. © 2013 CrowdStrike, Inc. All rights reserved.

Introduction & Prerequisites

© 2013 CrowdStrike, Inc. All rights reserved. 4

Return Oriented Programming

• Academic generalization / reinvention of ret2libc – Chain small Gadgets ending in ret to implement payload

– Circumvention of NX / XN mitigations and Harvard architecture

pop esi

pop edi

ret

mov ecx, 0x100

ret

rep movsd

ret

© 2013 CrowdStrike, Inc. All rights reserved. 5

ROP in Kernel Context

• Today: jump to executable user-space page – Hardware mitigation for x86 SMEP with Haswell

– Existing software mitigation in Linux by PAX’ UDEREF

• Kernel ROP already a necessity on iOS

• Expecting kernel ROP for x86 on Linux – SMEP patches already in mainline

.text

.data

“shellcode”

.text

.data

.text

“shellcode”

© 2013 CrowdStrike, Inc. All rights reserved. 6

Intel Performance Counters

“When the CPU utilization does not tell you the utilization of the CPU”

• Count various Performance Events in Hardware – Cache hits and misses

– Special instructions executed

– Branch prediction related events

• Present on all newer Intel CPUs

• Signal counter overflow with interrupt

© 2013 CrowdStrike, Inc. All rights reserved. 7

Sandy Bridge Return Prediction

• Predicting indirect branches is important for performance

• Sandy Bridge maintains a 16 entry shadow stack of call-

sites / return addresses – Entirely hidden from program, part of branch prediction unit

– Only recognizes call / ret patterns (no push awareness, etc.)

• ROP naturally results in return mispredicts!

© 2013 CrowdStrike, Inc. All rights reserved. 8

Best Friend 0x8889

• 0x89 – BR_MISP_EXEC.*: mispredicted executed branches

• 0x800 – .RETURN_NEAR: normal, near ret

• 0x8000 – .TAKEN: unconditional branch

Counts mispredicted returns executed!

© 2013 CrowdStrike, Inc. All rights reserved. © 2013 CrowdStrike, Inc. All rights reserved.

Related Work

© 2013 CrowdStrike, Inc. All rights reserved. 10

Security breaches as PMU deviation: Detecting and identifying security attacks using performance counters

• Machine-learning-like approach on Performance Counters – Uses a multitude of different generic counters

– No evaluation of false positives etc. in this part

• Uses Debug-Store backed Last-Branch-Records – Every branch results in a multi-word memory write

• Performance evaluation only on analysis component – Ignores performance overhead of DS backed LBR

– No code relased, cannot reproduce

© 2013 CrowdStrike, Inc. All rights reserved. 11

Mitigating ROP via Last Branch Recording

• BlueHat 1st Prize, uses Last Branch Records – Seen in multiple places before, e.g. “Down to the Bare

Metal: Using Processor Features for Binary Analysis”

• Uses MSR backed LBR storage for speed – Only supports storing the 8/16 last branches on the

most recent dual-core CPUs

– Checks injected into API call hooks (w/ kernel call)

• Good performance, easily circumvented

© 2013 CrowdStrike, Inc. All rights reserved. 12

Practical Timing Side Channel Attacks

Against Kernel Space ASLR

• Uses CPU cache timing attacks to break ASLR – Deliberately cause traps and then check performance

accessing aliased cache lines

– Suggests some solutions but some are hard to implement

• Most operating systems did not even fix all the info leaks – Windows and Linux make it hard not to obtain kernel pointers

• KASLR is broken, we need a better mitigation

© 2013 CrowdStrike, Inc. All rights reserved. © 2013 CrowdStrike, Inc. All rights reserved.

Detecting ROP with PMCs

© 2013 CrowdStrike, Inc. All rights reserved. 14

User-Space Call Stack

System Call

Time

Depth

Predictable

Returns

16

© 2013 CrowdStrike, Inc. All rights reserved. 15

Kernel-Space Call Stack

Time

Depth

System Call

Predictable

Returns

16

© 2013 CrowdStrike, Inc. All rights reserved. 16

Kernel-Space ROP

Time

Depth

Interrupt

• Initialize counter to value

close to overflow – e.g. -8, because -1

interrupts on every legitimate mispredict

• “Detect!” ROP in interrupt

handler if not legitimate

mispredict

© 2013 CrowdStrike, Inc. All rights reserved. 17

Kernel-space Call Stack w/ PMC

Time

Depth

16

N counted

mispredicts

Interrupt

© 2013 CrowdStrike, Inc. All rights reserved. 18

Differentiating Mispredicts

• Use MSR Last Branch Records to get precise but performant information about the last 16 returns – This includes a bit indicating mispredicts – Address returned from (a ret instruction) – Address returned to

• Check address returned to for a preceding call –call rel32 exactly 5 bytes before address –call r/m32 1 to 7 bytes before

–Verify instruction ends exactly on address returned to

© 2013 CrowdStrike, Inc. All rights reserved. 19

Kernel-Space Call Stack w/ PMC

Time

Depth

16

N/2 counted

mispredicts

Interrupt

Interrupt

N/2 counted

mispredicts w/ LBR

© 2013 CrowdStrike, Inc. All rights reserved. © 2013 CrowdStrike, Inc. All rights reserved.

Implementation & Evaluation

© 2013 CrowdStrike, Inc. All rights reserved. 21

-grsec-ropmu

• Extension to grsecurity patch set – No UDEREF means no KROP required

– Applies to 3.8.5 kernel

• Uses return mispredict counter with alternating LBR

to find ROP

© 2013 CrowdStrike, Inc. All rights reserved. 22

Performance

• Compiled ROPMU 3.8.5 kernel w/ 8 threads – Source on 7200 RPM volume w/ AES-NI LUKS

– Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz

N System Elapsed Wall-Clock

2 135.42% 104.44%

8 128.02% 103.68%

32 130.2% 103.65%

© 2013 CrowdStrike, Inc. All rights reserved. 23

Performance – System Time

0

50

100

150

200

250

300

350

400

450

t in

s

Disabled N = 2 N = 8 N = 32

© 2013 CrowdStrike, Inc. All rights reserved. © 2013 CrowdStrike, Inc. All rights reserved.

Future Work: Circumvention

© 2013 CrowdStrike, Inc. All rights reserved. 25

cli (ideally in Pivot) D

epth

16

N/2 counted

mispredicts

Interrupt

Interrupt

N/2 counted

mispredicts w/ LBR

© 2013 CrowdStrike, Inc. All rights reserved. 26

cli Countermeasures

• Detection: Check counter value for positive signum at system call exit – Damage has been done already

• Use gcc plugin to instrument cli emission in kernel – Inadvert cli in “misaligned” instruction; just fa

–Luckily, ff ff is invalid (ff is inc/dec Group) – Are there any good cli Pivots today?

• Does not work against ROPMU prototype! – (RO)PMU uses Non-Maskable Interrupt Vector (?)

© 2013 CrowdStrike, Inc. All rights reserved.