More Sensitive Applications in Public Clouds Security in Public...

12
Isolation in Public Clouds: Challenges, Threats and Defenses Venkatanathan Varadarajan [email protected] 1 Ph.D. Defense — November 30 th 2015 Committee: Prof. Michael Swift (advisor), Prof. Thomas Ristenpart (advisor), Prof. Aditya Akella, Prof. David Wood, and Prof. Nigel Boston 1 More Sensitive Applications in Public Clouds 2 More are migrated to the public cloud! 2 Cloud API Internet Security in Public Clouds — An Analogy VM VM VM VM VM VM VM Web Interface 3 Internet Multi-tenancy + Public *MIT CSAIL - RF Capture, “Wifi X-ray Vision 3 A Threat in Public Clouds 4 VM VM VM VM VM VM VM VM Cloud API Internet VM Target VMs Co-location 4 A Threat in Public Clouds 5 VM VM VM VM VM VM VM VM Cloud API Internet VM Co-location 5 An Example Cross-VM Attack: Side-channels 6 VM VM Time Core: A V cache sets cache ways Prime A Probe Attacker Timing Profile Extract secret information Ref: Zhang, Juels, Reiter, Ristenpart, “CrossVM Sidechannels …”, CCS’12 Icache Steal secret key and read secure email communications within ~6 hours! 6

Transcript of More Sensitive Applications in Public Clouds Security in Public...

Page 1: More Sensitive Applications in Public Clouds Security in Public …pages.cs.wisc.edu/~venkatv/venkat-defense-slides.pdf · 2016-03-24 · Internet Security in Public Clouds — An

Isolation in Public Clouds: Challenges, Threats and Defenses

Venkatanathan Varadarajan [email protected]

1Ph.D. Defense — November 30th 2015

Committee: Prof. Michael Swift (advisor), Prof. Thomas Ristenpart (advisor), Prof. Aditya Akella, Prof. David Wood, and Prof. Nigel Boston

1

More Sensitive Applications in Public Clouds

2

More are migrated to the public cloud!

2

Cloud API

Internet

Security in Public Clouds — An Analogy

VMVM

VM

VM

VM

VM

VM

VM

Web Interface

3

Internet

Multi-tenancy + Public

*MIT CSAIL - RF Capture, “Wifi X-ray Vision

3

A Threat in Public Clouds

4

VM

VM

VM

VM

VM

VM

VM

VM

Cloud API

Internet

VM

Target VMs

Co-location

4

A Threat in Public Clouds

5

VM

VM

VM

VM

VM

VMVM

VM

Cloud API

Internet

VM

Co-location

5

An Example Cross-VM Attack: Side-channels

6

VM

VM Time

Core: A V

cache  sets

cache  ways

Prime

A

Probe

Attacker  Timing  Profile

Extract  secret  information

Ref:  Zhang,  Juels,  Reiter,  Ristenpart,  “Cross-­‐VM  Side-­‐channels  …”,  CCS’12

I-­‐cache

Steal secret key and read secure email communications within ~6 hours!

6

Page 2: More Sensitive Applications in Public Clouds Security in Public …pages.cs.wisc.edu/~venkatv/venkat-defense-slides.pdf · 2016-03-24 · Internet Security in Public Clouds — An

Problem: Cross-VM AttacksTwo steps:

1. Place VM on the same host as victim, and

2. Exploit sharing and lack of isolation to,

• Steal secrets, — e.g., side-channel attacks — L1 D/I cache, TLB, Branch Predictors,

LLC, memory … • Affect performance,

— e.g., performance degradation, DoS attacks.

7

Isolation: “two user tasks are isolated if one cannot know about the

execution of the other due to resource sharing.”

7

Cluster Scheduler

Placement & Isolation in Public Clouds

8

Cloud API

Placement Policy

VMVM

VM

VM

VM

VM

VMVM

Hypervisor

Internet

(1) Controls co-location

(2) Provides isolation

• Cross-VM side- & covert-channels [Ristenpart’09, Xu’11, Wu’12, Zhang’12, Yarom’14, Zhang’14, Liu’15, Inci’15]

• Resource-stealing, DoS [Zhou’11, Varadarajan’12]

• Co-location Attack [Ristenpart’09, Zhang’14, Varadarajan’15, Xu’15]

8

VMVM

VM

VM

VM

VM

VMVM

In my dissertation

1. Is co-location practical in modern clouds?

2. Are there unique opportunities for malicious users to exploit the lack of isolation for performance gains?

3. Can isolation be improved without compromising the efficiency?

9

Cloud API

Cluster Scheduler

Placement Policy

Internet

Practice of multi-tenancy demands stronger isolation between VMs in public clouds.

9

Cluster Scheduler

Cloud API

Placement Policy

VMVM

VM

VM

VM

VM

VMVM

Internet

1. Is Co-location Practical in Modern Clouds?

10

Let’s assume 50,000 servers and a randomized placement policy,With 30 target VMs.

How many VMs do you need to launch to achieve a successful co-location 50% of the time?

Think about its cost ($$ and time)

My work: can achieve co-location with > 90% chance for a minimum of 14 cents with 10-30 VMs!

~ 1000 to 3000 VMs

10

Perfo

rman

ce D

egra

datio

n (%

)

0

150

300

450

600

CPU Net Disk Cache

2. Malicious use of Performance Interference

11

Local Xen Testbed

Machine Intel Xeon E5430, 2.66 Ghz

CPU 2 packages each with 2 cores

Cache Size 6MB per package

VM

VM3x-­‐6x  Performance  loss

My work: A greedy customer could improve his performance by interfering with neighbors

11

3. Improving Isolation w/o Compromising Efficiency

12

VM

VM Time

Core: A V

cache  sets

cache  ways

Prime

A

Probe

Attacker  Timing  Profile

Extract  secret  information

*Zhang,  Juels,  Reiter,  Ristenpart,  “Cross-­‐VM  Side-­‐channels  …”,  CCS’12

I-­‐cache

My work: Design principle soft-isolation to defend against cross-VM side-channels

Shared resource

Hard-isolation No sharing: ⬆ isolation ⬇ efficiency

12

Page 3: More Sensitive Applications in Public Clouds Security in Public …pages.cs.wisc.edu/~venkatv/venkat-defense-slides.pdf · 2016-03-24 · Internet Security in Public Clouds — An

Dissertation Summary

13

Cluster Scheduler

Cloud API

Placement Policy

VMVM

VM

VM

VM

VM

VMVM

Is co-location attacks possible, cheap?

[Security’15]

Stealing performance from neighbors [CCS’12]

Better isolation in CPU schedulers to prevent

side-channels [Security’14]

Practice of multi-tenancy demands stronger isolation between VMs in public clouds.

1

2

3

13

Prior Work on Co-location: 2009 Study

• 6 years old

• New countermeasures (e.g, virtual private clouds)

• Increased scale of clouds

• Only on Amazon EC2

14

VM

VM

VM

VM

VM

VM

VM

VM

Cloud API

VM

Target VMs

Co-location*Hey you get off my cloud, Ristenpart et al., CCS 2009

Cluster Scheduler

Placement Policy

Internet

Is co-location still viable in security-hardened modern clouds?

14

Our Work: Exploring Co-location Attacks in Modern Clouds

Steps in achieving co-location:

1. Finding Launch Strategy

• launch parameters to increase chances of co-location

2. Detecting Co-location

• with any target victim

15

VM

VM

VM

VM

VM

VM

VM

VM

Cloud API

VM

Target VMs

Detect co-location

Cluster Scheduler

Placement Policy

Internet

15

Co-residency Detection

16

1. Read shared state on two VMs

e.g., private IP addresses, shared TSC counters.

2. Correlate performance of shared resources

e.g., network round-trip times, cache-based covert-channels.

A memory-based covert-channel* can cause 3x-4x degradation

* Wu et al. “Whispers in the Hyper-space: High-speed Covert Channel Attacks in the Cloud.”, Usenix Security’12

VMVM

n/w pings or covert-channels

VMVM102.2.1.3102.2.1.1

16

Background: Memory Covert-Channel

17

LLC

Core (Private Caches)

Core (Private Caches)

LLC

Memory

VMVM

Shared Memory Bus

Sender:Signal() { repeat atomic_op(ua_addr) done }

Receiver:Observe() { s = start_time repeat N mem_access() done e = end_time bw = N/(e-s) }

* Wu et al. “Whispers in the Hyper-space: High-speed Covert Channel Attacks in the Cloud.”, Usenix Security’12

“Victim” VM must cooperate with attack VMO.K. for measurement studies

But not useful for real attacks in the wild

17

A

Co-residency Detection on Uncooperative Victim

18

Data Store

VM

VMpart of a cloud app,

e.g., web server, key-value storesTarget Victims

Load Balancer

Send large # requests & measure avg. latency

• Realistic victim setting: modern multi-tier cloud app. • A way to detect co-location when VMs are behind a

load-balancer in presence of background traffic

background load

Sender()Receiver()

VM

18

Page 4: More Sensitive Applications in Public Clouds Security in Public …pages.cs.wisc.edu/~venkatv/venkat-defense-slides.pdf · 2016-03-24 · Internet Security in Public Clouds — An

1

10

100

1000

idle 100 250 500 750 1000

Aver

age

Req

uest

Lat

ency

(ms)

(log

scal

e)

Background Load (# concurrent users)

Non-coresident Co-resident

Co-residency Detection on Uncooperative Victim

19

45x

Social networking application (Olio):

HAProxy LB + 3 Web servers + 1 mysql server

19

Our Work: Exploring Co-location Attacks in Modern Clouds

Steps in achieving co-location:

1. Finding Launch Strategy

• launch parameters to increase chances of co-location

2. Detecting Co-location

• with any target victim

20

VM

VM

VM

VM

VM

VM

VM

VM

Cloud API

VM

Target VMs

Detect co-location

Cluster Scheduler

Placement Policy

Internet

VM

20

Placement Policy

Big Picture: Placement Vulnerability Study

Study spanning 3 months, exploring 6 placement variables, spending more than $200 per cloud

21

Placement Variables

Co-location?

e.g., # VMs, when you launch, datacenter, VM type, etc.

Fix Placement Variables

Observe Placement Behavior

21

Study Setup

• Two distinct accounts: proxy for victim and attacker • 6 placement variables

• # victim & attacker VMs, delay b/w launches, time of day, day of week, datacenter, cloud providers

• Small instance type (EC2: t2.small, GCE: g1.small, Azure: Standard-A1)

• Values for these variables form a launch strategy

• Execute a launch strategy from a workstation • detect and log co-location

• 9 samples per strategy with 3 runs per time of day & 2 days of week (weekday/weekend)

22

22

Placement Policy

How hard should it be to achieve co-location?

• Random placement policy • N = 50,000 machines [re:Invent’14] • v - victims and a - attacker VMs • Probability of Collision:

Pc = 1 - (1 - v/N)a

23

Cluster Scheduler

Cloud API

Placement Policy

VMVM

VM

VM

VM

VM

VMVM v a = ln(1− Pc)/ln(1−v/N); Pc= 0.5

10 3466

20 1733

30 1155

For a modest 50% success rate with 10-30 victims we need to launch ~3000 VMs

23

Results: Varying Number of VMs

24

EC2

Succ

ess

Rate

0

0.25

0.5

0.75

1

10v 10a10v 20a10v 30a10v 30a20v 30a30v 30a

GCE

0

0.25

0.5

0.75

1

10v 10a10v 20a10v 30a10v 30a20v 30a30v 30a

Azure

0

0.25

0.5

0.75

1

10v 10a10v 20a10v 30a10v 30a20v 30a30v 30a

Co-location is possible with as low as 10 VMs and always achieve co-location with 30 VMs

24

Page 5: More Sensitive Applications in Public Clouds Security in Public …pages.cs.wisc.edu/~venkatv/venkat-defense-slides.pdf · 2016-03-24 · Internet Security in Public Clouds — An

Results: Varying Delay between Launches

25

Azure

0

0.25

0.5

0.75

1

10v 10a

20v 20a

30v 30a

GCE

0

0.25

0.5

0.75

1

10v 10a

20v 20a

30v 30a

Zero One hour

EC2

Succ

ess

Rate

0

0.25

0.5

0.75

1

10v 10a

20v 20a

30v 30a

Placement policy for each cloud significantly varies

25

Cost of a Launch Strategy

26

Cos

t

$0.00

$1.50

$3.00

$4.50

$6.00

10v 10a

20v 20a

30v 30a

Tim

e to

ach

ieve

& d

etec

t co-

loca

tion

(in m

inut

es)

0

45

90

135

180

10v 10a

20v 20a

30v 30a

EC2 GCE AZURE

The cheapest launch strategy costs as low as 14 cents

26

• We always achieved co-location in smaller datacenter regions,

• GCE: europe-west1-b and EC2: us-west-1 (CA)

• In EC2, launching attacker VMs early morning (2 to 10am PST) has a higher success rate.

• In Azure we could co-locate 16 VMs on a single host

• Brief experiments with platform-as-a-service,

• … and many more in the paper

Other Interesting Results

27

27

Placement Vulnerability [Security’15]

Outline

1. Is co-location practical in modern clouds?

2. Are there unique opportunities for malicious users to exploit the lack of isolation for performance gains?

3. Can isolation be improved without compromising the efficiency?

28

Practice of multi-tenancy in public clouds demands stronger isolation between VMs in the presence of malicious users.

Resource-Freeing Attacks [CCS’12]

CPU Scheduler-based defenses [Security’14]

28

Perfo

rman

ce D

egra

datio

n (%

)

0

150

300

450

600

CPU Net Disk Cache

Resource Contention in Public Clouds

29

Local Xen Testbed

Machine Intel Xeon E5430, 2.66 Ghz

CPU 2 packages each with 2 cores

Cache Size 6MB per package

VM

VM

Non-work-conserving CPU scheduling

Work-conserving scheduling

3x-­‐6x  Performance  loss  à Higher  cost

29

This  work:    Greedy customer can recover performance by interfering with other tenants

What can a tenant do?

30

Pack up VM and move(see our SOCC 2012 paper)… but, not all workloads cheap to move

VM

VM

Ask provider for better isolation… requires overhaul of the cloud

Resource-­‐Freeing  Attack

30

Page 6: More Sensitive Applications in Public Clouds Security in Public …pages.cs.wisc.edu/~venkatv/venkat-defense-slides.pdf · 2016-03-24 · Internet Security in Public Clouds — An

The Setting

Victim: – One or more VMs – Public interface (eg, http)

Beneficiary:  – VM whose performance we want to improve  

Helper:  – Mounts the attack  

Beneficiary and victim fighting over a target  resource

Helper

31

VM

VM

Victim

Beneficiary

31

Beneficiary & Victim • Apache webservers hosting static

and dynamic (CGI) web pages • Target Resources: Network

Bandwidth • Work-conserving scheduler

— half the bandwidth

Example: Network Contention

32

Net

Clients

VictimBeneficiary

Local  Xen

 Test  bed

What  can  you  do?

32

Recipe for a Successful RFA

Shi. resource away from the target  resource  towards the bo0leneck resource

33

Shift  resource  usage    via  public  interface

Proportion  of  Network  usage

CPU  intensive  dynamic  pages

Static  pagesProp

ortio

n  of  CPU

 usage

Push  to

wards  CPU

 bottle

neck

Reduce  target  resource  usage

Limits

33

An RFA in Our Example

In our testbed:

Increases beneficiary’s share of bandwidth

No RFA: 1800 page requests/sec W/ RFA: 3026 page requests/sec

34

Net

Helper

CGI  Req

uest

CPU  Utilization

Clients

50%  à85% share  of  bandwidth

34

Summary: Resource Freeing Attacks

1) Send targeted requests to victim 2) Shift resources use from target to a bottleneck

Similar RFA on other resources exists, e.g. CPU Cache Bandwidth

35

On EC2, we reduced contention by 66.5% = performance improvement of 3-13%

“Resource-Freeing Attacks: Improve Your Cloud Performance (at Your Neighbor’s Expense)”, ACM CCS 2012

35

1. Work-conserving vs. Non-work-conserving

2. Effects of simple pay-per-hour pricing model

• $ per unit time != $ per useful work-done

What did we learn from RFAs?

36

Isolation

Efficiency

Work-conserving

Non work-conserving

Hybrid

36

Page 7: More Sensitive Applications in Public Clouds Security in Public …pages.cs.wisc.edu/~venkatv/venkat-defense-slides.pdf · 2016-03-24 · Internet Security in Public Clouds — An

Placement Vulnerability [Security’15]

Outline

1. Is co-location practical in modern clouds?

2. Are there unique opportunities for malicious users to exploit the lack of isolation for monetary gains?

3. Can isolation be improved without compromising the efficiency?

37

Practice of multi-tenancy in public clouds demands stronger isolation between VMs in the presence of malicious users.

Resource-Freeing Attacks [CCS’12]

CPU Scheduler-based defenses [Security’14]

37

Problem: Cache-based Side-channels*

38

VM

VMTime

Core: A V

cache  sets

cache  ways

Prime

A

Probe

Attacker  Timing  Profile

Extract  secret  information

CRYPTO_FUNCTION(s): s ß secret bit

if (s = 0) { OPERATION_A } if (s = 1) { OPERATION_B }

*Zhang,  Juels,  Reiter,  Ristenpart,  “Cross-­‐VM  Side-­‐channels  …”,  CCS’12

I-­‐cache

Secret

38

Requirements for Successful Side-channel

39

VM

VMTime

Core: A V

cache  sets

cache  ways

Prime

A

Probe

Attacker  Timing  Profile

Extract  secret  information

CRYPTO_FUNCTION(s): s ß secret bit

if (s = 0) { OPERATION_A } if (s = 1) { OPERATION_B }

*Zhang,  Juels,  Reiter,  Ristenpart,  “Cross-­‐VM  Side-­‐channels  …”,  CCS’12

I-­‐cache

Shared resource

High-precision timers

Quick preemptions

Secret

39

Defenses against Side-channels1. Sharing  

• Resource Partitioning [NoHype’10] • Specialized Hardware [RPcache’07] • Software-based partitioning [StealthMem’12]

2. Access  to  high-­‐resoluFon  Fmers  • Reduce resolution [TimeWarp’12] • Removing timing channel [StopWatch’13]

3. Quick  cross-­‐VM  preempFons  • No prior work!

40

No  countermeasures  deployed  by  providers!

40

Our Solution: Soft Isolation

Allow sharing but limit frequency of dangerous cross-VM interactions

Goals: 1. Secure: Controlled information leakage 2. Commodity:  Easy to adopt 3. Efficient:  Allow sharing, low overhead

41

…  with  simple  changes  to  Hypervisor’s  CPU  scheduler

Core  

(per  core  state)  

Private  Caches

Hypervisor

VM

VM

41

Requirement for Quick Preemptions

42

VM

VM

V

Time

Core: A

cache  sets

Prime

A

Probe

cache  ways

Preemption  Interval

V V

Rate  of  preemption  >  Rate  of  event  to  measure

CRYPTO_FUNCTION(s): s ß secret bit

if (s = 0) { OPERATION_A } if (s = 1) { OPERATION_B }

Next  subsequent  code/taskexecution  …  (or  noise)

42

Page 8: More Sensitive Applications in Public Clouds Security in Public …pages.cs.wisc.edu/~venkatv/venkat-defense-slides.pdf · 2016-03-24 · Internet Security in Public Clouds — An

State-­‐of-­‐art  CPU  schedulers

Batch  VMs Interactive  VMs

Why do schedulers allow quick preemptions?

43

Prime-­‐probe  attacker:  Abuses  BOOST  priority,  using  interrupts.

Throughput-­‐oriented:  Benefits  from  longer  scheduler  timeslices

Latency-­‐oriented:  Benefits  from  quick  wakeups  ,    BOOST  priority

V

Time

Core: A V A

<  10µs

Malicious  VM

43

V

Time

AA

Min.  runFme (scheduler  parameter)

Interrupt(boosted)

V V

Soft-Isolation: Ratelimit Preemptions

44

VM

VM

Core:

Available in Xen (and KVM) • ratelimit_us (and sched_min_granularity_ns)

• Reduces VM-switches à Boosts batch-workload’s performance

Minimum RunTime (MRT) guarantee à soft-isolation

44

MRT Guarantee and Open Questions

45

TimeCore: AV V

1. Can MRT defend against Cross-VM Side-channels?(security evaluation)

delay

2. Trade-off between securityand performance?(performance overhead)

MRT  value

45

Security Evaluation : Prime-Probe Timing Profile

46

Idle  Victim  VM Simple  Victim  VM  Under  Zero-­‐MRT

Alternating  usage  patterni-­‐cache  access  timing

46

Security Evaluation : Prime-Probe Timing Profile

47

Simple  Victim  VM  Under  Zero-­‐MRT

Simple  Victim  VM  Under  1ms  MRT

Side-­‐channel  not  discernible Alternating  usage  pattern

47

More details in the paper …

• Detailed Performance and Security Analysis • 20+ graphs in the paper

• Per-core State-Cleansing • Interactive VMs may still leak information • MRT + State-cleansing incur low overhead

48

It  is  cheap  and  easy  to  deploy!

“Scheduler-­‐based  Defenses  against  Cross-­‐VM  Side-­‐channels”,  Usenix  Security’14   https://bitbucket.org/vvaradarajan/robsched

48

Page 9: More Sensitive Applications in Public Clouds Security in Public …pages.cs.wisc.edu/~venkatv/venkat-defense-slides.pdf · 2016-03-24 · Internet Security in Public Clouds — An

1. Is co-location practical in modern clouds? Yes, and surprisingly cheap!

2. Exploiting the lack of isolation for performance gains? Yes, reduce contention by gaming schedulers.

3. Improving isolation w/o compromising efficiency? Soft-Isolation

Placement Vulnerability [Security’15]

Conclusion

49

Practice of multi-tenancy in public clouds demands stronger isolation between VMs in the presence of malicious users.

Resource-Freeing Attacks [CCS’12]

CPU Scheduler-based defenses [Security’14]

49

Thanks to …

50

50

References

(1) Varadarajan, V., Kooburat, T., Farley, B., Ristenpart, T., & Swift, M. M. Resource-freeing attacks: improve your cloud performance (at your neighbor's expense). ACM CCS 2012

(2) Varadarajan, V., Ristenpart, T., & Swift, M. Scheduler-based defenses against cross-vm side-channels. In Usenix Security 2014.

(3) Varadarajan, V., Zhang, Y., Ristenpart, T., & Swift, M. A placement vulnerability study in multi-tenant public clouds. In Usenix Security 2015.

(4) Farley, B., Juels, A., Varadarajan, V., Ristenpart, T., Bowers, K. D., & Swift, M. M. More for your money: exploiting performance heterogeneity in public clouds. ACM SoCC 2012.

(5) Volos, H., Nalli, S., Panneerselvam, S., Varadarajan, V., Saxena, P., & Swift, M. M.. Aerie: Flexible file-system interfaces to storage-class memory. ACM Eurosys 2014.

51

51

Backup Slides

52

52

Results: Co-residency Detection in Public Clouds

53

Amazon EC2 (EC2)

0

5

10

15

20

Performance Degradation

0.8 1.4 2 2.6 3.2 3.8 4.4 5 5.6

167

Non-coresident Co-residentGoogle Compute Engine (GCE)

Num

ber o

f VM

pai

rs

0

5

10

15

20

Performance Degradation0.8 1.4 2 2.6 3.2 3.8 4.4 5 5.6

53

Co-residency Detection in Public Clouds (contd.)

• Heterogeneity at play on Azure AMD machines

• Requiring two receivers to detect co-residency

• Memory-probing or memory-locking: > 1.5x degradation

54

Num

ber o

f VM

pai

rs

0

4.5

9

13.5

18

Performance Degradation

0 0.8 1 1.2 1.4 1.5 1.6 1.8 2

mprobe-NC mlock-NC mprobe-C mlock-C

54

Page 10: More Sensitive Applications in Public Clouds Security in Public …pages.cs.wisc.edu/~venkatv/venkat-defense-slides.pdf · 2016-03-24 · Internet Security in Public Clouds — An

Limitations

1. Although an exhaustive study

— results limited to small instance type, three clouds, 9 runs per configuration, two user accounts etc.

2. Positive co-residency signal != exploitable

— may share only a small set of resources (e.g., memory, network)

3. A result of an unlucky placement policy settings?

55

“A Placement Vulnerability Study in Multi-tenant Public Clouds”, Usenix Security 2015

55

Challenge 1: Placement Policy

• An unknown placement policy places VM on a host

• Variables may influence VM placement

— Parameters: type of VM, time of launch, # VMs launched, datacenter region, etc.

— Environment variables: time of day, day of week, # machines, etc.

56

VMVM

VM

VM

VMVM

VM

Cloud API

Internet

Placement Policyfirst fit?

best fit?…

56

Challenge 2: Co-residency Detection

• No explicit co-residency information from cloud provider

• Multiple layers of strong isolation

— Virtual machine — e.g. hardware virtualization

— Network — e.g. Amazon VPC

— Modern hardware, etc.

57

VMVM

VM

VM

VMVM

VM

Cloud API

Internet

Placement PolicyDetecting

co-residency

Placement Policy

57

Some Strategies Work Better than Others

58

Launch Strategy v x a Cost in Cloud

Cost underRandom Placement*

Success rate norm. w/ random*

Launch 10 VMs in less popular datacenter 10x10 $0.26 $113.87 1/0.1 (=10)

Launch 30 VMs 1 hour after victim VM launches 30x30 $1.56 $32.75 1/0.6 (=1.67)

Launch more than 20 VMs 4 hours after victim VM

launches20x20 $0.52 $53.76 1/0.33 (=3.03)

Example strategies on EC2

*Random Placement of VMs on N hosts, v x a launch strategy has a probability of collision: 1 - (1 - v/N)a

58

Resource Freeing Attacks

1) Send targeted requests to victim 2) Shift resources use from target to a bottleneck

59

Can  we  mount  RFAs  when  target  resource  is  CPU  cache?

Shared CPU Cache: • Ubiquitous: Almost all workloads need cache • Hardware  controlled: Not easily isolated via software • Performance  SensiKve:  High performance cost!

59

Case Study: Cache vs. Network

• VicFm  : Apache webserver hosting static and dynamic (CGI) web pages

• Beneficiary: Synthetic cache bound workload (LLCProbe)

• Target  Resource: Cache

• No cache isolation: ~3x slower when sharing cache with webserver

60

Net

Cache

$$$ Clients

Local  Xen

 Test  bed

VictimBeneficiary

CoreCore

60

Page 11: More Sensitive Applications in Public Clouds Security in Public …pages.cs.wisc.edu/~venkatv/venkat-defense-slides.pdf · 2016-03-24 · Internet Security in Public Clouds — An

Victim webserver frequently interrupts, pollutes the cache – Reason: Xen gives higher

priority to VM consuming less CPU time

Cache vs. Network

CacheNet

61

Clients$$$

Cache  state  time  line

Beneficiary  starts  to  run

Core Core

decreased  cache  efficiency

Webserver    receives  a  request

Heavily  loaded  web  server

cache  state

61

Cache vs. Network w/ RFA

Cache

RFA helps in two ways: 1. Webserver loses its priority. 2. Reducing the capacity of

webserver. Net

62

Clients$$$

Cache  state  time  line

Core Core

HelperHeavily  loaded  webserver  requests  under  RFA

CGI  Req

uestBeneficiary  starts  to  run

Webserver    receives  a  request

Heavily  loaded  web  server

cache  state

62

RFA: Performance Improvement

63

RFA  intensities  –  time  in  ms  per  second

196%  slowdown

86%  slowdown

60%  Performance  Improvement

63

Instance  type m1.small

#  of  co-­‐resident  pairs 9      (23  total  instances)

Machine  type Intel  Xeon  E5507  with  4MB  LLC

Experiments on Amazon EC2

VM

VM

VM

VM

64

VM

Multiple  Accounts

Co-­‐resident  VMs  from  our  accounts:  Stand-­‐ins  for  victim  and  beneficiary

Separate    instances  for  helper  and  web  clients

No  direct  interact  with  any    other  customers  Indirect  interaction  akin  to    normal  usage  cases

VM

64

mcf from SPEC-CPU

65

 10%  slowdown

 6%  slowdown

3%  performance  improvement  =  35%  reduction  in  performance  loss

On  average  RFA  improved  performance  across  all  SPEC  workloads!

65

Xen  Version 4.2.1

Scheduler Credit  Scheduler  1Configuration

(Non-­‐work  conserving)40%  cap  on  DomU  VCPUs  

with  equal  weight#  VMs 6

#  VCPUs  per  VM 2

Experimental Methodology

Two VMs: 1. Attacker 2. Victim

66

Machine Intel  Xeon  E5645,  2.4GHz,  6  cores,  single  package

Memory  Hierarchy

Private  32KB  L1  (I-­‐  and  D-­‐Cache),  256KB  unified  L2,  12MB  shared  L3  &  16GB  DDR3  RAM.

Machine  ConfigurationXen  Configuration

Core

VM

VM

HypervisorSetting  similar  to  public  clouds  (e.g.  EC2)

66

Page 12: More Sensitive Applications in Public Clouds Security in Public …pages.cs.wisc.edu/~venkatv/venkat-defense-slides.pdf · 2016-03-24 · Internet Security in Public Clouds — An

MRT Guarantee and Open Questions

67

TimeCore: AV V

1. Can MRT defend against Cross-VM Side-channels?(security evaluation)

delay

2. Trade-off between securityand performance?(performance overhead)

MRT  value

67

Measuring Performance Overhead

68

CoreCore

Core Core

VCPU

VCPU

VCPUVCPU

VCPU VCPU

Hypervisor

workload-­‐mixMeasured  workload:  1. Interactive  à memcached,  

cassandra,  etc.  and  2. Batch  à graph500,  specJBB,  etc.

Competing  workloads:  microbenchmarks  à  highly   cache-­‐thrashing  +  (interactive  or  batch)  

68

Measuring Performance Overhead

69

Normalize

d  to  Zero-­‐MRT

0.8

1.1

1.4

1.7

2

ping memcachedcassandra mcf specjbb graph500

Avg.  95th  Percentile  Latency  (interactive  workloads)

Avg.  Runtime    (batch  workloads)

At  5ms  MRT<  7%  overhead    

69