Florin Dinu T. S. Eugene Ng Rice University

26
Florin Dinu T. S. Eugene Ng Rice University Synergy2Cloud: Introducing Cross-Sharing of Application Experiences Into the Cloud Management Cycle

description

Synergy2Cloud: Introducing Cross- Sharing of Application Experiences Into the Cloud Management Cycle. Florin Dinu T. S. Eugene Ng Rice University. Multi-Tenant Clouds . Resource contention Performance variation Failures. App1. App2. - PowerPoint PPT Presentation

Transcript of Florin Dinu T. S. Eugene Ng Rice University

Page 1: Florin  Dinu            T. S. Eugene Ng Rice University

Florin Dinu T. S. Eugene NgRice University

Synergy2Cloud: Introducing Cross-Sharing of Application Experiences

Into the Cloud Management Cycle

Page 2: Florin  Dinu            T. S. Eugene Ng Rice University

2

Multi-Tenant Clouds

• Resource contention

• Performance variation

• Failures

Challenging to ensure good application performance

App1 App2

Page 3: Florin  Dinu            T. S. Eugene Ng Rice University

3

Our Position: Cross-Sharing Experiences

Red would benefit if Green shared info about the failure

?

!

Page 4: Florin  Dinu            T. S. Eugene Ng Rice University

4

Ensuring Performance with Good DecisionsGood provisioning decisions

Need information to guide provisioning decisions

App1: How to best scale next?

Page 5: Florin  Dinu            T. S. Eugene Ng Rice University

5

Ensuring Performance with Good Decisions

Detect node failure & restart tasks

Need information to guide runtime decisions

Good runtime decisions

Page 6: Florin  Dinu            T. S. Eugene Ng Rice University

6

• Generating an efficient execution plan [NSDI 12]

• Scheduling a known execution plan [SOSP 09]

• Runtime decisions

• Stragglers [OSDI 08, OSDI 10]

• Managing traffic [SIGCOMM 2011]

• Failures [HPDC 12]

All these decisions can benefit from extra information

Ensuring Performance with Good Decisions

Page 7: Florin  Dinu            T. S. Eugene Ng Rice University

7

Obtaining Information Today: Operator

App 1App 2

Operator monitoring can be useful but has important shortcomings

Careful on resources

Application agnostic data

Page 8: Florin  Dinu            T. S. Eugene Ng Rice University

8

Obtaining Information Today: Apps

Application monitoring also has several limitations

Limited view of environment

Limited to owned resources

Limited by scale of app

Page 9: Florin  Dinu            T. S. Eugene Ng Rice University

9

Commonality among cloud apps

HARDWARE

OS

LARGE SCALE FWK

Virtualized. Few instance types available

Few OS image typesavailable

e.g. MapReduce. Many apps built on top

Unique opportunity in the cloud

Page 10: Florin  Dinu            T. S. Eugene Ng Rice University

10

Cross-Sharing Application ExperiencesSharing

Benefiting from information shared by others

I need to scale out

fast

Page 11: Florin  Dinu            T. S. Eugene Ng Rice University

11

Today: Rudimentary Sharing

Too slow. Requires human involvement. Needs to be automatic.

Page 12: Florin  Dinu            T. S. Eugene Ng Rice University

12

Hurdle to Overcome: Isolation

Isolation impedes sharing and measurement

Page 13: Florin  Dinu            T. S. Eugene Ng Rice University

13

But Isolation is Important

• More predictable performance• Improved security

• Today, we are striving to isolate at all levels– Network [NSDI 11, CONEXT 10, Xen]– Storage [VMWare]– CPU [Xen]– Caches [CCSW 09]

Is complete isolation the way to go?

Page 14: Florin  Dinu            T. S. Eugene Ng Rice University

14

Cross-Sharing Application Experiences

Examples Incentives Challenges

Page 15: Florin  Dinu            T. S. Eugene Ng Rice University

15

Cross-Sharing Example: Performance

Shared performance information may help scheduling

Dedicated Storage

Different cloud

Latency

Thro

ughp

ut

Page 16: Florin  Dinu            T. S. Eugene Ng Rice University

16

Cross-Sharing Example: Scalability

Sharing can inform scaling decisions

• Red needs to scale fast

• No time for testing

• Which VM type to use?

• Information from green may be useful?

Page 17: Florin  Dinu            T. S. Eugene Ng Rice University

17

Cross-Sharing Example: Failures

• Detect compute node failures

• Green deployed at larger scale

• Green detects failure faster

• Red has few vantage points

• Red can benefit from green’s experience

Some applications are better suited to discover failures

?

!

Page 18: Florin  Dinu            T. S. Eugene Ng Rice University

18

Incentives for the Operator

• 1. App performance = cloud performance• 2. Apps = huge monitoring platform• 3. $$$• 4. If you can’t beat them, join them

Use

3rd partyShare

Page 19: Florin  Dinu            T. S. Eugene Ng Rice University

Incentives for the Applications

• 1. More info => better decisions • 2. Obtain info quickly – no trial and error needed• 3. Info about resources that they don’t use• 4. Some apps better suited than others

!?

Page 20: Florin  Dinu            T. S. Eugene Ng Rice University

20

Addressing Challenges

• Increasing information expressiveness

• Verifying information authenticity

• Establishing similarity between applications

Synergy between applications and operator

Page 21: Florin  Dinu            T. S. Eugene Ng Rice University

21

Challenge – Increasing Expressiveness

Another

Data Center

Limited infrastructure knowledge

Path is congested

Page 22: Florin  Dinu            T. S. Eugene Ng Rice University

22

Challenge – Increasing Expressiveness

Operator should provide abstract location information

Effects of isolation – Info meaningul in context of red only

100% CPU usage

Empty slots

What does 100% CPU

mean for me?

Page 23: Florin  Dinu            T. S. Eugene Ng Rice University

23

Challenges - AuthenticityStorage

I have 20 instances connecting

simultaneously to storage and my throughput is….

• Verify where possible (ownership)

• Cull outliers with statistics

• Filter incoming sources of shared data

Fake Performance Info vs Bad Performance Info

Page 24: Florin  Dinu            T. S. Eugene Ng Rice University

24

Challenges – Establishing Similarity

HARDWARE

OS

LARGE SCALE FWK

ALLOCATION

50 vs 3 VMs

Impact runtime

Use delayed ACK?

Configuration, code tweaks

• Difficult in the general case

• Not all layers impact every shareable information (compute failures)

• Detect paths in the layers that impact the shared information

• Encode similarity between paths (e.g. hash codes)

Small changes in the code

Page 25: Florin  Dinu            T. S. Eugene Ng Rice University

25

Synergy2Cloud.com

Our first step for making cross-sharing a reality

Page 26: Florin  Dinu            T. S. Eugene Ng Rice University

26

Conclusion

• Our position: cross-sharing experiences a promising direction

• Win-win for both applications and operator

• Not trivial - several interesting challenges ahead