CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data...

36
Cloud Computing, 9/10/19 WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 1 CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department Western Michigan University [email protected] 276-3104 WiSe Lab @ WMU www.cs.wmich.edu/wise Cloud Computing 1 Acknowledgements I have liberally borrowed these slides and material from a number of sources including Web, AWS Educate MIT, Harvard, UMD, UPenn, UCSD, UW, Clarkson, . . . Amazon, Google, IBM, Apache, ManjraSoft, CloudBook, . . . Thanks to original authors including Ives, Dyer, Lin, Dean, Buyya, Ghemawat, Fanelli, Bisciglia, Kimball, Michels-Slettvet,… If I have missed any, its purely unintentional. My sincere appreciation to those authors and their creative mind. WiSe Lab @ WMU www.cs.wmich.edu/wise Cloud Computing 2 What is Cloud Computing… WiSe Lab @ WMU www.cs.wmich.edu/wise Cloud Computing 3

Transcript of CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data...

Page 1: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 1

CS5950 / CS6030Cloud Computing, Big Data

Ajay GuptaB239, CEAS

Computer Science DepartmentWestern Michigan University

[email protected]

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 1

Acknowledgements• I have liberally borrowed these slides and

material from a number of sources including– Web, AWS Educate– MIT, Harvard, UMD, UPenn, UCSD, UW, Clarkson, . .

.– Amazon, Google, IBM, Apache, ManjraSoft,

CloudBook, . . .• Thanks to original authors including Ives, Dyer,

Lin, Dean, Buyya, Ghemawat, Fanelli, Bisciglia, Kimball, Michels-Slettvet,…

• If I have missed any, its purely unintentional. My sincere appreciation to those authors and their creative mind.

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 2

What is Cloud Computing…

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 3

Page 2: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 2

Plan for today• AWS starter – EC2, VPC, SecurityGroups, Storage• Computing at scale

– The need for scalability; scale of current services– Scaling up: From PCs to data centers– Problems with 'classical' scaling techniques

• Utility computing and cloud computing– What are utility computing and cloud computing?– What kinds of clouds exist today?– What kinds of applications run on the cloud?– Virtualization: How clouds work 'under the hood'– Some cloud computing challengesCloud Computing 4WiSe Lab @ WMU

www.cs.wmich.edu/wise

An Example AWS Compute & Network Architecture

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 5

Plan for today• AWS starter – EC2, VPC, SecurityGroups, Storage• Computing at scale

– The need for scalability; scale of current services– Scaling up: From PCs to data centers– Problems with 'classical' scaling techniques

• Utility computing and cloud computing– What are utility computing and cloud computing?– What kinds of clouds exist today?– What kinds of applications run on the cloud?– Virtualization: How clouds work 'under the hood'– Some cloud computing challengesCloud Computing 6WiSe Lab @ WMU

www.cs.wmich.edu/wise

Page 3: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 3

How many users and objects?

• Flickr has >6 billion photos

• Facebook has 1.15 billion active users

• Google is serving >1.2 billion queries/day on more than 27 billion items

• >2 billion videos/day watched on YouTubeCloud Computing 7WiSe Lab @ WMU

www.cs.wmich.edu/wise

How much data?• Modern applications use massive data:

– Rendering 'Avatar' movie required >1 petabyte of storage

– eBay has >6.5 petabytes of user data– CERN's LHC will produce about 15 petabytes of

data per year– Since 2008, Google processed 20 petabytes per day– German Climate computing center dimensioned

for 60 petabytes of climate data– Google now designing for 1 exabyte of storage– NSA Utah Data Center is said to have 5 zettabyte (!)

• How much is a zettabyte?– 1,000,000,000,000,000,000,000 bytes– A stack of 1TB hard disks that is 25,400 km high

Cloud Computing 8

25,400 km

WiSe Lab @ WMU www.cs.wmich.edu/wise

How much computation?• No single computer can

process that much data– Need many computers!

• How many computers domodern services need?– Facebook is thought to have more than 60,000 servers– 1&1 Internet has over 70,000 servers– Akamai has 95,000 servers in 71 countries– Intel has ~100,000 servers in 97 data centers– Microsoft reportedly had at least 200,000 servers in 2008 – Google is thought to have more than 1 million servers,

is planning for 10 million (according to Jeff Dean)

Cloud Computing 9WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 4: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 4

Why should I care?• Suppose you want to build the next Google• How do you...

– ... download and store billions of web pages and images?– ... quickly find the pages that contain a given set of terms?– ... find the pages that are most relevant to a given search?– ... answer 1.2 billion queries of this type every day?

• Suppose you want to build the next Facebook• How do you...

– ... store the profiles of over 500 million users?– ... avoid losing any of them?– ... find out which users might want to be friends?

• Stay tuned!Cloud Computing 10WiSe Lab @ WMU

www.cs.wmich.edu/wise

Plan for today• AWS starter – EC2, VPC, SecurityGroups, Storage

• Computing at scale– The need for scalability; scale of current services– Scaling up: From PCs to data centers– Problems with 'classical' scaling techniques

• Utility computing and cloud computing– What are utility computing and cloud computing?– What kinds of clouds exist today?– What kinds of applications run on the cloud?– Virtualization: How clouds work 'under the hood'– Some cloud computing challengesCloud Computing 11

NEXT

WiSe Lab @ WMU www.cs.wmich.edu/wise

Scaling up

• What if one computer is not enough?– Buy a bigger (server-class) computer

• What if the biggest computer is not enough?– Buy many computers

Cloud Computing 12

PC Server Cluster

WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 5: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 5

Clusters

• Characteristics of a cluster:– Many similar machines, close interconnection (same room?)– Often special, standardized hardware (racks, blades)– Usually owned and used by a single organization

Cloud Computing 13

Many nodes/blades(often identical)

Network switch(connects nodes witheach other and with other racks)

Storage device(s)

Rack

WiSe Lab @ WMU www.cs.wmich.edu/wise

Power and cooling• Clusters need lots of power

– Example: 140 Watts per server– Rack with 32 servers: 4.5kW (needs special power

supply!)– Most of this power is converted into heat

• Large clusters need massive cooling– 4.5kW is about 3 space heaters– And that's just one rack!

Cloud Computing 14WiSe Lab @ WMU www.cs.wmich.edu/wise

Scaling up

• What if your cluster is too big (hot, power hungry) to fit into your office building?– Build a separate building for the cluster– Building can have lots of cooling and power– Result: Data center

Cloud Computing 15

PC Server Cluster Data center

WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 6: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 6

What does a data center look like?

• A warehouse-sized computer– A single data center can easily contain

10,000 racks with 100 cores in each rack (1,000,000 cores total)

Cloud Computing 16

Google data center in The Dalles, Oregon

Data centers(size of a football field)

Coolingplant

WiSe Lab @ WMU www.cs.wmich.edu/wise

What's in a data center?

• Hundreds or thousands of racks

Cloud Computing 17

Source: 1&1

WiSe Lab @ WMU www.cs.wmich.edu/wise

What's in a data center?

• Massive networking

Cloud Computing 18

Source: 1&1

WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 7: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 7

What's in a data center?

• Emergency power supplies

Cloud Computing 19

Source: 1&1

WiSe Lab @ WMU www.cs.wmich.edu/wise

What's in a data center?

• Massive cooling

Cloud Computing 20

Source: 1&1

WiSe Lab @ WMU www.cs.wmich.edu/wise

Energy matters!

• Data centers consume a lot of energy– Makes sense to build them near sources of cheap electricity– Example: Price per KWh is 3.6ct in Idaho (near hydroelectric

power), 10ct in California (long distance transmission), 18ct in Hawaii (must ship fuel)

– Most of this is converted into heat ® Cooling is a big issue!Cloud Computing 21

Company Servers Electricity CosteBay 16K ~0.6*105 MWh ~$3.7M/yrAkamai 40K ~1.7*105 MWh ~$10M/yrRackspace 50K ~2*105 MWh ~$12M/yrMicrosoft >200K >6*105 MWh >$36M/yrGoogle >500K >6.3*105 MWh >$38M/yrUSA (2006)

Source: Qureshi et al., SIGCOMM 200910.9M 610*105 MWh $4.5B/yr

WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 8: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 8

Scaling up

• What if even a data center is not big enough?– Build additional data centers– Where? How many?

Cloud Computing 22

PC Server Cluster Data center Network of data centers

WiSe Lab @ WMU www.cs.wmich.edu/wise

Global distribution

• Data centers are often globally distributed– Example above: Google data center locations (inferred)

• Why?– Need to be close to users (physics!)– Cheaper resources– Protection against failures

Cloud Computing 23WiSe Lab @ WMU www.cs.wmich.edu/wise

Trend: Modular data center

• Need more capacity? Just deploy another container!

Cloud Computing 24WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 9: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 9

Plan for today• AWS starter – EC2, VPC, SecurityGroups, Storage

• Computing at scale– The need for scalability; scale of current services– Scaling up: From PCs to data centers– Problems with 'classical' scaling techniques

• Utility computing and cloud computing– What are utility computing and cloud computing?– What kinds of clouds exist today?– What kinds of applications run on the cloud?– Virtualization: How clouds work 'under the hood'– Some cloud computing challengesCloud Computing 25

NEXT

WiSe Lab @ WMU www.cs.wmich.edu/wise

Problem #1: Difficult to dimension

• Problem: Load can vary considerably– Peak load can exceed average load by factor 2x-10x [Why?]– But: Few users deliberately provision for less than the peak– Result: Server utilization in existing data centers ~5%-20%!!– Dilemma: Waste resources or lose customers!

Cloud Computing 26

2x-10x

Jobs cannot be completed

Dissatisfiedcustomersleave

Provisioning for the peak load Provisioning below the peak

WiSe Lab @ WMU www.cs.wmich.edu/wise

Problem #2: Expensive• Need to invest many $$$ in hardware

– Even a small cluster can easily cost $100,000– Microsoft recently invested $499 million in a single

data center

• Need expertise– Planning and setting up a large cluster is highly nontrivial– Cluster may require special software, etc.

• Need maintenance– Someone needs to replace faulty hardware, install

software upgrades, maintain user accounts, ...

Cloud Computing 27WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 10: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 10

Problem #3: Difficult to scale • Scaling up is difficult

– Need to order new machines, install them, integrate with existing cluster - can take weeks

– Large scaling factors may require major redesign, e.g., new storage system, new interconnect, new building (!)

• Scaling down is difficult– What to do with superfluous hardware?– Server idle power is about 60% of peak ®Energy

is consumed even when no work is being done– Many fixed costs, such as construction

Cloud Computing 28WiSe Lab @ WMU www.cs.wmich.edu/wise

Recap: Computing at scale• Modern applications require huge amounts of processing

and data– Measured in petabytes, millions of users, billions of

objects– Need special hardware, algorithms, tools to work at this

scale• Clusters and data centers can provide the resources we

need– Main difference: Scale (room-sized vs. building-sized)– Special hardware; power and cooling are big concerns

• Clusters and data centers are not perfect– Difficult to dimension; expensive; difficult to scale

Cloud Computing 29WiSe Lab @ WMU www.cs.wmich.edu/wise

The power plant analogy

• It used to be that everyone had their own power source– Challenges are similar to the cluster: Needs large up-front

investment, expertise to operate, difficult to scale up/down...

Cloud Computing 30

Steam engine at Stott Park Bobbin Mill

Waterwheel at the Neuhausen ob Eck Open-Air Museum

WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 11: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 11

Scaling the power plant

• Then people started to build large, centralized power plants with very large capacity...

Cloud Computing 31WiSe Lab @ WMU www.cs.wmich.edu/wise

Metered usage model

• Power plants are connected to customers by a network

• Usage is metered, and everyone (basically) pays only for what they actually use

Cloud Computing 32

Power source Network Meteringdevice

Customer

WiSe Lab @ WMU www.cs.wmich.edu/wise

Why is this a good thing?• Economies of scale

– Cheaper to run one big power plant than many small ones

• Statistical multiplexing– High utilization!

• No up-front commitment– No investment in generator;

pay-as-you-go model• Scalability

– Thousands of kilowattsavailable on demand; addmore within secondsCloud Computing 33

n Cheaper to run one big data center than many small ones

n High utilization!

n No investment in data center;pay-as-you-go model

n Thousands of computers available on demand; addmore within seconds

Electricity Computing

WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 12: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 12

What is cloud computing?

Cloud Computing 34

http

://ww

w.dil

bert.

com

/fast/

2013

-06-

29/

WiSe Lab @ WMU www.cs.wmich.edu/wise

What is cloud computing?

Cloud Computing 35

The interesting thing about Cloud Computing is that we've redefined Cloud Computing to include everything that we already do.... I don't understand what we would do differently in the light of Cloud Computing other than change the wording of some of our ads.

Larry Ellison, quoted in the Wall Street Journal, September 26, 2008

A lot of people are jumping on the [cloud] bandwagon, butI have not heard two people say the same thing about it.There are multiple definitions out there of "the cloud".

Andy Isherwood, quoted in ZDnet News, December 11, 2008

WiSe Lab @ WMU www.cs.wmich.edu/wise

So what is it, really?• According to NIST:

• Essential characteristics:– On-demand self service– Broad network access– Resource pooling– Rapid elasticity– Measured service

Cloud Computing 36

Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 13: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 13

Other terms you may have heard

• Utility computing– The service being sold by a cloud– Focuses on the business model (pay-as-you-go),

similar to classical utility companies

• The Web– The Internet's information sharing model– Some web services run on clouds, but not all

• The Internet– A network of networks.– Used by the web; connects (most) clouds to their

customersCloud Computing 37WiSe Lab @ WMU

www.cs.wmich.edu/wise

Plan for today• AWS starter – EC2, VPC, SecurityGroups, Storage

• Computing at scale– The need for scalability; scale of current services– Scaling up: From PCs to data centers– Problems with 'classical' scaling techniques

• Utility computing and cloud computing– What are utility computing and cloud computing?– What kinds of clouds exist today?– What kinds of applications run on the cloud?– Virtualization: How clouds work 'under the hood'– Some cloud computing challengesCloud Computing 38

NEXT

WiSe Lab @ WMU www.cs.wmich.edu/wise

Everything as a Service• What kind of service does the cloud provide?

– Does it offer an entire application, or just resources?– If resources, what kind / level of abstraction?

• Three types commonly distinguished:– Software as a service (SaaS)

• Analogy: Restaurant. Prepares&serves entire meal, does the dishes, ...

– Platform as a service (PaaS)• Analogy: Take-out food. Prepares meal, but does not serve it.

– Infrastructure as a service (IaaS)• Analogy: Grocery store. Provides raw ingredients.

– Other xaaS types have been defined, but are less common• Desktop, Backend, Communication, Network, Monitoring, ...

Cloud Computing 39WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 14: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 14

Software as a Service (SaaS)

• Cloud provides an entire application– Word processor, spreadsheet, CRM software,

calendar...– Customer pays cloud provider– Example: Google Apps, Salesforce.com

Cloud Computing 40

Cloudprovider

User

Hardware

Middleware

Application

WiSe Lab @ WMU www.cs.wmich.edu/wise

Platform as a Service (PaaS)

• Cloud provides middleware/infrastructure– For example, Microsoft Common Language Runtime (CLR)– Customer pays SaaS provider for the service; SaaS provider

pays the cloud for the infrastructure– Example: Windows Azure, Google App Engine

Cloud Computing 41

Cloudprovider

User

Hardware

Middleware

Application

SaaSprovider

WiSe Lab @ WMU www.cs.wmich.edu/wise

Infrastructure as a Service (IaaS)

• Cloud provides raw computing resources– Virtual machine, blade server, hard disk, ...– Customer pays SaaS provider for the service; SaaS

provider pays the cloud for the resources– Examples: Amazon Web Services, Rackspace Cloud, GoGrid

Cloud Computing 42

Cloudprovider

User

Hardware

Middleware

Application

SaaSprovider

WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 15: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 15

Private/hybrid/community clouds

• Who can become a customer of the cloud?– Public cloud: Commercial service; open to (almost) anyone.

Example: Amazon AWS, Microsoft Azure, Google App Engine

– Community cloud: Shared by several similar organizations.Example: Google's "Gov Cloud"

– Private cloud: Shared within a single organization. Example: Internal datacenter of a large company.

Cloud Computing 43

Focus ofthis class

Is this a'real' cloud?

CompanyA

CompanyB

Public Community Private

WiSe Lab @ WMU www.cs.wmich.edu/wise

Plan for today• AWS starter – EC2, VPC, SecurityGroups, Storage

• Computing at scale– The need for scalability; scale of current services– Scaling up: From PCs to data centers– Problems with 'classical' scaling techniques

• Utility computing and cloud computing– What are utility computing and cloud computing?– What kinds of clouds exist today?– What kinds of applications run on the cloud?– Virtualization: How clouds work 'under the hood'– Some cloud computing challengesCloud Computing 44

NEXT

WiSe Lab @ WMU www.cs.wmich.edu/wise

Examples of cloud applications• Application hosting• Backup and Storage• Content delivery• E-commerce• High-performance computing• Media hosting• On-demand workforce• Search engines• Web hosting

Cloud Computing 45WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 16: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 16

Case study: • Animoto: Lets users

create videos from their own photos/music– Auto-edits photos and aligns

them with the music, so it"looks good"

• Built using Amazon EC2+S3+SQS• Released a Facebook app in mid-April 2008

– More than 750,000 people signed up within 3 days– EC2 usage went from 50 machines to 3,500 (x70

scalability!)Cloud Computing 46

Source: Jeff Bezos' talk at Stanford on 4/19/08

http

://an

imot

o.co

m/b

log/co

mpa

ny/a

maz

on-c

om-ce

o-jef

f-bez

os-o

n-an

imot

o/

WiSe Lab @ WMU www.cs.wmich.edu/wise

Case study:• March 19, 2008: Hillary Clinton's official White House schedule

released to the public– 17,481 pages of non-searchable, low-quality PDF– Very interesting to journalists, but would have required

hundreds of man-hours to evaluate– Peter Harkins, Senior Engineer at The Washington Post:

Can we make that data available more quickly, ideally within the same news cycle?

– Tested various Optical Character Recognition (OCR) programs; estimated required speed

– Launched 200 EC2 instances; project was completed within nine hours (!) using 1,407 hours of VM time ($144.62)

– Results available on the web only 26 hours after the release

Cloud Computing 47

http

://aw

s.am

azon

.com

/solut

ions/c

ase-

studie

s/was

hingt

on-p

ost/

WiSe Lab @ WMU www.cs.wmich.edu/wise

Other examples• DreamWorks is using the Cerelink

cloud to render animation movies– Cloud was already used to render parts of

Shrek Forever After and How to Train your Dragon

• CERN is working on a "science cloud" to process experimental data

• Virgin Atlantic is hosting their newtravel portal on Amazon AWS

Cloud Computing 48WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 17: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 17

Recap: Utility/cloud computing• Why is cloud computing attractive?

– Analogy to 'classical' utilities (electricity, water, ...)– No up-front investment (pay-as-you-go model)– Low price due to economies of scale– Elasticity - can quickly scale up/down as demand varies

• Different types of clouds– SaaS, PaaS, IaaS; public/private/community clouds

• What runs on the cloud?– Many potential applications: Application hosting,

backup/storage, scientific computing, content delivery, ...– Not yet suitable for certain applications (sensitive data,

compliance requirements)

Cloud Computing 49WiSe Lab @ WMU www.cs.wmich.edu/wise

Is the cloud good for everything?

• No. • Sometimes it is problematic, e.g., because of

auditability requirements

• Example: Processing medical records– HIPAA (Health Insurance Portability and

Accountability Act) privacy and security rule• Example: Processing financial information

– Sarbanes-Oxley act

• Would you put your medical data on the cloud?– Why / why not?Cloud Computing 50WiSe Lab @ WMU

www.cs.wmich.edu/wise

Recap: Cloud applications

• Clouds are good for many things...– Applications that involve large amounts of

computation, storage, bandwidth– Especially when lots of resources are needed

quickly (Washington Post example) or load varies rapidly (TicketLeap example)

• ... but not for all things

Cloud Computing 51WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 18: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 18

Plan for today• AWS starter – EC2, VPC, SecurityGroups, Storage

• Computing at scale– The need for scalability; scale of current services– Scaling up: From PCs to data centers– Problems with 'classical' scaling techniques

• Utility computing and cloud computing– What are utility computing and cloud computing?– What kinds of clouds exist today?– What kinds of applications run on the cloud?– Virtualization: How clouds work 'under the hood'– Some cloud computing challengesCloud Computing 52

NEXT

WiSe Lab @ WMU www.cs.wmich.edu/wise

What is virtualization?

• Suppose Alice has a machine with 4 CPUs and 8 GB of memory, and three customers:– Bob wants a machine with 1 CPU and 3GB of memory– Charlie wants 2 CPUs and 1GB of memory– Daniel wants 1 CPU and 4GB of memory

• What should Alice do?

Cloud Computing 53

Alice

Bob

Charlie

DanielPhysical machine

WiSe Lab @ WMU www.cs.wmich.edu/wise

What is virtualization?

• Alice can sell each customer a virtual machine (VM) with the requested resources– From each customer's perspective, it appears as if

they had a physical machine all by themselves (isolation) Cloud Computing 54

Alice

Bob

Charlie

DanielPhysical machine

Virtual machines

Virtual machinemonitor

WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 19: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 19

How does it work?

• Resources (CPU, memory, ...) are virtualized– VMM ("Hypervisor") has translation tables that map requests

for virtual resources to physical resources– Example: VM 1 accesses memory cell #323; VMM maps this

to memory cell 123.– For which resources does this (not) work?– How do VMMs differ from OS kernels?

Cloud Computing 55

Physical machine

OS 1 OS 2

App

App App

VMM

1 0-99 0-99

1 299-399 100-199

2 0-99 300-399

2 200-299 500-599

2 600-699 400-499

Translation table

VM 1 VM 2

WiSe Lab @ WMU www.cs.wmich.edu/wise

Benefit: Migration

• What if the machine needs to be shut down?– e.g., for maintenance, consolidation, ...– Alice can migrate the VMs to different physical machines

without any customers noticingCloud Computing 56

Alice

Bob

Charlie

Daniel

Physical machines

Virtual machines

Virtual machinemonitor

WiSe Lab @ WMU www.cs.wmich.edu/wise

Benefit: Time sharing

• What if Alice gets another customer?– Multiple VMs can time-share the existing resources– Result: Alice has more virtual CPUs and virtual memory

than physical resources (but not all can be active at the same time)

Cloud Computing 57

Alice

Bob

Charlie

DanielPhysical machine

Virtual machines

Virtual machinemonitor

Emil

WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 20: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 20

Benefit and challenge: Isolation

• Good: Emil can't access Charlie's data• Bad: What if the load suddenly increases?

– Example: Emil's VM shares CPUs with Charlie's VM, and Charlie suddenly starts a large compute job

– Emil's performance may decrease as a result– VMM can move Emil's software to a different CPU, or

migrate it to a different machine

Cloud Computing 58

Alice

Bob

Charlie

DanielPhysical machine

Virtual machines

VMM

Emil

WiSe Lab @ WMU www.cs.wmich.edu/wise

Recap: Virtualization in the cloud

• Gives cloud provider a lot of flexibility– Can produce VMs with different capabilities– Can migrate VMs if necessary (e.g., for maintenance)– Can increase load by overcommitting resources

• Provides security and isolation– Programs in one VM cannot influence programs in

another• Convenient for users

– Complete control over the virtual 'hardware' (can install own operating system, own applications, ...)

• But: Performance may be hard to predict– Load changes in other VMs on the same physical

machine may affect the performance seen by the customer

Cloud Computing 59WiSe Lab @ WMU www.cs.wmich.edu/wise

Plan for today• AWS starter – EC2, VPC, SecurityGroups, Storage

• Computing at scale– The need for scalability; scale of current services– Scaling up: From PCs to data centers– Problems with 'classical' scaling techniques

• Utility computing and cloud computing– What are utility computing and cloud computing?– What kinds of clouds exist today?– What kinds of applications run on the cloud?– Virtualization: How clouds work 'under the hood'– Some cloud computing challengesCloud Computing 60

NEXT

WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 21: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 21

10 obstacles and opportunities1. Availability

– What happens to my business ifthere is an outage in the cloud?

2. Data lock-in– How do I move my data from

one cloud to another?

3. Data confidentiality and auditability– How do I make sure that the cloud doesn't leak my

confidential data? – Can I comply with regulations like HIPAA and

Sarbanes/Oxley?Cloud Computing 61

Service Duration Date

S3 6-8 hrs 7/20/08

AppEngine 5 hrs 6/17/08

Gmail 1.5 hrs 8/11/08

Azure 22 hrs 3/13/09

Intuit 36 hrs 6/16/10

EBS >3 days 4/21/11

ECC ~2 hrs 6/30/12

Some cloud outages

WiSe Lab @ WMU www.cs.wmich.edu/wise

10 obstacles and opportunities4. Data transfer bottlenecks

– How do I copy large amounts of data from/to the cloud?

– Example: 10 TB from UC Berkeleyto Amazon in Seattle, WA

– Motivated Import/Export feature on AWS

5. Performance unpredictability– Example: VMs sharing the same

disk ® I/O interference– Example: HPC tasks that require

coordinated schedulingCloud Computing 62

Method Time

Internet (20Mbps)

FedEx

Time to transfer 10TB [AF10]

Primitive Mean perf.

Std dev

Memory bandwidth

1.3GB/s 0.05GB/s (4%)

Disk bandwidth

55MB/s 9MB/s(16%)

Performance of 75 EC2 instancesin benchmarks

45 days

1 day

WiSe Lab @ WMU www.cs.wmich.edu/wise

10 obstacles and opportunities6.Scalable storage

– Cloud model (short-term usage, no up-front cost, infinite capacity on demand) does not fit persistent storage well

7.Bugs in large distributed systems– Many errors cannot be reproduced in smaller

configs8.Scaling quickly

– Problem: Boot time; idle power– Fine-grain accounting?

Cloud Computing 63WiSe Lab @ WMU www.cs.wmich.edu/wise

Page 22: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 22

10 obstacles and opportunities9. Reputation fate sharing

– One customer's bad behavior can affect the reputation of others using the same cloud

– Example: Spam blacklisting, FBI raid after criminal activity

10.Software licensing– What if licenses are for specific computers?

• Example: Microsoft Windows– How to scale number of licenses up/down?

• Need pay-as-you-go model as wellCloud Computing 64WiSe Lab @ WMU

www.cs.wmich.edu/wise

Stay tuned

Next you will learn about: Programming at scale; Parallelism, Concurrency and Consistency

Cloud Computing 65WiSe Lab @ WMU www.cs.wmich.edu/wise

Some extra slides with various definitions of cloud, cloud

architecture, etc.

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 66

Page 23: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 23

What is Cloud Computing• http://cloudcomputing.sys-con.com/read/612375_p.htm Twenty-One Experts Define Cloud Computing• It is the infrastructural paradigm shift that is sweeping across the Enterprise IT world, but how is it best defined? I refer of course to 'Cloud

Computing' - the phenomenon that currently has as many definitions as there are squares on a chess-board. To try and narrow it down we bring here a round-up of some recent attempts to bring welcome precision where there risks being unnecessary vagueness. Enjoy!

"What is cloud computing all about? Amazon has coined the word “elasticity” which gives a good idea about the key features: you can scale your infrastructure on demand within minutes or even seconds, instead of days or weeks, thereby avoiding under-utilization (idle servers) and over-utilization (blue screen) of in-house resources. With monitoring and increasing automation of resource provisioning we might one day wake up in a world where we don’t have to care about scaling our Web applications because they can do it alone.” Markus Klems

• "For me the simplest explanation for cloud computing is describing it as, 'internet centric software.' This new cloud computing software model is a shift from the traditional single tenant approach to software development to that of a scalable, multi-tenant, multi-platform, multi-network, and global. This could be as simple as your web based email service or as complex as a globally distributed load balanced content delivery environment.

• I think drawing a distinction on whether its, PaaS, SaaS, HaaS is completely secondary, ultimately all these approaches are attempting to solve the same problems (scale). As software transitions from a traditional desktop deployment model to that of a network & data centric one, "the cloud" will be the key way in which you develop, deploy and manage applications in this new computing paradigm.” - Reuven Cohen

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 67

What is Cloud Computing – contd.• "I view cloud computing as a broad array of web-based services aimed at

allowing users to obtain a wide range of functional capabilities on a 'pay-as-you-go' basis that previously required tremendous hardware/software investments and professional skills to acquire. Cloud computing is the realization of the earlier ideals of utility computing without the technical complexities or complicated deployment worries.” Jeff Kaplan

• "People are coming to grips with Virtualization and how it reshapes IT, creates service and software based models, and in many ways changes a lot of the physical layer we are used to. Clouds will be the next transformation over the next several years, building off of the software models that virtualization enabled.” Douglas Gourlay

• "The way I understand it, “cloud computing” refers to the bigger picture…basically the broad concept of using the internet to allow people to access technology-enabled services. According to Gartner, those services must be 'massively scalable' to qualify as true 'cloud computing'. So according to that definition, every time I log into Facebook, or search for flights online, I am taking advantage of cloud computing.” Praising Gaw © 2008 SYS-CON Media Inc.

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 68

What is Cloud Computing – contd.• "The “Cloud” concept is finally wrapping peoples’ minds around what is possible

when you leverage web-scale infrastructure (application and physical) in an on-demand way. “Managed Services”, “ASP”, “Grid Computing”, “Software as a Service”, “Platform as a Service”, “Anything as a Service”… all terms that couldn’t get it done. Call it a “Cloud” and everyone goes bonkers. Go figure." - Damon Edwards

"There sure is a lot of confusion when it comes to talking about cloud computing. Yet, it does not need to be so complicated. There really are only three types of services that are cloud based: SaaS, PaaS, and Cloud Computing Platforms. I am not sure being massively scalable is a requirement to fit into any one category."- Brian de Haaff

• "SaaS is one consumer facing usage of cloud computing. While it's something of a semantic discussion it is important for people inside to have an understanding of what it all means. Put simply cloud computing is the infrastructural paradigm shift that enables the ascension of SaaS." - Ben Kepes © 2008 SYS-CON Media Inc.

•WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 69

Page 24: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 24

What is Cloud Computing – contd.• "The 'cloud' model initially has focused on making the hardware layer

consumable as on-demand compute and storage capacity. This is an important first step, but for companies to harness the power of the cloud, complete application infrastructure needs to be easily configured, deployed, dynamically-scaled and managed in these virtualized hardware environments.” Kirill Sheynkman

• "Cloud computing overlaps some of the concepts of distributed, grid and utility computing, however it does have its own meaning if contextually used correctly. Cloud computing really is accessing resources and services needed to perform functions with dynamically changing needs. An application or service developer requests access from the cloud rather than a specific endpoint or named resource. What goes on in the cloud manages multiple infrastructures across multiple organizations and consists of one or more frameworks overlaid on top of the infrastructures tying them together. The cloud is a virtualization of resources that maintains and manages itself."- Kevin Hartig © 2008 SYS-CON Media Inc.

•WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 70

What is Cloud Computing – contd.• "I was chatting with a customer the other day who was struggling with some

of the implications of cloud computing. The analogy that finally made sense to them is what I will call 'cloud dining.' I am the cook in the house and I am tasked with feeding the family. If my 10-year old is lobbying for Italian, I am cook at home or order out. The decision may also vary from day to day. For instance, I might not have all the ingredients and have to order out, or, like this weekend, it may be 103 outside and cooking at home is not all that appealing. Now, the same can be said for supporting a given application in a cloud computing environment.

In a fully implemented Data Center 3.0 environment, you can decide if an app is run locally (cook at home), in someone else’s data center (take-out) and you can change your mind on the fly in case you are short on data center resources (pantry is empty) or you having environmental/facilities issues (too hot to cook). In fact, with automation, a lot of this can can be done with policy and real-time triggers. For example, during month end processing, you might always shift non-critical apps offsite, or if you pass a certain cooling threshold, you might ship certain processing offsite."- Omar Sultan © 2008 SYS-CON Media Inc.

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 71

What is Cloud Computing – contd.• "Clouds are vast resource pools with on-demand resource

allocation. The degree of on-demandness can vary from phone calls to web forms to actual APIs that directly requisition servers. I tend to consider slow forms of requisitioning to be more like traditional datacenters, and the quicker ones to be more cloudy. A public facing API is a must for true clouds.

Clouds are virtualized. On-demand requisitioning implies the ability to dynamically resize resource allocation or moving customers from one physical server to another transparently. This is all difficult or impossible without virtualization.

© 2008 SYS-CON Media Inc. •

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 72

Page 25: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 25

What is Cloud Computing – contd.• "Cloud computing is ... the user-friendly version of grid computing."

- Trevor Doerksen

• "Most computer savvy folks actually have a pretty good idea of what the term "cloud computing" means: outsourced, pay-as-you-go, on-demand, somewhere in the Internet, etc."- Thorsten von Eicken

• Clouds tend to be priced like utilities (hourly, rather than per-resource), and I think we’ll see this model catching on more and more as computing resources become as cheap and ubiquitous as water, electricity, and gas (well, maybe not gas). However, I think this is a trend, not a requirement. You can certainly have clouds that are priced like pizza, per slice.”- Jan Pritzker

© 2008 SYS-CON Media Inc. WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 73

Cloud Computing????

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 74

What is Cloud Computing – contd.• "In order to discuss some of the issues surrounding The Cloud

concept, I think it is important to place it in historical context. Looking at the Cloud's forerunners, and the problems they encountered, gives us the reference points to guide us through the challenges it needs to overcome before it is adopted.” Paul Wallis

• "The web fanatics and blogosphere would have you believe that all applications will move to the web. Some will, most will not. Reliability, scalability, security, and a host of other issues will prevent most businesses from moving their mission critical applications to hosted services or cloud based services. The risk of failure is too great.

• Amazon is the leader in cloud based services, but even Amazon has experienced down times for its own business. Cloud services will continue to improve. But my guess is the uptake will take longer

than most people predict.”- Don Dodge © 2008 SYS-CON Media

Inc.WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 75

Page 26: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 26

What is Cloud Computing – contd.• "I would like to propose a 'Cloud Pyramid' to help differentiate the various

Cloud offerings out there. [At the top of the pyramid] users are truly restricted to only what the application is and can do. Some of the notable companies here are the public email providers (Gmail, Hotmail, Quicken Online, etc.). Almost any Software as a Service (SaaS) provider can be lumped into this group.

As you move further down the pyramid, you gain increased flexibility and control but your a still fairly restricted to what you can and cannot do. Within this Category things get more complicated to achieve. Products and companies like Google App Engine, Heroku, Mosso, Engine Yard, Joyent or force.com (SalesForce platform) fall into this segment.

At the bottom of the pyramid are the infrastructure providers like Amazon’s EC2, GoGrid, RightScale and Linode. Companies providing infrastructure enable Cloud Platforms and Cloud Applications. Most companies within this segment operate their own infrastructure, allowing them to provide more features, services and control than others within the pyramid." - Michael Sheehan © 2008 SYS-CON Media Inc.

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 76

What is Cloud Computing – contd.• "Today's combination of high-speed networks, sophisticated PC graphics

processors, and fast, inexpensive servers and disk storage has tilted engineers toward housing more computing in data centers. In the earlier part of this decade, researchers espoused a similar, centralized approach called "grid computing." But cloud computing projects are more powerful and crash-proof than grid systems developed even in recent years.” Aaron Ricadela

"When virtualizing applications to be used by people who care nothing about computers or technology - as is mostly the case with Clouds - the key thing we want to virtualize or hide from the user is complexity. Most people want to deal with an application or a service, not software. ... The more intelligent we want [computers and computer applications] to be - that is, intuitive, exhibiting common sense and not making us have to constantly take care of them - the more smart software it will take. But with cloud computing, our expectation is that all that software will be virtualized or hidden from us and taken care of by systems and/or professionals that are somewhere else - out there in The Cloud.” Irving Wladawsky Berger

• © 2008 SYS-CON Media Inc.

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 77

What is Cloud Computing – contd.• "I view cloud computing as a broad array of web-based services

aimed at allowing users to obtain a wide range of functional capabilities on a ‘pay-as-you-go’ basis that previously required tremendous hardware/software investments and professional skills to acquire. Cloud computing is the realization of the earlier ideals of utility computing without the technical complexities or complicated deployment worries.” Ben Kepes

"Cloud computing really comes into focus only when you think about what IT always needs: a way to increase capacity or add capabilities on the fly without investing in new infrastructure, training new personnel, or licensing new software. Cloud computing encompasses any subscription-based or pay-per-use service that, in real time over the Internet, extends IT's existing capabilitities” Bill Martin

• © 2008 SYS-CON Media Inc. WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 78

Page 27: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 27

Defining Clouds: There are many views for what is cloud computing?

• Over 20 definitions:– http://cloudcomputing.sys-con.com/read/612375_p.htm

• A compromised definition J– "A Cloud is a type of parallel and distributed system consisting of

a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements established through negotiation between the service provider and consumers.”

• Keywords: Virtualization (VMs), Dynamic Provisioning (negotiation and SLAs), and Web 2.0/3.0 access interface

WiSe Lab @ WMU www.cs.wmich.edu/wise

79Cloud Computing

What is Cloud Computing?

1. Web-scale problems2. Large data centers3. Different models of computing4. Highly-interactive Web applications

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 80

1. Web-Scale Problems• Characteristics:

– Definitely data-intensive– May also be processing intensive

• Examples:– Crawling, indexing, searching, mining the Web– “Post-genomics” life sciences research– Other scientific data (physics, astronomers, etc.)– Sensor networks– Web 2.0 applications– …

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 81

Page 28: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 28

2. Large Data Centers• Web-scale problems? Throw more machines at it!• Clear trend: centralization of computing resources in

large data centers– Necessary ingredients: fiber, juice, and space– What do Oregon, Iceland, and abandoned mines have in

common?

• Important Issues:– Redundancy– Efficiency– Utilization– Management

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 82

Runtime Environment for ApplicationsDevelopment and Data Processing Platforms

Examples: Windows Azure, Hadoop, Google AppEngine, Aneka

Platform as a Service

Virtualized ServersStorage and Networking

Examples: Amazon EC2, S3, Rightscale, vCloud

Infrastructure as a Service

End user applicationsScientific applications

Office automation, Photo editing, CRM, and Social Networking

Examples: Google Documents, Facebook, Flickr, Salesforce

Software as a ServiceWeb 2.0 Interfaces

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 83

Key Technology: Virtualization

Hardware

Operating System

App App App

Traditional Stack

Hardware

OS

App App App

Hypervisor

OS OS

Virtualized Stack

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 84

Page 29: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 29

Market-oriented Cloud Architecture: QoS negotiation and SLA-based Resource Allocation

DispatcherVM

MonitorService Request

Monitor

Pricing Accounting

Service Request Examiner and Admission Control

- Customer-driven Service Management- Computational Risk Management- Autonomic Resource Management

Users/Brokers

SLAResource Allocator

Virtual Machines

(VMs)

Physical Machines

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 85

A (Layered) Cloud Architecture

Cloud resources

Virtual Machine (VM), VM Management and Deployment

QoS Negotiation, Admission Control, Pricing, SLA Management, Monitoring, Execution Management, Metering, Accounting, Billing

Cloud programming: environments and toolsWeb 2.0 Interfaces, Mashups, Concurrent and Distributed Programming, Workflows, Libraries, Scripting

Cloud applicationsSocial computing, Enterprise, ISV, Scientific, CDNs, ...

Adaptive M

anagement

CoreMiddleware

User-LevelMiddleware

System level

User level

Autonom

ic / Cloud Econom

y

Apps Hosting Platforms

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 86

3. Different Computing Models

• Utility computing– Why buy machines when you can rent cycles?– Examples: Amazon’s EC2, GoGrid, AppNexus

• Platform as a Service (PaaS)– Give me nice API and take care of the implementation– Example: Google App Engine

• Software as a Service (SaaS)– Just run it for me!– Example: Gmail

“Why do it yourself if you can pay someone to do it for you?”

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 87

Page 30: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 30

4. Web Applications • A mistake on top of a hack built on sand held

together by duct tape?• What is the nature of software applications?

– From the desktop to the browser– SaaS == Web-based applications– Examples: Google Maps, Facebook

• How do we deliver highly-interactive Web-based applications?– AJAX (asynchronous JavaScript and XML)– For better, or for worse…

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 88

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 89

Web-Scale Problems?

• Don’t hold your breath:– Biocomputing– Nanocomputing– Quantum computing– …

• It all boils down to…– Divide-and-conquer– Throwing more hardware at the problem

Simple to understand… a lifetime to master…

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 90

Page 31: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 31

Divide and Conquer“Work”

w1 w2 w3

r1 r2 r3

“Result”

“worker” “worker” “worker”

Partition

Combine

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 91

Different Workers

• Different threads in the same core• Different cores in the same CPU• Different CPUs in a multi-processor

system• Different machines in a distributed system

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 92

Choices, Choices, Choices

• Commodity vs. “exotic” hardware• Number of machines vs. processor vs.

cores• Bandwidth of memory vs. disk vs. network• Different programming models

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 93

Page 32: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 32

Flynn’s TaxonomyInstructions

Single (SI) Multiple (MI)

Dat

a

Mul

tiple

(MD

)

SISD

Single-threaded process

MISD

Pipeline architecture

SIMD

Vector Processing

MIMD

Multi-threaded Programming

Sin

gle

(SD

)

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 94

SISD

D D D D D D D

Processor

Instructions

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 95

SIMD

D0

Processor

Instructions

D0D0 D0 D0 D0

D1

D2

D3

D4

…Dn

D1

D2

D3

D4

…Dn

D1

D2

D3

D4

…Dn

D1

D2

D3

D4

…Dn

D1

D2

D3

D4

…Dn

D1

D2

D3

D4

…Dn

D1

D2

D3

D4

…Dn

D0

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 96

Page 33: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 33

MIMD

D D D D D D D

Processor

Instructions

D D D D D D D

Processor

Instructions

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 97

Memory Typology: Shared

Memory

Processor

Processor Processor

Processor

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 98

Memory Typology: Distributed

MemoryProcessor MemoryProcessor

MemoryProcessor MemoryProcessor

Network

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 99

Page 34: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 34

Memory Typology: Hybrid

MemoryProcessor

Network

Processor

MemoryProcessor

Processor

MemoryProcessor

Processor

MemoryProcessor

Processor

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 100

Parallelization Problems• How do we assign work units to workers?• What if we have more work units than workers?• What if workers need to share partial results?• How do we aggregate partial results?• How do we know all the workers have finished?• What if workers die?

What is the common theme of all of these problems?

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 101

General Theme?

• Parallelization problems arise from:– Communication between workers– Access to shared resources (e.g., data)

• Thus, we need a synchronization system!• This is tricky:

– Finding bugs is hard– Solving bugs is even harder

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 102

Page 35: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 35

Managing Multiple Workers• Difficult because

– (Often) don’t know the order in which workers run– (Often) don’t know where the workers are running– (Often) don’t know when workers interrupt each other

• Thus, we need:– Semaphores (lock, unlock)– Conditional variables (wait, notify, broadcast)– Barriers

• Still, lots of problems:– Deadlock, livelock, race conditions, ...

• Moral of the story: be careful!– Even trickier if the workers are on different machines

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 103

Patterns for Parallelism

• Parallel computing has been around for decades

• Here are some “design patterns” …

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 104

Master/Slaves

slaves

master

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 105

Page 36: CS5950 / CS6030 Cloud Computing, Big Data - wmich.edu · CS5950 / CS6030 Cloud Computing, Big Data Ajay Gupta B239, CEAS Computer Science Department ... •AWS starter –EC2, VPC,

Cloud Computing, 9/10/19

WiSe Lab @ WMU Ajay Gupta www.cs.wmich.edu/wise 36

Producer/Consumer Flow

CP

P

P

C

C

CP

P

P

C

C

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 106

Work Queues

CP

P

P

C

C

shared queue

W W W W W

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 107

Rubber Meets Road• From patterns to implementation:

– pthreads, OpenMP for multi-threaded programming– MPI for clustering computing– …

• The reality:– Lots of one-off solutions, custom code– Write you own dedicated library, then program with it– Burden on the programmer to explicitly manage everything

• MapReduce to the rescue!– (for next time)

WiSe Lab @ WMU www.cs.wmich.edu/wise

Cloud Computing 108