Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar...

26
Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 2014 1

Transcript of Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar...

Page 1: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

1

Programming Models

Cloud Computing

Garth Gibson

Greg GangerMajd SakrRaja Sambasivan

G. Gibson, Mar 3, 2014

Page 2: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

2

Recall the SaaS, PaaS, IaaS Taxonomy

• Service, Platform or Infrastructure as a Serviceo SaaS: service is a complete application (client-server

computing)• Not usually a programming abstraction

o PaaS: high level (language) programming model for cloud

computer• Eg. Rapid prototyping languages• Turing complete but resource management hidden

o IaaS: low level (language) computing model for cloud computer• Eg. Assembler as a language• Basic hardware model with all (virtual) resources exposed

• For PaaS and IaaS, cloud programming is neededo How is this different from CS 101? Scale, fault tolerance,

elasticity, ….

G. Gibson, Mar 3, 2014

Page 3: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

3

Embarrassingly parallel “Killer app:” Web servers• Online retail stores (like the ice.com example)

o Most of the computational demand is for browsing product marketing,

forming and rendering web pages, managing customer session state• Actual order taking and billing not as demanding, have separate

specialized services (Amazon bookseller backend)

o One customer session needs a small fraction of one server

o No interaction between customers (unless inventory near exhaustion)

• Parallelism is more cores running identical copy of web server

• Load balancing, maybe in name service, is parallel programmingo Elasticity needs template service, load monitoring, cluster allocation

o These need not require user programming, just configuration

G. Gibson, Mar 3, 2014

Page 4: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

4

Eg., Obama for America Elastic Load Balancer

G. Gibson, Mar 3, 2014

Page 5: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

5

What about larger apps?

• Parallel programming is hard – how can cloud frameworks

help?

• Collection-oriented languages (Sipelstein&Blelloch, Proc IEEE

v79, n4, 1991)

o Also known as Data-parallel

o Specify a computation on an element; apply to each in

collection• Analogy to SIMD: single instruction on multiple data

o Specify an operation on the collection as a whole• Union/intersection, permute/sort, filter/select/map• Reduce-reorderable (A) /reduce-ordered (B)

– (A) Eg., ADD(1,7,2) = (1+7)+2 = (2+1)+7 = 10– (B) Eg., CONCAT(“the “, “lazy “, “fox “) = “the lazy fox “

• Note the link to MapReduce …. its no accident

G. Gibson, Mar 3, 2014

Page 6: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

6

High Performance Computing Approach

• HPC was almost the only home for parallel computing in the 90s

• Physical simulation was the killer app – weather, vehicle design,

explosions/collisions, etc – replace “wet labs” with “dry labs”o Physics is the same everywhere, so define a mesh on a set of particles, code

the physics you want to simulate at one mesh point as a property of the

influence of nearby mesh points, and iterate

o Bulk Synchronous Processing (BSP): run all updates of mesh points in

parallel based on value at last time point, form new set of values & repeat

• Defined “Weak Scaling” for bigger machines – rather than make a fixed

problem go faster (strong scaling), make bigger problem go same

speedo Most demanding users set problem size to match total available memory

G. Gibson, Mar 3, 2014

Page 7: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

7

High Performance Computing Frameworks

• Machines cost O($10-100) million, so o emphasis was on maximizing utilization of machines (congress checks)

o low-level speed and hardware specific optimizations (esp. network)

o preference for expert programmers following established best practices

• Developed MPI (Message Passing Interface) framework (eg. MPICH)o Launch N threads with library routines for everything you need:

• Naming, addressing, messaging, synchronization (barriers)• Transforms, physics modules, math libraries, etc

o Resource allocators and schedulers space share jobs on physical cluster

o Fault tolerance by checkpoint/restart requiring programmer save/restore

o Proto-elasticity: kill N-node job & reschedule a past checkpoint on M nodes

• Very manual, deep learning curve, few commercial runaway successes

G. Gibson, Mar 3, 2014

Page 8: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

8

Broadening HPC: Grid Computing

• Grid Computing started with commodity servers (predates

Cloud)

• Frameworks were less specialized, easier to use (& less

efficient)o Beowulf, PVM (parallel virtual machine), Condor, Rocks, Sun

Grid Engine

• For funding reasons grid emphasized multi-institution

sharingo So authentication, authorization, single-signon, parallel-ftp

o Heterogeneous workflow (run job A on mach. B, then job C on

mach. D)

• Basic model: jobs selected from batch queue, take over

cluster

• Simplified “pile of work”: when a core comes free, take a

task from the run queue and run to completion

G. Gibson, Mar 3, 2014

Page 9: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

9

Cloud Programming, back to the future

• HPC demanded too much expertise, too many details and

tuning

• Cloud frameworks all about making parallel programming

easiero Willing to sacrifice efficiency (too willing perhaps)

o Willing to specialize to application (rather than machine)

• Canonical BigData user has data & processing needs that

require lots of computer, but doesn’t have CS or HPC

training & experienceo Wants to learn least amount of computer science to get results

this week

o Might later want to learn more if same jobs become a personal

bottleneck

G. Gibson, Mar 3, 2014

Page 10: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

10

Cloud Programming Case Studies

• MapReduceo Package two Sipelstein91 operators filter/map and reduce as

the base of a data parallel programming model built around

Java libraries

• DryadLinqo Compile workflows of different data processing programs into

schedulable processes

• GraphLabo Specialize to iterative machine learning with local update

operations and dynamic rescheduling of future updates

G. Gibson, Mar 3, 2014

Page 11: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

11

MapReduce (Majd)

G. Gibson, Mar 3, 2014

Page 12: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

12

DryadLinq

• Simplify efficient data parallel codeo Compiler support for imperative and

declarative (eg., database) operations

o Extends MapReduce to workflows

that can be collectively optimized

• Data flows on edges between processes at vertices (workflows)

• Coding is processes at vertices and expressions representing

workflow

• Interesting part of the compiler operates on the expressionso Inspired by traditional database query optimizations – rewrite the

execution plan with equivalent plan that is expected to execute faster

G. Gibson, Mar 3, 2014

Page 13: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

13

DryadLinq

• Data flowing through a graph abstractiono Vertices are programs (possibly different with each vertex)

o Edges are data channels (pipe-like)

o Requires programs to have no side-effects (no changes to shared state)

o Apply function similar to MapReduce reduce – open ended user code

• Compiler operates on expressions, rewriting execution sequenceso Exploits prior work on compiler for workflows on sets (LINQ)

o Extends traditional database query planning with less type restrictive code• Unlike traditional plans, virtualizes resources (so might spill to storage)

o Knows how to partition sets (hash, range and round robin) over nodes

o Doesn’t always know what processes do, so less powerful optimizer than

database – where it can’t infer what is happening, it takes hints from users

o Can auto-pipeline, remove redundant partitioning, reorder partitionings, etc

G. Gibson, Mar 3, 2014

Page 14: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

14

Example: MapReduce (reduce-reorderable)

• DryadLinq

compiler can

pre-reduce,

partition,

sort-merge,

partially

aggregate

• In MapReduce

you “configure”

this youself

G. Gibson, Mar 3, 2014

Page 15: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

G. Gibson, Mar 3, 2014 15

Background on Regression

• Regression problem: For given input A, and observation Y, find

unknown x parameter

• Sparse regression is one variation of regression that favors a small

number of non-zero parameters corresponding to the most

relevant features.

…..

…..…..

….

……

..

=

Y: n by 1x: m by 1

Eg. Alzheimer Disease data463 sample by 509K features. In case of pair-wise study, # of features would be inflated to 1011.

X

A: n by m matrix

Page 16: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

16

Eg. Medical Research

• Find genetic patterns predicting disease

• Millions to 1011 (pair-wise gene study) dimensions

AT…….CG G AAAAT…….CG T AAA

AT…….CG T AAA

AT…….CG T AAA

Genetic variation associated with disease

G. Gibson, Mar 3, 2014

Samples(patients)

Page 17: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

HPC Style: Bulk Synchronous Parallel

17

1

1

1

1

Thread 1

Thread 2

Thread 3

Thread 4

Iterative Improvement of Estimated “Solution”

2

2

2

2

3

3

3

3

4

4

4

4

5

5

5

5

Time

Threads synchronize (wait for each other) every iterationParameters read/updated at synchronization barriers

Repetitive file processing: Mahout, DryadLinq, SparkDistributed shared memory: Pregel, Hama, Giraph, GraphLab

G. Gibson, Mar 3, 2014

Page 18: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

Synchronization can be costly

18

1

1

1

1

Thread 1

Thread 2

Thread 3

Thread 4

2

2

2

2

3

3

3

3

Machines performance or work assigned unequalSo threads must wait for each other

And larger clusters suffer longer communication in barrier sync

If you can, do more work between syncs, but not always possible

Time

G. Gibson, Mar 3, 2014

Page 19: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

Can we just run asynchronous?

19

1

1

1

1

Thread 1

Thread 2

Thread 3

Thread 4

2

2

2

2

3

3

3

3

4

4

4

4

5

5

5

5

Threads proceed to next iteration without waitingThreads not on same iteration #

In most computations this leads to wrong answerIn search/solve, however, it might still converge, but it might also diverge

6

6

6

6

Parameters read/updated at any time

Time

G. Gibson, Mar 3, 2014

Page 20: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

20

GraphLab: managing asynch ML• GraphLab provides a higher level programming

modelo Data is associated with vertices and edges between

vertices, inherently sparse (or we’d use a matrix representation instead)

o Program a vertex update based on its edges & neighbor vertices

o Background code used to test if its time to terminate updating

• BSP can be cumbersome & inefficiento Iterative algorithms may do little work per iteration and

may not need to move all the data each iterationo Use traditional database transaction isolation for

consistencyG. Gibson, Mar 3, 2014

Page 21: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

21

Scheduling is green field in ML

• Graphlab proposes updates do own schedulingo Baseline: sequential update of each vertex once per

iterationo Sparseness allows non-overlapping updates to execute in

parallelo Opportunity for smart schedulers to exploit more app

properties• Prioritize specific updates over other updates because these

communicate more information more quickly

G. Gibson, Mar 3, 2014

Page 22: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

One way to think about scheduling

22

1

1

1

1

Thread 1

Thread 2

Thread 3

Thread 4

2

2

2

3

3

3

4

4

4

5

5

5 6

6

6

Force threads to sync up

2 3 4 5 6

Make thread 1 catch up

Time

• “Partial” synchronicityo Spread network communication evenly (don’t sync unless needed)

o Threads usually shouldn’t wait – but mustn’t drift too far apart!

• Straggler toleranceo Slow threads must somehow catch up

G. Gibson, Mar 3, 2014

Page 23: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

Stale Synchronous Parallel (SSP)

23

Note: x-axis is now iteration count, not time!Allow threads to usually run at own pace: mostly asynch

Fastest/slowest not allowed to drift >S iterations apart: bound errorThreads cache (stale) versions of parameters, to reduce network traffic

Iteration0 1 2 3 4 5 6 7 8 9

Thread 1

Thread 2

Thread 3

Thread 4

Staleness Threshold 3Thread 1 waits untilThread 2 has reached iter 4

G. Gibson, Mar 3, 2014

Page 24: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

24

Why does SSP converge?

Instead of xtrue, SSP sees xstale = xtrue + error

The error caused by staleness is boundedOver many iterations, average error goes to zero

G. Gibson, Mar 3, 2014

Page 25: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

25

SSP uses networks efficiently

0 8 16 24 32 40 480

1000

2000

3000

4000

5000

6000

7000

8000

Time Breakdown: Compute vs NetworkLDA 32 machines (256 cores), 10% data per

iter

Network waiting time

Compute time

Staleness

Sec

on

ds

SSP balances network and compute time

BSP

G. Gibson, Mar 3, 2014

Page 26: Programming Models Cloud Computing Garth Gibson Greg Ganger Majd Sakr Raja Sambasivan G. Gibson, Mar 3, 20141.

26

Advanced Cloud Computing Programming Models• Ref 1: MapReduce: simplified data processing on large clusters.  Jeffrey Dean

and Sanjay Ghemawat.  OSDI’04.  2004.

http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf

• Ref 2: DyradLinQ: A system for general-purpose distributed data-parallel

computing using a high-level language.  Yuan Yu, Michael Isard, Dennis

Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey.

 OSDI’08.  http://research.microsoft.com/en-us/projects/dryadlinq/dryadlinq.pdf

• Ref 3: Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos

Guestrin, and Joseph M. Hellerstein (2010). "GraphLab: A New Parallel

Framework for Machine Learning." Conference on Uncertainty in Artificial

Intelligence (UAI). http://www.select.cs.cmu.edu/publications/scripts/papers.cgi

G. Gibson, Mar 3, 2014