Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath...

53
Starfish: A Self-tuning System for Big Data Analytics Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University

Transcript of Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath...

Page 1: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

Starfish: A Self-tuning System for Big Data Analytics

Presented by Carl Erhard & Zahid Mian

Authors: Herodotos Herodotou,

Harold Lim, Fei Dong, Shivnath Babu

Duke University

Page 2: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

2

Analysis in the Big Data Era

9/26/2011

Massive Data

DataAnalysi

s

Insight

Key to Success = Timely and Cost-Effective Analysis

Starfish

Page 3: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

3

We want a MAD System

9/26/2011 Starfish

Magntetism “Attracts” or welcomes all sources of data, regardless of structure, values, etc.

Agility Adaptive, remains in sync with rapid data evolution and modification

Depth More than just your typical analytics, we need to support complex operations like statistical analysis and machine learning

Page 4: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

4

No wait…I mean MADDER

9/26/2011 Starfish

Data-lifecycle Do more than just

queries, Awareness optimize the movement, storage, and processing of big data

Elasticity Dynamically adjust resource usage and operational costs based on workload and user requirements

Robustness Provide storage and querying services even in the event of some failures

Page 5: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

5

Practitioners of Big Data AnalyticsWho are the users?

Data analysts, statisticians, computational scientists…Researchers, developers, testers…Business Analysts…You!

Who performs setup and tuning?The users!Usually lack expertise to tune the system

9/26/2011 Starfish

Page 6: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

6

Motivation

9/26/2011 Starfish

Page 7: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

7

Tuning ChallengesHeavy use of programming languages for

MapReduce programs (e.g., Java/python)

Data loaded/accessed as opaque files

Large space of tuning choices (over 190 parameters!)

Elasticity is wonderful, but hard to achieve (Hadoop has many useful mechanisms, but policies are lacking)

Terabyte-scale data cycles

9/26/2011 Starfish

Page 8: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

8

Our goal: Provide good performance automatically

Starfish: Self-tuning System

9/26/2011

MapReduce Execution Engine

Distributed File System

Hadoop

Java / C++ / R / Python

OozieHivePigElastic

MapReduceJaql

HBase

Starfish

Analytics System

Starfish

Page 9: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

9

What are the Tuning Problems?

9/26/2011

Job-level MapReduce

configuration

Workload management

Datalayout tuning

Cluster sizing

Workflow optimization

J1 J2

J3

J4

Starfish

Page 10: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

10

Starfish’s Core Approach to Tuning

9/26/2011

1) if Δ(conf. parameters) then what …?

2) if Δ(data properties) then what …?

3) if Δ(cluster properties) then what …?

Profiler

Collects concisesummaries of

execution

What-if Engine

Estimates impact of hypothetical

changes on execution

Optimizers

Search through space of tuning choices

Job

WorkflowWorkload

Data layout

Cluster

Starfish

Page 11: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

Starfish Architecture

9/26/2011 11Starfish

Page 12: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

12

Job Level TuningJust-in-Time Optimizer: Automatically selects

efficient execution techniques for MapReduce jobs.

Profiler: A Starfish component which is able to collect detailed summaries of jobs on a task-by-task basis.

Sampler: Collects statistics about input, intermediate, and output data of a MapReduce job.

9/26/2011 Starfish

Page 13: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

13

MapReduce Job Execution

9/26/2011

split 0 map out 0reducesplit 2 map

split 1 map split 3 map Out 1reduce

job j = < program p, data d, resources r, configuration c >

Starfish

Page 14: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

14

What Controls MR Job Execution?

Space of configuration choices:Number of map tasksNumber of reduce tasksPartitioning of map outputs to reduce tasksMemory allocation to task-level buffersWhether output data from tasks should be compressedWhether combine function should be used

9/26/2011

job j = < program p, data d, resources r, configuration c >

Starfish

Page 15: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

15

Effect of Configuration Settings

Use defaults or set manually (rules-of-thumb)Rules-of-thumb may not suffice

9/26/2011

Two-dimensional projection of a multi-dimensional surface(Word Co-occurrence MapReduce Program)

Rules-of-thumb settings

Starfish

Page 16: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

16

MapReduce Job Tuning in a NutshellGoal:

Challenges: p is an arbitrary MapReduce program; c is high-dimensional; …

9/26/2011

),,,(minarg crdpFcSc

opt

),,,( crdpFperf

Profiler

What-if Engine

Optimizer

Runs p to collect a job profile (concise execution summary) of <p,d1,r1,c1>

Given profile of <p,d1,r1,c1>, estimates virtual profile for <p,d2,r2,c2>

Enumerates and searches through the optimization space S efficiently

Starfish

Page 17: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

17

Job ProfileConcise representation of program execution as a jobRecords information at the level of “task phases”Generated by Profiler through measurement or by the

What-if Engine through estimation

9/26/2011

Memory Buffer

Merge

Sort,[Combine],[Compress]

Serialize,Partitionmap

Merge

split

DFS

SpillCollectMapRead

Starfish

Page 18: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

18

Job Profile FieldsDataflow: amount of data flowing through task phasesMap output bytes

Number of spills

Number of records in buffer per spill

9/26/2011

Costs: execution times at the level of task phasesRead phase time in the map task

Map phase time in the map task

Spill phase time in the map task

Dataflow Statistics: statistical information about dataflowWidth of input key-value pairs

Map selectivity in terms of records

Map output compression ratio

Cost Statistics: statistical information about resource costsI/O cost for reading from local disk per byte

CPU cost for executing the Mapper per record

CPU cost for uncompressing the input per byte

Starfish

Page 19: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

19

Generating Profiles by MeasurementGoals

Have zero overhead when profiling is turned offRequire no modifications to HadoopSupport unmodified MapReduce programs written in

Java or Hadoop Streaming/Pipes (Python/Ruby/C++)

Approach: Dynamic (on-demand) instrumentationEvent-condition-action rules are specified (in Java)Leads to run-time instrumentation of Hadoop internalsMonitors task phases of MapReduce job executionWe currently use Btrace (Hadoop internals are in Java)

9/26/2011 Starfish

Page 20: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

20

Generating Profiles by Measurement

9/26/2011

split 0 map out 0reduce

split 1 map

raw data

raw data

raw data

map profile

reduce profile

job profile

Use of Sampling• Profile fewer tasks• Execute fewer tasks

JVM = Java Virtual Machine, ECA = Event-Condition-Action

JVM JVM

JVM

Enable Profiling

ECA rules

Starfish

Page 21: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

21

Results of Job Profiling

9/26/2011 Starfish

Page 22: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

22

Results using Job Profiling

9/26/2011 Starfish

Page 23: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

23

Workflow-Aware SchedulingUnbalanced Data Layout

Skewed DataData Layout Not Considered when SchedulingTasksAddition/Dropping Partitions—No Rebalance

Can Lead to Failures Due to Space IssuesLocality-Aware Schedulers Can Make Problem WorsePossible Solutions:

Change # of ReplicasCollocating Data (Block Placement Policy)

9/26/2011 Starfish

Page 24: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

24

Impact of Unbalanced Data Layout

9/26/2011 Starfish

Page 25: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

25

Impact of Unbalanced Data Layout

9/26/2011 Starfish

Page 26: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

26

Impact of Unbalanced Data Layout

9/26/2011 Starfish

Page 27: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

27

Workflow-Aware SchedulingMakes Decisions by Considering Producer-Consumer

Relationships

9/26/2011 Starfish

Nodes

Page 28: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

28

Starfish’s Workflow-Aware SchedulerSpace of Choices:

Block Placement Policy: Round Robin (Local Write is default)

Replication FactorSize of blocks: general large for large filesCompression: Impacts I/O; not always beneficial

9/26/2011 Starfish

Page 29: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

29

Starfish’s Workflow-Aware SchedulerWhat-If Questions

A) Expected runtime of Job P if the RR block placement policy is used for P’s output files?

B) New Data layout in the cluster if the RR block placement policy is used for P’s output files?

C) Expected runtime of Job C1 (C2) if its input data layout is the one in the answer of Question (above)?

D) Expected runtimes of Jobs C1 and C2 if scheduled concurrently when Job P completes?

E) Given Local Write block policy and RF = 1 for Job P’s output, what is the expected increase in the runtime of Job C1 if one node in the cluster fails during C1’s execution?

9/26/2011 Starfish

Page 30: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

30

Estimates from the What-if Engine

9/26/2011

Hadoop cluster: 16 nodes, c1.mediumMapReduce Program: Word Co-occurrenceData set: 10 GB Wikipedia

True surface Estimated surface

Starfish

Page 31: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

31

Workflow Scheduler Picks Layout

9/26/2011 Starfish

Page 32: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

32

Optimizations-Workload Optimizer

9/26/2011 Starfish

Page 33: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

33

Provisioning--ElastisizerMotivation: Amazon Elastic MapReduce Data in S3, processed in-cluster, stored to S3User Pays for Resources UsedElastisizer Determines …

Best clusterHadoop configurations

… Based on user-specified goals (execution time and costs)

9/26/2011 Starfish

Page 34: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

34

Elastisizer Configuration Evaluation

9/26/2011 Starfish

Page 35: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

35

Elastisizer Configuration Evaluation

9/26/2011 Starfish

Page 36: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

36

Elastisizer- Cluster Configurations

9/26/2011 Starfish

Page 37: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

37

Multi-objective Cluster Provisioning

9/26/2011

m1.small m1.large m1.xlarge c1.medium c1.xlarge0

200400600800

1,0001,200

ActualPredicted

Ru

nn

ing

Tim

e (m

in)

m1.small m1.large m1.xlarge c1.medium c1.xlarge0.002.004.006.008.00

10.00

ActualPredicted

EC2 Instance Type for Target Cluster

Cos

t ($

)

Instance Type for Source Cluster: m1.large

Starfish

Page 38: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

38

Critique of PaperGood

Source Available for ImplementationAble to See the impact of various settingsGood Visualization ToolsTutorials/Source available at duke.edu/starfish

BadThe paper (and subsequent materials) talk a lot about

what Starfish does, but not necessarily how it does itThere is no documentation on LastWord, and this seems

importantOnly works after a the job/workflow has been executed at

least once9/26/2011 Starfish

Page 39: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

39

Starfish’s VisualizerTimeline Views

Shows progress of a job execution at the task levelSee execution of same job with different settings

Data-flow ViewsView of flow of data among nodes, along with MR jobs“Video Mode” allows playback execution from past

Profile ViewsTimings, data-flow, resource-level

9/26/2011 Starfish

Page 40: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

40

Timeline Views

9/26/2011 Starfish

Page 41: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

41

Timeline View

9/26/2011 Starfish

Page 42: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

42

Data Skew View

9/26/2011 Starfish

Page 43: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

43

Data Skew View

9/26/2011 Starfish

Page 44: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

44

Data Skew View

9/26/2011 Starfish

Page 45: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

45

Data-flow Views

9/26/2011 Starfish

Page 46: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

46

ReferencesHerodotou, Herodotos, et al. "Starfish: A self-tuning

system for big data analytics." Proc. of the Fifth CIDR Conf. 2011.

Dong, Fei. Extending Starfish to Support the Growing Hadoop Ecosystem. Diss. Duke University, 2012.

Herodotou, Herodotos, Fei Dong, and Shivnath Babu. "MapReduce programming and cost-based optimization? Crossing this chasm with Starfish." Proceedings of the VLDB Endowment 4.12 (2011).

http://www.cs.duke.edu/starfish/http://www.youtube.com/watch?v=Upxe2dzE1uk

9/26/2011 Starfish

Page 47: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

47

Backup

9/26/2011 Starfish

Page 48: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

48

Hadoop MapReduce EcosystemPopular solution to Big Data Analytics

9/26/2011

MapReduce Execution Engine

Distributed File System

Hadoop

Java / C++ / R / Python

OozieHivePigElastic

MapReduceJaql

HBase

Starfish

Page 49: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

49

Workflow-level TuningStarfish has a Workflow-aware Scheduler which

addresses several concerns:How do we equally distribute computation across

nodes?How do we adapt to imbalance of a load or energy

cost?

The Workflow-aware Scheduler works with the What-if Engine and the Data Manager to answer these questions

9/26/2011 Starfish

Page 50: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

50

Workload-level TuningStarfish’s Workload Optimizer is aware of each

workflow that will be executed. It reorders the workflows in order to make them more efficient.This includes reusing data for different workflows that

use the same MapReduce jobs.

9/26/2011 Starfish

Page 51: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

51

What-if Engine

Job Oracle

Virtual Job Profile for <p, d2, r2, c2>

What-if Engine

9/26/2011

Task Scheduler Simulator

JobProfile

<p, d1, r1, c1>

Properties of Hypothetical job

Input DataProperties

<d2>

ClusterResources

<r2>

ConfigurationSettings

<c2>

Possibly Hypothetical

Starfish

Page 52: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

52

Virtual Profile Estimation

9/26/2011

Given profile for job j = <p, d1, r1, c1> estimate profile for job j' = <p, d2, r2, c2>

(Virtual) Profile for j'

DataflowStatistics

Dataflow

CostStatistics

Costs

Profile for jInput

Data d2

Confi-guration

c2

Resourcesr2

Costs

White-box Models

CostStatisticsRelative

Black-boxModels

Dataflow

White-box Models

DataflowStatistics

CardinalityModels

Starfish

Page 53: Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University.

53

Job Optimizer

9/26/2011

Best Configuration Settings <copt> for <p, d2, r2>

Subspace Enumeration

Recursive Random Search

Just-in-Time Optimizer

JobProfile

<p, d1, r1, c1>

Input DataProperties

<d2>

ClusterResources

<r2>

What-ifcalls

Starfish