Adaptive MapReduce using Situation-Aware Mappers

Adaptive MapReduce using Situation-AwareMappers

Rares Vernica1 (HP Labs),Andrey Balmin, Kevin S. Beyer, Vuk Ercegovac (IBM Research)

1Work done at IBM Research.

15th International Conference on Extending Database Technology,March 26-30 2012

Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 1 / 25

Outline

1 Motivation

2 Problem Statement

3 Situation-Aware MappersAdaptive MappersAdaptive CombinersAdaptive Sampling and Partitioning

4 Summary


MapReduce Review

map (k,v) → list(k,v);reduce (k,list(v)) → list(k,v).

DFSINPUT 1/3

INPUT 3/3

INPUT 2/3

MAP

Input:(k,v)

MAP

MAPREDUCE

Output:list(k,v)

REDUCE

SHUFFLE

MERGE

Input:(k, list(v))

DFSOUTPUT 1/2

OUTPUT 2/2

Output:list(k,v)

combine (k,list(v)) → list(k,v).


Motivation: MapReduce Issues

MapReduceParallel data-processing frameworkOpen-source implementation (Hadoop)Simple programming environment

MapReduce: “simplicity over performance”Limited choice of execution strategies:

Mappers checkpoint after every splitMap outputs are sorted and written to fileReducer read statically predetermined partitions


Solutions to MapReduce Issues

MapReduce-inspired alternativesDryad (Microsoft)Spark (UC Berkeley)Hyracks (UC Irvine)Nephele (TU Berlin)

Have more choices in runtime execution


Our Solution: Adaptive MapReduce

Make MapReduce (Hadoop) more flexibleLeverage existing investment in:

Framework (Hadoop)Query processing systems (Jaql, Pig, Hive)

Techniques for:Dynamic checkpoint intervals (Map)Best-effort hash-based aggregation (Combine)Dynamic, sample-based, partitioning (Reduce)

Performance tuning:Cardinality and cost estimation (due to UDFs)Adaptive to runtime environment


Problem Statement: Adaptive MapReduce

GoalsImprove MapReduce (Hadoop) performance by:

New runtime optionsAdaptive to runtime environment

Preserve Hadoop’sFault-toleranceScalabilityProgramability


Outline

1 Motivation

2 Problem Statement

3 Situation-Aware MappersAdaptive MappersAdaptive CombinersAdaptive Sampling and Partitioning

4 Summary


Situation-Aware Mappers

Main ideaMake MapReduce more dynamic

Mappers:

Aware of the global state of the jobCommunicate through a distributed meta-data storeBreak assumption: isolation




Main ideaMake MapReduce more dynamicMappers:

Aware of the global state of the job

Communicate through a distributed meta-data storeBreak assumption: isolation





Aware of the global state of the jobCommunicate through a distributed meta-data store

Break assumption: isolation





Aware of the global state of the jobCommunicate through a distributed meta-data storeBreak assumption: isolation



Adaptive MapReduce

DFS

MAPMAPMAP

DFS

REDUCEREDUCE

DFS

MAPMAPMAP

DFS

REDUCEREDUCE

DMDSDFS

MAPMAPMAP

DFS

REDUCEREDUCE

AM

AS

AP

AC

DMDS

Adaptive TechniquesAM: Adaptive MappersAC: Adaptive CombinersAS: Adaptive SamplingAP: Adaptive Partitioning


Adaptive MapReduce

DFS

MAPMAPMAP

DFS

REDUCEREDUCE

DFS

MAPMAPMAP

DFS

REDUCEREDUCE

DMDS

DFS

MAPMAPMAP

DFS

REDUCEREDUCE

AM

AS

AP

AC

DMDS



Distributed Meta-Data StoreDistributed read/writeTransactionale.g., ZooKeeper

Adaptive MapReduce

DFS

MAPMAPMAP

DFS

REDUCEREDUCE

DFS

MAPMAPMAP

DFS

REDUCEREDUCE

DMDSDFS

MAPMAPMAP

DFS

REDUCEREDUCE

AM

AS

AP

AC

DMDS



Adaptive Mappers Motivation

Input data is divided into splitsOne-to-one correspondence of mappers and splitsAM decouple # splits from # mappers

: Startup cost, e.g., scheduling, loading ref. data

, : Split processing cost

Small splits Large startup cost Balanced workload

Large splits Small startup cost Inbalanced workload

: Startup cost, e.g., scheduling, loading ref. data

, : Split processing cost

Small splits Large startup cost Balanced workload

Large splits Small startup cost Inbalanced workload

Adaptive Mappers Small startup cost Balanced workload


Adaptive Mappers Algorithm

JobID locations Host1 [Split1, Split2, ... ] Host2 ...

MapReduce Client

Root1ZooKeeper


MapReduce Client

Root1ZooKeeper

Host2

Map1Init

Map2Init

...

Host1

...

2

...


MapReduce Client

Root1ZooKeeper

Host2

Map1Init

Map2Init

...

Host1

...

2

...

3


MapReduce Client

Root1ZooKeeper

Host2

Map1Init

Map2Init

...

Host1

...

2

...

3

4 assigned Split1{Map2}

Split1


MapReduce Client

Root1ZooKeeper

Host2

Map1Init

Map2Init

...

Host1

...

2

...

3

4 assigned Split1{Map2}

Split15

OK/Fail

Store meta-data inZooKeeperImplemented as a newInputFormat


Adaptive Mappers Algorithm

Additional FeaturesProcess local splits first, then remote splitsFault tolerance

Restated task unlocks splitsSplit reprocessing is shared

Scheduler aware (FIFO, FAIR, and FLEX)


Experimental Setting

Hardware40-node IBM Systemx iDataPlex dx340Two quad-core Intel Xeon E5540 64-bit 2.83GHz32GB RAMFour SATA disks160 map and 160 reduce slots

SoftwareUbuntu Linux, kernel 2.6.32-24 64-bit server editionJava 1.6 64-bit server editionHadoop 0.20.2ZooKeeper 3.3.1


Start-up Cost vs. ZooKeeper Overhead

20 200 2000

Number of Splits

020406080

100120140

280300

Tim

e (s

econ

ds)

Regular MappersAdaptive Mappers 2000 1-byte records

Sleep 1s/record5 nodes, 20 map slots20-2000 Reg. Mappers20 Adaptive Mappers

Small ZooKeeperoverheadLarge Map startupcost ∼2s/map


Adaptive Mappers Workloads

1 Set-Similarity Join [Vernica et al., 2010]Publication datasetsDBLP: 1.2M records, 310MBCITESEERX: 1.3M records, 1,750MBIncreased to ×10 and ×100

2 JOINSingle dataset (“fact” table), Sort Benchmark data generatorFan-out coefficient (“dimension” table)average join fan-out 1 : 30TERASORT: 1B records, 93GB


Adaptive Mappers Experiments - Set-Similarity Join

2048102451225612864 32 AM

Split Size (MB)

0

200

400

600

800

1000

Tim

e (s

econ

ds)

Regular MappersAdaptive Mappers

Stage 3:One-Phase Record JoinBroadcast join equivalentDBLP and CITESEERX ×10Single wave of AM

×3 speedup over defaultHadoop split size (64MB)Optimal with no tuning


Adaptive Mappers Experiments - JOIN

102451225612864 32 16 8 AM

Split Size (MB)

0

300

600

900

1200

Tim

e (s

econ

ds)

Regular MappersAdaptive Mappers

Map-only job1B TERASORT recordsModels a skewed joinSingle wave of AM

Regular Mappers:Large split: data skewSmall split: schedulingand start-up overhead

Optimal with no tuning


Adaptive MapReduce

MAP

AM

AS

AP

ACMAP

AM

AS

AP

ACMAP

DFS DFS

REDUCE

AM

AS REDUCE

AP

AC

DMDS



Adaptive Combiners

Main ideaReplace sort with hashingReduce serialization, sort, and IO

Map

Regular Combiners

Sort Buffer

: User code: Data

Map

Regular Combiners

Sort Buffer

: User code: Data

CombineSortMap

Regular Combiners

Sort Buffer

: User code: Data

CombineSortMap

Regular Combiners

Sort Buffer

: User code: Data

CombineSort MergeMap

Regular Combiners

Sort Buffer

: User code: Data

CombineSort Merge

Adaptive Combiners

Hash-group and Combine

Map

Regular Combiners

Sort Buffer

: User code: Data

CombineSort Merge

Adaptive Combiners



Adaptive Combiners


Map

Regular Combiners

Sort Buffer

: User code: Data

Map

Regular Combiners

Sort Buffer

: User code: Data

CombineSort

Map

Regular Combiners

Sort Buffer

: User code: Data

CombineSortMap

Regular Combiners

Sort Buffer

: User code: Data


Regular Combiners

Sort Buffer

: User code: Data

CombineSort Merge

Adaptive Combiners


Map

Regular Combiners

Sort Buffer

: User code: Data

CombineSort Merge

Adaptive Combiners



Adaptive Combiners


Map

Regular Combiners

Sort Buffer

: User code: Data

Map

Regular Combiners

Sort Buffer

: User code: Data

CombineSortMap

Regular Combiners

Sort Buffer

: User code: Data

CombineSort

Map

Regular Combiners

Sort Buffer

: User code: Data


Regular Combiners

Sort Buffer

: User code: Data

CombineSort Merge

Adaptive Combiners


Map

Regular Combiners

Sort Buffer

: User code: Data

CombineSort Merge

Adaptive Combiners



Adaptive Combiners


Map

Regular Combiners

Sort Buffer

: User code: Data

Map

Regular Combiners

Sort Buffer

: User code: Data

CombineSortMap

Regular Combiners

Sort Buffer

: User code: Data

CombineSortMap

Regular Combiners

Sort Buffer

: User code: Data

CombineSort Merge

Map

Regular Combiners

Sort Buffer

: User code: Data

CombineSort Merge

Adaptive Combiners


Map

Regular Combiners

Sort Buffer

: User code: Data

CombineSort Merge

Adaptive Combiners



Adaptive Combiners


Map

Regular Combiners

Sort Buffer

: User code: Data

Map

Regular Combiners

Sort Buffer

: User code: Data

CombineSortMap

Regular Combiners

Sort Buffer

: User code: Data

CombineSortMap

Regular Combiners

Sort Buffer

: User code: Data


Regular Combiners

Sort Buffer

: User code: Data

CombineSort Merge

Adaptive Combiners


Map

Regular Combiners

Sort Buffer

: User code: Data

CombineSort Merge

Adaptive Combiners



Adaptive Combiners Details

“Best-effort” aggregationNever spill to diskHash-table replacement policies:

No-Replacement (NR)Least-Recently-Used (LRU)

Implemented as:Library for HadoopOptimization choice for Jaql


Adaptive Combiners Experiments

GROUP-BYSynthetic dataset with 3 dimensions (A1, A2, and A3) and 1 factGroup records and apply aggregation functionTWL: 10B records, 120GB

Reg.

AM AC AM, AC

0

30

60

90

120

150

180

Tim

e (s

econ

ds)

Regular CombinersAdaptive Combiners NRAdaptive Combiners LRU

GROUP-BY on A1×2.5 speedup

Reg.

AM 1 25 100

Cache Size (K)

0

50

100

150

200

250

300

350

Tim

e (s

econ

ds)

0.00

0.25

0.50

0.75

1.00

Mis

s R

atio

(%

)

Regular CombinersAdaptive Combiners NRAdaptive Combiners LRUMiss Ratio NRMiss Ratio LRU

GROUP-BY on A1 and A2×3 speedup


Adaptive MapReduce

MAP

AM

AS

AP

ACMAP

AM

AS

AP

ACMAP

DFS DFS

REDUCE

AM

AS REDUCE

AP

AC

DMDS



Adaptive Sampling and Partitioning

Step 1 Compute and publishlocal histogram

Step 2 Collect localhistograms andcompute partitioningfunction

Step 3 Broadcast partitioningfunction

MAPREDUCE

MAPREDUCE

MAP

MAPREDUCE

MAPREDUCE

MAP

DMDS

MAPREDUCE

MAPREDUCE

MAP

DMDS

MAPREDUCE

MAPREDUCE

MAP

DMDS


Summary

Adaptive runtime techniques for MapReduceSituation-Aware MappersMake MapReduce more dynamic

Up to ×3 speedup for well-tuned jobsOrders of magnitude speedup for badly tuned jobsNever hurt performanceConfigure themselvesPart of IBM InfoSphere BigInsights


Vernica, R., Carey, M., and Li, C. (2010).Efficient parallel set-similarity joins using MapReduce.In SIGMOD Conference.


Adaptive MapReduce using Situation-Aware Mappers

Documents

Transcript of Adaptive MapReduce using Situation-Aware Mappers