Adaptive MapReduce using Situation-Aware Mappers
description
Transcript of Adaptive MapReduce using Situation-Aware Mappers
Adaptive MapReduce using Situation-AwareMappers
Rares Vernica1 (HP Labs),Andrey Balmin, Kevin S. Beyer, Vuk Ercegovac (IBM Research)
1Work done at IBM Research.
15th International Conference on Extending Database Technology,March 26-30 2012
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 1 / 25
Outline
1 Motivation
2 Problem Statement
3 Situation-Aware MappersAdaptive MappersAdaptive CombinersAdaptive Sampling and Partitioning
4 Summary
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 2 / 25
MapReduce Review
map (k,v) → list(k,v);reduce (k,list(v)) → list(k,v).
DFSINPUT 1/3
INPUT 3/3
INPUT 2/3
MAP
Input:(k,v)
MAP
MAPREDUCE
Output:list(k,v)
REDUCE
SHUFFLE
MERGE
Input:(k, list(v))
DFSOUTPUT 1/2
OUTPUT 2/2
Output:list(k,v)
combine (k,list(v)) → list(k,v).
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 3 / 25
MapReduce Review
map (k,v) → list(k,v);reduce (k,list(v)) → list(k,v).
DFSINPUT 1/3
INPUT 3/3
INPUT 2/3
MAP
Input:(k,v)
MAP
MAPREDUCE
Output:list(k,v)
REDUCE
SHUFFLE
MERGE
Input:(k, list(v))
DFSOUTPUT 1/2
OUTPUT 2/2
Output:list(k,v)
combine (k,list(v)) → list(k,v).
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 3 / 25
Motivation: MapReduce Issues
MapReduceParallel data-processing frameworkOpen-source implementation (Hadoop)Simple programming environment
MapReduce: “simplicity over performance”Limited choice of execution strategies:
Mappers checkpoint after every splitMap outputs are sorted and written to fileReducer read statically predetermined partitions
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 4 / 25
Solutions to MapReduce Issues
MapReduce-inspired alternativesDryad (Microsoft)Spark (UC Berkeley)Hyracks (UC Irvine)Nephele (TU Berlin)
Have more choices in runtime execution
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 5 / 25
Our Solution: Adaptive MapReduce
Make MapReduce (Hadoop) more flexibleLeverage existing investment in:
Framework (Hadoop)Query processing systems (Jaql, Pig, Hive)
Techniques for:Dynamic checkpoint intervals (Map)Best-effort hash-based aggregation (Combine)Dynamic, sample-based, partitioning (Reduce)
Performance tuning:Cardinality and cost estimation (due to UDFs)Adaptive to runtime environment
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 6 / 25
Problem Statement: Adaptive MapReduce
GoalsImprove MapReduce (Hadoop) performance by:
New runtime optionsAdaptive to runtime environment
Preserve Hadoop’sFault-toleranceScalabilityProgramability
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 7 / 25
Outline
1 Motivation
2 Problem Statement
3 Situation-Aware MappersAdaptive MappersAdaptive CombinersAdaptive Sampling and Partitioning
4 Summary
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 8 / 25
Situation-Aware Mappers
Main ideaMake MapReduce more dynamic
Mappers:
Aware of the global state of the jobCommunicate through a distributed meta-data storeBreak assumption: isolation
Situation-Aware Mappers
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 9 / 25
Situation-Aware Mappers
Main ideaMake MapReduce more dynamicMappers:
Aware of the global state of the job
Communicate through a distributed meta-data storeBreak assumption: isolation
Situation-Aware Mappers
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 9 / 25
Situation-Aware Mappers
Main ideaMake MapReduce more dynamicMappers:
Aware of the global state of the jobCommunicate through a distributed meta-data store
Break assumption: isolation
Situation-Aware Mappers
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 9 / 25
Situation-Aware Mappers
Main ideaMake MapReduce more dynamicMappers:
Aware of the global state of the jobCommunicate through a distributed meta-data storeBreak assumption: isolation
Situation-Aware Mappers
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 9 / 25
Situation-Aware Mappers
Main ideaMake MapReduce more dynamicMappers:
Aware of the global state of the jobCommunicate through a distributed meta-data storeBreak assumption: isolation
Situation-Aware Mappers
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 9 / 25
Adaptive MapReduce
DFS
MAPMAPMAP
DFS
REDUCEREDUCE
DFS
MAPMAPMAP
DFS
REDUCEREDUCE
DMDSDFS
MAPMAPMAP
DFS
REDUCEREDUCE
AM
AS
AP
AC
DMDS
Adaptive TechniquesAM: Adaptive MappersAC: Adaptive CombinersAS: Adaptive SamplingAP: Adaptive Partitioning
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 10 / 25
Adaptive MapReduce
DFS
MAPMAPMAP
DFS
REDUCEREDUCE
DFS
MAPMAPMAP
DFS
REDUCEREDUCE
DMDS
DFS
MAPMAPMAP
DFS
REDUCEREDUCE
AM
AS
AP
AC
DMDS
Adaptive TechniquesAM: Adaptive MappersAC: Adaptive CombinersAS: Adaptive SamplingAP: Adaptive Partitioning
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 10 / 25
Distributed Meta-Data StoreDistributed read/writeTransactionale.g., ZooKeeper
Adaptive MapReduce
DFS
MAPMAPMAP
DFS
REDUCEREDUCE
DFS
MAPMAPMAP
DFS
REDUCEREDUCE
DMDSDFS
MAPMAPMAP
DFS
REDUCEREDUCE
AM
AS
AP
AC
DMDS
Adaptive TechniquesAM: Adaptive MappersAC: Adaptive CombinersAS: Adaptive SamplingAP: Adaptive Partitioning
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 10 / 25
Adaptive Mappers Motivation
Input data is divided into splitsOne-to-one correspondence of mappers and splitsAM decouple # splits from # mappers
: Startup cost, e.g., scheduling, loading ref. data
, : Split processing cost
Small splits Large startup cost Balanced workload
Large splits Small startup cost Inbalanced workload
: Startup cost, e.g., scheduling, loading ref. data
, : Split processing cost
Small splits Large startup cost Balanced workload
Large splits Small startup cost Inbalanced workload
Adaptive Mappers Small startup cost Balanced workload
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 11 / 25
Adaptive Mappers Motivation
Input data is divided into splitsOne-to-one correspondence of mappers and splitsAM decouple # splits from # mappers
: Startup cost, e.g., scheduling, loading ref. data
, : Split processing cost
Small splits Large startup cost Balanced workload
Large splits Small startup cost Inbalanced workload
: Startup cost, e.g., scheduling, loading ref. data
, : Split processing cost
Small splits Large startup cost Balanced workload
Large splits Small startup cost Inbalanced workload
Adaptive Mappers Small startup cost Balanced workload
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 11 / 25
Adaptive Mappers Algorithm
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
4 assigned Split1{Map2}
Split1
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
4 assigned Split1{Map2}
Split15
OK/Fail
Store meta-data inZooKeeperImplemented as a newInputFormat
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 12 / 25
Adaptive Mappers Algorithm
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
4 assigned Split1{Map2}
Split1
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
4 assigned Split1{Map2}
Split15
OK/Fail
Store meta-data inZooKeeperImplemented as a newInputFormat
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 12 / 25
Adaptive Mappers Algorithm
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
4 assigned Split1{Map2}
Split1
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
4 assigned Split1{Map2}
Split15
OK/Fail
Store meta-data inZooKeeperImplemented as a newInputFormat
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 12 / 25
Adaptive Mappers Algorithm
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
4 assigned Split1{Map2}
Split1
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
4 assigned Split1{Map2}
Split15
OK/Fail
Store meta-data inZooKeeperImplemented as a newInputFormat
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 12 / 25
Adaptive Mappers Algorithm
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
4 assigned Split1{Map2}
Split1
JobID locations Host1 [Split1, Split2, ... ] Host2 ...
MapReduce Client
Root1ZooKeeper
Host2
Map1Init
Map2Init
...
Host1
...
2
...
3
4 assigned Split1{Map2}
Split15
OK/Fail
Store meta-data inZooKeeperImplemented as a newInputFormat
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 12 / 25
Adaptive Mappers Algorithm
Additional FeaturesProcess local splits first, then remote splitsFault tolerance
Restated task unlocks splitsSplit reprocessing is shared
Scheduler aware (FIFO, FAIR, and FLEX)
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 13 / 25
Experimental Setting
Hardware40-node IBM Systemx iDataPlex dx340Two quad-core Intel Xeon E5540 64-bit 2.83GHz32GB RAMFour SATA disks160 map and 160 reduce slots
SoftwareUbuntu Linux, kernel 2.6.32-24 64-bit server editionJava 1.6 64-bit server editionHadoop 0.20.2ZooKeeper 3.3.1
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 14 / 25
Start-up Cost vs. ZooKeeper Overhead
20 200 2000
Number of Splits
020406080
100120140
280300
Tim
e (s
econ
ds)
Regular MappersAdaptive Mappers 2000 1-byte records
Sleep 1s/record5 nodes, 20 map slots20-2000 Reg. Mappers20 Adaptive Mappers
Small ZooKeeperoverheadLarge Map startupcost ∼2s/map
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 15 / 25
Adaptive Mappers Workloads
1 Set-Similarity Join [Vernica et al., 2010]Publication datasetsDBLP: 1.2M records, 310MBCITESEERX: 1.3M records, 1,750MBIncreased to ×10 and ×100
2 JOINSingle dataset (“fact” table), Sort Benchmark data generatorFan-out coefficient (“dimension” table)average join fan-out 1 : 30TERASORT: 1B records, 93GB
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 16 / 25
Adaptive Mappers Experiments - Set-Similarity Join
2048102451225612864 32 AM
Split Size (MB)
0
200
400
600
800
1000
Tim
e (s
econ
ds)
Regular MappersAdaptive Mappers
Stage 3:One-Phase Record JoinBroadcast join equivalentDBLP and CITESEERX ×10Single wave of AM
×3 speedup over defaultHadoop split size (64MB)Optimal with no tuning
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 17 / 25
Adaptive Mappers Experiments - JOIN
102451225612864 32 16 8 AM
Split Size (MB)
0
300
600
900
1200
Tim
e (s
econ
ds)
Regular MappersAdaptive Mappers
Map-only job1B TERASORT recordsModels a skewed joinSingle wave of AM
Regular Mappers:Large split: data skewSmall split: schedulingand start-up overhead
Optimal with no tuning
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 18 / 25
Adaptive MapReduce
MAP
AM
AS
AP
ACMAP
AM
AS
AP
ACMAP
DFS DFS
REDUCE
AM
AS REDUCE
AP
AC
DMDS
Adaptive TechniquesAM: Adaptive MappersAC: Adaptive CombinersAS: Adaptive SamplingAP: Adaptive Partitioning
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 19 / 25
Adaptive Combiners
Main ideaReplace sort with hashingReduce serialization, sort, and IO
Map
Regular Combiners
Sort Buffer
: User code: Data
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSortMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSortMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSort MergeMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Adaptive Combiners
Hash-group and Combine
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Adaptive Combiners
Hash-group and Combine
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 20 / 25
Adaptive Combiners
Main ideaReplace sort with hashingReduce serialization, sort, and IO
Map
Regular Combiners
Sort Buffer
: User code: Data
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSort
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSortMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSort MergeMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Adaptive Combiners
Hash-group and Combine
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Adaptive Combiners
Hash-group and Combine
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 20 / 25
Adaptive Combiners
Main ideaReplace sort with hashingReduce serialization, sort, and IO
Map
Regular Combiners
Sort Buffer
: User code: Data
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSortMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSort
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSort MergeMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Adaptive Combiners
Hash-group and Combine
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Adaptive Combiners
Hash-group and Combine
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 20 / 25
Adaptive Combiners
Main ideaReplace sort with hashingReduce serialization, sort, and IO
Map
Regular Combiners
Sort Buffer
: User code: Data
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSortMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSortMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Adaptive Combiners
Hash-group and Combine
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Adaptive Combiners
Hash-group and Combine
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 20 / 25
Adaptive Combiners
Main ideaReplace sort with hashingReduce serialization, sort, and IO
Map
Regular Combiners
Sort Buffer
: User code: Data
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSortMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSortMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSort MergeMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Adaptive Combiners
Hash-group and Combine
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Adaptive Combiners
Hash-group and Combine
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 20 / 25
Adaptive Combiners
Main ideaReplace sort with hashingReduce serialization, sort, and IO
Map
Regular Combiners
Sort Buffer
: User code: Data
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSortMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSortMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSort MergeMap
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Adaptive Combiners
Hash-group and Combine
Map
Regular Combiners
Sort Buffer
: User code: Data
CombineSort Merge
Adaptive Combiners
Hash-group and Combine
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 20 / 25
Adaptive Combiners Details
“Best-effort” aggregationNever spill to diskHash-table replacement policies:
No-Replacement (NR)Least-Recently-Used (LRU)
Implemented as:Library for HadoopOptimization choice for Jaql
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 21 / 25
Adaptive Combiners Experiments
GROUP-BYSynthetic dataset with 3 dimensions (A1, A2, and A3) and 1 factGroup records and apply aggregation functionTWL: 10B records, 120GB
Reg.
AM AC AM, AC
0
30
60
90
120
150
180
Tim
e (s
econ
ds)
Regular CombinersAdaptive Combiners NRAdaptive Combiners LRU
GROUP-BY on A1×2.5 speedup
Reg.
AM 1 25 100
Cache Size (K)
0
50
100
150
200
250
300
350
Tim
e (s
econ
ds)
0.00
0.25
0.50
0.75
1.00
Mis
s R
atio
(%
)
Regular CombinersAdaptive Combiners NRAdaptive Combiners LRUMiss Ratio NRMiss Ratio LRU
GROUP-BY on A1 and A2×3 speedup
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 22 / 25
Adaptive MapReduce
MAP
AM
AS
AP
ACMAP
AM
AS
AP
ACMAP
DFS DFS
REDUCE
AM
AS REDUCE
AP
AC
DMDS
Adaptive TechniquesAM: Adaptive MappersAC: Adaptive CombinersAS: Adaptive SamplingAP: Adaptive Partitioning
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 23 / 25
Adaptive Sampling and Partitioning
Step 1 Compute and publishlocal histogram
Step 2 Collect localhistograms andcompute partitioningfunction
Step 3 Broadcast partitioningfunction
MAPREDUCE
MAPREDUCE
MAP
MAPREDUCE
MAPREDUCE
MAP
DMDS
MAPREDUCE
MAPREDUCE
MAP
DMDS
MAPREDUCE
MAPREDUCE
MAP
DMDS
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 24 / 25
Adaptive Sampling and Partitioning
Step 1 Compute and publishlocal histogram
Step 2 Collect localhistograms andcompute partitioningfunction
Step 3 Broadcast partitioningfunction
MAPREDUCE
MAPREDUCE
MAP
MAPREDUCE
MAPREDUCE
MAP
DMDS
MAPREDUCE
MAPREDUCE
MAP
DMDS
MAPREDUCE
MAPREDUCE
MAP
DMDS
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 24 / 25
Adaptive Sampling and Partitioning
Step 1 Compute and publishlocal histogram
Step 2 Collect localhistograms andcompute partitioningfunction
Step 3 Broadcast partitioningfunction
MAPREDUCE
MAPREDUCE
MAP
MAPREDUCE
MAPREDUCE
MAP
DMDS
MAPREDUCE
MAPREDUCE
MAP
DMDS
MAPREDUCE
MAPREDUCE
MAP
DMDS
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 24 / 25
Adaptive Sampling and Partitioning
Step 1 Compute and publishlocal histogram
Step 2 Collect localhistograms andcompute partitioningfunction
Step 3 Broadcast partitioningfunction
MAPREDUCE
MAPREDUCE
MAP
MAPREDUCE
MAPREDUCE
MAP
DMDS
MAPREDUCE
MAPREDUCE
MAP
DMDS
MAPREDUCE
MAPREDUCE
MAP
DMDS
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 24 / 25
Summary
Adaptive runtime techniques for MapReduceSituation-Aware MappersMake MapReduce more dynamic
Up to ×3 speedup for well-tuned jobsOrders of magnitude speedup for badly tuned jobsNever hurt performanceConfigure themselvesPart of IBM InfoSphere BigInsights
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 25 / 25
Vernica, R., Carey, M., and Li, C. (2010).Efficient parallel set-similarity joins using MapReduce.In SIGMOD Conference.
Rares Vernica (HP Labs) Adaptive MapReduce EDBT 2012 25 / 25