HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan...

29
HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1

Transcript of HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan...

Page 1: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

1

HJ-HadoopAn Optimized MapReduce

Runtime for Multi-core Systems

Yunming ZhangAdvised by: Prof. Alan Cox and Vivek SarkarRice University

Page 2: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

Social Informatics

Slide borrowed from Prof. Geoffrey Fox’s presentation

Page 3: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

Big Data Era

20 PB/day

100 PB media

120 PB cluster

Bing ~ 300 PB

2008

2012

2011

2012

Slide borrowed from Florin Dinu’s presentation

Page 4: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

4

MapReduce Runtime

Job Starts

Map

…….

Reduce

JobEnds

Figure 1. Map Reduce Programming Model

Map

MapReduce

Page 5: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

5

Hadoop Map Reduce

• Open source implementation of Map Reduce Runtime system– Scalable– Reliable– Available

• Popular platform for big data analytics

Page 6: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

6

Habanero Java(HJ)

• Programming Language and Runtime Developed at Rice University

• Optimized for multi-core systems– Lightweight async task– Work sharing runtime– Dynamic task parallelism– http://habanero.rice.edu

Page 7: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

7

Kmeans

Page 8: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

8

Kmeans

Topics

To be Classified Documents

Kmeans is an application that takes as input a large number of documents and try to classify them into different topics

Page 9: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

Kmeans using Hadoop

9

To be classified documents Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Machines …

Page 10: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

Kmeans using Hadoop

10

To be classified documents Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Topics

Machines …

Page 11: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

Kmeans using Hadoop

11

To be classified documents Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Topics

Topics

Machines …

Page 12: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

Kmeans using Hadoop

12

To be classified documents Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Topics

Topics

Topics

Machines …

Page 13: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

Kmeans using Hadoop

13

To be classified documents Computation Memory

Machine 1Map task in a JVM

Duplicated In-memory Cluster Centroids

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics 1x

Topics 1x

Topics 1x

Topics 1x

Machines …

Page 14: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

14

Memory Wall

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200

KMeans Throughput Benchmark

Hadoop

Topics data size (MB) with 4KB/topic

Num

ber o

f top

ics/

Tim

e in

min

We used 8 mappers from 30 -80 MB, 4 mappers for 100 – 150 MB, 2 mappers for 180 – 380 for sequential Hadoop.

cluster size(MB)

HadoopFull GC calls

30 3,54250 4,39070 5,18680 1,108,888

Page 15: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

15

Memory Wall

• Hadoop’s approach to the problem– Increase the memory available to each Map Task

JVM by reducing the number of map tasks assigned to each machine.

Page 16: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

Kmeans using Hadoop

16

To be classified documents Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

…. Machines …

Topics 2x

Topics 2x

Page 17: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

17

Memory Wall

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200

KMeans Throughput Benchmark

Hadoop

Topics data size (MB) with 4KB/topic

Num

ber o

f top

ics/

Tim

e in

min

Decreased throughput due to reduced number of map tasks per machine

We used 8 mappers from 30 -80 MB, 4 mappers for 100 – 150 MB, 2 mappers for 180 – 380 for sequential Hadoop.

Page 18: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

18

HJ-Hadoop Approach 1

To be classified documents

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Computation Memory

Machine1Map task in a JVM

Topics 4x

Machines …

Dynamic chunking

Page 19: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

19

HJ-Hadoop Approach 1

To be classified documents

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Computation Memory

Machine1Map task in a JVM

Topics 4x

Machines …

Dynamic chunking

NoDuplicated In-memory Cluster Centroids

Page 20: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

20

Results

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200

KMeans Throughput Benchmark

HJ-HadoopHadoop

Topics data size (MB) with 4KB/topic

Num

ber o

f top

ics/

Tim

e in

min

We used 2 mappers for HJ-Hadoop

Page 21: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

21

Results

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200

KMeans Throughput Benchmark

HJ-HadoopHadoop

Topics data size (MB) with 4KB/topic

Num

ber o

f top

ics/

Tim

e in

min

We used 2 mappers for HJ-Hadoop

Process 5x Topics efficiently

Page 22: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

22

Results

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200

KMeans Throughput Benchmark

HJ-HadoopHadoop

Topics data size (MB) with 4KB/topic

Num

ber o

f top

ics/

Tim

e in

min

We used 2 mappers for HJ-Hadoop

4x throughput improvement

Page 23: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

23

HJ-Hadoop Approach 1

To be classified documents

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Computation Memory

Machine1Map task in a JVM

Topics 4x

Machines …

Dynamic chunking

Page 24: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

24

HJ-Hadoop Approach 1Only a single thread reading input

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Computation Memory

Machine1Map task in a JVM

Topics 4x

Machines …

Dynamic chunking

Page 25: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

Kmeans using Hadoop

25

Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Topics

Topics

Topics

Machines …

Four threads reading input

Page 26: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

HJ-Hadoop Approach 2

26

Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Machines …

Four threads reading input

Page 27: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

HJ-Hadoop Approach 2

27

Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Machines …

Four threads reading input

NoDuplicated In-memory Cluster Centroids

Page 28: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

28

Trade Offs between the two approaches

• Approach 1– Minimum memory overhead– Improved CPU utilization with small task granularity

• Approach 2– Improved IO performance– Overlap between IO and Computation

• Hybrid Approach– Improved IO with small memory overhead– Improved CPU utilization

Page 29: HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

29

Conclusions

• Our goal is to tackle the memory inefficiency in the execution of MapReduce applications on multi-core systems by integrating a shared memory parallel model into Hadoop MapReduce runtime– HJ-Hadoop can be used to solve larger problems

efficiently than Hadoop, processing process 5x more data at full throughput of the system

– The HJ-Hadoop can deliver a 4x throughput relative to Hadoop processing very large in-memory data sets