SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers...

41
Thanh-Chung Dao and Shigeru Chiba The University of Tokyo, Japan 1 Thanh-Chung Dao SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

Transcript of SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers...

Page 1: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Thanh-Chung Dao and Shigeru ChibaThe University of Tokyo, Japan

1Thanh-Chung Dao

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

Page 2: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Running Hadoop on modern supercomputers

• Hadoop assumes every compute node has a local disk drive

• Modern supercomputers do not have local disk drives• It only has a central file server using e.g. Lustre• For example, K computer, Cray Titan, and IBM Sequoia

Thanh-Chung Dao 2SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

From Fujitsu

Page 3: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Why supercomputers do not have local disk drives

• Local disk• Not scalable• Hard to maintain• Physical space is limited

• It cannot be shared among all users

• SSD is available on some supercomputers• But data should be erased after a job finishes

3SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers Thanh-Chung Dao

Page 4: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Using in-memory storage to provide efficient virtual local disks

• Research question: How to deploy in-memory storage on supercomputers• Choose the best deployment strategy in context of MapReduce

• Using in-memory storage is natural approach to avoid expensive disk I/O to central file server• Data is kept in memory

• Memcached-like separate in-memory server is also an option• Typical deployment of Memcached software is installing its daemon on

dedicated nodes

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 4Thanh-Chung Dao

Page 5: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Our approach: SEMemin-memory file system

• Users can choose three deployment strategies• RamDisk: data is stored only in local memory• Every-node: data can be stored in remote memory• Dedicated-node: data is stored on dedicated nodes

• Our in-memory storage, SEMem:• Easily configurable to select appropriate deployment strategy• Tightly integrated with Hadoop• Using MPI communication [Dao & Chiba, CCGRID’16]

5SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers Thanh-Chung Dao

Page 6: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

RamDisk: data is stored only in local memory

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 6

Not available

Node

Available

Thanh-Chung Dao

In-memory storage

Node 1

Tasks

data

Page 7: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

RamDisk: data is stored only in local memory

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 7

In-memory storage

Node 2

In-memory storage

Node 3

Not available

Node

Available

• Out of memory can happenv Since each node has limited amount of memory

In-memory storage

Node 4

Thanh-Chung Dao

In-memory storage

Node 1

Tasks

data

Node 1 cannot use memory of Node 2, 3, & 4

Page 8: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

The original Hadoop workflow

8SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

Data sending on the same node

Inter-node communication

Thanh-Chung Dao

ShuffleServerMappers Reducers

Map Output

Local Disk

Page 9: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

RamDisk deployment on Hadoop workflow

9SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

• Mappers are modified to send their output directly to shuffle server

• In-memory storage is set up at shuffle server

Thanh-Chung Dao

Mappers ReducersMap Output

Local Disk

Shuffle server

In-memory storage

Inter-node communication

Fetch

Page 10: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Every-node: deployed on every node and data can be stored in remote memory

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 10

Full

Node

Available

Thanh-Chung Dao

In-memory storage

Node 1

Tasks

data

Page 11: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Every-node: deployed on every node and data can be stored in remote memory

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 11

Full

Node

Available

Thanh-Chung Dao

In-memory storage

Node 1

Tasks

data

Page 12: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Every-node: deployed on every node and data can be stored in remote memory

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 12

In-memory storage

Node 2

In-memory storage

Node 3

• More complex in-memory management is required

In-memory storage

Node 4

Thanh-Chung Dao

When storage on Node 1 is full, it can use memory of Node 2, 3, & 4

Full

Node

Available

In-memory storage

Node 1

Tasks

data

Inter-node communication

Page 13: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Every-node deployment on Hadoop workflow13SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

MappersReducers

Shuffle server

In-memory storage

Inter-node communication

Shuffle server

Daemon

Node

Thanh-Chung Dao

Page 14: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Every-node deployment on Hadoop workflow14SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

MappersReducers

Shuffle server

SEMem daemon

In-memory storage

Inter-node communication

Shuffle server

SEMem daemonDaemon

Node

Thanh-Chung Dao

Page 15: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Every-node deployment on Hadoop workflow15SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

MappersReducers

Map Output

Shuffle server

SEMem daemon

In-memory storage

Inter-node communication

Shuffle server

SEMem daemonDaemon

Node

Thanh-Chung Dao

Page 16: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Every-node deployment on Hadoop workflow16SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

MappersReducers

Map Output

Shuffle server

SEMem daemon

In-memory storage

Inter-node communication

Shuffle server

SEMem daemonDaemon

Node

Thanh-Chung Dao

Get metadata

Page 17: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Every-node deployment on Hadoop workflow17SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

MappersReducers

Map Output

Shuffle server

SEMem daemon

In-memory storage

Inter-node communication

Shuffle server

SEMem daemonDaemon

Node

Thanh-Chung Dao

Get metadata

Fetch

Page 18: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Dedicated-node: deployed only on dedicated nodes that are used only for storage

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 18Thanh-Chung Dao

In-memory storage

Node 1

Tasks

data

Full

Node

Available

Page 19: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Dedicated-node: deployed only on dedicated nodes that are used only for storage

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 19Thanh-Chung Dao

In-memory storage

Node 1

Tasks

data

Full

Node

Available

Page 20: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Dedicated-node: deployed only on dedicated nodes that are used only for storage

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 20

Not available

Node

Available

• Dedicated nodes• There is no computation task

In-memory storage

Dedicated node

In-memory storage

Dedicated node

In-memory storage

Node 2

Thanh-Chung Dao

In-memory storage

Node 1

Tasks

data

Page 21: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Dedicated-node: deployed only on dedicated nodes that are used only for storage

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 21

Not available

Node

Available

In-memory storage

Dedicated node

In-memory storage

Dedicated node

In-memory storage

Node 2

Thanh-Chung Dao

In-memory storage

Node 1

Tasks

data

• Dedicated nodes• There is no computation task

• It might slow since only 2 of 4 nodes compute the task

When storage on Node 1 is full, it can use memory of dedicate nodes

Page 22: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Dedicated-node deployment on Hadoop workflow22SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

MappersReducers

Shuffle server

In-memory storage

Inter-node communicationDaemon

Node

Thanh-Chung Dao

Page 23: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Dedicated-node deployment on Hadoop workflow23SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

MappersReducers

Shuffle server

SEMem daemon

SEMem daemon

Memory nodeManager

…In-memory storage

Inter-node communicationDaemon

Node

Thanh-Chung Dao

Page 24: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Dedicated-node deployment on Hadoop workflow24SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

MappersReducers

Shuffle server

SEMem daemon

SEMem daemon

Memory nodeManager

…In-memory storage

Inter-node communicationDaemon

Node

Thanh-Chung Dao

Page 25: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Dedicated-node deployment on Hadoop workflow25SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

MappersReducers

Shuffle server

SEMem daemon

SEMem daemon

Memory nodeManager

…In-memory storage

Inter-node communicationDaemon

Node

Thanh-Chung Dao

Map Output

Page 26: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Dedicated-node deployment on Hadoop workflow26SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

MappersReducers

Shuffle server

SEMem daemon

SEMem daemon

Memory nodeManager

…In-memory storage

Inter-node communicationDaemon

Node

Thanh-Chung Dao

Map Output

Get metadata

Page 27: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Dedicated-node deployment on Hadoop workflow27SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

MappersReducers

Shuffle server

SEMem daemon

SEMem daemon

Memory nodeManager

…In-memory storage

Inter-node communicationDaemon

Node

Thanh-Chung Dao

Map Output

Get metadata

Fetch

Page 28: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Technical issue 1: communication protocol on SEMem

• MPI communication on SEMem• Fast communication protocol is required

• Since every-node and dedicated-node are network-intensive• MPI is the de facto communication on modern supercomputers

• HPC-Reuse is used• Enable MPI over Hadoop processes

• MPI-friendly dynamic process creation is required

• Multiplexing non-blocking MPI on memory nodes• Since we want to avoid MPI_THREAD_MULTIPLE

• Handling multiple requests from clients

• Direct memory is used• Since memory copying between JVM’s heap and native MPI is slow

• Current MPI implementation is written in C

28SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers Thanh-Chung Dao

Page 29: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

MPI over Hadoop processes [Dao, CCGrid 2016]

• Using our HPC-Reuse• MPI-friendly dynamic process creation

• Hadoop requires dynamic process creation• Minimizing the cost of changes in architecture

• Gang scheduling (of processes) more favorable in MPI• All-or-nothing scheduling strategy

• Statically creating all processes at the beginning• Minimizing communication delay• Since resizing running jobs might affect performance and fairness

• MPI-Spawn is slow due to collective operation

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers Thanh-Chung Dao 29

Page 30: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Avoiding MPI_THREAD_MULTIPLE

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 30

Chapter 4. E�ciently Virtualizing Local Disks 82

while true doif req == null then

req = MPI.iRecvendif there is a new request then

Add req to sendingPool’s waitList;Reset req = null

endfor slot in sendingPool’s slots do

if data reading finishes thenMPI.iSend to the client

endif iSend finishes then

free the slotend

endAssign req in waitList to free slots;

endAlgorithm 1: Multiplexing non-blocking MPI on the memory nodes

To multiplex non-blocking communication, SEMem runs a dedicated thread on every

node to handle it. Note that most of the supercomputers we know, for example TSUB-

AME and Fujitsu FX10, do not support MPI THREAD MULTIPLE mode. This ap-

proach helps avoid calling MPI send/recv in di↵erent threads.

On each memory node, we run a waiting daemon to listen incoming requests. The

algorithm implemented in the daemon is shown in Algorithm 1. We use progress strategy

to check iRecv and iSend status and data is queued as well. A loop is called to check if

there is any coming message or not (req variable). When there is a new request, it is

added to a queue (waitList). This waiting list contains requests that have been processed.

A sending pool, namely sendingPool, is created to process queued requests. A slot of

the sending pool is responsible for reading data and calling iSend. When data is read to

the sending bu↵er, iSend is called to send data to the client. Data reading must be also

implemented to use an asynchronous mechanism, for example AsynchronousFileChannel

while any MapOutput doWait for (host, MapOutputs) ;for each MapOutput do

MPI.Send to the host ;if MPI.Recv from the host then

Data in heap ;end

end

endAlgorithm 2: Receiving data at reducers of Hadoop MapReduce

Chapter 4. E�ciently Virtualizing Local Disks 82

while true doif req == null then

req = MPI.iRecvendif there is a new request then

Add req to sendingPool’s waitList;Reset req = null

endfor slot in sendingPool’s slots do

if data reading finishes thenMPI.iSend to the client

endif iSend finishes then

free the slotend

endAssign req in waitList to free slots;

endAlgorithm 1: Multiplexing non-blocking MPI on the memory nodes

To multiplex non-blocking communication, SEMem runs a dedicated thread on every

node to handle it. Note that most of the supercomputers we know, for example TSUB-

AME and Fujitsu FX10, do not support MPI THREAD MULTIPLE mode. This ap-

proach helps avoid calling MPI send/recv in di↵erent threads.

On each memory node, we run a waiting daemon to listen incoming requests. The

algorithm implemented in the daemon is shown in Algorithm 1. We use progress strategy

to check iRecv and iSend status and data is queued as well. A loop is called to check if

there is any coming message or not (req variable). When there is a new request, it is

added to a queue (waitList). This waiting list contains requests that have been processed.

A sending pool, namely sendingPool, is created to process queued requests. A slot of

the sending pool is responsible for reading data and calling iSend. When data is read to

the sending bu↵er, iSend is called to send data to the client. Data reading must be also

implemented to use an asynchronous mechanism, for example AsynchronousFileChannel

while any MapOutput doWait for (host, MapOutputs) ;for each MapOutput do

MPI.Send to the host ;if MPI.Recv from the host then

Data in heap ;end

end

endAlgorithm 2: Receiving data at reducers of Hadoop MapReduce

At SEMem daemon

• Multiplexing non-blocking MPI

At clients

Thanh-Chung Dao

Page 31: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Technical issue 2: storage size in dedicated-node SEMem

• How to estimate number of memory nodes• Need to minimize number of memory nodes

• Since we trade computation resource for data storage in dedicated-node

• Our approach: number of memory nodes is estimated roughly based on the size of input data

31SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers Thanh-Chung Dao

Page 32: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Experiment configuration• Benchmarks

• Puma: WordCount, InvertedIndex, and SequenceCount• Tera-sort: up to 1 TB of input data

• Supercomputers• K computer-like FX10 at the University of Tokyo• TSUBAME at Tokyo Institute of Technology

• Hadoop v2.2.0

32SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers Thanh-Chung Dao

Page 33: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

33SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

Dedicated-node is faster than every-node in some benchmarks

0200

400

600

8001000

Data size

Tim

e (s

econ

d)

10G 20G 40G 80G 120G

Dedicated-nodeRamDiskEvery-nodeCentral-disk

0200

400

600

8001000

Data size

Tim

e (s

econ

d)

10G 20G 40G 80G 120G

Dedicated-nodeRamDiskEvery-nodeCentral-disk

0200

400

600

8001000

Data size

Tim

e (s

econ

d)

10G 20G 40G 80G 120G

Dedicated-nodeRamDiskEvery-nodeCentral-disk

WordcountInvertedIndex SequenceCount

Dedicated-node 13% faster than every-node

Out of memory

Thanh-Chung Dao

Configuration #computation nodesRamDisk 36

Every-node 36Central-disk 36

Dedicated-node 32

• In every-node deployment, the more complex SEMem daemon disturbs computation tasks at several places

• 4 memory nodes in dedicated-node

Page 34: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

200

400

600

800

1000

1200

Tota

l Exe

cutio

n Ti

me

(sec

ond)

64G 128G 256G 512G 1TB

SEMemCentral diskSSD

#40

#40#48

#48

#64

Data size

34SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

Dedicated-node SEMem is faster than central disk and SSD

SEMem reduces execution time by 20% and 5% on average

• Total number of nodes is the same

• SEMem has lesscomputation nodes

• Tera-sort application on TSUBAME

Thanh-Chung Dao

Page 35: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

35SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

Dedicated-node SEMem and SSD-backed storage can be an alternative for each other

200

400

600

800

1000

1200

Tota

l Exe

cutio

n Ti

me

(sec

ond)

64G 128G 256G 512G 1TB

SEMem (32 nodes + memory nodes)SSD (32 nodes + SSD storage)

#40 #40#48

#48

#64

Data size

SEMem is faster 4~13%

• Both have the samenumber of computation nodes

• SEMem has memory nodes

• Tera-sort application on TSUBAME

• SSD size of each node is 120 GB

Thanh-Chung Dao

Page 36: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

36SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

100

200

300

400

500

600

Data size

Tota

l Exe

cutio

n Ti

me

(sec

ond)

64G 128G 256G 512G

MPI-based in memory HadoopTCP-based in-memory Hadoop

Up to 10% fasterthan TCP communication

MPI-based Hadoop is faster than TCP• Tera-sort on TSUBAME

(64 nodes)

• Both MPI and TCP test cases using in-memory storage

Thanh-Chung Dao

Page 37: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

• Experiment purpose:• Measure performance impact of storage size (left figure)• When out of memory happens (right figure)

• Number of memory nodes is estimated based on size of input data• 64GB of input data• 8GB each memory node

37SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

Number of memory nodes should not be large1000

1500

2000

2500

3000

3500

Tim

e (M

illis

econ

ds)

4 6 8 10 12 14 16

Data copying timeNumber of memory nodes 050

100

150

200

Number of computation/storage nodes

Exe

cutio

n tim

e (S

econ

ds)

32/16 34/14 36/12 38/10 40/8 42/6 44/4 46/2 48/0

Tera-sort 64GB input data

Out of memory happens

8 memory nodes show the best result

Tera-sort on TSUBAME

Thanh-Chung Dao

Page 38: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Related work• M3R (VLDB 2012)

• In-memory storage by providing a shared heap-state• Data is stored through places and activities operators • Did not mention storage deployment explicitly and also no evaluation

• HaLoop (VLDB 2010)• Caching preferences by providing efficient hash algorithms for reading and

writing• Deployment strategies are not relevant in this context

• Spark (NSDI 2012)• Use in-memory storage• Choose a location for Resilient Distributed Datasets (RDD) through

preferredLocations() operator• Does not provide deployment strategies in general• There was no evaluation of RDD deployment

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 38Thanh-Chung Dao

Page 39: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 39Thanh-Chung Dao

Limitations• No fault-tolerance in MPI

• Multiple levels of storage

• Preferred locations

Page 40: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Future work• Estimating number of memory nodes

• Topology of memory nodes

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers 40Thanh-Chung Dao

Page 41: SEMem: deployment of MPI-based in-memory storage for ... · Running Hadoop on modern supercomputers • Hadoop assumes every compute node has a local disk drive • Modern supercomputers

Summary• Goal: Using in-memory storage to provide efficient virtual local disks

• Challenge: choose the best deployment strategy of in-memory storage (or virtual local disks)

• Our approach: SEMem• Dedicated-node strategy showed a good performance in some

benchmarks • Easily configure the system to investigate an appropriate

strategy for applications• MPI communication

41SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers Thanh-Chung Dao