Threading Opportunities in High-Performance Flash-Memory Storage Craig Ulmer Sandia National...

4
Threading Opportunities in High- Performance Flash-Memory Storage Craig Ulmer Sandia National Laboratories, California Maya Gokhale Lawrence Livermore National Laboratory Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000

Transcript of Threading Opportunities in High-Performance Flash-Memory Storage Craig Ulmer Sandia National...

Page 1: Threading Opportunities in High-Performance Flash-Memory Storage Craig Ulmer Sandia National Laboratories, California Maya GokhaleLawrence Livermore National.

Threading Opportunities in High-Performance Flash-Memory Storage

Craig Ulmer Sandia National Laboratories, California Maya Gokhale Lawrence Livermore National Laboratory

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000

Page 2: Threading Opportunities in High-Performance Flash-Memory Storage Craig Ulmer Sandia National Laboratories, California Maya GokhaleLawrence Livermore National.

Revolutionary Storage Technologies

• Storage-Intensive Supercomputing (SISC) at LLNL– System architectures for applications with massive datasets

– New technologies: processing elements, networks, and storage

• NAND-Flash storage in high-performance computing– Flash chips have great potential: 100x access times, 10x bandwidth

– However, few commercial products have delivered performance

• Exception: Fusion-io’s ioDrive– PCIe x4 card with 80-320 GB of flash

– Theoretical read speed of 700 MB/s

– Hardware allows many IOPs to be in-flight concurrently

Page 3: Threading Opportunities in High-Performance Flash-Memory Storage Craig Ulmer Sandia National Laboratories, California Maya GokhaleLawrence Livermore National.

Threaded I/O Microbenchmarks

• Observation: Increasing number of IOPs improves performance– Opposite of what we expect from hard drives

– Due to flash memory packaging: chip is actually a die stack

• Implemented a set of I/O microbenchmarks to investigate – Threaded with mixed I/O characteristics (mostly read-only)

– Currently: Block transfer, kNN, external sort, binary search

• Example: k-Nearest Neighbors (kNN)– Stream through all training vectors and find

k vectors that are most similar to each input vector

– Each thread works on portion of training data

Page 4: Threading Opportunities in High-Performance Flash-Memory Storage Craig Ulmer Sandia National Laboratories, California Maya GokhaleLawrence Livermore National.

kNN Results

• Single ioDrive vs. Three SATA hard drives in RAID0– ioDrive provided a 3x improvement to end application

– Small number of threads can have large impact with flash memory

0

20

40

60

80

100

120

140

160

1 2 4 8 16 32 64

Input Vectors per Pass

Sin

gle

Pa

ss T

ime

(s

)

SATA RAID - 1 ThreadSATA RAID - 2 ThreadsSATA RAID - 4 ThreadsioDrive - 1 ThreadioDrive - 2 ThreadsioDrive - 4 Threads

ioD

riveS

AT

A

0

50

100

150

200

250

1 2 4 8 16 32 64 128

Threads

Sin

gle

Pas

s T

ime

(s)

SATA RAID - 32 Vectors per Pass

ioDrive - 32 Vectors per Pass

Input Vectors Threads

Tim

e (s

)

Tim

e (s

)