Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and...

23
© 2016 OpenPOWER Foundation Leveraging CAPI Flash for Apache Spark Jan S. Rellermeyer Research Staff Member IBM Research Austin Thomas S. Hubregtsen Research Staff Member IBM Research Austin

Transcript of Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and...

Page 1: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

© 2016 OpenPOWER Foundation

Leveraging CAPI Flash for Apache Spark

Jan S. Rellermeyer Research Staff Member

IBM Research – Austin

Thomas S. Hubregtsen Research Staff Member

IBM Research – Austin

Page 2: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

Generation 1:

• Workload: Batch/Unstructured

• Resiliency (Hadoop): through data replication

• Key parameter: Disk bandwidth

Big Data Systems are:

Scalable

Resilient

Easy to use

Big Data Systems: Evolution

© 2016 OpenPOWER Foundation 2

Page 3: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

Generation 2

• Workload:

Interactive/Iterative

• Resiliency (Spark):

through in-memory

re-computation

• Key parameter:

Memory capacity

Big Data Systems are:

Scalable

Resilient

Easy to use

Generation 1:

• Workload: Batch/Unstructured

• Resiliency (Hadoop): through data replication

• Key parameter: Disk bandwidth

Big Data Systems: Evolution

© 2016 OpenPOWER Foundation 3

Page 4: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

Source: http://www.businessinsider.com/

ibm-spark-good-news-for-databricks-2015-6?IR=T,

June 2015

Source:

http://spark.apache.org/

Big Data Systems: Spark

© 2016 OpenPOWER Foundation 4

Page 5: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

spill

0

50000

100000

150000

200000

250000

300000

350000

400000

1GB 2GB 4GB 8GB 16GB 32GB 64GB 128GB

Ru

nti

me (

ms)

Total Heap Memory

x Degrees of Separation on Spark

2x

Experiment: Artificially reducing available memory

© 2016 OpenPOWER Foundation 5

Page 6: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

The amount of data is growing exponentially

© 2016 OpenPOWER Foundation 6

Source: Calista Redmon; OpenPOWER summit 2016

Page 7: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

Flash: Fast storage or slow memory? Price ($/TB raw) $$$$$

Capacity ≤ 1 TB per system

Access Latency ~100ns

BW ~12GB/s

Price ($/TB raw) $$$

Capacity Up to 56 TB per system

Access Latency ~100us (~1M IOPS)

BW ~8GB/s

Price ($/TB raw) $

Capacity ≥ 6 TB per drive (capacity optimized)

Access Latency ~5ms (~75-100 IOPS) (7200 RPM)

BW ~0.1GB/s

© 2016 OpenPOWER Foundation 7

Source: Brad Brech

Page 8: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

DRAM Flash HDD Unit

Type DDR 1600 SATA SATA

Bandwidth 1 0.1 0.01 Gb/s/$

IOPS 1,000,000 1,000 1 IOPS/$

Capacity 100 1,000 10,000 MB/$

Latency 100 50,000 5,000,000 Ns

Characteristics of different storage media (in orders of magnitude)

© 2016 OpenPOWER Foundation 8

Source: Peter Hofstee

Page 9: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

Source:

Objective Analysis,

August 2007

Source:

http://blogs-images.forbes.com/jimhandy/files/2014/04/2014-04-30-

DRAM-NAND-Price-per-GB-Trends-2010-2014-paint.jpg, 2014

Trend: Price DRAM vs NAND Flash

© 2016 OpenPOWER Foundation 9

Page 10: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

Trend: Bandwidth DRAM vs SSD

© 2016 OpenPOWER Foundation 10

Source: Sandisk IT blog

Page 11: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

• Algorithm: Six degrees of separation from Kevin Bacon

• Input set: 10,000 movies, 1-101 actors per movie

• Hardware: IBM Power System S882L - two 12-core 3.02 GHz Power8 processor cards - 512 GB DRAM - Single Hard Disk Drive - Flash storage: IBM FlashSystem 840

• Software: - Ubuntu 14.04 Little Endian - Apache Spark 1.3

Experiment: Spilling files to Flash

© 2016 OpenPOWER Foundation 11

Page 12: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

*Not using the unified memory model

*

Data management in Apache Spark

© 2016 OpenPOWER Foundation 12

Page 13: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

1

10

100

1000

10000

1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950

Exe

cuti

on

tim

e in

mili

seco

nd

s, u

sin

g a

loga

rith

mic

sca

le

Available memory in gigabytes

Spark on HDD

Spark on ramdisk

Baseline

1.01x speedup

Proxy experiment: Spilling files to DRAM

© 2016 OpenPOWER Foundation 13

Page 14: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

CAPI: The advantages

• Accelerators can work with the same memory addresses that the processors use

• Removes OS and device driver overhead

Source: https://openpowerfoundation.org/blogs/capi-drives-business-performance/ © 2016 OpenPOWER Foundation 14

Page 15: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

• Read/write commands issued via APIs from applications to eliminate 97% of code path length • Saves 20-30 cores per 1M IOPS

CAPI: Reduction in code path

© 2016 OpenPOWER Foundation 15

Source: Brad Brech/Damir Jamsek

Page 16: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

Identical hardware with 2 different paths to data

FlashSystem

Conventional

I/O (FC) CAPI

IBM POWER S822L >5x better IOPS

per HW thread

>2x lower latency

CAPI: IOPS and Latency difference

© 2016 OpenPOWER Foundation 16 Source: Brad Brech/Damir Jamsek

Page 17: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

• Algorithm: Six degrees of separation from Kevin Bacon

• Input set: 10,000 movies, 1-101 actors per movie

• Hardware: IBM Power System S882L - two 12-core 3.02 GHz Power8 processor cards - 512 GB DRAM - Single Hard Disk Drive - Flash storage: IBM FlashSystem 840 connected using CAPI

• Software: - Ubuntu 14.04 Little Endian - Apache Spark 1.3

Experiment: Spilling key-value pairs to Flash using CAPI

© 2016 OpenPOWER Foundation 17

Page 18: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

0

50000

100000

150000

200000

250000

300000

350000

400000

1GB 2GB 4GB 8GB 16GB 32GB 64GB 128GB

Ru

nti

me (

ms)

Total Heap Memory

x Degrees of Separation on Spark

Disk

CAPI+Flash

4 x memory reduction through CAPI + Flash

Result: Reduced the memory footprint

© 2016 OpenPOWER Foundation 18

In collaboration with Jan Rellermeyer

Page 19: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

Number of Terasort instances in parallel

Result: Increased number of parallel instances

© 2016 OpenPOWER Foundation 19

In collaboration with Jan Rellermeyer

Page 20: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

Direct Storage • Application constrained by single-system memory

capacity. Typical growth is through additional compute nodes.

• CAPI Flash APIs offer highly-efficient flash access, increased total capacity at better $ / throughput.

Data Cache • Application uses in-memory caches for data storage,

and typically-constrained by ratios of memory to underlying storage.

• CAPI Flash APIs offer access to much larger ephemeral or persistent data in Flash, freeing up RAM.

Fast Storage • Application is constrained by IO overhead and

throughput of existing storage infrastructure.

• CAPI Flash APIs offer extremely high IO per CPU thread with low latency.

CAPI Flash acceleration: Use cases and advantages

© 2016 OpenPOWER Foundation 20

Source: Brad Brech

Page 21: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

Do you get what you pay for?

• McKinsey (2008): data center CPU utilization is roughly 6%

• Gartner (2012): industry-wide utilization rate is 12%

Chances are good that you started buying new servers primarily because you needed more memory banks

© 2016 OpenPOWER Foundation 21

Source: news.com/2014/07/01/ibm-set-to-open-new-softlayer-data-center-in-london

Page 22: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

Get more Spark out of your data center Contact me at [email protected] for more information

Page 23: Leveraging CAPI Flash for Apache Spark · • IBM Corporation 2016 • IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be

23

• IBM Corporation 2016

• IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be trademarks or service marks of International Business Machines Corporation in the United States, other countries, or both. A current list of IBM trademarks is available on the web at “Copyright and trademark information”at www.ibm.com/legal/copytrade.shtml

• Other company, product, and service names may be trademarks or service marks of others.

• References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates.

• IBM and IBM Credit LLC do not, nor intend to, offer or provide accounting, tax or legal advice to clients. Clients should consult with their own financial, tax and legal advisors. Any tax or accounting treatment decisions made by or on behalf of the client are the sole responsibility of the customer.

• IBM Global Financing offerings are provided through IBM Credit LLC in the United States, IBM Canada Ltd. in Canada, and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates and availability are based on a client’s credit rating, financing terms, offering type, equipment type and options, and may vary by country. Some offerings are not available in certain countries. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice.

• Tux penguin image used with permission: Larry Ewing <[email protected]>