In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating...

30
1 / 30 In-Storage Compute: an Ultimate Solution for Accelerating I/O- intensive Applications 13 August 2015 YANG SEOK KI, Director, Memory Solutions Lab, Samsung Electronics

Transcript of In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating...

Page 1: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

1 / 30

In-Storage Compute: an Ultimate Solution for Accelerating I/O-intensive Applications

13 August 2015 YANG SEOK KI, Director, Memory Solutions Lab, Samsung Electronics

Page 2: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

2 / 30

Disclaimer: The contents provided in this material are based on concepts and early research results, and are for technical discussions only. This material does not reflect any product-level plan of records.

Page 3: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

3 / 30

Korea HST Team US ADS Team

Contributors

Yangwook KANG

Heekwon PARK

Boncheol GU

Duckho BAE

Sungho YOON

Special thanks to: Yoonho Chung, Insoon Jo, Minwook Jung, Jungwook Kang, Moonsang Kwon, Truong Nguyen, Dongchul Park, Prem Paulson, Jonghyun Yoon

Page 4: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

4 / 30

Outline

1. Background 2. In-Storage Compute Concept 3. ISC Prototype 4. Case Studies 5. Summary

Page 5: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

5 / 30

Data Processing Market

Source: IDC, Worldwide Business Analytics Software 2014–2018 Forecast, Jul 2014 [1]

Advanced and predictive analyticsQuery, reporting, and analysis

Spatial information analyticsContent analytics

Subtotal0

5,000

10,000

15,000

20,000

25,000

2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

$M

Worldwide BI and Analytics Tools Revenue by Segment

Advanced and predictive analytics Query, reporting, and analysis Spatial information analytics

Content analytics Subtotal

Constant growth of business intelligence & analytics market

Page 6: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

6 / 30

I/O Performance Issue

BI and analytics tools are I/O hungry!! • They usually access terabytes (sometimes even petabytes) of data on

slow storage device

Event Waits Time(s) Avg. wait (ms) % DB time Wait class

Direct path read 4,604,339 567.141 123 63.67 User I/O

Direct path read temp 1,955,162 147,298 75 16.54 User I/O

DB CPU 38,874 4.36

DB file sequential read 117,944 16,399 139 1.84 User I/O

Direct path write temp 597,138 13,507 23 1.52 User I/O

Source: HUAWEI, Accelerate Oracle Performance, Sep 2012 [2]

OLAP Bottleneck!!

Page 7: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

7 / 30

Data 1

Data 2

Data 3

DRAM CPU SSDs

Final data

Host

CPU-centric Computing Model (Von Neumann)

Long journey of data

Page 8: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

8 / 30

Moving Data is Expensive!!

Moving Computation is Cheaper than Moving Data Source: HDFS Architecture Guide [3]

Reducing data movement can help improve both energy and performance

Source: USENIX HotPower, 2012 [4]

The energy consumed by data movement is starting to exceed the energy consumed by computation

Source: High Performance parallel IO, 2014 [5]

Page 9: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

9 / 30

Near Data Processing Technology

Intelligent SSD [NxGnData]

Netezza S-blade [IBM]

Exadata [Oracle]

SPU (Storage Processing Unit) [Seagate]

Closer to source

Page 10: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

10 / 30

The ultimate of close-to-data compute for high performance & low power is

“In-Storage Compute (ISC)”

Page 11: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

11 / 30

Data 1

Data 2

Data 3

Host DRAM CPU ISC SSDs

Final data

What is ISC (In-Storage Compute)?

Page 12: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

12 / 30

Why? IO Traffic

ISC is an ultimate approach to IO reduction/avoidance

ISC SSD Typical SSD

A

A

A Target data

A

A

A

Computing

A Target data

Samsung SAS-based ISC with a Hadoop application

Page 13: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

13 / 30

Why? Coprocessor

SSD is a complete computer with high performance low power processor

Power management

Capacitor

26.5

5.5

46.7

10.6

0

20

40

60

Baseline SmartSSD Baseline SmartSSD

Single instance Two instances

Power Consumption (Watts)

ISC SSD ISC SSD

5x

Samsung SAS-based ISC with a Hadoop application

Page 14: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

14 / 30

Power management

Capacitor

Why? Bandwidth Gap

Superfluous internal bandwidth • To hide processing overhead of host interface and FTL

Page 15: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

15 / 30

Why? Resource Utilization

Storage resource is underutilized

Page 16: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

16 / 30

How? ISC Application Development Process

C/C++ - Support C++11/STL

1

2

3 5

ARM Cross compile

X86 Compile

ISC SSD Emulation

Download /isc/myprogram/ssdlet

Run /isc/myprogram/host

ssdlet Host app

4

6

Page 17: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

17 / 30

ISC Dataflow Programming Model

Device ISC Application

SSDlet SSDlet

Storage

SSD

Host

Host ISC Application

SSDlet

Input port

Pipe

Ouput port

get() put()

get() put()

read() write()

read() write()

Page 18: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

18 / 30

ISC Multiple Device Model

Host ISC Application

Input port

Pipe

Ouput port

Page 19: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

19 / 30

int main(int argc, char *argv[]) { SSD ssd(“/dev/nvme0n1p1”); module_id_t mid = ssd.loadModule(File(ssd, “./libkvstore.so”); Application app(ssd); SSDLet kvstore(app, mid, “KVStore”); auto out_command = app.connectTo<String>(kvstore.in(0)); auto out_key = app.connectTo<String>(kvstore.in(1)); auto out_value= app.connectTo<String>(kvstore.in(2)); auto in_result = app.connectTo<String>(kvstore.out(0)); app.start(); string command, key, value; while (std::cin >> command) { if (command == “get”) { out_command.put(command); std::cin >> key; out_key.put(key); in_result.get(value); std::cout << value << std::endl; else if (command == “put”) { out_command.put(command); std::cin >> key >> value; out_key.put(key); out_value.put(value); } else break; } return 0; }

class KVStore public SSDLet<IN_TYPE<SR(string), SR(string), SR(string)>, OUT_TYPE<SR(string)>> { public: map<string, string> table; void run() { auto in_command = getInputPort<0>(); auto in_key = getInputPort<1>(); auto in_value = getInputPort<2>(); auto out_value = getOutputPort<0>(); string command, key, value; while (true) { if (!in_command.get(command)) break; if (command == “get”) { if (!in_key.get(key)) break; out_value.put(table[key]); } if (command == “put”) { if (!in_key.get(key) || !in_value.get(value)) break; table[key] = value; } } } };

Simple Key Value Store in ISC Programming

KVStore

cmd

value

value

key

Page 20: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

20 / 30

ISC Host Library

Host Program

SSD Firmware

ISC Runtime

ISC Runtime Framework

Device Program

SSDLet SSDLet Built-in Task

Fiber Fiber Fiber

Scheduler

Input port

Pipe

Ouput port

Page 21: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

21 / 30

Samsung ISC SSD Prototype

Commodity SSD • Samsung PM1725 NVMe with the ISC feature • PCIe 3.0 x4 • 800 GB

Software • C++11 • C++ STL • g++ • Software emulator

Page 22: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

22 / 30

Case Study 1: Data Analytics with MySQL

MySQL

I/O Interface

I/O Interface

Aggregate

Scan

MySQL

I/O Interface

I/O Interface

Aggregate

Data preprocessing on SSD

MySQL determines data pages to fetch according to relevance hints from SSD • MySQL gets relevance hints for pages in a given range all at once • Filter out access to pages with irrelevant data

Page 23: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

23 / 30

Data Analytics Query

Elapsed time of TPC-H query 2 • An analytic query to find a minimum cost supplier

• ISC reduces the query time to less than 1/40

SELECT s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment FROM part, supplier, partsupp, nation, region WHERE p_partkey = ps_partkey AND s_suppkey = ps_suppkey AND p_size = 15 AND p_type LIKE '%BRASS' AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'EUROPE' AND ps_supplycost = ( SELECT MIN(ps_supplycost) FROM partsupp, supplier, nation, region WHERE p_partkey = ps_partkey AND s_suppkey = ps_suppkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'EUROPE') ORDER BY s_acctbal desc, n_name, s_name, p_partkey LIMIT 100;

The most efficient plan is to put part table first in the join order and filter out its irrelevant

pages!

# of pages read w/ MySQL (baseline)

Table name # of pages read w/ ISC

1,325,978 Total 22,317

325,386 Part 7,525

15,229 Supplier 4,582

985,354 Partsupp 10,201

5 Nation 5

4 Region 4 0

20

40

60

80

100

120

MySQL (baseline) ISC

116.7

2.6

Seco

nds

Execution Time of TPC-H Query 2

Page 24: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

24 / 30

3.6X TPC-H Query Processing Speed Up

A representative TPC-H benchmark subset is expected to reveal over 3.6x performance gains

0102030405060708090

2 3 4 6 10 12 14 17 Geo mean

44.8

0.4 0.7 0.9 10.7

1.2

82.9

2.1 3.6

Query Number

Speed-up by ISC

Host server Dell PowerEdge R720 - Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz x2 - 3G of DRAM - OS device: Samsung MZ-6ER100T SAS 100GB SSD - Data device: PM1725 480GB NVMe SSD (SR=3GB/s)

OS Ubuntu 15.04 (3.19.0 kernel)

Software Mariadb-5.5.42 & TPC-H 2.17.0

TPC-H dataset 20G of dataset (with scale factor of 10)

Page 25: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

25 / 30

Case Study 2: Storage Compaction

LevelDB • One of popular embedded databases • Open-source, embedded key/value store by

Google • Base database system for other open

source projects - RocksDB (LevelDB+HBase),

HyperLevelDB - Riak, Ceph storage backend

Log: Max size of 4MB then flushed into a set of Level 0 SST files

Level 0: Max of 4 SST files then one file compacted into Level 1

Level 1: Max total size of 10MB then one file compacted into Level 2

Level 2: Max total size of 10 x Level 1 then one file compacted into Level 3

Level 3+: Max total size of 10 x previous level then one file compacted into next level

Page 26: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

26 / 30

New LevelDB with Compaction Powered by ISC

Compaction Read/Merge/Write

memtable

Immutable table

Log

Insert/update Append

Flush Compact

Page 27: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

27 / 30

Up to 10X Throughput Improvement

10x

No Read

More Flush

Page 28: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

28 / 30

Take-Away Messages

Computing paradigm shift from CPU-centric to data-centric for I/O intensive applications

Samsung ISC realizes heterogeneous computing framework across general purpose CPU and SSD.

IO intensive applications can benefit from low power high performance of embedded processors and high internal bandwidth of SSDs.

Samsung ISC prototype • ISC-aware MySQL achieves performance improvement up to 80x or 3.6x on

average with TPC-H

• ISC-aware LevelDB achieves up to 10x throughput improvement with dbbench (default benchmark)

Page 29: In-Storage Compute: an Ultimate Solution for … Compute: an Ultimate Solution for Accelerating I/O-intensive Applications . ... 4 . 6 . 17 / 30 .

29 / 30

Reference [1] IDC, “Worldwide Business Analytics Software 2014–2018 Forecast and 2013 Vendor Shares,” Jul 2014. [2] HUAWEI, “Accelerate Oracle Performance by Using ASM Preferred Read Failure Group with Dorado,” Sep 2012. [3] HDFS Architecture Guide, https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html [4] Devesh Tiwari et al., “Reducing Data Movement Costs using Energy-Efficient, Active Computation on SSD,”

HotPower’12, 2012. [5] Prabhat and Quincey Koziol, “High Performance Parallel I/O,” CRC Press book, Oct 2014.