IBM Storage Reference Architecture for AI applied to...

16
© 2018 IBM Corporation Frank Kraemer IBM Systems Architect mailto:[email protected] Nvidia GTC 2018 10/2018 IBM Storage Reference Architecture for AI applied to Autonomous Driving (AD)

Transcript of IBM Storage Reference Architecture for AI applied to...

Page 1: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

Frank KraemerIBM Systems Architectmailto:[email protected] GTC 2018 10/2018

IBM Storage Reference Architecture for AI applied to Autonomous Driving (AD)

Page 2: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

Autonomous Driving = See + Think + Act

1 32

https://autoware.ai/

The Automotive Industry has to solve this highly complex problem.

Page 3: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

Automotive Sensor Setup for AD

3http://currencyobserver.com/2017/12/global-automotive-sensors-market-2017-2022/

Each data source: ~ 2 Gbit/sSensors sets: ~ 30 Gbit/sData collection volume: ~ 12-15 TB/h

Page 4: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

Automotive Industry generates large amounts of data

Sources: Images from https://www.youtube.com/watch?v=4jW0fJ80VG8https://www.youtube.com/watch?v=dhEgD6ZFlQEhttps://www.youtube.com/watch?t=21&v=39QMYkx89j0

▪ Storage of data (sensor /

video) is very costly.

▪ Handling of these data is

difficult i.e. due to high

required bandwidth.

▪ For testing purposes sensor /

video data are much more

complex in comparison to

discrete bus signals,

electronic values, etc.

Sensor / video data must be synchronously captured, stored, modified and executed with other

testing data such as CAN, FlexRay, Radar, LiDAR, HiSonic, etc. – most common formats are:

ADTF v2/3 (digitalwerk) RTMaps (Intempora) MDF4 and ROS/rosbag.

Page 5: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

Data Management for ADAS/AD development and test is challenging

Test Drives

50-70 TB / day / car

R&D Labs: tagging

R&D Labs: developing & testing & (re-)simulation & AI training

▪ >5 PB of data for each car project▪ 300-500 PB data in total

> 200h / 1h driving

o Europeo USAo Chinao Japano Asiao Africa

Training Data as a Service (TDaaS)

Page 6: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

The IBM AD Solution Approach

4. How to analyze sensor and video data with fast analytics and modern BigDatatools?

2. How to distribute data globally within an enterprise and partners?

1. How to implement & operate an efficient storage, workflow and management system?

„The Data Foundation“

3. How to preserve digital data for decades with optimized costs?

IBM Analytics HDFS

Hortonworks HDP, DSX, Spark,…

IBM AREMA

IBM High-Speed WAN File TransferIBM Aspera / Mass Data Migration / Cloud

IBM Spectrum Computing

IBM Object Storage (COS)

6. How to do efficient IT workload and resource scheduling?

IBM ‘Cold’ ArchivingIBM Spectrum Protect / Cold / Low Cost / Tape

5. How to run Machine Learning (ML) and AI training with Nvidia GPU technology at scale?

IBM Enterprise-Class AI

Power9 AC922, PowerAI, AI Vision

IBM Spectrum Discover(MetaOcean)

Page 7: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

• Tiering from flash, to disk, to tape, to cloud.• Cloud appears as external storage pool.• Auto Tiering & migration.• High performance Read/Write operations.• Public cloud-ready.• Support of multi cloud environments.

ICP

AWS S3

Azure

Private CloudReplicated

Compressed

Encrypted

IntegrityValidated

Transparent Cloud Tiering

Backup

DR

Tiering

Archive

Datasharing

IBM Cloud

The IBM storage architecture based on Spectrum Scale, COS and Tape

IBM Spectrum Scale (HOT)• File based storage with Object & HDFS support

• High End I/O performance

• Information Lifecycle Management (ILM)

• Sub Micro-seconds access time

IBM Cloud Object Storage (S3) (WARM)• Site Fault Tolerant

• Geo Dispersed and WW scale

• Easy to Deploy

• Milli-seconds access time

IBM Spectrum Archive & Tape (COLD)• Lowest TCO

• Tape ILM target – especially frozen archive

• Long term retention and Minutes access time

• Access as files via LTFS

• Reduced floor space requirements and energy consumption

• Up to 260PB native capacity in a single Tape Library

Page 8: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

Building-block ”HOT” High Performance I/O File Storage

Block

iSCSI

Client workstations Users, Containers

and applications

HPC & HTCCompute farm

Traditionalapplications

GLOBAL Namespace

Analytics

Transparent HDFS

OpenStack

Cinder

Glance

Manila

Object

Swift S3

Transparent Cloud

Powered byIBM Spectrum Scale

Automated data placement and data migration

Disk Tape Shared Nothing Cluster (FPO)

FlashNVMe

New Genapplications

Transparent Cloud Tier (TCT)

Worldwide File Data Distribution (AFM)

Site B

Site A

Site C

SMBNFS

POSIX

File

EncryptionFile AuditLoggingImmutability

DR Site

AFM-DR

JBOD/JBOF

ESS

Spectrum Scale RAID

Compression

DGX / AC922

S3 Data Cloud

Management APIAdvanced GUIRESTful API

Cloud Data Sharing

Page 9: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

IBM Analytics & Hortonworks (HDP) / Hadoop

https://developer.ibm.com/dwblog/2017/ibm-hortonworks-expand-partnership-help-businesses-accelerate-data-driven-decision-making/

Automotive Customer Use Case:

➢ Major automotive OEM was experiencing significant difficulties and costs associated with storing and processing huge volumes of Video, Radar and Lidar files within legacy Network Attached Storage (NAS) system.

➢ Data necessary for development of Autonomous Vehicle machine learning algorithms.

➢ Today, storing multiple Petabytes of video and binary data with HDP Data Lake, aiming to grow to the tens of Petabytes.

➢ Dramatically reduced data management costs and user productivity.

➢ Provided foundation for Autonomous Driving research.

➢ IBM Reference customer for Spectrum Scale and HDP.

Page 10: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

2nd Generation IBM Elastic Storage Server (ESS) Family

10

Model GL4S: 4 Enclosures, 20U

334 NL-SAS, 2 SSD

Model GL6S:6 Enclosures, 28U

502 NL-SAS, 2 SSD

Model GL2S: 2 Enclosures, 12U

166 NL-SAS, 2 SSD

Capacity

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

36 GB/s12 GB/s 24 GB/s

System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

EXP3524

8

9

16

17

Model GS1S24 SSD

EXP3524

8

9

16

17

System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

EXP3524

8

9

16

17

Model GS2S48 SSD

EXP3524

8

9

16

17

System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

EXP3524

8

9

16

17

EXP3524

8

9

16

17

EXP3524

8

9

16

17

Model GS4S96 SSD

Speed

40 GB/s

14 GB/s

26 GB/s

Model GL1S: 1 Enclosures, 9U

82 NL-SAS, 2 SSD

ESS 5U84 Storage

6 GB/s

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

ESS 5U84 Storage

38 GB/s 40 GB/s

Model GH14S: 1 2U24 Enclosure SSD4 5U84 Enclosure HDD334 NL-SAS, 24 SSD

Model GH24S: 2 2U24 Enclosure SSD4 5U84 Enclosure HDD334 NL-SAS, 48 SSD

Page 11: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

Presentation at ATZ Live 04/2018 in Wiesbaden, Germany

„Artifical Intelligence is key to understand Sensor Data“

„Relevant data is needed to finalize the Software Development.“

Dr. Michael Hafner, Head of Automated Driving and Active Safety at Mercedes-Benz, talks about sensors, safety, and the road map that developers are following.

https://www.daimler.com/innovation/autonomous-driving/expert-interview.html

Page 12: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

Workload and data flow for AI flow is complex

Traditional Business Data

Sensor Data

Data from collaboration

partners

Data from mobile app and social media

Legacy Data

Data Preparation

Pre-Processing

Training Dataset

Data Source Model Training Inference

AI Deep Learning Frameworks(Tensorflow, Caffe, …)

Monitor & Advise

Instrumentation

Iterate

Distributed & Elastic Deep Learning (Fabric)

Parallel Hyper-Parameter Search & Optimization

Network Models

Hyper-Parameters

Testing Dataset

Trained Model

Deploy in Production using Trained Model

New Data

Years of DataHours and weeks of

preparation

Weeks and months of training

Sub Seconds to results

Heavy IO

https://public.dhe.ibm.com/common/ssi/ecm/75/en/75016775usen/systems-hardware-ibm-spectrum-computing-analyst-paper-or-report-75016775usen-20180618.pdf

IBM Reference Architecture for AI Infrastructure

Page 13: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

Reference IBM Spectrum Scale ESS CORAL

▪ 2.5 TB/sec single stream IOR as requested from ORNL

▪ 1 TB/sec 1MB sequential read/write as stated in CORAL RFP

▪ Single Node 16 GB/sec sequential read/write as requested from ORNL

▪ 50K creates/sec per shared directory as stated in CORAL RFP

▪ 2.6 Million 32K file creates/sec as requested from ORNL

▪ Summit’s 250-petabyte storage system is delivered by a cluster of 77x

IBM ESS Storage Systems that will deliver 2.5 TBs of data.

▪ Summit will have the capacity of 30B files and 30B directories and will

be able create files at a rate of over 2.6 million I/O file operations per

second.

https://www.ibm.com/blogs/systems/fastest-storage-fastest-system-summit/

Page 14: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

Global Data Distribution via IBM Aspera

Automotive company synchronizes petabytes of vehicle field test data & video from on-site locations to worldwide R&D teams at high-speed with IBM Aspera FASP.

IBM Aspera for Global Data Distributionhttp://downloads.asperasoft.com/

Page 15: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

IBM can help

5. to guarantee long-year data verifiability and recoverability of test data with a comparable cheap tape storage solution for potential warranty cases.

1. Significantly increased development efficiency by reducing manual efforts for video tagging, eliminated wasted time for data search and manual data copy/move processes and by automating workflows.

2. Significantly increased test through-put, means allowing you to run more test cases in less time, therefore increasing time-to-market as well as the quality of your camera and ADAS products.

4. to reduce IT costs for local storage hardware by globally centralizing data in a private cloud and object store, from which project- and demand specific video data are downloaded to local test labs.

3. Increase the entire flexibilityof your organization through the ability to move work-load from one place to another.

Page 16: IBM Storage Reference Architecture for AI applied to ...on-demand.gputechconf.com/gtc-eu/2018/pdf/e8526... · Transparent HDFS OpenStack Cinder Glance Manila Object Swift S3 Transparent

© 2018 IBM Corporation

Question to win a prize

16

How much data does a single test/dev car generate in an 8 hour shift per day?

a) 1-5 TB per dayb) 50-70 TB per dayc) 1-5 PB per day