© 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes...

20
1 © 2018 PURE STORAGE INC. PURE PROPRIETARY

Transcript of © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes...

Page 1: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

1 © 2018 PURE STORAGE INC. PURE PROPRIETARY

Page 2: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

2 © 2018 PURE STORAGE INC. PURE PROPRIETARY

The Importance of Data in AIMatt Oostveen

CTO Pure Storage APJ

Page 3: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

3 © 2018 PURE STORAGE INC. PURE PROPRIETARY

MATTHEW OOSTVEENChief Technology Officer: APJ PURE STORAGE

Working on future systems

Career: IBM, Microsoft, IDC, DellEMC

Page 4: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

4 © 2018 PURE STORAGE INC. PURE PROPRIETARY

WHAT YOU’LL LEARN TODAY

The Role of Data in AI

Our Technical motivation

AI Tuned Infrastructure

Page 5: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

5 © 2018 PURE STORAGE INC. PURE PROPRIETARY

ZB CORE/EDGE THINGS REAL-TIME INTELLIGENCE

Page 6: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

6 © 2018 PURE STORAGE INC. PURE PROPRIETARY

What is the difference between AI & ML?

If it’s written in Python it’s probably ML

If it’s written in PowerPoint it’s AI

Page 7: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

7 © 2018 PURE STORAGE INC. PURE PROPRIETARY

NEW ALGORITHMSMassively Parallel Delivering

Superhuman Accuracy

CPU- TENS OF CORES

MODERN COMPUTEMassively Parallel Architecture

Driving Performance

GPU- THOUSANDS OF CORES

BIG DATAData is the New Oil

163 Zettabytes Created in 2025

THE BIG BANG OF INTELLIGENCEFUELED BY PARALLEL COMPUTE, NEW ALGORITHMS, AND BIG DATA

Page 8: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

8 © 2018 PURE STORAGE INC. PURE PROPRIETARY

DATA IS VITAL TO MACHINE LEARNINGOBSERVATION BY PROF. ANDREW NG, AI LUMINARY

Page 9: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

© 2018 PURE STORAGE INC. PURE PROPRIETARY9

DO IT YOURSELFIS OFTEN THE ONLY OPTION

Never-ending cycles of compiling and tuning open source software

Months of system building and tuning, constant maintenance

Yet legacy solutions full of data bottlenecks, from storage to GPU to apps

Page 10: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

10 © 2018 PURE STORAGE INC. PURE PROPRIETARY

DEEP LEARNING = COMPLEXITY

AI complexity: how many skilled researchers are required and available to build high-performance models?Infrastructure complexity: how many people do I need to deploy, manage, and scale systems for deep learning?

Performance complexity: how much time is spent tuning and configuring pipelines to keep GPUs fed with data?

Page 11: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

11 © 2018 PURE STORAGE INC. PURE PROPRIETARY

COMPLEXITY IS NOT SUSTAINABLE

NEVER-ENDING CYCLES OF TUNING MODELS &

CODE

MONTHS OF SYSTEM BUILDING AND TUNING,

CONSTANT MAINTENANCE

Page 12: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

© 2018 PURE STORAGE INC. PURE PROPRIETARY12

AIRI IN DEPTH

Page 13: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

© 2018 PURE STORAGE INC. PURE PROPRIETARY13

NVIDIA® DGX-1™ | 4x DGX-1 Systems | 4 PFLOPS of DL

Performance

PURE FLASHBLADE™ | 15x 17TB Blades | 1.5M IOPS

ARISTA | 2x 100Gb Ethernet Switches with RDMA

NVIDIA GPU CLOUD DEEP LEARNING STACK | NVIDIA Optimized

Frameworks

AIRI SCALING TOOLKIT | Multi-node Training Made Simple

THE INDUSTRY’S FIRSTCOMPLETE AI-READY INFRASTRUCTURE

HARDWARE

SOFTWARE

Page 14: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

© 2018 PURE STORAGE INC. PURE PROPRIETARY14

LINKING DATA WITH OUTCOMES

UNIFIED ETHERNET FABRICDelivers the performance of RDMA, while simplifying integration into existing data centers

HIGH-PERFORMANCE DATA PLATFORMKeeps GPUs fed with data for efficient scaling of storage and compute resources

CONFIGURATION GUIDE & SCALING TOOLKITSimplify the deployment and validation of high-performance infrastructure for deep learning

Final rendering TBD

Page 15: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

TECHNOLOGY STACKAI-AT-SCALE MADE SIMPLE

MULTI-NODE TRAININGAIRI Scaling Toolkit

NVIDIA OPTIMIZED DEEP LEARNING FRAMEWORKS

CONTAINERIZATIONGPU-optimized Docker

SCALE-OUT GPU COMPUTENVIDIA DGX-1 with Tesla V100 GPUs

SCALE-OUT FILE & OBJECT PROTOCOLSPure Storage Purity//FB

SCALE-OUT FLASH STORAGEPure Storage FlashBlade

. .

. .

. .

AIRI TECHNOLOGY STACKINCLUDES NVIDIA GPU CLOUD DL STACK & SCALING TOOLKIT

Page 16: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

DATA BOTTLENECK ELIMINATEDAIRI™ SLASHES TRAINING TIME BY 4X, BOOST DATA SCIENTIST PRODUCTIVITY

RESNET-50

2660 i/s 2540 i/s

4870 i/s

10244 i/s

1 DGX-1 1 DGX-1 2 DGX-1 4 DGX-1

Synthetic Data Mode(Theoretical Maximum

for DGX-1 Performance)

Multi-Node Training

INCEPTION3

1700 i/s 1600 i/s

3160 i/s

6440 i/s

1 DGX-1 1 DGX-1 2 DGX-1 4 DGX-1

Synthetic Data Mode

VGG16

1660 i/s 1640 i/s

3110 i/s

6300 i/s

1 DGX-1 1 DGX-1 2 DGX-1 4 DGX-1

Synthetic Data Mode

Keeping GPUs Busy with TensorFlow & 100Gb Ethernet with RDMA

Multi-Node Training Multi-Node Training

Page 17: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

AI-AT-SCALE FOR EVERY ORGANISATIONAIRI™ TO EXTEND THE POWER OF NVIDIA® DGX-1™

SYSTEMS

INDUSTRY’S FIRST TO SIMPLIFY AI-AT-SCALEData scientist teams can focus on algorithms, not infrastructure

50 RACKS UNDER 50 INCHESPerformance of entire data center for each data team

SLASH TRAINING FROM MONTH TO WEEKOnly a few experts can run multi-node training, AIRI makes it simple

Page 18: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

© 2018 PURE STORAGE INC. PURE PROPRIETARY18

AIRI SCALING TOOLKIT

CONFIGURATION AND DEPLOYMENT GUIDEGuide to enabling RDMA over Converged Ethernet and end-to-end best practices for configuration of storage, networking, and compute.

SCALING TOOLKITTools to reproduce benchmark results and validate deployment of end-to-end AIRI environment.

Final rendering TBD

Page 19: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

© 2018 PURE STORAGE INC. PURE PROPRIETARY19

KEY TAKEAWAYS

I. Architect for data acquisition, cleaning, exploration, training, and model validation

II. Design infrastructure to scale with thesophistication of data pipelines and models

III.

Serve models at scale using best-of-breed tools that support operations

Page 20: © 2018 PURE STORAGE INC. PURE PROPRIETARY€¦ · big data data is the new oil 163 zettabytes created in 2025 the big bang of intelligence fueled by parallel compute, new algorithms,

20 © 2018 PURE STORAGE INC. PURE PROPRIETARY