End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab...

End to End Deep Learning Solutionon Arm Architecture

Jan. 14 2019, Jammy Zhou

HPC and AI convergenceTOP500 Trend

More than 50 percent of additional flops in the latest TOP500 rankings were from Nvidia Tesla GPUs according to TOP500 report

Half of TOP10 systems use Nvidia GPUs, and 122 systems of TOP500 use Nvidia GPUs (64 systems uses P100 GPUs, 46 systems uses V100 GPUs, 12 systems uses Kepler GPUs)

More AI/ML/DL workloads are being added to HPC applications with wide adoption of Nvidia GPUs

Arm on the road

Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203 in the latest ranking.

Good momentum of Arm based supercomputers around the world, Post-K from Japan, Tianhe-3 from China, Catalyst UK, GW4 Isambad and CEA system from Europe

Arm SVE is enabled by Post-K together with the Tofu D interconnect and HBM2 memory, and will be used for some AI workloads

Besides Nvidia GPUs, there are some other accelerator options in the market, for example, MI60/MI50 Radeon Instinct GPUs from AMD, Xilinx and Intel FPGAs, customized ASIC products, etc

HPC and AI in the Cloud

CPU Accelerator

Network Storage

AI & ML ServicesHPC Services

100 Gbps Ethernet, InfiniBand, Omni-Path, RDMA and RoCE

Fast and scalable storage, such as NVMe based local SSD

Arm on the roadScience Cloud with Arm based HPC from HPC Systems (supporting Hisilicon Hi1616 and Marvell Thunder X2)

Amazon EC2 A1 instances based on AWS Graviton Arm 64-bit processor for scale-out and Arm based workloads

Arm Neoverse continuous improvement

Accelerators (GPUs, FPGAs, ASICs)

HPC & AI software stack (languages, frameworks, libraries, drivers, compilers, etc), multi-node distributed support and MPI

HW Diversity & SW Fragmentation

DL Frameworks

HAL and Drivers

Libraries

Hardware (CPU, GPU, FPGA, ASIC, DSP)

TensorFlowCaffe MXNet Theano

Caffe2 CNTKPaddlePaddle

BLAS FFT RNG SPARSE Eigen

PyTorch Keras

Framework support for multiple accelerators

CMSIS-NNACL

Model Formats (framework specific, ONNX, NNEF)Deep Learning Compilers (TVM, Glow, XLA, ONNC, etc)

1. Difficult to switch between frameworks by application and algorithm developers

2. Different backends to maintain by framework developers for various accelerators

3. Multiple frameworks to support by chip and IP vendors with duplicated efforts, and out-of-tree support by forking the upstream

4. Multiple configurations to support by OEMs/ODMs and cloud vendors

Chainer...

1

2

3

4

Big Data Analytics

TensorFlowOnSpark CaffeOnSpark SparkFlow ...

cuDNN MIOpen

...

Open Neural Network eXchange EcosystemFramework interoperability & Hardware optimizations

ONNX Format

ONNX Models

ONNXIFI ONNX Runtime

ONNX Tools

Create Convert DeployOptimize

ONNX Specifications

Neural-network-only ONNXDefines an extensible computation graph model, built-in operators and standard data types

Support only tensors for input/output data types

ONNX-ML ExtensionClassical machine learning extension

Also support data types of sequences and maps, extend ONNX operator set with ML algorithms not based on neural networks

ONNX v1.3 Released on Sep. 1st 2018

Control Flow supportFunctions (composable operators, experimental)Enhanced shape inferenceAdditional optimization passesONNXIFI 1.0 (C-backend for accelerators)

More to come... QuantizationTest/ComplianceData pipelinesEdge/Mobile/IoT

ONNX Interface for Framework IntegrationONNXIFIStandardized interface for NN inference on different accelerators Runtime discovery and selection of execution backends, as well as ONNX operators supported on each backendSupport ONNX format & online model conversion

ONNXIFI BackendA combination of software layer and hardware device used to run an ONNX graphThe same software layer can expose multiple backendsHeterogeneous type of backend can distribute work across multiple device types internally

ONNXIFIlibonnxifi.so

Glow Library Alibonnxifi-glow.so libonnxifi-a.so

Applications

ONNX Models Frameworks

Library Blibonnxifi-b.so

Library Clibonnxifi-c.dll

Library Dlibonnxifi-d.dylib

...

https://github.com/pytorch/glow/tree/master/lib/Onnxifi

ONNX RuntimeHigh-performance and cross-platform inference engine for ONNX modelsFully implements the ONNX specification including the ONNX-ML extensionArm platforms are supported on both Linux (experimental) and Windows

Diagram from https://github.com/Microsoft/onnxruntime/blob/master/docs/HighLevelDesign.mdTensorRT and nGraph support are work in progress

https://github.com/Microsoft/onnxruntime/blob/master/docs/HighLevelDesign.md

Machine IntelligenceA Linaro Strategic Initiative

Provide the best-in-class Deep Learning performance by leveraging Neural Networkacceleration in IP and SoCs from the Arm ecosystem, through collaborative seamlessintegration with the ecosystem of AI/ML software frameworks and libraries

Scope from HPC to microcontroller

HPC, Data Center & Cloud *SVE based optimization for DL frameworks & libraries

PCIe/CCIX based heterogeneous accelerator support on Arm servers (drivers, compilers and framework integration, etc)

Scale out support for distributed training

Edge node & deviceInitial focus on inference support for Cortex-A SOCs

Common model description format and APIs to the runtime

Common optimized runtime inference engine for Arm-based SoC

Plug-in framework to support multiple 3rd party IPs (NPU, GPU, DSP, FPGA)

Continuous integration testing and benchmarking

Microcontroller *CMSIS-NN optimized frameworks/libraries on RTOS

Frameworks like uTendor and TensorFlow Lite (quantization, footprint reduction, etc)

IP based accelerator support & optimization

* under discussion

traininginference

ArmNN based collaborations - ongoing

https://developer.arm.com/products/processors/machine-learning/arm-nnhttps://community.arm.com/tools/b/blog/posts/arm-nn-the-easy-way-to-deploy-edge-ml

A good base for future collaborations:

100 man-years of effort, 340,000 lines of code

Shipping in over 200 million Android devices based on estimation

Impressive performance uplift by software-only improvements over a period of 6 months

https://developer.arm.com/products/processors/machine-learning/arm-nn

https://community.arm.com/tools/b/blog/posts/arm-nn-the-easy-way-to-deploy-edge-ml

Thanks!

End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab...

Documents

Transcript of End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab...