Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics...

68
Doug Burger Technical Fellow, Advanced Intelligent Architectures, Microsoft Azure

Transcript of Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics...

Page 1: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Doug Burger

Technical Fellow, Advanced Intelligent Architectures, Microsoft Azure

Page 2: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

2

Page 3: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

United States

United States

Canada

Mexico

Venezuela

Colombia

Peru

Bolivia

Brazil

Argentina

Atlanta OceanAlgeria

MaliNiger

Nigeria

Chad

Libya Egypt

Sudan

Ethiopia

Dr Congo

AngolaZambia

Nambia

South

Africa

Greenland

Svalbard

Sweden

Norway

United

Kingdom

France

PolandUkraine

Turkey

Saudi

Arabia

Iran

Kazakistan

India

Russia

China

Myanmar

(Burma)

Indian Ocean

Indonesia

Australia

Pacific Ocean

Pacific Ocean

Data centerOwned capacity

Future capacity

Leased capacity

Edge site

3

Page 4: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine
Page 5: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

5

Page 6: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Silicon scaling will end because

of economics, not physics

Minimal

cost/transistor

(7, 5, 3 nm?)

Lower TCO

might justify

one more node

But the

economics are

tough and will

happen late

Page 7: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Spatial Computation

FPGA

Data

Instruction

Instruction

Instruction

Data

Instruction

Instruction

Instruction

Temporal Computation

Temporal vs. spatial computing

7

CPU

Instruction

Page 8: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Microsoft Confidential

Catapult V0

8

• 1U rack-mounted

• 2 x 10Ge ports

• 3 x16 PCIe slots

• 12 Intel Westmere

cores (2 sockets)

Page 9: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Bing Ranking Implementation Details

Complex

ALU

Ln, ÷, div

Basic Tile

Basic Tile

Basic Tile

Basic Tile

Registers

Constants

FFE 1 Inst.

FFE nInst.

Compression

Thresholds

Local

ALU

DSP

DSP

Schedulin

g

Logic

Distribution latches

Control/Data Tokens

Feature

Transmissi

on

Network

Stream

Preprocessin

g FSM

FE FFE DTS

FE0

89 Non-

BodyBlock

Features

34 State

Machines

55 % Utilization

FE1

55 BodyBlock

Features

20 State

Machines

45 % Utilization

DTT [3][7]

DTT [3][6]

DTT [3][5]

DTT [3][4]

DTT [3][3]

DTT [3][2]

DTT [3][1]

DTT [3][0]

DTT [3][11]

DTT [3][10]

DTT [3][9]

DTT [3][8]

DTT [2][7]

DTT [2][6]

DTT [2][5]

DTT [2][4]

DTT [2][3]

DTT [2][2]

DTT [2][1]

DTT [2][0]

DTT [2][11]

DTT [2][10]

DTT [2][9]

DTT [2][8]

DTT [1][7]

DTT [1][6]

DTT [1][5]

DTT [1][4]

DTT [1][3]

DTT [1][2]

DTT [1][1]

DTT [1][0]

DTT [1][11]

DTT [1][10]

DTT [1][9]

DTT [1][8]

DTT [0][7]

DTT [0][6]

DTT [0][5]

DTT [0][4]

DTT [0][3]

DTT [0][2]

DTT [0][1]

DTT [0][0]

DTT [0][11]

DTT [0][10]

DTT [0][9]

DTT [0][8]

FFE [1][3]

FFE [1][2]

FFE [1][1]

FFE [1][0]

FFE [0][3]

FFE [0][2]

FFE [0][1]

FFE [0][0]

FFE: 64 cores / chip

256-512 threads

DTT: 48 DTT tiles/chip

240 tree processors

2880 trees/chip

Page 10: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine
Page 11: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Catapult V1 Accelerator Card

11

• Altera Stratix V D5

• 172.6K ALMs, 2014 M20Ks

• 457KLEs

• 1 KLE == ~12K gates

• M20K is a 2.5KB SRAM

• PCIe Gen 2 x8, 8GB DDR3

• 20 Gb network among

FPGAs

Stratix V

8GB DDR3

PCIe Gen3 x8

Page 12: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Mapped Fabric into a Pod

12

• Low-latency access to a local FPGA

• Compose multiple FPGAs to accelerate large workloads

• Low-latency, high-bandwidth sharing of storage and memory across server boundaries

Server 1

FPGA

Server 48

FPGA

Top Of Rack Switch (TOR)

Server 2

FPGA

Server 47

FPGA

Server 3

FPGA

Server 46

FPGA

Server 4

FPGA

Server 45

FPGA

Server 23

FPGA

Server 26

FPGA

Server 24

FPGA

Server 25

FPGA

… …DTWS DTWS

DTWS DTWS

DTWS DTWS

DTWS DTWS

DTWS DTWS

DTWS DTWS

S0 S0

S0 S0

S0 S0

S1 S1

S2 S2

S2 S2

10G

b E

thern

et

Lin

ks

FPGA Torus

Page 13: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

1,632 server pilot deployed in production datacenter13

Page 14: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine
Page 15: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Catapult V2 Architecture

15

CPU CPU FPGA

NIC

DRAM DRAM DRAM

WCS 2.0 Server Blade (Mt. Hood) Catapult V2 (Pikes Peak)

Gen3 2x8

Gen3 x8

QPI Switch

QSF

P

QSF

P

QSF

P

40Gb/s

40Gb/s

WCS Gen4.1 Blade with Mellanox NIC and Catapult FPGA

Pikes Peak

WCS Tray

Backplane

Option Card

Mezzanine

Connectors

Catapult WCS Mezz card

(Pike’s Peak)

Page 16: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Slot DMA

Engine

Config

Flash

(RSU)

DDR3

Controller

JTAG

Temp.

Shell

I2C

4256 Mb QSPI

Config Flash

4 GB DDR3

72

8

SEU

40G

MAC

Server NIC Top-of-Rack Switch

Host

CPU

8

Clock

128

4 4

256

40G

MAC

512

FIFO

40G Bypass Ctrl

FIFO

256 256 256 256

ASLs

User Logic

Role

x8 PCIe Core

(HIP 0)

x8 PCIe Core

(HIP 1)

Shell

Abstractions

Can put NVMe

controllers in the

shell, how to use?

Page 17: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Bing Production Results

software

FPGA

99.9% Query Latency versus Queries/sec

HW

vs.

SW

Late

ncy a

nd

Lo

ad

average software load

99.9% software latency

99.9% FPGA latency

average FPGA query load

Page 18: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Gone to Scale: Azure SmartNIC Host

SmartNIC

FPGA

ToR

NIC ASIC

SmartNIC

CPU

Page 19: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Scalable Hardware Microservice fabric

Web search

ranking

Traditional software (CPU) server plane

QPICPU

ToR

FPGA

CPU

Hardware acceleration plane

Interconnected FPGAs form a separate plane of computation

Can be managed and used independently from the CPU

Web search

ranking

Deep neural

networks

SDN offload

SQL

11010100101110110011100111100111011 01100111001111001110

00011010

11010

11010

Page 20: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Accelerators as a First-Class Network Citizen

0

5

10

15

20

25

1 10 100 1000 10000 100000 1000000

Ro

un

d-T

rip

Late

ncy (u

s)

LTL L0 (same TOR)

LTL L1

Example L0 latency histogram

Example L1 latency histogram

Examples of L2 latency histograms for different pairs of

FPGAs

Number of Reachable Hosts/FPGAs

Catapult Gen1 Torus(can reach up to 48 FPGAs)

LTL Average

LatencyLTL 99.9th Percentile

6x8 Torus Latency

LTL L2

10K 100K 250K

Page 21: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Example: Bing Hardware Microservices

CPU FPGA CPU FPGA

Page 22: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

AlexNet

ResNet-50

GNMTBERT-L

GPT-2

Megatron

1

10

100

1000

10000

2010 2012 2014 2016 2018 2020

Mill

ions

of p

ara

me

ters

AlexNet

ResNet-50

BERT-L

GPT-2

Megatron

1

10

100

1000

10000

100000

2010 2012 2014 2016 2018 2020

Billio

ns

of

op

s

~325x ResNet50

(8.3B parameters) ~2200x ResNet50

(17.6T ops)

Microsoft Azure Highly ConfidentialDo not share, forward or copy without MS CELA & MS PR Approval

Page 23: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Project Brainwave Engine

Page 24: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Using Project Brainwave for land use mapping

Data Build Train Deploy Intelligent Apps

ActionIntelligenceData

Stored on Azure Premium

Storage

Azure Machine Learning

1001

Land Classification ModelResNet-50

NAIP Data

20TB, 200M images

Visual Studio Tools for AI

Geo AI Data Science

Virtual Machine

Using FPGAs for ultra-fast inferencing different types of land use for the entire United States

( ESRI NAIP Data, 20+ TB), ~10 minutes, $42

Azure Batch AI

Ultra-fast Inferencing using FPGAs

Satellite Images

for the Entire US

Location: Kent Island Year: 2015

Page 25: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

0

2000

4000

6000

AI

TFLO

PS

per

rack

DNN Inference Performance on Brainwave

NPUs

CPU Stratix 5 FPGA Arria 10 FPGA Stratix 10 FPGA

Brainwave NPUs are on steep growth trajectory for low-batch DNN inference

F

C

F

C

TOR

F

C

50G

T1

T2

TOR

T1

F

F

F

F

F

F

F

9x50G

FPGA

Dual Socket

CPUs

50GNIC

PCIeGen3 x16

50G

Bing Compute

Server

CP

U

50G

FPGA

FPGA

FPGA FPGA

FPGA

FPGA

Bing FPGA

Appliances

FPGA

FPGA

50G

NIC

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F F

25

Page 26: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Microsoft Azure Highly ConfidentialDo not share, forward or copy without MS CELA & MS PR Approval

Page 27: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Software Developers Should Program Hardware

C

Assembly

PDP-11 System/370

???

HDL

Intel FPGA Xilinx FPGA ASIC

Page 28: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

ToR ToR ToR ToR

CS CS CS CS

ToRToRToRToR

Page 29: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Project Catapult + Brainwave History

2011: Project Catapult Launched

2013: Bing pilot runs decision trees 40X faster

2015: Bing ranking throughput increased 2X

2016: Azure Accelerated Networking delivers industry-leading cloud

performance

2017: Over 1M servers deployed with FPGAs at hyperscale

2017: Hardware Microservices harness FPGAs for distributed

computing

2017: FPGAs enable real-time AI, ultra-low latency inferencing without

batching; Bing launches first FPGA-accelerated Deep Neural Network

2018: Project Brainwave launched in Azure Machine Learning

Field Programmable Gate Arrays

29

Page 30: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine
Page 31: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine
Page 32: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

32

Backup slides

Page 33: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Demo | AI for Earth

http://www.microsoft.com/aiforearth

33

Page 34: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

References

Pixel-Level Land Cover Classification Using the Geo AI Data Science Virtual Machine and Batch AIhttps://blogs.technet.microsoft.com/machinelearning/2018/03/12/pixel-level-land-cover-classification-using-the-geo-ai-data-science-virtual-machine-and-batch-ai/

How to Use FPGAs for Deep Learning Inference to Perform Land Cover Mapping on Terabytes of Aerial Images https://blogs.technet.microsoft.com/machinelearning/2018/05/29/how-to-use-fpgas-for-deep-learning-inference-to-perform-land-cover-mapping-on-terabytes-of-aerial-images/

34

Page 35: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

35©Microsoft Corporation

Azure

Rough: Storyboard Notes/Flow of Deck

Straw-man flow:

• Azure all up slide

• Cloud inflection point

• Azure global deployment and scale

• Digital transformation and what it means to companies

• Cloud and edge story. Supercomputer, Sphere (security) Starbucks example

• Data centers. Dublin DC, underwater DC

• Palix, Re-thinking traditional storage system design, co-designing future hardware & software infrastructure for the cloud and building a completely new storage system

• AI: OpenAI deal; general trend toward super AI

• End on an aspirational note…i.e. quantum

Page 36: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

36©Microsoft Corporation

Azure

Links to other deck:

1. [Link to top level folder]

2. Mark Russinovich Presentations including

a) Mark LinkedIn deck

b) Inside Azure Architecture

c) Mark’s Build deck

Page 37: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

37

Page 38: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Azure Datacenters & Architecture

38

Page 39: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Azure global infrastructure54 total Azure regions: 44 generally available + 10 coming soon

39

Page 40: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

3x BeastAzure servers: General purpose

Gen 2

Processor 2 x 6 Core 2.1 GHz

Memory 32 GiB

Hard Drive 6 x 500 GB

SSD None

NIC 1 Gb/s

Godzilla

Processor 2 x 16 Core 2.0 GHz

Memory 512 GiB

Hard Drive None

SSD 9 x 800 GB

NIC 40 Gb/s

Beast

Processor 4 x 18 Core 2.5 GHz

Memory 4096 GiB

Hard Drive None

SSD4 x 2 TB NVMe, 1 x 960 GB SATA

NIC 40 Gb/s

Gen 6

Processor2 x Skylake 24 Core 2.7GHz

Memory 768GiB DDR4

Hard Drive None

SSD4 x 960 GB M.2 SSDsand 1 x 960 GB SATA

NIC 40 Gb/s + FPGA

Beast v2

Processor 8 x 28 Core 2.5 GHz

Memory 12 TiB

Hard Drive None

SSD4 x 2 TB NVMe, 1 x 960 GB SATA

NIC 50 Gb/s

Optimized Gen 7

Processor2 x 26 core Cascade Lake

Memory 192 GB DDR4

Hard Drive None

SSD 5 x 960 GB M.2 NVMe

NIC 50 Gb/s + FPGA

Optimized Gen 6

Processor2 x 24 core Skylake Lake

Memory 192 GB DDR4

Hard Drive None

SSD 4 x 960 GB M.2 NVMe

NIC 40 Gb/s + FPGA

Azure Sphere

Processor 2 x M4 Core @ 200 MHz

Memory 64KB RAM

WiFi 2.4/5.0 GHz 802.11 b/g/n

0.00000000533 Beast

40

Page 41: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Azure architecture

Azure Portal CLI3rd

party

Resource

ProviderCompute Networking Storage

PaaS

offering

Service

FabricAKS ACI Functions

Event

Grid

Azure Fabric Controller

Hardware Manager

Azure infrastructure

Au

then

ticati

on

Tele

metr

y a

nd

in

sig

hts

RB

AC

Azure Resource Manager

41

Page 42: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Web Application Firewall integration at edge

• IP restriction

• Geo filtering

• http parameters

• request methods

• size restriction

Microsoft Global Network

CDN

IP restriction rules in region

allows access from AFD only

WAF

Front Door

Region 1

Web Application

Region 2

Web Application

SQLi, XSS

Malicious Bots

42

Page 43: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Secure Application ContainersCompartmentalize code for agility, robustness & security

On-chip Cloud Services Provide update, authentication, and connectivity

Custom Linux kernelEmpowers agile silicon evolution and reuse of code

Security Monitor

Guards integrity and access to critical resources

App Containers for POSIX (on Cortex-A)

App Containers for I/O (on Cortex-Ms)

OS

Layer 4

On-chip Cloud ServicesOS

Layer 3

HLOS KernelOS

Layer 2

Security MonitorOS

Layer 1

Azure Sphere MCUsHardware

Azure Sphere OS Architecture

43

Page 44: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Simplify development

Focus your device development effort

on the value you want to create

Streamline debugging

Experience interactive, context-aware

debugging across device and cloud

Collaborate across your team

Apply tool-assisted collaboration across

your entire development organization

44

Page 45: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Azure SmartNIC – Accelerating SDN

Page 46: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Inside Azure Servers

46

Page 47: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Gen 2

Processor 2 x 6 Core 2.1 GHz

Memory 32 GiB

Hard Drive 6 x 500 GB

SSD None

NIC 1 Gb/s

Gen 3

Processor 2 x 8 Core 2.1 GHz

Memory 128 GiB

Hard Drive 1 x 4 TB

SSD 5 x 480 GB

NIC 10 Gb/s

HPC

Processor 2 x 12 Core 2.4 GHz

Memory 128 GiB

Hard Drive 5 x 1 TB

SSD None

NIC 10 Gb/s IP, 40 Gb/s IB

Gen 4

Processor 2 x 12 Core 2.4 GHz

Memory 192 GiB

Hard Drive 4 x 2 TB

SSD 4 x 480 GB

NIC 40 Gb/s

Godzilla

Processor 2 x 16 Core 2.0 GHz

Memory 512 GiB

Hard Drive None

SSD 9 x 800 GB

NIC 40 Gb/s

Gen 5.1

Processor 2 x 20 Core 2.3 GHz

Memory 256 GiB

Hard Drive None

SSD6 x 960 GB PCIe Flash and 1 x 960 GB SATA

NIC 40 Gb/s + FPGA

GPU Gen 5

Processor 2 x 8 Core 2.6 GHz

Memory 256 GiB

Hard Drive 1 x 2 TB

SSD 1 x 960 GB SATA

NIC 40 Gb/s

GPU 2 x 2 Compute GPU

Beast

Processor 4 x 18 Core 2.5 GHz

Memory 4096 GiB

Hard Drive None

SSD4 x 1920 GB NVMeand 1 x 960 GB SATA

NIC 40 Gb/s

Gen 6

Processor2 x Skylake 24 Core 2.7GHz

Memory 768GiB DDR4

Hard Drive None

SSD4 x 960 GB M.2 SSDsand 1 x 960 GB SATA

NIC 40 Gb/s

FPGA Yes

RAM

Cores

Mega Beast

Processor 4 x 18 Core 2.5 GHz

Memory 12,288 GiB

Hard Drive None

SSD4 x 1920 GB NVMeand 1 x 960 GB SATA

NIC 40 Gb/s

47

Page 48: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Cold Aisle Servicing

Intel, AMD, ARM64 CPUs

High density GPU expansion for HPC/AI

NVM (DRAM+battery) and 3DXP for low-latency

High density HDD and Flash expansion

Microsoft custom designed SSDs

40 going up to 50 Gbps networking

Accelerated VMs using FPGAsMicrosoft SSD

48

Page 49: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

0.0E+00

5.0E+22

1.0E+23

1.5E+23

2.0E+23

Digital Universe

Installed Capacity

Source: IDC

0%

20%

40%

60%

80%

100%

2020 2030 2040

Perc

en

tag

e o

f

Dig

ital U

niv

ers

e

Data that can be stored

Data that has to be discarded

Byte

s

49

Page 50: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

50

Page 51: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

0.0E+00

5.0E+22

1.0E+23

1.5E+23

2.0E+23

Digital Universe

Installed Capacity

Source: IDC

0%

20%

40%

60%

80%

100%

2020 2030 2040

Perc

en

tag

e o

f

Dig

ital U

niv

ers

e

Data that can be stored

Data that has to be discarded

Byte

s

51

Page 52: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Norbert Wiener, U.S. News and World Rep, 1964; Mikhail Neiman, Radiotechnika, 1965; Church et al, Science, 2012; Goldman et al, Nature, 2013

Let’s store data in DNA!

The number of

transistors doubles

every 18-24 months

The industry

roadmaps are based

on that continued

rate of improvement

Arrange the atoms

the way we want

DNA molecules use approximately 50 atoms for one bit

52

Page 53: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Bases:

Data: Bits Base

00 A

01 C

10 G

11 T

Simple mapping:

Store data in synthetic DNA strands

150 to 300 bases

G T

TT G GG T G GT

10000111001001

53

Page 54: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

VS.

Cold Storage: 1EB

Size: Two Walmart Supercenters

It’s here!

54

Page 55: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

1.0E-02

1.0E-01

1.0E+00

1.0E+01

1.0E+02

1.0E+03

1.0E+04

1.0E+05

1.0E+06

1.0E+07

1.0E+08

1.0E+09

1.0E+10

Disk (Rot) Disk (SSD) Flash (Chip) Tape Optical DNA

Gb

/mm

3

Today Projection Limit

redundancy

spacing

metadata

Potential of DNA:

>107 improvement

3-5 5 10-305 1000+Lifetime (years) 10-unlimited

55

Page 56: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

(Credit: Philipp Stössel/ETH Zurich)

DNA “synthetic fossils” survive 2,000 – 2,000,000 years

Extreme density makes these conditions cheap and easy to keep56

Page 57: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

800 Kbases/day 80 Gbases/day 10 Gbases/day

Size of a mainframe Size of a workstation Size of a portable SSD

Same medium as read technology improves:G T TG

57

Page 58: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Same medium as read technology improves:G T TG

Medium changes as read and write technology improves:

58

Page 59: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

ProblemHow to improve performance for batch-of-1 inferences at scale with low cost?

Solution

Hardware accelerated models using FPGAs

Goals

Provide large-scale, performant, and affordable real-time AI inference services

59

Page 60: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Real-time AI at cloud scale with phenomenal performance and affordable cost

Ultra-fast DNN performance with accelerated ResNet50Extreme speed: Object classification on FPGA in <1.8 ms per image

Affordable: Only 20 cents per million images during preview

http://aka.ms/aml-real-time-ai

Models are easy to create and deploy into Azure cloud

Write once, deploy anywhere – to intelligent cloud or edge

Manage and update your models using Azure ML & IoT Edge

60

Page 61: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Data Build Train Deploy Intelligent Apps

ActionIntelligenceData

Stored on Azure Premium

Storage

Azure Machine Learning

1001

Land Classification ModelResNet-50

NAIP Data

20TB, 200M images

Visual Studio Tools for AI

Geo AI Data Science

Virtual Machine

Using FPGAs for ultra-fast inferencing different types of land use for the entire United States

( ESRI NAIP Data, 20+ TB)

Azure Batch AI

Ultra-fast Inferencing using FPGAs

Satellite Images

for the Entire US

Location: Kent Island Year: 2015

61

Page 62: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Image Answer Answers for Multimedia Images

QP MLBERT Quantus query passage ranking

Bling SD Web Intelligence model to produce semantic document body

Vortex Intelligent Search (QnA, Captions) & Ranking

WaveNet & QRNN Text to speech for Bing Mobile & Microsoft Assistant Commute

ANN Approximate nearest neighbor search

TP1, Deep Think v2, Deep Think v2.1, Deep Scan, Deep

Scan v2, Deep Scan v2.1, Deep Read, Poly Math v4.1,

Poly Math v5, Poly Math 5.1, Universal PM1, AGI

encoder/decoder

Intelligent Search (QnA, Captions) & Ranking

LiRIC List Reading and Inference using Comprehension

ResNet-50, ResNet-152, DenseNet-121, VGG-16, SSD-

VGG

Computer vision scenarios for Azure ML/3P

Universal Deep Think v1 Intelligent Search (QnA, Captions) & Ranking

Smart Reply, Meeting Insights, Key Points Intelligence for M365

Scenario &

Model Analysis

HW Development

(if needed)

Implementation

& Optimization

Quantization &

Fine-Tuning

Flighting & PROD

Deployment

62

Page 63: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

EFFICIENCYFLEXIBILITY

FPGAs

Contro

l Unit

(CU)

Registers

Arithmeti

c Logic

Unit

(ALU)

CPUs GPUsASICsNPUsSoft

NPUs

63

Page 64: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

http://aka.ms/rtai

[email protected]

Models are easy to create and deploy into Azure cloud

Write once, deploy anywhere – to intelligent cloud or edge

Manage and update your models using Azure IoT Edge

64

Page 65: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Project Natick

65

Page 66: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Retardance Polarization Angle

Data stored in quartz glass Written using femtosecond lasers

25µm

66

Page 67: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

Storage Stamp

REST FILE

DFS Layer

Intra-stamp replication

LB

Partition Layer

Front-Ends

Storage Stamp

LB

Partition Layer

Front-Ends

DFS Layer

Intra-stamp replication

Geo replication

67

Page 68: Doug Burger - HPTS · Edge site 3. 5. Silicon scaling will end because of economics, not physics Minimal ... forward or copy without MS CELA & MS PR Approval. Project Brainwave Engine

68