ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the...

Post on 30-Aug-2020

0 views 0 download

Transcript of ASIC Clouds: Specializing the Datacenter...Objective In a Nutshell • Two key metrics drive the...

ASIC Clouds: Specializing the Datacenter

Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor

UC San Diego and Toshiba

Presented By: Vandit Agarwal

Motivation• GPU and FPGA based clouds already successful

• Even ASIC Clouds have been successfully used

• Take this idea ahead to form ASIC based clouds for other applications

• Purpose built Datacenter

• Large arrays of ASIC accelerators

• Optimize Total Cost of Ownership (TCO)

• For increasingly common high-volume chronic computations

• Downside:

• High Non Recurring Engineering (NRE)

• Inflexibility

Introduction• Two visible trends:

• Heavy work done on cloud; interactive moved to client

• Rise of dark silicon - specialization and near threshold computation

• Conjunction of these two designs proved viable

• On a single machine level, ASICs can offer at least an order improvement - explore and propose ASIC cloud

• Identify key issues by studying Bitcoin ASIC Cloud

Objective In a Nutshell• Two key metrics drive the development:

• H/w cost per performance = $ per op/s

• Energy per operation = W per op/s

• Working with a joint knowledge/control over datacenter and h/w design

• Select single TCO-optimal point amongst many Pareto-optimal points

• ASIC Design: achieves reduction in silicon area and energy consumption

• ASIC Server: organization of ASIC, heat sinks, selective components, custom voltages

• ASIC Datacenter: optimize rack and datacenter level thermal distribution, costs such as provisioning cost, availability, taxes etc.

**To meet the requirements at datacenter level, modifications trickle down in the hierarchy

Specialization HierarchyOff-PCB Interface On-PCB

Network

On-ASIC Interconnection

Network

ASIC Cloud Architecture

• Trying to create a generic skeleton for ASIC Cloud

• Heart of ASIC cloud - Replicated Compute Accelerator (RCA) - multiplied recursively

• Customization: eg - if RCA requires DRAM, then ASIC contains shared DRAM controllers connected to ASIC-local DRAMs

Off-PCB Interface On-PCB

Network

On-ASIC Interconnection

Network

ASIC Server Overview

• Focussed on 1U 19-inch Rackmount servers

• Forced air-cooling system

• Air intake from front, removal from back

• Air at 30oC

ASIC Server Evaluation Flow• Given an implementation and architecture for target

RCA:

• VLSI tools used to map it to target process

• Analysis tools provide info on:

• Area

• Performance

• Power density

• Tune the following to find lowest TCO:

• No. of RCAs/Chip

• No. of chips/PCB

• Organization of chips on PCB

• Power delivery mechanism

• Cooling mechanism

• Choice of voltage

Thermally-Aware ASIC Server Design• ASICs and DC/DC convertors - major sources of heat

• Heat Sinks:

• Heat spreader glued to the heat source (die) using Thermal Interface Material (TIM)

• Spreader has fins - air blowed through them

• Increasing spreader size improves cooling

• Increasing the die size improves cooling - overcomes TIM resistance

• Developed a model:

• Input: fan curve, ASIC count/row

• Output: Optimal heat sink parameters

Arranging ASICs on PCB

More Chips vs Fewer Chips• How large (in mm2) should each chip

be?

• Determines how many RCAs will be on each chip

• Many small ASICs easier to cool than few large ASICs

• Increasing silicon area -> heat dissipation capacity increases (TIM)

• Large total die area in a row is effective

• Increasing no. of chips increases the packaging cost but not by much

Power Density and Server Cost

• Given same RCA, increasing Watts, increases performance

• Moving right (high power density), very little total silicon per lane (due to temperature constraints) and must be divided into many smaller chips

• Cooling and packaging cost

• Moving left (low power density), more silicon per lane and fewer chips

• Silicon area cost

Bitcoin• Semi-anonymously and securely transfer money

• Blockchain - globally replicated public ledger of transactions

• A distributed consensus algorithm called Byzantine Fault Tolerance determines whose transactions are added to the blockchain

• Mining:

• Machines request work from a pool server

• Hash - brute force attempt at partial inversion of cryptographically hard hash function

• Hashrate - rate of hash - typically Giga hashes per second (GH/s)

• On success, other machines verify. Accept and append the block

What Led to Bitcoin ASIC Cloud?

• People are incentivized to mine:

• More number of machine = more secure system

• Blockchain reward (25 BTC = ~USD 11k in 2016)

• 144 blocks daily x 25 BTC per block = ~USD 1.5M daily

• Rising TCO justifies the increased investment in NRE and other development cost

• Leads to more specialization

Bitcoin ASIC TrendD

ifficu

lty

Implementation

• 0.66 mm2 silicon in UMC 28-nm process.

• Power density: 2W/mm2

• Extremely high power density

Results

• More silicon -> optimal voltages decreases -> server efficiency increases

• Initially, costs reduce (right to left) but then silicon costs start building up

Voltage Stacking

• DC/DC power is significant

• Chips serially chained so that their supplies sum to 12V

• Lead to significant savings in TCO optimal case

Litecoin ASIC Cloud

Video Transcoding ASIC Cloud

**Pareto points are glitchy because of variations in constants and polynomial order for server components as they vary with voltages

CNN ASIC Cloud

When is ASIC Cloud Feasible

Discussion• This is one of the earlier attempts to create a general

framework/skeleton for an ASIC cloud. How feasible do you think this technology is and how widely and how soon can we potentially adopt it for a large variety of applications?

• The authors recommend that open sourcing various tools by the cloud providers and silicon foundries would potentially lead to lower TCO. Is this a good solution? Why or why not?

• What do you think is more optimal? Investing heavily in (high NRE) in more advanced nodes (eg 16nm) or using/modifying older nodes (eg 65nm) in an ASIC?

Bitcoin ASIC Cloud Design• Repeatedly execute a Bitcoin hash operation

• Input: 512 bit block

• Mutate the block and perform SHA256 on it

• Fed into another round of SHA256

• Leading zero count performed and matched with the target

• 64 rounds in each SHA