System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

23
System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O Kshitij Sudan* Saisanthosh Balakrishnan § Sean Lie § , Min Xu § Dhiraj Mallick § , Gary Lauterbach § Rajeev Balasubramonian* § *

description

System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O. *. Kshitij Sudan* Saisanthosh Balakrishnan § Sean Lie § , Min Xu § Dhiraj Mallick § , Gary Lauterbach § Rajeev Balasubramonian *. §. Exec Summary. Focus on web-scale applications - PowerPoint PPT Presentation

Transcript of System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

Page 1: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

System Architecture for Web-Scale Applications Using Lightweight

CPUs and Virtualized I/O

Kshitij Sudan*Saisanthosh Balakrishnan§

Sean Lie §, Min Xu § Dhiraj Mallick §, Gary Lauterbach§

Rajeev Balasubramonian*§

*

Page 2: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Exec Summary

• Focus on web-scale applications• Contribution 1: use of simple cores• This amplifies the power/cost contribution of the

I/O subsystem• Contribution 2: virtualize I/O, e.g., single disk

shared by many cores• Contribution 3: software stack optimizations• Contribution 4: evaluations on a production

quality real design

Page 3: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Exec Summary

• Focus on web-scale applications• Contribution 1: use of simple cores• This amplifies the power/cost contribution of the

I/O subsystem• Contribution 2: virtualize I/O, e.g., single disk

shared by many cores• Contribution 3: software stack optimizations• Contribution 4: evaluations on a production

quality real design

Page 4: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Exec Summary

• Focus on web-scale applications• Contribution 1: use of simple cores• This amplifies the power/cost contribution of the

I/O subsystem• Contribution 2: virtualize I/O, e.g., single disk

shared by many cores• Contribution 3: software stack optimizations• Contribution 4: evaluations on a production

quality real design

Page 5: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Web Scale Applications

• Targeting datacenter platforms• Focus on power and cost (OpEx and CapEx)• Web scale applications have large datasets,

high concurrency, high communication, high I/O – e.g., MapReduce

• Typically, performance increases as cluster size grows, but so does power and cost

Page 6: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Energy Efficient CPUs

• For embarrassingly parallel workloads, energy per instruction (EPI) is important

• For a given power/energy budget, many low-EPI cores can yield a higher throughput than a few high-EPI cores

• Hence, use many light-weight energy-efficient CPUs (Atom CPU at 8.5 W)

Page 7: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Contribution of the I/O Sub-System

• With light-weight cores, the energy and cost contributions of “other” components grow– Intel Atom CPU + Chipset = 11 Watts– Typical disk, or Ethernet card = 5-25 Watts– Fans, power supplies etc…

• The application only uses 20-60 MB/s disk bw, while the disk has a peak read bw of 120 MB/s

Page 8: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

0

20

40

60

80

100

120

140

160

Atom TeraSort - Aggregate Disk BW read Moving average (read) writ Moving average (writ)

Dsik

BW

(MB/

sec)

Wasting energy on over-provisioned resources

Page 9: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Cluster-in-a-Box with Virtualized I/O

• Use energy-efficient CPUs– ~10x more CPUs in same power budget than using

typical server class CPUs• Virtualize I/O devices – disk and Ethernet– Balanced resource provisioning and lower

cost/power• Amortize fixed server overheads by sharing

components– Fans, power supplies, etc.

Page 10: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Compute Cards

Compute card – 6 CPUs share 4 ASICs (PCIe connection), ASIC implements the fabric, 4GB DDR2 memory per CPU on the back

Page 11: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Compute Cards

Compute card – 6 CPUs share 4 ASICs (PCIe connection), ASIC implements the fabric, 4GB DDR2 memory per CPU on the back

Page 12: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Logical Organization

Ethernet FPGA

E-Cards

(Up to 8 per system each with 8xSATA HDD/SSD)

Storage FPGA

S-Cards

(Up to 8 per system, each with 8x1 GbE or 2x10 GbE)

CPU + ChipsetASIC

3D-Torus Interconnect formed by ASICs

ComputeCard

Page 13: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Physical Organization

S-Card

E-Card

Compute Card

Midplane Interconnect

HDD/SSD

Page 14: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Cluster-in-a-Box Summary• 768 CPU cores interconnected using a high bandwidth fabric

in a 3D torus topology– Low-latency distributed fabric architecture based on low-power

ASICs• FPGAs implement the disk and ethernet controllers • Fabric and FPGAs implement I/O virtualization

– Up to 64 disks shared by 384 server nodes• Server nodes don’t require a rack-top-switch to

communicate– All internal cluster communication via fabric

• Entire cluster consumes < 3.5kW under full-load

Page 15: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

System Software Improvements

• Implement large SATA packet sizes to reduce disk seek overheads

• Other OS/ethernet configuration knobs: avoid journaling in the filesystem, jumbo TCP/IP frames, interrupt coalescing

• MapReduce configuration: designate the few nodes near the S-cards as DataNodes

Page 16: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Methodology

• Compare two cluster designs with the same power envelope to evaluate TCO and power for cluster architectures – 17-node Core i7 CPU based cluster (baseline) and

a 384-node Atom cluster-in-a-box– 4 kW Core i7 cluster; 3.5 kW Atom cluster-in-a-box– Four Apache Hadoop benchmarks– TCO calculations based on Hamilton’s model

Page 17: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

TeraGen TeraSort WordCount GridMix0

20

40

60

80

100

120

9.5

34.26

6.11

34.4823.68

98

5.66

65.63

Execution Time Results

AtomCore i7

Exec

ution

Tim

e (m

ins)

Page 18: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Improvement in EDP

-100%

0%

100%

200%

300%

400%

500%

600%

700%

329%

606%

-34%

273%

% C

hang

e in

Per

f./W

-h

Page 19: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

TeraGen TeraSort WordCount GridMix-40%-20%

0%20%40%60%80%

100%120%140%160%

75.50%

147.75%

-15.38%

46.96%

Improvement in EnergyCh

ange

in P

erf./

Watt

Page 20: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Performance/TCO vs. Number of Disks and Number of Cores

Page 21: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

HPCA-2013

Conclusions

• Datacenter power and cost are limiting factors when scaling web-scale apps– Build clusters using light-weight, low-power CPUs

• Balanced resource provisioning can improve utilization, cost, power– Virtualize I/O (disk and Ethernet)– Amortize the overheads of fans, power supplies, etc.

• The cluster-in-a-box system yields up to 6x improvement in EDP, relative to a traditional cluster

Page 22: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

Questions?

Thank You

Page 23: System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

CPU and Disk Utilization

HPCA-2013

768 CPUs, 64 disks 64 CPUs, 32 disks