EE382C Final Project Crouching Tiger, Hidden...

36
EE382C Final Project Crouching Tiger, Hidden Dragonfly Alexander Neckar Camilo Moreno Matthew Murray Ziyad Abdel Khaleq

Transcript of EE382C Final Project Crouching Tiger, Hidden...

Page 1: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

EE382C Final Project

Crouching Tiger, Hidden Dragonfly

Alexander Neckar

Camilo Moreno

Matthew Murray

Ziyad Abdel Khaleq

Page 2: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Outline

• Topology, consideration and layout

• Routing solution

• Mirroring and simulation

• Results and conclusion

Page 3: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Dragonfly Topology

Fully-connected local groups

Low hop count

Fast access to global links

Page 4: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Dragonfly Topology

Load balance:

Endpoints/router >= global links per router

~All traffic is bound for other groups. BW should fit.

Local links per router >= endpoints+global links

~All traffic needs to traverse local link before,after global.

Adaptive Routing helps deal with adversarial traffic.

As long as overall BW is sufficient

And we have good backpressure

Page 5: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Considerations

Costs

Optical links drive cost

Minimize number, good utilization

Local links much cheaper

Overprovisioning helps feed global links

Physical layout

fully-connected group size limit (5m cables)

Page 6: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Considerations

Power

Links dominate power

Traffic

Mostly limited in throughput by send window(RPC).

some (RDMA) very large packets.

hotspots.

So... what?

Page 7: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Layout Considerations

Maybe as many as 60 racks per group!

Page 8: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Layout Considerations

Realistically, 34ish

Page 9: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Layout Considerations

Maximize racks per group?

routers on bottom slots, wire diagonally

Actually not a constraint

Balance / cost issues with very large groups.

100m optical cables

~70m square: 147 x 50 racks: >200K rack slots

Page 10: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Chips

Channels:

5GB/s = 4 diff. Pairs @10Gb/s

1 optical cable

4 elec. cable pairs each direction

Chips size is perimeter-driven

buffers+crossbar are only a few mm2.

High-radix requires large perimeter for I/O

Page 11: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Exploring options

Lots of guesstimation!

Page 12: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Basic

TOPOLOGY 13x26x13

Cost 6.16M

Power 68Kw

Router Radix 51

Opt. Links 57291

Elect. Links 110175

Groups 339

Endpts/group 338

>114k nodes

Balanced for uniform random

Page 13: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Cheaper, better?

TOPOLOGY 10x32x10

Cost 5.64M

Power 70.7Kw

Router Radix 51

Opt. Links 51360

Elect. Links 159216

Groups 321

Endpts/group 320

Fewer optical cables

Overprovisioned in-group links

8.5% cheaper

4% higher power

Page 14: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

A little more savings

TOPOLOGY 10x34x9

Cost 5.22M

Power 70.5Kw

Router Radix 52

Opt. Links 46971

Elect. Links 172227

Groups 307

Endpts/group 340

90% of normal global links

Overprovisioned in-group links

Even cheaper

Any good?

Page 15: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

What if...?

TOPOLOGY 10x45x5

Cost 3.11M

Power 65.9Kw

Router Radix 59

Opt. Links 25425

Elect. Links 223740

Groups 226

Endpts/group 450

Half the “necessary” global links

Very overprovisioned in-group links

Otherwise not 100K

Almost half the price!

Page 16: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Improving Global Adaptive Routing

I feel the need…the need for speed.

Page 17: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Challenges

Quick congestion detection

Quick and accurate return to minimal

Tricks with credits, etc., can provide stiff backpressure

How do we avoid incorrectly taking the non-minimal route?

Page 18: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Solution idea

Use the rate of change of the queue to provide quick congestion detection and quick return to minimal

Potential advantages:

More accurate representation of network performance

Rapid detection

Potential problems:

Sensitivity to burstiness

Page 19: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Our Work

ROC = 0.99*prev_ROC + 0.01*cur_ROC

Developed two new routing algorithms:

Min_queue_rate < 2*nonmin_queue_rate || min_queue_rate < 0

Old algorithm || min_queue_rate < 0

Page 20: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Results

1024 nodes, 2*p = 2*h = a = 8, injection

Uniform:

2% increase in average, 5% increase in max for both ROC and combo

Bad_dragon:

ROC = 69% ave. latency, 82% max

Combo = 72% ave., 90% max

Page 21: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Bad Dragon Results

0

10

20

30

40

50

60

70

80

90

100

Original ROC Combo

Ave Latency

Max Latency

Hops

Page 22: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Simulation Challenge

Booksim's cycle-accurate nature is at odds with simulating our very large system

std::bad_alloc...

Page 23: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Solution: Slicing

Do a fraction of the work and get all of the results!

How do we not include components in our simulation and still effectively simulate the entire network?

Page 24: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Slicing idea 1: Scaledown

A = 8, H = 2

Page 25: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Idea: Relationships

Page 26: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Forget about hotspots for a minute...

Page 27: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Slicing Idea 2: Mirroring

Page 28: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Routing

Page 29: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Mirroring with Hotspots

Page 30: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Results for Different topologies

p/a/h

p: Endpoints per switch

a: Switches per group

h: Global links per switch

100,000 nodes with “Project Traffic”

Best from 10/32/10 @ 3.0277 Million Cycles

Page 31: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Simulation Results For 13 / 26 / 13

100

97 97.4

100

108.14

106.43

90

92

94

96

98

100

102

104

106

108

110

Original ROC Combo

Average Latency

Hops3,217,516 cycles

3,209,757 cycles

3,247,934 cycles

Page 32: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Simulation Results For 10 / 32 / 10

100 99.3

97.44

100

112.89

110.9

85

90

95

100

105

110

115

Original ROC Combo

Average Latency

Hops

3,064,421 cycles

3,027,714 cycles 3,054,955 cycles

Page 33: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Simulation Results

For 10 / 32 / 10 WITH 10 Hotspots

100

97.37

98.58

100

113.071

111.1

85

90

95

100

105

110

115

Original ROC Combo

Average Latency

Hops

3,057,401cycles

3,025,221cycles

3,063,628cycles

Page 34: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Other Simulation Results

16 / 28 / 8:

Runtime 4,130,224

Average Latency 519.74 (too big)

10 / 45 / 5 (half global links)

Runtime 4,190,192

Average latency 528.51

Page 35: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Conclusion

ROC always wins in average latency and runtime cycles.

At a small cost of additional power (4%) over the basic 13 / 26 / 13. We can get higher performance cheaper with the 10 / 32 / 10 topology.

Simulated hotspots scenario is pessimistic, numbers are fine.

Page 36: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Questions