Department of Computer Science Jinan University( 暨南大学 ) Liangshan Song, Yuhui Deng, Junjie Xie 1.
1 Department of Computer Science, Jinan University
description
Transcript of 1 Department of Computer Science, Jinan University
![Page 1: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/1.jpg)
1 Department of Computer Science, Jinan University2School of Computer Science & Technology, Huazhong
University of Science & Technology
Junjie Xie1, Yuhui Deng1, Ke Zhou2
1NPC 2013: The 10th IFIP International Conference on Network and Parallel Computing. April 21, 2023. Guiyang, China.
![Page 2: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/2.jpg)
• Motivation
• Challenges
• Related work
• Our idea
• System architecture
• Evaluation
• Conclusion
2
![Page 3: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/3.jpg)
• The Explosive Growth of Data Large Data Center⇒ Industrial manufacturing, E-commerce, Social network... IDC: 1,800EB data in 2011, 40-60% annual increase YouTube : 72 hours of video are uploaded per minute. Facebook : 1 billion active users upload 250 million photos per
day.
Image from http://www.buzzfeed.com3
![Page 4: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/4.jpg)
Feb.2011, 《 Science 》: On the Future of Genomic Data 。 Feb.2011, 《 Science 》: Climate Data Challenges in the 21st Century
Jim Gray : The global amount of information would double every 18 months (1998).
![Page 5: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/5.jpg)
• IDC report: Most of the data would be stored in data centers.
• Large Data Center Scalability⇒ Google: 19 data centers>1 million servers Facebook, Microsoft, Amazon… : >100k servers
• Large Data Center Fault Tolerance⇒ Google MapReduce:
5 nodes fail during a job 1 disk fails every 6 hours
Google Data Center
Therefore, the data center network has to be very scalable and fault tolerant
![Page 6: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/6.jpg)
• Tree-based Structure Bandwidth bottleneck, Single points of failure, Expensive
• Fat-tree High capacity,
Limited scalability
6
Tree-based StructureFat-tree
![Page 7: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/7.jpg)
7
DCell Scalable, Fault-tolerant, High capacity, Complex, Expensive
• DCell is a level-based, recursively defined interconnection structure.
• It requires multiport (e.g., 3, 4 or 5) servers.
• DCell scales doubly exponentially with the server node degree.
• It is also fault tolerant and supports high network capacity.
• Downside: It trades-off the expensive core switches/routers with multiport NICs and higher wiring cost.
C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang and S. Lu. DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers. In: Proc. of the ACM SIGCOMM’08, Aug 2008
![Page 8: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/8.jpg)
• FiConn Scalable, Fault-tolerant, Low capacity
8
D. Li, C. Guo, H. Wu, K. Tan, and S. Lu. FiConn: Using Backup Port for Server Interconnection in Data Centers. In: Proc. of the IEEE INFOCOM, 2009.
• FiConn utilizes servers with two built-in ports and low-end commodity switches to form the structure.
• FiConn has a lower wiring cost than DCell.
• Routing in FiConn also makes a balanced use of links at different levels and is traffic-aware to better utilize the link capacities.
• Downside: it has lower aggregate network capacity.
Other architectures: Portland, VL2, Camcube…
![Page 9: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/9.jpg)
• What we achieve: Scalability: Millions of
servers Fault-tolerance:
Structure & Routing Low cost: Commodity
devices High capacity: Multi-
redundant links
Totoro Structure of One Level
9
![Page 10: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/10.jpg)
10
0, 0, 0 0, 0, 10, 0, 2 0, 0, 3 0, 1, 0 0, 1, 1 0, 1, 20, 1, 3 0, 2, 0 0, 2, 1 0, 2, 20, 2, 3 0, 3, 0 0, 3, 1 0, 3, 2 0, 3, 3
3, 2, 33, 2, 23, 2, 13, 2, 0 3, 3, 33, 3, 23, 3, 13, 3, 03, 1, 33, 1, 23, 1, 13, 1, 03, 0, 33, 0, 23, 0, 13, 0, 0 2, 3, 32, 3, 22, 3, 02, 2, 32, 2, 22, 2, 12, 2, 02, 1, 32, 1, 22, 1, 12, 1, 02, 0, 32, 0, 22, 0, 1
1-0, 0 1-0, 1
1-2, 11-2, 01-3, 0 1-3, 1
2-0 2-1 2-2 2-3
1-1, 0 1-1, 1
1, 0, 0 1, 0, 11, 0, 2 1, 0, 3 1, 1, 0 1, 1, 1 1, 1, 21, 1, 3 1, 2, 0 1, 2, 1 1, 2, 21, 2, 3 1, 3, 1 1, 3, 2 1, 3, 31, 3, 0
2, 3, 12, 0, 0
Level -1 Link
Level -2 Link
structure with N = 4, n = 4, K = 2.
![Page 11: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/11.jpg)
• Architecture: Two-port servers Low-end switches Recursively defined
• Building Algorithm
k-level Totoro
two-port NIC
11
![Page 12: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/12.jpg)
• Connect N servers to an N-port switch
• Here, N=4
• Basic partition: Totoro0
• Intra-switch
A Totoro0 Structure 12
![Page 13: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/13.jpg)
• Available ports in Totoro0: c. Here, c=4
• Connect n Totoro0s to n-port switches by using c/2 ports
• Inter-switch
A Totoro1 structure consists of n Totoro0s. 13
![Page 14: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/14.jpg)
• Connect n Totoroi-1s to n-port switches to build a Totoroi
• Recursively defined• Half of available ports ⇒ Open & Scalable
• The number of paths among Totorois is n/2 times of the number of paths among Totoroi-1s ⇒Multi-redundant links ⇒ High network capacity
14
![Page 15: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/15.jpg)
15
0 TotoroBuild(N, n, K) {1 Define tK = N * nK 2 Define server = [aK, aK-1, …, ai, …, a1, a0] 3 For tid = 0 to (tK - 1) 4 For i = 0 to (K – 1)5 ai+1 = (tid / (N * ni)) mod n6 a0 = tid mod N7 Define intra-switch = (0 - aK, aK-1, …, a1, a0) 8 Connect(server, intra-switch)9 For i = 1 to K10 If ((tid – 2i-1 + 1) mod 2i == 0) 11 Define inter-switch (u - bK-u, …, bi, …, b0)12 u = i13 For j = i to (K - 1)14 bj = (tid / (N * nj-1)) mod n 15 b0 = (tid / 2u) mod (N / n * (n/2)u) 16 Connect(server, inter-switch)17 }
The key: work out the level of the outgoing link of this server
![Page 16: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/16.jpg)
16
N n u tu
16 16 2 4096
24 24 2 13824
32 32 2 32768
16 16 3 65536
24 24 3 331776
32 32 3 1048576 Millions of servers
![Page 17: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/17.jpg)
• Totoro Routing Algorithm (TRA) Basically, Not Fault-tolerant
• Totoro Broadcast Domain (TBD) Detect & Share link states
• Totoro Fault-tolerant Routing (TFR) TRA + Dijkstra algorithm (Based on TBD)
17
![Page 18: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/18.jpg)
Totoro Routing Algorithm (TRA)
18
• Divide & Conquer algorithm• Path from src to dst?
![Page 19: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/19.jpg)
19
Step 1: src and dst belong to two different partitions respectively
Totoro Routing Algorithm (TRA)
![Page 20: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/20.jpg)
Totoro Routing Algorithm (TRA)
20
Step 2: Take a link between these two partitions
![Page 21: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/21.jpg)
Totoro Routing Algorithm (TRA)
21
m and n are the intermediate servers The intermediate path is from m to n
![Page 22: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/22.jpg)
Totoro Routing Algorithm (TRA)
22
Step 3: src(dst) and m(n) are in the same basic partition, just return the directed path
![Page 23: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/23.jpg)
Totoro Routing Algorithm (TRA)
23
Step 3: Otherwise, return to Step 1 to work out the path from src(dst) to m(n)
![Page 24: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/24.jpg)
Totoro Routing Algorithm (TRA)
24
Step 4: Join the P(src, m), P(m, n) and P(n, dst) for a full path
![Page 25: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/25.jpg)
Totoro Routing Algorithm (TRA)
25
• The performance of TRA is close to the SP under the conditions of different sizes.
• Simple & Efficient
N n u tu MuTRA
Shortest Path Algorithm
Mean StdDev Mean StdDev
24 24 1 576 6 4.36 1.03 4.36 1.03
32 32 1 1024 6 4.40 1.00 4.39 1.00
48 48 1 2304 6 4.43 0.96 4.43 0.96
24 24 2 13824 10 7.61 1.56 7.39 1.32
32 32 2 32768 10 7.68 1.50 7.45 1.26
The mean value and standard deviation of path length in TRA and SP Algorithm in Totorou of different sizes. Mu is the maximum distance between any two servers in Totorou.tu indicates the total number of servers
![Page 26: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/26.jpg)
Totoro Broadcast Domain (TBD)
26
• Fault-tolerance Detect and share link states ⇒• Time cost & CPU load Global strategy is ⇒
impossible• Divide Totoro into several TBDs
Green: inner-serverYellow: outer-server
![Page 27: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/27.jpg)
Totoro Fault-tolerant Routing (TFR)
27
• Two strategies: Dijkstra algorithm within TBD TRA between TBDs
• Proxy: a temporary destination• Next hop: the next server on P(src, proxy/dst)
![Page 28: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/28.jpg)
Totoro Fault-tolerant Routing (TFR)
28
• If the proxy is unreachable
![Page 29: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/29.jpg)
Totoro Fault-tolerant Routing (TFR)
29
• Reroute the packet to another proxy by using local redundant links
![Page 30: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/30.jpg)
• Evaluating Path Failure Totoro vs. Shortest Path Algorithm(Floyd-Warshall)
• Evaluating Network Structure Totoro vs. Tree-based structure, Fat-Tree, DCell
& FiConn
30
![Page 31: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/31.jpg)
Evaluating Path Failure
31
• Types of failures Link, Node, Switch & Rack failures
• Comparison TFR vs. SP
• Platform Totoro1 (N=48, n=48, K=1, tK=2,304 servers)
Totoro2 (N=16, n=16, K=2, tK=4,096 servers)
• Failures ratios 2% - 20%
• Communication mode All-to-all
• Simulation times 20 times
![Page 32: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/32.jpg)
Evaluating Path Failure
32
• Path failure ratio vs. node failure ratio. The performance of TFR is almost identical to that of SP Maximize the usage of redundant links when a node failure occurs
![Page 33: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/33.jpg)
Evaluating Path Failure
33
• Path failure ratio vs. link failure ratio. TFR performs well when the link failure ratio is small (i.e., <4%). The performance gap between TFR and SP becomes larger and
larger. Not global optimal Not guaranteed to find out an existing path A huge performance improvement potential
![Page 34: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/34.jpg)
Evaluating
34
• Path failure ratio vs. switch failure ratio. TFR performs almost as well as SP in Totoro1
The performance gap between TFR and SP becomes larger and larger in the same Totoro2
![Page 35: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/35.jpg)
Evaluating Path Failure
35
• Path failure ratio vs. switch failure ratio. Path failure ratio of SP is lower in a larger-level Totoro More redundant high-level switches help bypass the failure
![Page 36: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/36.jpg)
Evaluating Path Failure
36
• Path failure ratio vs. rack failure ratio. In a low-level Totoro, TFR achieves results very close to SP. The capacity of TFR in a relative high-level Totoro can be
improved.
![Page 37: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/37.jpg)
Evaluating Network Structure
37
• Low degreeApproaches to but never reach 2Lower degree Lower deployment and maintenance overhead.⇒
Structure Degree DiameterBisection Width
Tree -- 2logd-1T 1
Fat-Tree -- 2log2T T/2
DCell k + 1 <2lognT-1 T/4longnT
FiConn 2 – 1/2k O(logT) O(T/logT)
Totoro 2 – 1/2k O(T) T/2k+1
N: the number of ports on an intra-switchn:the number of ports on an inter-switch
T : the total number of servers .For Totoro, there is
![Page 38: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/38.jpg)
Evaluating Network Structure
38
• Relative large diameter Smaller diameter More efficient routing mechanism ⇒ In practice, the diameter of a Totoro3 with 1M servers is only 18.
This can be improved.
Structure Degree DiameterBisection Width
Tree -- 2logd-1T 1
Fat-Tree -- 2log2T T/2
DCell k + 1 <2lognT-1 T/4longnT
FiConn 2 – 1/2k O(logT) O(T/logT)
Totoro 2 – 1/2k O(T) T/2k+1
![Page 39: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/39.jpg)
Evaluating Network Structure
39
• Large bisection widthLarge bisection width Fault-tolerant & Resilient ⇒ Take a small number of k, the bisection width is large. BiW=T/4, T/8, T/16 when k = 1, 2, 3.
Structure Degree DiameterBisection Width
Tree -- 2logd-1T 1
Fat-Tree -- 2log2T T/2
DCell k + 1 <2lognT-1 T/4longnT
FiConn 2 – 1/2k O(logT) O(T/logT)
Totoro 2 – 1/2k O(T) T/2k+1
![Page 40: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/40.jpg)
• Scalability: Millions of servers & Open structure
• Fault-tolerance: Structure & Routing mechanism
• Low cost: Two-port servers & Commodity switches
• High capacity: Multi-redundant links
Totoro is a viable interconnection solution for data centers!
40
![Page 41: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/41.jpg)
• Fault-tolerance: Structure
How to be more resilient?
Routing under complex failures: More robust rerouting techniques?
• Network capacity Data locality:
Mapping between servers and switches? Data storage allocation policies?
41
![Page 42: 1 Department of Computer Science, Jinan University](https://reader035.fdocuments.in/reader035/viewer/2022062304/568140c5550346895dac8cb2/html5/thumbnails/42.jpg)
42
NPC 2013: The 10th IFIP International Conference on Network and Parallel Computing. April 21, 2023. Guiyang, China.