DATA STRUCTURES AND ALGORITHMS FOR RESOURCE...

DATA STRUCTURES AND ALGORITHMS FOR RESOURCE SCHEDULING IN HIGHSPEED NETWORKS

By

YAN LI

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2010

c⃝ 2010 Yan Li

2

I dedicate my dissertation to my beloved daughter, Sarah, and my wife, Lu Chen.

3

ACKNOWLEDGMENTS

I want to thank my PhD advisers: Dr. Sartaj Sahni and Dr. Sanjay Ranka. You

provided me the chance to fulfill my dream of become a PhD in one of the best

universities in the world. You taught me how to perform the real academical researches.

You inspired me with your brilliant mind and trained my how to attack those difficult

problems in my way. You brought me confidence to solve the problems by myself while

patiently support me by pointing out even the smallest flaws. You also provide me the

comfort research environments so that I can dedicated myself to the researches. Your

demeanors, your kindnesses and your ways of thinking will be the best examples for me

through all my life.

I want to thank my wife, Dr. Lu Chen. You are the one that share all my troubles,

my happiness, my desperation and my hopes. Looking back all these years, I cannot

imagine how I can gain what I have today without you. You take care of the family, you

obtained your PhD together with me, and the most amazing part, you brought us our

most valuable treasure: little Sarah. I cannot expect any more from you. I know behind

all of these, it is your numerous efforts and endless patients. Words are not enough to

express my sincerely appreciation to you. However, I still want to say that, you are the

best gift that I have ever received in all my life.

I want to thank my parents in China. It is you that taught me right and wrong,

the integrity and proud and what are really valued for me; It is you that stand all the

loneliness and missing and encourage me pursue my own dream in other side of the

ocean. Now I have achieve your expectations. Now, I am a Doctor and a father. I will

follow the examples you set to me. I will try my best to bring happiness for my family

members. And for you two, you will always be the ones that I love the most.

Finally, I want to thank all the peoples I meet here. It is you that bring me all the

happy time in Gainesville. Without you, I would not accomplish my PhD so smooth and

4

productive. These memories will forever be embedded deeply in my mind and never will

fade away.

5

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

CHAPTER

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.1 Problem Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.2 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.3 Organization Of This dissertation . . . . . . . . . . . . . . . . . . . . . . . 19

2 TEMPORAL NETWORK MODEL . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.1 Slotted Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2 Continuous Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 SCHEDULING IN GENERAL NETWORKS . . . . . . . . . . . . . . . . . . . . 23

3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Single-Path Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.1 Problem Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 Path Computation Algorithms . . . . . . . . . . . . . . . . . . . . . 25

3.2.2.1 Fixed slot . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.2.2 Maximum bandwidth in a slot . . . . . . . . . . . . . . . . 283.2.2.3 Maximum duration . . . . . . . . . . . . . . . . . . . . . . 283.2.2.4 First slot . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.2.5 All slots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.2.6 All pairs all slots . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.3 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.3.1 Space complexity . . . . . . . . . . . . . . . . . . . . . . 313.2.3.2 Time complexity . . . . . . . . . . . . . . . . . . . . . . . 333.2.3.3 Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3 Multiple-Path Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.3.1.1 Data structures . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.2 Optimal Solution and N-Batch Heuristics . . . . . . . . . . . . . . . 443.3.2.1 N-Batch heuristics . . . . . . . . . . . . . . . . . . . . . . 48

3.3.3 Online Scheduling Algorithms . . . . . . . . . . . . . . . . . . . . . 493.3.3.1 Greedy algorithm . . . . . . . . . . . . . . . . . . . . . . 493.3.3.2 Greedy scheduling with finish time extension(GOS-E) . . 523.3.3.3 K-Path algorithms . . . . . . . . . . . . . . . . . . . . . . 53

6

3.3.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 543.3.4.1 Experimental framework . . . . . . . . . . . . . . . . . . 543.3.4.2 Single start time scheduling(SSTS) . . . . . . . . . . . . 553.3.4.3 Multiple start time scheduling(MSTS) . . . . . . . . . . . 583.3.4.4 GOS v.s. GOS-E . . . . . . . . . . . . . . . . . . . . . . 61

3.4 General Network Scheduling Algorithms Summary . . . . . . . . . . . . . 63

4 SCHEDULING IN OPTICAL NETWORKS . . . . . . . . . . . . . . . . . . . . . 65

4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.2 Scheduling in Full Wavelength Conversion Network . . . . . . . . . . . . 67

4.2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.2.2 Routing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.2.3 Wavelength Assignment Algorithms . . . . . . . . . . . . . . . . . . 714.2.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 734.2.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.2.5.1 Simulation environment . . . . . . . . . . . . . . . . . . . 744.2.5.2 Evaluated algorithms . . . . . . . . . . . . . . . . . . . . 754.2.5.3 Results and observations . . . . . . . . . . . . . . . . . . 76

4.3 Scheduling in Sparse Wavelength Conversion Network . . . . . . . . . . . 774.3.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . 774.3.2 Extended Network Model . . . . . . . . . . . . . . . . . . . . . . . 794.3.3 Routing and Wavelength Assignment Algorithms . . . . . . . . . . 81

4.3.3.1 Extended Bellman-Ford algorithm for sparse wavelengthconversion . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.3.3.2 k-Alternative path algorithm . . . . . . . . . . . . . . . . . 824.3.3.3 Breaking the ties in path selection . . . . . . . . . . . . . 834.3.3.4 Wavelength assignment . . . . . . . . . . . . . . . . . . . 84

4.3.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 854.3.4.1 Experimental framework . . . . . . . . . . . . . . . . . . 854.3.4.2 Slack tie-breaking scheme . . . . . . . . . . . . . . . . . 874.3.4.3 Blocking probability . . . . . . . . . . . . . . . . . . . . . 894.3.4.4 Requests’ average start time . . . . . . . . . . . . . . . . 924.3.4.5 Scheduling overhead . . . . . . . . . . . . . . . . . . . . 944.3.4.6 Algorithm switching strategy . . . . . . . . . . . . . . . . 95

4.4 Optical Network Scheduling Summary . . . . . . . . . . . . . . . . . . . . 97

5 MULTIPLE RESOURCE SCHEDULING . . . . . . . . . . . . . . . . . . . . . . 99

5.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.2 Resource Model and Data Structure . . . . . . . . . . . . . . . . . . . . . 101

5.2.1 Resource Model: MRRM . . . . . . . . . . . . . . . . . . . . . . . 1015.2.2 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.4 Multiple Resource Scheduling Algorithm . . . . . . . . . . . . . . . . . . . 107

5.4.1 WS − RC Scheduling Algorithm . . . . . . . . . . . . . . . . . . . . 108

7

5.4.2 WN − RC Scheduling Algorithm . . . . . . . . . . . . . . . . . . . . 1105.4.3 WS − RN Scheduling Algorithm . . . . . . . . . . . . . . . . . . . . 1115.4.4 WN − RN Scheduling Algorithm . . . . . . . . . . . . . . . . . . . . 113

5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.5.1 Evaluation Environment . . . . . . . . . . . . . . . . . . . . . . . . 1135.5.2 Results and Observations . . . . . . . . . . . . . . . . . . . . . . . 114

6 SCHEDULING IN TIME-DOMAIN WAVELENGTH INTERLEAED NETWORKS 117

6.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.3 Network Model and Problem Definition . . . . . . . . . . . . . . . . . . . . 1216.4 Tree Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.5 Tree-Wavelength Assignment . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.5.1 Generic Form of the Tree-Wavelength Assignment Problem . . . . 1266.5.2 Greedy Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1326.6.1 Experimental Framework . . . . . . . . . . . . . . . . . . . . . . . . 1326.6.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

7 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

8

LIST OF FIGURES

Figure page

1-1 Topology of Internet2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2-1 Time-bandwidth list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3-1 Modified Dijkstra’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3-2 Pseudocode for extended Floyd algorithm . . . . . . . . . . . . . . . . . . . . 31

3-3 Algorithm time complexities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3-4 Classification of fixed-slot algorithms . . . . . . . . . . . . . . . . . . . . . . . . 35

3-5 Topologies of Abilene network, MCI network, Burchard’s network and Clusternetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3-6 Acceptance ratio vs requests number in Abilene MCI Burchard and Clusternetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3-7 Acceptance ratio vs requests number in various random topologies . . . . . . . 40

3-8 Acceptance ratio vs mean requests duration in Abilene MCI Burchard andCluster network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3-9 Acceptance ratio vs mean requests duration in various random topologies . . . 41

3-10 Acceptance ratio vs mean requests bandwidth in Abilene MCI Burchard andCluster network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3-11 Acceptance ratio vs mean requests bandwidth in various random topologies . . 42

3-12 Basic intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3-13 Greedy online scheduling algorithm GOS . . . . . . . . . . . . . . . . . . . . . 51

3-14 Comparison of different algorithms’ MFT for different number of files in MCIusing SSTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3-15 Comparison of different algorithms’ MFT for different number of files in 100nodes random topology using SSTS. . . . . . . . . . . . . . . . . . . . . . . . 55

3-16 Comparison of different algorithms’ MFT random Topologies of different sizeusing SSTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3-17 Comparison of different algorithms’ execution time on different numbers offiles on MCI network using SSTS. . . . . . . . . . . . . . . . . . . . . . . . . . 57

3-18 Comparison of different algorithms’ execution time on different numbers offiles on 100 nodes random topology using SSTS. . . . . . . . . . . . . . . . . 57

9

3-19 Comparison of different algorithms’ MFT execution time on random topologieswith different size using SSTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3-20 Comparison of different algorithms’ MFT for different number of files in MCIusing MSTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3-21 Comparison of different algorithms’ MFT for different number of files in 100nodes random topology using MSTS. . . . . . . . . . . . . . . . . . . . . . . . 59

3-22 Comparison of different algorithms’ MFT in random topologies of different sizeusing MSTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3-23 Comparison of different algorithms’ execution time in MCI network using MSTS.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3-24 Comparison of different algorithms’ execution time in 100 nodes random topologyusing MSTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3-25 Comparison of different algorithms’ execution time on random topologies withdifferent size using MSTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3-26 Comparison on number of Max-Flows is computed by GOS-E and GOS in100 node random network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3-27 Comparison on execution time used by GOS-E and GOS in 100-node randomnetwork. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3-28 Comparison on number of Max-Flows is computed by GOS-E and GOS in100 node random network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3-29 Comparison on execution time used by GOS-E and GOS in 100-node randomnetwork. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4-1 A request table with 5 requests . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4-2 Comparison of wavelength assignment using different schemes for requesttable of Figure 4-1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4-3 Time complexity of different algorithms . . . . . . . . . . . . . . . . . . . . . . . 74

4-4 NSF and GEANT network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4-5 Network acceptance ratio vs number of requests . . . . . . . . . . . . . . . . . 76

4-6 Acceptance ratio vs requests number in various random topologies . . . . . . . 77

4-7 Extended Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4-8 Extended Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4-9 Network Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

10

4-10 Different h values for different topologies. Network Traffic Load: α = 0.05 . . . 88

4-11 Benefit of slack tie-breaking scheme in various topologies. Network TrafficLoad: α = 0.05. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4-12 Blocking Probability vs. Wavelength Converter Ratio in various topology withlow traffic load. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4-13 Blocking Probability vs. Wavelength Converter Ratio in various topology withhigh traffic load. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4-14 Total resource consumption in a 100-node random network under differentworkload. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4-15 Average Request Start Time vs. Wavelength Converter Ratio in various topologywith low traffic load. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4-16 Average Request Start Time vs. Wavelength Converter Ratio in various topologywith high traffic load. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4-17 Average computation time of EBF-S and KDP-S. . . . . . . . . . . . . . . . . . 94

4-18 The performance of algorithm switching strategy in Slow Traffic Pattern Switching.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4-19 The performance algorithm switching strategy in Fast Traffic Pattern Switching.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5-1 General Model of MRRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5-2 Detailed Model of MRRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5-3 WS − RC Scheduling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5-4 Extended Bellman-Ford algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 112

5-5 Our co-scheduling algorithms’ performance on acceptance ratio. . . . . . . . . 115

5-6 Our co-scheduling algorithms’ performance on converge speed. . . . . . . . . 115

6-1 An example of TWIN network. . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

6-2 An example of TWIN Tree Construction. . . . . . . . . . . . . . . . . . . . . . 124

6-3 The greedy algorithm for Tree-Construction . . . . . . . . . . . . . . . . . . . . 125

6-4 Reduction from Graph-Coloring problem to tree-wavelength assignment problem.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6-5 Network Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

11

6-6 The performances of wavelength assignment heuristics under different numberof requests in MCI network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6-7 The performances of wavelength assignment heuristics under different numberof requests in 100-node random topologies. . . . . . . . . . . . . . . . . . . . 134

6-8 The performances of wavelength assignment heuristics in random networkswith various sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6-9 The algorithm running time of wavelength assignment heuristics under differentnumber of requests in 100 node networks. . . . . . . . . . . . . . . . . . . . . 136

6-10 The algorithm running time of wavelength assignment heuristics in randomnetworks with various sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

12

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

DATA STRUCTURES AND ALGORITHMS FOR RESOURCE SCHEDULING IN HIGHSPEED NETWORKS

By

Yan Li

December 2010

Chair: Satarj SahniCochair: Sanjay RankaMajor: Computer Engineering

Large scale scientific applications require the collaboration of geographically

distributed computational resources, that are connected by high speed and dedicated

networks. These distributed computational resources, together with the network

connecting them, define a heterogeneous computational environment. Finding

an optimal resource allocation schedule is key to providing effective and reliable

computation services in this environment. This dissertation focuses on resource

scheduling problems for dedicated high speed networks. We formulate a series

of scheduling problems according to various scheduling needs and performance

metrics. We propose a set of data structures that characterize the temporal behavior of

various resources. Based on these data structures, we propose algorithms for resource

allocation and path computation for each formulated scheduling problem.

We develop multi-path reservation algorithms for in-advance scheduling of large

file transfers from multiple sources to multiple destinations. When the requests are

processed one by one in online mode, a new max-flow based greedy algorithm and

four variants that adapt the k–shortest paths and k–disjoint paths algorithms are

proposed. Further, to find an earliest-finishing schedule for a batch of file transfers, a

linear programming based algorithm is developed.

13

We develop an extended network model that supports optical switches with or

without wavelength converters. We customize some existing routing and resource

allocation algorithms to the optical environment and study the impact of different

wavelength conversion schemes for resource scheduling. We also present a novel

wavelength assignment strategy that alleviates the need to keep track of the bandwidth

allocation status of each wavelength.

A Multiple Resource Reservation Model (MRRM) is presented for the case when

multiple types of resources are scheduled together. This model enables the monitoring

and scheduling of multiple heterogeneous and distributed resources. MRRM provides

a unified representation for multiple types of distributed resources, and represents

resource constraints such as compatibility and accessibility. Using MRRM, we solve the

Multiple Resource First Slot (MRFS) problem based on a collection of algorithms that

are customized for different request and resource types.

We also consider the wavelength assignment problem in Time-domain Wavelength

Interleaved Networks (TWIN). We propose a 2-step process to compute the wavelength

assignment for a given set of the traffic demands. The goal of our scheduling algorithm

is to find out the assignment that uses the minimum number of wavelengths. We show

that this wavelength assignment problem is NP-Complete by reducing the Graph

Coloring problem to it. Four greedy heuristics are presented to compute a near optimal

solution within reasonable time.

For each proposed algorithm, we conduct extensive experiments under various

topologies and workloads to evaluate its performance and efficiency.

14

CHAPTER 1INTRODUCTION

1.1 Problem Overview

Many large-scale scientific and commercial applications produce large amounts

of data, of the order of terabytes to petabytes, which must be transferred across

wide-area networks. For example, for an e-Science application, data sets produced

on a supercomputer in Los Angeles may need to be streamed to a remote storage

center in Houston for analysis. The results then are sent to Atlanta and visualized there

to guide the next round the experiments. When data providers and consumers are

geographically distributed, dedicated connections are needed to effectively support a

variety of remote tasks [46]. More specifically, dedicated bandwidth channels are critical

in these tasks to offer (i) large capacity for massive data transfer operations, and (ii)

dynamically stable bandwidth for monitoring and steering operations. It is important

that these channels be available when the data is or will be ready to be transferred.

Thus, the ability to reserve such dedicated bandwidth connections either on-demand or

in-advance is critical to both classes of operations.

Figure 1-1. Topology of Internet2.

15

To provide dedicated network connection service for the previous example, the

manager of the high speed network that connects these sites (like Internet2 [29],

Figure 1-1) should address the following issues:

(a) Resource monitoring and management. To provide a resource reservation serviceon a network, it is fundamental to acknowledge the status of available resourcesand reserve them for a certain period of time. More sophisticated resourcereservation mechanisms can only be built with the support of these two basicfunctionalities.

(b) Dynamic and scalable resource model. As the link capacity of a network is sharedby different requests from time to time, a dynamic resource model that representsthe changing status of the network’s available resources is needed. Plus, to becapable of the increasing demands for the future applications, this resource modelis also expected to be scalable with the network size and the number of requests.

(c) Efficient scheduling algorithms for various reservation requests. Based on theabove resource model, algorithms are needed to find paths that fulfill users’requests. User requests generally specify source and destination nodes as wellas bandwidth and duration. When the job start time is the primary concern,the bandwidth scheduler need to find the earliest start time for which a feasiblenetwork path is available. When the job requires a large amount of bandwidth,multi-path scheduling is needed to fulfill the resource requirement. When therequired path is not available due to link outage, an alterative path need tobe computed and re-allocated. Hence, for each type of request, we need acorresponding algorithm to effectively compute the best resource allocationschedule.

In this dissertation, we focus on the last two of these three issues and propose data

structures and algorithms for various resource scheduling problems in dedicated high

speed networks.

The importance of dedicated connection capabilities has been recognized,

and several network research projects are currently underway to develop resource

scheduling capabilities. These include User Controlled Light Paths (UCLP) [61],

UltraScience Net (USN) [47], Circuit-switched High-speed End-to-End Transport

ArcHitecture (CHEETAH) [74], Enlightened [18], Dynamic Resource Allocation via

GMPLS Optical Networks (DRAGON) [2], Japanese Gigabit Network II [31], Bandwidth

on Demand (BoD) on Geant2 network [22], On-demand Secure Circuits and Advance

16

Reservation System (OSCARS) [4] of ESnet, Hybrid Optical and Packet Infrastructure

(HOPI) [28], Bandwidth Brokers [72] and others. In addition, production networks at the

national and international scale with such capabilities are being deployed by Internet2

and LHCNet[37]. Such deployments are expected to be on the increase and proliferate

into both shared and private network infrastructures across the globe in the coming

years.

Bandwidth reservation systems operate in one of two modes:

(a) In on-demand scheduling, bandwidth is reserved for a time period that begins atthe current time.

(b) In in-advance scheduling, bandwidth is reserved for a time period that begins atsome future time.

In reality, on-demand scheduling is a special case of in-advance scheduling;

the future time for each scheduling request is separated from the time at which the

request is made by a time interval of zero. Hence, in our research, all the problems are

formulated for the in-advance scheduling mode and the solutions are naturally adapted

to the on-demand case.

1.2 Our Contributions

The contributions of this dissertation are listed below:

(a) Dynamic Resource Model. To represent the changing resource availability inthe network, we defined a set of data structures that coupe the time informationtogether with resource availability. We also discuss in detail the pros-and-cons oftwo different time representations: continuous time model and discrete time model.

(b) Single Path Scheduling in General Networks. Single path scheduling is thesimplest and most common reservation mode, as each request only asks fora single path in the network. For this type of scheduling, we have categorizedseveral different problems based on different objectives, including fixed slot,maximum bandwidth in slot, maximum duration, first slot, all slots and all-pairsall-slots. For each problem, we have proposed several algorithms as its solution.Some of these algorithms are adapted from original on-demand schedulingalgorithms, while others are first proposed by us. Our experiments indicate thatfor those fixed-slot problem, the minimum-hop feasible path algorithm of theseon large networks while DAFP is best on small networks. For other problems, as

17

no prior algorithms have been proposed, our evaluations show only the own theefficiency of our proposed algorithms.

(c) Multiple Paths Scheduling in General Networks. It is well known that usingmultiple paths utilizes the available network resources more effectively [10].To explore the benefits brought by multi-path routing, the Earliest Finish TimeFile Transfer Problem (EFTFTP) is proposed. In this problem, files have to betransferred from multiple sources to multiple destinations and the objective is tominimized the overall finish time of all transfers. To solve this problem, two differentapproaches are proposed and compared: online scheduling, in which the filetransfers are scheduled one by one; and batch scheduling, in which we scheduleall file transfers together. Optimal solutions and heuristics are provided for thesetwo approach. Our experiments show that online scheduling algorithm generateschedules with a maximum finish time slightly larger than those obtained by thebatch scheduling. However, online scheduling is more time efficient and providesbetter average finish times. Hence, online scheduling presents a good balanceamong maximum finish time, mean finish time, and computation time.

(d) Scheduling in Optical Networks with Full Wavelength Conversion. Many highspeed networks are based on optical interconnects and optical switches. A lightpath in an optical network has to two constraints: wavelength continuity constraintand wavelength sharing constraint. Path computation and resource allocationalgorithms for general networks may not be suitable for optical networks. Hence,we re-design our fixed-slot and first-slot algorithms for the optical environment andconsider the wavelength assignment problem. Our results show that the ExtendedBellman Ford (EBF) algorithm has better performance that other algorithms. Forheterogeneous networks, List Sliding Window (LSW) algorithm also providedcomparable solutions; while for homogeneous networks Modified Switch PathFirst (MSPF) algorithm and Modified Switch Wavelength First (MSWF) algorithmprovide comparable solutions. We also showed that a deferred wavelengthassignment strategy can be used effectively in conjunction with our routingalgorithms.

(e) Scheduling in Optical Networks with Sparse Wavelength Conversion.Although full-wavelength conversion provide a great convenience for opticalnetwork scheduling, the high costs and the added latency introduced by wavelengthconverters make sparse wavelength conversion an attractive option in opticalnetwork design. Hence, we examine the impact of sparse wavelength conversionfor resource scheduling on optical networks. We propose a new network model toemulate the full-conversion algorithms in sparse conversion networks. Using thismodel, we conducted extensive experiments to assess the impact of wavelengthconverters on the First-Slot RWA algorithms’ performance. Our experimentsindicate that increasing wavelength converters has positive impact on blockingperformance, but very little impact on the availability of earlier start times. We alsoshow that for networks no larger than several hundreds nodes, deploy wavelength

18

converters at most 60% of nodes is enough to provide satisfying performance.Additionally, an algorithm switching strategy that adapts the scheduling algorithmas the current workload changes is proposed. When the network’s traffic patterndoes not changing dramatically, this strategy provides considerable performanceimprovements.

(f) Multiple Resources Scheduling. To provide a complete service guaranteefor a complex e-Science application, not only network connections, but alsocomputational resources, such as CPU time and disk space should be reserved.We propose a Multiple Resources Reservation Model (MRRM) that representsheterogeneous resources in a unified way while keeping their diversity. We alsodefine the Multiple Resource First Slot (MRFS) problem and divide it into foursub-problems. For each salient instance of MRFS , an algorithm is presented tosolve it based on MRRM. Experiments on a heterogeneous computer networkshow that our algorithms are scalable linearly in terms of network size and requestratio.

(g) Scheduling in TWIN. Time-domain Wavelength Interleaving Network (TWIN) is anovel optical network architecture. In TWIN, traffic from different source nodes mayshare one wavelength through Time Dimension Multiplexing (TDM) in the networkif their have the same destination. This new feature may improve the link capacityutilization by providing much finer granularity in optical bandwidth reservationfor multi-source-single-destination traffics. In this dissertation, we consider thewavelength assignment problems in TWIN networks. Several algorithms for thisproblem are proposed and evaluated.

1.3 Organization Of This dissertation

The rest of this dissertation is organized as follows. In Chapter 2, we presents

the temporal network model and related data structures, which are fundamental for

all following chapters. In Chapter 3, our researches on the general network model is

presented, where fractional reservation of a link’s capacity is allow. Chapter 4 discusses

bandwidth the scheduling problem in the specific context of optical networks. Chapter

5 proposes a novel multiple resource scheduling model and corresponding scheduling

algorithms. Chapter 6 solves the wavelength assignment problem in Time-domain

Wavelength Interleaved Networks. We conclude this dissertation in Chapter 7.

19

CHAPTER 2TEMPORAL NETWORK MODEL

We assume that the network is represented as a graph G = (V ,E). Each node

of this graph represents a device such as a switch for layers 1-2 and a router for layer

3; and each edge represents a link such as SONET or Ethernet. When developing

an in-advance reservation system one must decide on a representation of time. The

options are to either consider time as divided into equal size slots as is done in [12, 19,

23, 54, 58] or to consider time as being continuous as in [46–48, 52, 73].

2.1 Slotted Time

Explicit in the slotted model of [12, 23] is the use of an array to store link status

for each time slot. So, for example, we may use a two-dimensional array b such that

b[l , t] gives the bandwidth available on link l in slot t. The use of this model of time has

several merits and demerits. The merits include its simplicity and the fact that the status

of a link l in any slot t may be determined in O(1) time. Some of the demerits are:

1. We need to decide the granularity of a time slot. Does a slot represent a minute,an hour, a day, or a week of time? The granularity of a time slot determines thenumber of time slots we need to provision for in our array b. So, if the advancescheduler permits reservations to be made up to a year in advance of the jobcompletion time, the number of time slots (i.e., size of the second dimension of ourarray), T , will need to be 52 in case a time slot represents a week and 525,600in case a time slot is 1 minute (we assume a 365-day year). Assuming that ittakes 4 bytes to store the available bandwidth of a link and a network with 1000links, the memory required for the link status array b is 208,000 bytes when weemploy a 1-week granularity and is about 2GB for a 1-minute granularity. On theother hand, the potential to waste a lot of resources is high when we use a 1-weekgranularity. This is because the scheduler can allocate only an integral number ofslots to a reservation request. So, if a task needs a fractional number of slots, thescheduler must round to the nearest integer. A request for (say) 1.1 slots results inthe allocation of 2 slots. With a 1-week granularity, a 1-minute task, which ties up asource-destination path for 1-week results in a 0.01% utilization!

2. The run-time of reservation algorithms is often a function of T , the total number ofslots, or τ , the duration of a reservation request or both. So, for example, it takesO(τ) time to determine whether link l has bandwidth b available in each of slotst, · · · , t + τ − 1. This determination is to be made to verify that link l is available

20

for a bandwidth b reservation of duration τ beginning at slot t. To find the first slotduring which link l has a bandwidth of at least b available takes O(T ) time.

3. The exact time of day/week/year represented by slot t cannot be fixed. In otherwords the slot b[l , t] cannot be associated with (say) week t of a year when usinga slot granularity of 1 week. This is because the reservation system needs tooperate essentially forever using the same array. As time advances from 1 week tothe next we need to drop the slot that represented the elapsed week and add a slotto represent the week that is now a year away. To do this efficiently, we must usethe slots associated with each link in a circular manner as is done in the circulararray-representation of a queue [51].

2.2 Continuous Time

In the continuous time model adopted by [46–48, 52, 73], the status of each

link l is maintained using a time-bandwidth list (TB list) TB[l ] that is comprised of

tuples of the form (ti , bi), where ti is a time and bi is a bandwidth. The tuples on a

TB list are in increasing order of ti . If (ti , bi) is a tuple of TB[l ] (other than the last

one), then the bandwidth available on link l from ti to ti+1 is bi . When (ti , bi) is the

last tuple, a bandwidth of b is available from ti to ∞. Consider the link shown in

Figure 2-1. The graph is a pictorial representation of the bandwidth available on this

link as a function of time. So, for example, a bandwidth of 5 is available from time 0

to time 1 and the available bandwidth from 2 to 3 is 4. The corresponding TB list is

[(0, 5), (1, 2), (2, 4), (3, 5), (4, 1), (5, 5)].

Figure 2-1. Time-bandwidth list

21

We note that TB lists may be used in the slotted model of time as well with ti

representing a slot rather than a time. In this case, TB[4] = [(2, 10), (9, 5), (20, 50)]

would mean that link 4 has an available bandwidth of 10 in slots 2 through 8, a

bandwidth of 5 in slots 9 through 19, and a bandwidth of 50 in slots 20 and beyond;

the link is not available in slot 1.

Each TB list may be represented as an array linear list using dynamic array resizing

as described in [51] or as a linked list.

The demerits of the continuous time model include its relative complexity (linear lists

are somewhat more difficult to handle than arrays) and the complexity of determining the

status of a link at any given time. The latter can be done in O(log I |TB[l ]|)) time using a

binary search in case of an array linear list and in O(|TB[l ]|) time in case of a linked TB

list. Some of the merits of this time model are:

1. There is no need to pick a time granularity or to place a bound on the length ofthe book ahead period (i.e., we don’t need to limit ourselves to reservations thatcomplete within a year (say)).

2. The memory required to represent a link state (i.e., the TB list) is a function ofthe time variation in link bandwidth availability rather than the scheduling horizonT . So, for example, a link with bandwidth capacity 100 and no reservationsis represented by the TB list [(0, 100)] irrespective of how far ahead one canschedule. On the other hand, if the available bandwidth changes at times 0,10, and 40, the TB list will have 3 tuples. We note that the time variation in linkbandwidth is loosely related to the number of tasks scheduled on that link.

3. The run-time of reservation algorithms is not a function of the scheduling horizon.Instead, it is a function of the size of the TB lists, which, in turn, depends on thenumber of tasks that have been scheduled.

Because of the correspondence between a slot and time, we often use the two

terms interchangeably.

22

CHAPTER 3SCHEDULING IN GENERAL NETWORKS

3.1 Problem Definition

Much of the research on bandwidth scheduling have focused on reserving a

single path for a specified bandwidth request. For on-demand scheduling, it is

typically supported by Multiple Protocol Label Switching (MPLS) [8, 11] at layer 3

and by Generalized MPLS (GMPLS) [67] at layers 1 and 2. Algorithms for on-demand

scheduling are described in [24, 40, 41, 64], and implemented by CHEETAH, DRAGON,

HOPI, UCLP and JGN. GeantII, OSCARS, USN and Enlighten support in-advance

scheduling and algorithms for in-advance scheduling are described in [12, 23, 46–48,

52], for example.

On the other hand, it is well known that using multiple paths can utilize the

available network resources more effectively [10]. The multi-path reservation problem

is formulated in [10] as a network flow problem with the objective of minimizing link

congestion. Algorithms for delay-constrained file transfer using multiple paths are

proposed in [49]. Multi-path file transfer with both link utilization constraints and path

length constraints is considered in [36]. A maximum concurrent flow formulation is used

in [45] to solve the large file transfer problem with fixed start and end times. Its objective

is to maximize network throughput. [45] also develops linear programming models

to maximize network throughput and proposes two heuristics for multi-path routing.

The first heuristic, k-Shortest Paths (KSP), uses the k-shortest paths algorithm of [69]

to compute k not necessarily disjoint paths from the source to the destination. The

scheduling of the file transfer is restricted to these k paths. The second heuristic,

k-Disjoint Paths (KDP), computes k disjoint paths from source to destination by

eliminating the links contained in previously computed paths before computing the

next path; each path computation generates the shortest path in the remaining network.

23

In this chapter, we use a network model in which all the network links allow

fractional bandwidth reservation. In this case, we can reserve any amount of bandwidth

on any link provided there is enough capacity available. However, in the next chapter,

the integer constraint for bandwidth reservation is introduced to accommodate for optical

network constraints.

In Section 3.2, a set of in-advance scheduling problems are defined and corresponding

algorithms are proposed and evaluated. In Section 3.3,a multiple path scheduling

problem Earliest Finish Time File Transfer Problem (EFTFTP) is proposed and solved by

both optimal solutions and heuristics.

3.2 Single-Path Scheduling

3.2.1 Problem Definitions

The following problems are of interest in the context of in-advance scheduling.

1. Fixed Slot: Reserve a path with bandwidth b from the source s to the destinationd from time tstart to time tend .

2. Maximum Bandwidth in Slot: Find the largest bandwidth b such that there is abandwidth b path from the source s to the destination d from time tstart to time tend .Reserve such a path.

3. Maximum Duration: Find the maximum duration τ such that there is a bandwidthb path from the source s to the destination d from time tstart to time tstart + τ .Reserve such a path.

4. First Slot: Find the least t for which there is a path with bandwidth b from thesource s to the destination d from time t to time t + τ , where τ is the duration forwhich the path is desired. Reserve such a path.

5. All Slots: Find all ranges r such that for every t ∈ r , there is a bandwidth b pathfrom the source s to the destination d from time t to time t + τ . Reserve such apath for a user selected t in one of the found ranges.

6. All Pairs All Slots: For every source-destination pair (s, d), find all ranges r suchthat for every t ∈ r , there is a bandwidth b path from s to d from time t to timet + τ .

24

The fixed-slot problem is referred to as the connection feasibility problem in [23].

The soonest completion problem formulated in [23] is referred to as the first available

transmission period in [12]; both are identical to the first-slot problem stated above.

In addition to the path problems defined above, an in-advance reservation system

may implement aggregate path reservation algorithms in which the available bandwidth

on the reserved path may change during the course of the reservation interval. The

integral of the available bandwidth over the reservation interval is to meet a prespecified

requirement. Aggregate data reservation algorithms are useful in data transfer

applications where we are not concerned with transferring data at a uniform rate but

just with completing the transfer by (say) a given deadline. [23] shows that the aggregate

path reservation problem is NP-hard.

3.2.2 Path Computation Algorithms

We describe only the algorithms needed to compute the paths for the various

problems described in Section 3.2.1. The actual scheduling or reservation of the found

path requires us to update the TB lists or the b-array entries for the links on the path as

well as to signal the routers on the path at the reserved time. The former is a relatively

straightforward process and the latter requires the use of specific signaling protocols .

3.2.2.1 Fixed slot

Of the problems listed in Section 3.2.1, the fixed-slot problem is the most studied.

The algorithms that have been proposed are described below.

1. Feasible Path (FP): A link of the network is feasible for fixed-slot scheduling iff theavailable bandwidth on the link at all times in the interval [tstart , tend ] is ≥ b. Let p bea path from the source s to the destination d . p is a feasible path iff it is comprisedonly of feasible links. In FP scheduling, a feasible path is reserved. Such a pathmay be found by performing a search (depth- or breadth-first, for example [51]) onthe subgraph of G , called the feasible subgraph, obtained from G by eliminating alllinks that are not feasible. FP scheduling is done in [23].

2. Minimum Hop Feasible Path (MHFP): The number of links on a path is its hopcount. In MHFP scheduling, a minimum-hop feasible path is reserved. Such a pathmay be found by performing a breadth-first search on the feasible subgraph of

25

G . Notice that FP scheduling is the same as MHFP scheduling when the searchmethod used by FP scheduling is breadth first. MHFP scheduling is used in USNand the path computation algorithm is formally stated in [52].

3. Widest/Shortest Feasible Path (WSFP): This is an adaptation of the widest-shortestmethod proposed in [24] for on-demand scheduling. Let p be a feasible path. Letbmin ≥ b be the minimum bandwidth available on any link of p at any instant inthe scheduling interval [tstart , tend ]. In WSFP scheduling, we use the minimum-hopfeasible path that has the maximum bmin value. Ties are broken arbitrarily. Noticethat WSFP scheduling is MHFP scheduling with a specified tie breaker. WSFPscheduling is suggested in [12]. A WSFP may be found by running a modifiedBellman-Ford algorithm on the feasible subgraph of G [24]; the weight of a link isthe minimum bandwidth available on that link during the interval [tstart , tend ] or byrunning a modified version of Dijkstra’s shortest path algorithm on this feasiblesubgraph [40]. In the latter case, when selecting the next shortest path, priority isgiven to next-path candidates with least hop count and ties are broken by using thebmin value of the path.

4. Shortest/Widest Feasible Path (SWFP): This is a variant of WSFP schedulingthat was first proposed for on-demand scheduling [64]. We select a feasible paththat has maximum bmin value. Ties are broken by favoring paths with smaller hopcounts. An SWFP path may be found [40] by first running Dijkstra’s shortest pathalgorithm modified to find a path with maximum bmin and then doing a breadth-firstsearch to find a minimum-hop path with this maximum bmin value; the breadth-firstsearch ignores links that violate this bmin requirement.

5. Shortest Distance Feasible Path Algorithms (SDFP): These algorithms find ashortest path (the length of a path being the sum of the weights of the links on thatpath) in the feasible subgraph of G . An SDFP path may be found using Dijkstra’sshortest path algorithm.SDFP algorithms differ in their selection of a cost metric for feasible links.SDFP-min (minimum SDFP) is an extension of the shortest distance pathalgorithm for on-demand scheduling [41] to the case of in-advance scheduling.The weight of a feasible link is the reciprocal of the minimum bandwidth availableon that link during the scheduling interval [tstart , tend ].In SDFP-avg (average SDFP), the weight of a feasible link is the reciprocal of theaverage (rather than the minimum) bandwidth available on that link during theperiod [tstart , tend ].

6. Dynamic Alternative Feasible Path (DAFP): Again, this is an adaptation of thedynamic alternative path algorithm originally proposed for on-demand scheduling[40]. Let h be the number of hops in the MHFP. In DAFP, we use a widest feasiblepath (i.e., one with maximum bmin value) that has no more than h + 1 hops. Sucha path may be found [40] by restricting the Bellman-Ford algorithm proposedfor WSFP to use no path with more than h + 1 hops. We note that while DAFP

26

guarantees to find a feasible path whenever such a path exists, the dynamicalternative path algorithm of [40] provides no such guarantee.

7. OSPF Like Algorithms: These are shortest path algorithms that work on G orsome subgraph of G other than the feasible subgraph. They differ in how the linkweights are defined and/or in how the subgraph is defined. Since these algorithmsdo not work on the feasible subgraph of G , they may generate an infeasible pathand so fail to schedule a request in some cases where one of the aforementionedfeasible path algorithms succeed. The shortest path may be found using Dijkstra’salgorithm.In most implementations of the OSPF algorithm, the weight of a link is defined tobe the reciprocal of its bandwidth capacity. Note that this bandwidth is not the link’savailable bandwidth at any given time but the link’s nominal unloaded bandwidth.So, the weight of a 10Gb link is 1/10 regardless of the already scheduled loadon that link. The OSPF path is the shortest path in the graph G using theselink weights. In case the OSPF path is not feasible for fixed-slot scheduling, thereservation request is denied.In the version of OSPF-TE implemented in OSCARS [4], you remove from thenetwork graph G those links that do not have an available bandwidth that is atleast b at the time the scheduling request is processed (not at time tstart); linkweights are as for OSPF. The shortest path in this reduced graph is found and anattempt is made to schedule the reservation request on this path. As with OSPF,the OSPF-TE path may not be feasible. Note that even though the OSPF-TE pathhas enough bandwidth at the time the path is computed, it may not have sufficientbandwidth during the reservation period [tstart , tend ]. The feasibility of the OSPF-TEpath is verified, in OSCARS, by using a database of previously made reservations.In case the OSPF-TE path is infeasible, the scheduling request is denied.

8. k Dynamic Paths (kDP): These algorithms are an extension of OSPF-likealgorithms. Recognizing that an OSPF-like algorithm may fail to find a feasiblepath in a network that has a feasible path, kDP algorithms generate additionalpaths with the hope that one of the additional paths will be feasible. An OSPF-likealgorithm generates a shortest path and succeeds when this shortest pathis feasible and fails otherwise. In a kDP algorithm, when the generated pathis infeasible, we reduce the current graph by removing from it links on thegenerated infeasible path whose available bandwidth during the reservationinterval [tstart , tend ] is less than b. We then find the shortest s to d path in thisreduced graph. This path computation and graph reduction process is repeated atmost k times. The process terminates when the first feasible path is found or whenk infeasible paths have been generated.kDP-OSPF and kDP-OSPF-TE are natural extensions of OSPF and OSPF-TE.kDP-LOAD is an adaptation of the algorithms used in [58] for in-advancescheduling of optical networks. In kDP-LOAD, each link is assigned a weight equalto the total load already scheduled on that link. More precisely, a link’s weight isthe aggregate allocated bandwidth on the link beginning at the current time and

27

going up to the latest scheduled time. Some other adaptations are kDPA-min inwhich the link weight is the reciprocal of the minimum bandwidth available in theinterval [tstart , tend ] (as in SDFP-min) and kDP-avg in which the link weight is thereciprocal of the average bandwidth available in this interval (as in SDFP-avg).

9. k Static Paths (kSP): In this algorithm, we have up to k precomputed pathsbetween every pair of source-destination vertices. To schedule a path between s

and d , we examine, in some order, the up to k precomputed paths for the pair (s, t)and select the first that is feasible for the interval [tstart , tend ]. If none is feasible, thescheduling request is denied.

3.2.2.2 Maximum bandwidth in a slot

This computation is achieved by modifying Dijkstra’s shortest path algorithm [51]

as shown in Figure 3-1. Here, b[u][v ] is the minimum bandwidth available on the edge

(u, v) during the specified interval/slot and bw [u] is the maximum bandwidth along

paths from the source s to vertex u under the constraint that these paths go through

only those vertices to which a maximum bandwidth path has been found already. The

complexity of the algorithm is O(n2) for a general n vertex graph. However, practical

network graphs have O(n) edges and the complexity becomes O(n log n) when a max

heap (for example) is used to maintain the bw values.

3.2.2.3 Maximum duration

As noted in [23], the maximum duration problem is very similar to the widest path

problem, which in turn is identical to the maximum bandwidth problem. The weight

of a link is set to the maximum duration, beginning at tstart for which the link has a

bandwidth of b or more available. The widest s to d path in this weighted graph is the

maximum duration path. The path is found using a modified Dijkstra’s algorithm as for

the maximum bandwidth in slot problem.

3.2.2.4 First slot

Three different algorithms have been proposed for the first slot problem [23, 52, 58].

1. Slotted Sliding Window (SSW): The sliding window first algorithm proposedin [58] for optical networks is a variation of the soonest completion algorithmproposed in [23]. Both these algorithms try the slots tstart , tstart + 1, · · · , in thisorder, to find the least t for which the graph G has a feasible path (i.e., an s to d

28

MaxBandwidth(s,d,prev)

{

bw [i ] = b[s][i ], 1 ≤ i ≤ n.prev [i ] = s, 1 ≤ i ≤ n.prev [s] = 0.Initialize L to be a list with all vertices other than s .for (i = 1, i < n − 1, i ++){

Delete a vertex w from L with maximum bw .if (w == d) return.for (each u adjacent from w )

if (bw [u] < min{bw [w ], b[w ][u]}){

bw [u] = min{bw [w ], b[w ][u]}.prev [u] = w .

}

}

}

Figure 3-1. Modified Dijkstra’s algorithm

path with bandwidth b for the duration from t to t + τ ). The existence of a feasiblepath for any t may be done using a fixed-slot algorithm such as FP that guaranteesto find a feasible path whenever such a path exists or by using a kDP (as in thecase of [58]) or kSP algorithm that does not provide such a guarantee.

2. List Sliding Window (LSW): This is similar to SSW except that it was developedfor the continuous time model in which there is no concept of discrete timeintervals (i.e., slots). For each link in the network, we define a start-time list,ST , that is comprised of pairs of the form (a, b) with the property that for everyt ∈ [a, b], the link has bandwidth b available from t to t + τ . Let a1 < a2 < · · · < aqbe the distinct a values in the union of the ST lists of all links. It is easy to see thatthe earliest time t for which the network has a path from s to d with bandwidth b

from time t to time t + τ is one of the ais. The LSW algorithm of [52] examines theais in the order a1, a2, · · · stopping at the first ai for which a feasible path is found.The search for a feasible path is done using a breadth-first search as is done bythe MHFP algorithm. Some optimization is possible since the breadth-first searchfor ai follows that for ai−1 < ai . Since the breadth-first search must scan the ST

list of each link that is traversed during the search, this scan may begin where themost recent scan of this list (from the breadth-first search for an earlier ai ) left offrather than from the front of the ST list.Although LSW was developed for the continuous time model, it may be used alsoin the slotted time model regardless of whether TB lists or an array is used torepresent link status.

29

3. Extended Bellman Ford (EBF): This algorithm for first slot was proposed in [52].First, we extend the notion of an ST list for a link to that for a path in the naturalway. Next, define st(k , u) to be the union of the ST lists for all paths from vertex s

to vertex u that have at most k edges on them. Clearly, st(0, u) = ∅ for u ̸= s andst(0, s) = [0,∞]. Also, st(1, u) = ST (s, u) for u ̸= s and st(1, s) = st(0, s). Fork > 1 (actually also for k = 1), we obtain the following recurrence

st(k , u) = st(k−1, u)∪{∪v such that (v ,u) is an edge{st(k−1, v)∩ST (v , u)}} (3–1)

where ∪ and ∩ are list union and intersection operations. For an n-vertex graph,st(n − 1, d) gives the start times of all paths from s to d that have bandwidth b

available for a duration τ . The a value of the first (a, b) pair in st(n − 1, d) gives thedesired first slot.The Bellman-Ford algorithm [51] may be extended to compute st(n − 1, d). Theextension merely works with st lists rather than with scalars and is described in[52].

3.2.2.5 All slots

There appears to be just one algorithm that has been proposed for the all slots

problem. This is the extended Bellman-Ford algorithm [52] to compute st(n − 1, d) (see

preceding discussion of first slot algorithms). As noted above, st(n − 1, d) gives the start

times of all paths from s to d that have bandwidth b available for a duration τ .

3.2.2.6 All pairs all slots

The extended Bellman-Ford (EBF ) algorithm of Section 3.2.2.4 computes st(u) =

st(n − 1, u) for a given source vertex s and all u in O(nel) time. st(u) gives the start

time of all available slots of duration d and bandwidth b. So, in O(nel), using EBF, we

are able to determine all available slots from s to every other vertex u (including vertex

d). Furthermore, to determine all available slots between all pairs of vertices, we may

run the EBF algorithm for n times, once with each vertex as the source vertex s. So, the

time needed to determine all slots between all pairs of vertices is O(n2el). An alternative

strategy to determine all available slots between all pairs of vertices is to extend Floyd’s

all-pairs shortest path algorithm [51] as is done in [47]. Figure 3-2 gives the resulting

extension. Here, st(u, v) is the ST list for paths from u to v . Initially, st(u, v) = ST (u, v).

On termination, st(u, v) gives all possible start times for paths from u to v .

30

algorithm ExtendedFloyd()

{

for (int k = 1; k < n; k++)

for (int i = 1; i < n; k++)

for (int j = 1; j < n; k++)

st(i , j) = st(i , j) ∪ {st(i , k) ∩ st(k , j)};}

Figure 3-2. Pseudocode for extended Floyd algorithm

3.2.3 Performance Metrics

In addition to the traditional metrics of space and time complexity, the effectiveness

of an in-advance scheduling algorithm in accommodating reservation requests is critical.

The space complexity needs to be “reasonable”. That is, the space requirement should

not exceed the available memory on the computer on which the bandwidth management

system is to run. The time complexity is important as this influences the response time

of the bandwidth management system and, in turn, determines how many reservation

requests this system can process per unit time. Scheduling effectiveness is, of course,

critical as revenue is generated only from tasks that are actually scheduled.

Although many of the proposed algorithms may be run in distributed mode (for

example, those based on breadth-first search and Bellman-Ford), current implementations

of these algorithms in in-advance reservation systems such as GeantII, OSCARS, USN,

and Enlighten are centralized. Therefore, we limit our discussion of the complexity

metric to centralized implementations of the proposed algorithms.

3.2.3.1 Space complexity

When using the slotted-array model, the status of link l is stored in array position

b[l ][i ], 1 ≤ i ≤ T , where T is the number of slots in the scheduling horizon1 . The

1 This representation is an extension of the packed adjacency list representation of agraph described in [51]. The extension requires us to keep an array b[l ][∗] of slots witheach link l rather than a scalar weight.

31

memory required for link status information is, therefore, �(eT ), where e is the number

of links in the network. In the continuous time model, where TB lists are used to store

link status, we need O(∑

i|TB[i ]|), where |TB[i ]| is the size of the TB list for link i . As

noted in Chapter 2, the slotted model is a special case of the continuous model and

TB lists may also be used for this latter model. When this is done, the size of each TB

list is bounded by T and the TB list representation takes O(eT ) memory. For a lightly

loaded system in which the size of TB list is much less than its maximum possible size

of T , the TB-list scheme uses much less memory than is used by the slotted-array

representation. In fact, when no tasks are scheduled, the TB-list representation uses

�(e) memory vs. �(T ) memory for the slotted-array representation. At the other

extreme where each TB list has T pairs, the TB-list representation takes about twice

the memory taken by the array representation (note that each entry on a TB list is a pair

while each array entry is a singleton).

In addition to the memory required to store network status, memory is needed to

run the path computation algorithms. Algorithms employing a graph search strategy

such as breadth-first search need O(n) space to keep track of which vertices have

or have not been visited so far. Note that we may determine whether or not a link is

feasible while the breadth-first search algorithm is running and so no additional space

is needed to maintain the feasible subgraph of G . Those that use some version of

Dijkstra’s shortest path algorithm, need O(n) space for a priority queue and O(e) space

for the link weights that are in use.

For Bellman-Ford extensions, we need space for the link ST lists and the path

st lists. The number of link ST lists is e. Although there are O(n2) path st lists, the

computation of st(k , ∗) can be done in-place (i.e., using the same space as used by

st(k − 1, ∗) [52]). So, space for O(n) st lists is needed. For the slotted model, T is

an upper bound on the size of an ST /st list. So, Bellman-Ford extensions require an

additional O((n + e)T ) memory to run. In lightly loaded systems, the size of an st list

32

is much less than T and correspondingly less memory is needed. For the continuous

model, the size of an ST /st list could be as large as the number of tasks scheduled so

far. However, in realistic applications, we expect the size to be considerably less than the

T value for a corresponding slotted system.

The Floyd extension used in the all-pairs all-slots problem requires an n × n array of

st lists. The memory for this array corresponds to that for O(n2) st lists. Bounds on the

size of an st list were discussed in the previous paragraph.

3.2.3.2 Time complexity

Figure 3-3 summarizes the time complexity of each of the algorithms described in

Section 3.2.2. We assume also that the number of links e is at least equal to the number

of nodes n. The algorithms of Section 3.2.2 that work on the feasible subgraph of G may

be implemented to either begin by identifying the feasible links of G or may check each

link for feasibility when the link is first examined. In either case, O(eτ) time is spent on

link feasibility checks when the slotted-array model is used. In the continuous model,

feasibility checks take O(L) time, where L is the sum of the lengths of the TB lists. In

either case, the remainder of the algorithm takes O(e) time.

When the number of previously scheduled jobs is small, the TB lists also are small

in size. However, in the worst case, the size of a TB list may be T . So, for lightly loaded

systems, we expect the continuous version of an algorithm to outperform (in terms of run

time and memory) the corresponding slotted-array version. This is the typical trade-off

between sparse and dense data-structure representations. Additionally, the slotted-array

versions for the fixed slot and max bandwidth problems are expected to outperform the

corresponding continuous versions when the requested reservation duration τ is small.

3.2.3.3 Effectiveness

There are two aspects to effectiveness–guarantees and utilization. Guarantees

has to do with whether or not the scheduling algorithm provides any guarantee on its

result. For example, does a fixed-slot algorithm guarantee to find a feasible whenever

33

Problem Algorithm SlottedArray ContinuousFixed Slot FP O(eτ) O(e + L)

MHFP O(eτ) O(e + L)WSFP O(eτ + n log n) O(e + L+ n log n)SWFP O(eτ + n log n) O(e + L+ n log n)SDFP O(eτ + n log n) O(e + L+ n log n)DAFP O(eτ + n log n) O(e + L+ n log n)OSPF O(e + n(τ + log n)) O(e + L+ n log n)kDP O(ke +min{kn, e}τ + kn log n)) O(ke + L+ kn log n)kSP O(min{kn, e}τ + kn) O(L+ kn)

Max Bandwidth Dijkstra O(eτ + n log n) O(e + L+ n log n)

Max Duration Dijkstra O(eT + n log n) O(e + L+ n log n)

First Slot SSW-MHFP O(eT ) −LSW O((q + T )e) O(qe + L)EBF O(nel + eT ) O(nel + L)

All Slots EBF O(nel + eT ) O(nel + L)

All Pairs All Slots Floyd O(n3l + eT ) O(n3l + L)l = size of longest st list

q = number of different ais in the ST listsL = sum of lengths of TB lists

Figure 3-3. Algorithm time complexities

such a path exits? Does a first-slot algorithm actually find the earliest feasible path? For

the fixed-slot problem, all algorithms other than the OSPF-like, k dynamic paths, and k

static paths algorithms provide a guarantee. For the remaining in-advance scheduling

problems, all algorithms described in Section 3.2.2, other than the first-slot algorithm of

[58], provide a guarantee. Figure 3-4 gives a possible hierarchical classification of the

fixed-slot algorithms of Section 3.2.2.1.

Since the scheduling algorithms work in an online mode (i.e., scheduling requests

are processed in the order they arrive at the bandwidth management system and a

decision on whether or not to make the requested reservation made on the basis of

link states at the time the reservation request is processed without regard to future

requests), the link status at the time a decision is made on the current request being

processed depends on decisions made in the past. These past decisions are a function

of the path computation algorithm(s) in use. Suppose that fixed-slot reservation requests

34

Figure 3-4. Classification of fixed-slot algorithms

A, B, and C arrive at the bandwidth management system in this order. Request A may

be denied by OSPF-TE (as OSPF-TE provides no guarantee) and accepted by FP. As

a result, the network state following the processing of request A is different when the

bandwidth management system uses OSPF-TE for fixed-slot reservation than when

it uses FP. Consequently, it is entirely possible that OSPF-TE then accepts B and C

while FP rejects both B and C . Hence network utilization as measured by the number

of accepted requests or total network bandwidth that has been scheduled may be

more using OSPF-TE that provides no guarantee than when using FP that provides a

guarantee!

Burchard [12] considers two utilization metrics–request blocking ratio (RBR) and

bandwidth blocking ratio (BBR), which, for any measurement interval, are defined as:

RBR =number of rejected requests

total number of requests

BBR =sum of bandwidth-duration products of rejected requests

sum of bandwidth-duration products of all requests

Equivalently, we may use request acceptance (RAR) and bandwidth acceptance

(BAR) ratios, which are defined as:

35

RAR = 1− RBR

BAR = 1− BBR

We note that BBR also has been used in the context of on-demand scheduling (see

[40], for example).

3.2.4 Experiments

For each of the max bandwidth, max duration, all slots, and all pairs all slots

problems, only one algorithm has been proposed and so no relative effectiveness

comparison is possible. For the first slot problem, there are three algorithms–SSW-MHFP,

LSW, and EBF. All 3 guarantee to find the first slot correctly. Hence, barring differences

resulting from their possible implementation using different tie breakers, each is just as

effective. Of course, there will be some difference in the computer-time taken to execute

each algorithm (as indicated in Section 3.2.3.2). Therefore an experimental evaluation

of effectiveness is needed only for the various algorithms proposed for the fixed slot

problem.

We programmed the various fixed slot algorithms in C++ and measured the

effectiveness of each using the RAR and BAR metrics. Although we experimented

with both variants (SDFP-min and SDFP-avg) of SDFP, we report only the results

for SDFP-min as there the results for both variants are comparable. For OSPF, we

programmed the OSPF-TE variant that is used in OSCARS [4]. The kDP variant

tested by us is kDP-LOAD with k = 4. We used this variant as it is the variant used in

Enlightened [18]. Finally, for kSP, we set k to 4 and used 4 shortest paths.

For test networks, we used the the Abilene network [5], 19-node MCI network and

the 16-node cluster network of [40], the 11-node network of [12], and several randomly

generated topologies. The backbone of the Abilene network used by us has 11 nodes

as shown in Figure 6-5A and each backbone node has a 5-node stub network attached

36

to it. The bandwidth of each link is 155Mbps. The bandwidth of each link in the network

of [12] (Figure 3-5C) is 100Mbps. Figures 3-5B and 3-5D give the MCI and cluster

topologies together with the link bandwidths. The random networks we tried had 200,

400, or 800 nodes and the out-degree of each node was randomly selected to be

between 3 and 7. To ensure network connectivity, the random network has bidirectional

links between nodes i and i +1 for every 1 ≤ i < n, where n is the number of nodes. The

bandwidth of each link in a randomly generated network was randomly selected from

the set 50Mbps (OC1), 155Mbps (OC3), 620Mbps (OC12), 1000Mbps (1G Ethernet),

1245Mbps (OC24), 2490Mbps (OC48), 4975Mbps (OC96), 9950Mbps (OC192), and

10000Mbps (10G Ethernet).

A Abilene B MCI

C Burchard D Cluster

Figure 3-5. Topologies of Abilene network, MCI network, Burchard’s network and Clusternetwork

We generated a synthetic set of reservation requests. Each request is described by

the 6-tuple (source node, destination node, time at which the request is made, requested

start time, duration, bandwidth). The source and destination nodes for each request

37

were randomly selected using a uniform random number generator. The time at which

the request is made followed a Poisson distribution. The requested start time was set

to be the time at which the request is made plus a randomly generated lag. Since the

results are relatively insensitive to the lag time, we arbitrarily set the mean lag time to be

100 units.

For our experiments, we had three control parameters–number of requests in the

study interval, mean duration of a scheduling request, and mean bandwidth of a request.

The study interval was arbitrarily set to 2000 time units; the number of requests in the

study interval was set to one of the values 200, 400, 600, 800, and 1000 for the random

networks and to one of 100, 200, 300, 400, and 500 for the remaining networks; the

allowable mean request durations were 200, 400, 600, 800, and 1000 time units; and the

allowable mean request bandwidths were 500, 1000, 1500, 2000, and 2500 Mbps for the

random networks and 10, 30, 50, 70 and 90 Mbps for the remaining networks. For each

setting of the 3 control parameters, we ran 10 trials. In the case of random networks, the

network topology was randomly regenerated for each of the 10 trials. For each trial, we

measured the request acceptance and bandwidth acceptance ratios (RAR and BAR). In

reporting our results, we computed the average RAR for all conducted experiments with

a given network and fixed value for one of the 3 control parameters. So, for example, we

computed the average RAR for the 250 (5 request durations * 5 request bandwidths * 10

trials) experiments done on the MCI network with the number of requests in the study

interval being 100.

Since the relative performance of the fixed-slot algorithms is rather insensitive to

whether we use the RAR or BAR metric, we report only on the RAR results. Figure 3-6

and Figure 3-7 give the average acceptance ratios for the fixed-slot algorithms of

Section 3.2.2.1 as a function of the number of requests in the study interval. On

networks such as the MCI, Cluster, and Burchard networks, that have a relatively

small number nodes, the dynamic alternative feasible path algorithm (DAFP) gives best

38

performance consistently across the range of number of requests tested by us. The

minimum hop feasible path algorithm (MHFP) is consistently second best. However,

on larger networks such as the Abilene network, which has 66 nodes, and the random

networks that have 200+ nodes, MHFP is consistently superior to DAFP. For the tested

larger networks, MHFP is best and DAFP is second best. Generally, the fixed-slot

algorithms OSPF, kDP, and kSP that do not guarantee to make a reservation when such

a reservation is possible fared worse than the algorithms that provide such a guarantee.

However, at times, the performance of the best “no guarantee algorithm” was quite close

to or slightly better than that of the worst “guarantee algorithm.” On our non-random

networks, OSPF consistently had the worst performance. However, on our random

networks, kDP was consistently worst and, often, by quite a margin. As expected, as

the network gets saturated (i.e., the number of requests in the study interval increases),

the RAR for all algorithms declines and the rate of decline is about the same for all

algorithms.

A Abilene B MCI

C Burchard D Cluster

Figure 3-6. Acceptance ratio vs requests number in Abilene MCI Burchard and Clusternetwork

39

A 200 nodes B 400 nodes

MHFP

WSFP

SWFP

SDFP

DAFP

OSPF

KDP

KSP

200 400 600 800 1000

Number of Requests

C 800 nodes

Figure 3-7. Acceptance ratio vs requests number in various random topologies

Figures 3-8–3-9 give the average acceptance ratios for the fixed-slot algorithms of

Section 3.2.2.1 as a function of the mean request duration. The relative performance of

the algorithms is the same as for the case when we fixed the number of requests rather

than the mean request duration.

85MHFP

WSFP

SWFP

75

80

SWFP

SDFP

DAFP

OSPF

KDP

70

75

(%)

KDP

KSP

65

ance

Ratio(

60Accep

ta

55

50

45

200 400 600 800 1000

Mean Request Duration

A Abilene

65MHFP

WSFP

SWFP

60

SDFP

DAFP

OSPF

KDP

KSP

55

(%)

50

ance

Ratio(

45

Accep

ta

40

35

30

200 400 600 800 1000

Mena Request Duration

B MCI

90MHFP

WSFP

SWFP

80

85 SDFP

DAFP

OSPF

KDP

KSP

75

(%)

70

ance

Ratio(

60

65

Accep

ta

55

50

45

200 400 600 800 1000


C Burchard

55MHFP

WSFP

SWFP

50

SDFP

DAFP

OSPF

KDP

KSP

45

(%)

KSP

40

ance

Ratio(

35

Accep

ta

3030

25

200 400 600 800 1000


D Cluster

Figure 3-8. Acceptance ratio vs mean requests duration in Abilene MCI Burchard andCluster network

40

90MHFP

WSFP

SWFP

80

SDFP

DAFP

OSPF

KDP

KSP

70

(%)

60

anceRatio(

50

Accepta

40

30

20

200 400 600 800 1000


A 200 nodes

95MHFP

WSFP

SWFP

85

SDFP

DAFP

OSPF

KDP

KSP

75

(%)

65

anceRatio(

55

Accepta

45

35

25

200 400 600 800 1000


B 400 nodes

95

MHFP

WSFP

SWFP

SDFP

85

DAFP

OSPF

KDP

KSP

75

(%)

65

anceRatio(

55

Accepta

45

35

25

200 400 600 800 1000


C 800 nodes

Figure 3-9. Acceptance ratio vs mean requests duration in various random topologies

Figures 3-10–3-11 give the average acceptance ratios as a function of the mean

request bandwidth. The relative performance of the algorithms is the same as for the

case when we fixed either the number of requests or the mean request duration.

100MHFP

WSFP

SWFP

80

90

SWFP

SDFP

DAFP

OSPF

KDP

70

80

(%)

KDP

KSP

60

ance

Ratio(

50Accep

ta

40

30

20

10 30 50 70 90

Mean Request Bandwidth

A Abilene

95

MHFP

WSFP

SWFP

85

SDFP

DAFP

OSPF

KDP

KSP

75

(%)

55

65

ance

Ratio(

45Accep

ta

35

25

15

10 30 50 70 90

Mena Request Bandwidth

B MCI

100MHFP

WSFP

SWFP

80

90SDFP

DAFP

OSPF

KDP

KSP

70

80

(%)

60

ance

Ratio(

50Accep

ta

40

30

20

10 30 50 70 90


C Burchard

90

MHFP

WSFP

SWFP

80

SDFP

DAFP

OSPF

KDP

KSP

70

(%)

KSP

50

60

ance

Ratio(

40Accep

ta

30

20

10

10 30 50 70 90


D Cluster

Figure 3-10. Acceptance ratio vs mean requests bandwidth in Abilene MCI Burchardand Cluster network

41

100MHFP

WSFP

SWFP

80

90 SDFP

DAFP

OSPF

KDP

KSP

70

(%)

60

anceRatio(

40

50

Accepta

30

20

10

500 1000 1500 2000 2500


A 200 nodes

100MHFP

WSFP

SWFP

80

90 SDFP

DAFP

OSPF

KDP

KSP

70

(%)

60

anceRatio(

40

50

Accepta

30

20

10

500 1000 1500 2000 2500

Number of Requests

B 400 nodes

90

100MHFP

WSFP

SWFP

SDFP

80

90DAFP

OSPF

KDP

KSP

60

70

(%)

50

60

anceRatio(

40

Accepta

20

30

10

0

500 1000 1500 2000 2500


C 800 nodes

Figure 3-11. Acceptance ratio vs mean requests bandwidth in various randomtopologies

3.3 Multiple-Path Scheduling

3.3.1 Problem Definition

2 Consider a scenario of Fusion Experiments[21], which is collaboration consisting

of researchers in different European countries. After each round of experiments, the

simulation data are generated in different sites and processed by supercomputers

across the continent. Moreover, the data transfer and processing time is limited since

the next round of experiments will be guided by the results from the previous ones.

Thus, the file transfer time may become a major bottleneck to improve the experiment’s

efficiency. The delay of any file transfer in the batch would cause big loss to the whole

project. In this paper, we model such problem as the Batched-File Path Scheduling

Problem(BFPSP), where the goal is to minimize the overall transfer time of multiple

one-to-one file transfers. Without losing generality, we assume that all the file transfer

requests are pre-specified before scheduling starts. Clearly, such an algorithm can also

be used by batch the newly arrived requests at appropriate intervals when they arrive.

In in-advanced scheduling[23], each file transfer may have a different start time. Their

earliest start time can be defined by the use in their requests. But actual start time is

2 This section is submitted the journal of supercomputing in 2009 and still underreview.

42

decided by the scheduling algorithms at runtime. In this paper, we provide both optimal

and approximate solutions for this problem and evaluate them in multiple scenarios.

All scheduling approaches can be considered as an variation of the Batch

Scheduling. We assume that all requests are collected as a batch in the scheduler;

the requests in a batch are scheduled as a group with certain periodicity. Obviously,

if all the file transfer requests are batched as one group, the solution is optimal, which

is denoted as All-Batch. All-Batch is very time consuming for large batch sizes. Also,

it is not realistic as the arrival time of the requests may not be the same. We present

a number of heuristics that have much lower time complexity than using All-Batch, but

have similar performance and the added benefit that all the requests may not be known

beforehand. The proposed approach, N-Batch, groups requests into batch of constant

size N and schedules each batch separately. For the special case of batch size equal

to 1, the scheduling is equivalent to Online Scheduling. For this case, we develop two

sets of heuristics:GOS and k-Path.

We have compared all these algorithms for a variety of scenarios and performance

metrics. Our simulations show that both N-Batch and GOS provides schedules that

are comparable in quality to using All-Batch, but require significantly less scheduling

time than using All-Batch. GOS is comparable to N-Batch but requires significantly less

computation time. We also investigate GOS-E algorithms that minimize path switching

overhead, which is a variant of GOS. GOS-E is known to have good performance when

path switching overhead cannot be ignored.

3.3.1.1 Data structures

Recall the data structure Time-Bandwidth List (TB List) introduced in Chapter 2.

Let T = [T0,T1, · · · ], T0 < T1 < · · · , be the union of time component of the (ti , bi)

tuples in the TB lists of all links in the network. We refer to T as the global time list. It

is easy to see that the available bandwidth on each link of the network is unchanged

in the interval [Ti ,Ti+1). Figure 3-12 shows the TB lists for 2 links. For this simple

43

example, assume these are the only two links in the network. The TB list for the first link

is [(0, 5), (1, 2), (2, 5)] and that for the second link is [(0, 5), (1.5, 3), (2, 5)]. The global

time list for our example is [0, 1, 1.5, 2]. In the interval [0,1), the available bandwidth on

the two links is 5 whereas in the interval [1,1.5), the first link has an available bandwidth

of 2 while the second link’s available bandwidth is 5, and neither of links’ bandwidth

changes within this basic interval.

Figure 3-12. Basic intervals

The intervals [T0,T1), [T1,T2), · · · in the global time list are referred as basic

intervals. At any time within a certain basic interval, each edge has a constant amount

of available bandwidth. Basic intervals obtained from the global time list can be ordered

using the relationship [a, b) < [c , d) iff b ≤ c (note that the basic intervals of a global

time list are disjoint and that a < b for each basic interval [a, b)).

File transfer requests are characterized by a 4-tuple (si , di , fi ,Si) where si is the

source location of the file that is to be transferred; di is the destination to which the file

is to be sent; fi is the size of the file; and Si , which is the time at which the file becomes

available for transfer, specifies the earliest time at which the file transfer may begin.

3.3.2 Optimal Solution and N-Batch Heuristics

In periodic batch scheduling, requests are collected/batched in a centralized

scheduler; the collected/batched requests are scheduled as a group. To find out the

earliest finish time of a batch of files, We develop a 2 step algorithm to optimally (i.e.,

minimize the maximum finish time) schedule a set of file transfer requests. The two

steps are:

44

Step 1: Determine the minimum finish time, minFinishTime.

Step 2: Determine a file transfer schedule that achieves this minimum finish time.

To find out the minimum finish time, we construct a global time list from the TB

lists of all links as before and then construct the basic intervals from this global time

list. The basic interval [Ti ,Ti+1) is referred to simply as basic interval i . To determine

the minimum finish time, we use a linear programming (LP) model to determine, for

a specified basic interval i , the minimum time within this basic interval by which it is

possible to complete all file transfers in the given request set F . This LP model will have

no feasible solution for basic intervals i if it isn’t possible to complete the file transfer

by time Ti+1. In this case, minFinishTime must lie in a basic interval q > i . Suppose

the value of LP’s objective function ft is a valid time within the basic interval i . Then

all the jobs in the batch F can be finished by ft. Now, Ti ≤ ft ≤ Ti+1. If ft > Ti ,

minFinishTime = ft. However, when ft = Ti , it is possible to complete the file transfers in

an interval q < i . So, using the LP model, we can conduct a binary search over the basic

intervals to determine the value of minFinishTime.

Equations 3–2 through 3–8 give our LP model to find minFinishTime within

[Ti ,Ti+1).

45

min ft (3–2)

subject to∑

k:(l ,k)∈E

f jlk(q)−∑

k:(k,l)∈E

f jkl(q) = 0

∀j ∈ F ,∀l ∈ V , l ̸= sj , l ̸= dj , 0 ≤ q ≤ i (3–3)i∑

q=0

(∑

k:(l ,k)∈E

f jlk(q)−∑

k:(k,l)∈E

f jkl(q)) = fj if l = sj

−fj if l = dj

∀j ∈ F (3–4)

∑j∈F

f jlk(q) ≤ blk(Tq) ∗ (Tq+1 − Tq),∀(l , k) ∈ E , q < i (3–5)∑j∈F

f jlk(i) ≤ blk(Ti) ∗ (ft − Ti),∀(l , k) ∈ E (3–6)

f jlk(q) ≥ 0, [Tq,Tq+1] ⊆ [Sj ,Tq+1],∀(l , k) ∈ E ,∀j ∈ F (3–7)

f jlk(q) = 0, [Tq,Tq+1] * [Sj ,Tq+1],∀(l , k) ∈ E ,∀j ∈ F (3–8)

In this formulation, ft ∈ [Ti ,Ti+1) denotes the time by which all file transfers

complete. f jlk(q) is the amount of file transferred for request j ∈ F on link (l , k) ∈ E in

the basic interval q. blk(q) is the bandwidth available on link (l , k) in the basic interval

q. Equation 3–3 ensures that for each transfer request j ∈ F , for each node l that is

neither the source nor the destination node, and for each basic interval q, 0 ≤ q ≤ i , the

amount of file j that leaves node l equals the amount that enters this node; i.e., nodes

other than the source or destination may not create or store data and data cannot be

buffered at these nodes for transfer in later basic intervals. Equation 3–4 requires the

source node of request j to send a net fj units of file j out over all permissible basic

intervals and requires the destination node to receive a net fi units. Equations 3–5 and

3–6 ensure that the amount of traffic on each link in each basic interval does not exceed

the available capacity of any link in any basic interval. Equation 3–7 ensures that file

46

transfer amounts are non-negative in permissible basic intervals and Equation 3–8

ensures that the file transfer amounts are 0 in non-permissible basic intervals.

One can verify that each solution to Equation 3–3 through 3–8 defines a valid file

transfer schedule for all requests in F and that the finish time of this schedule is at most

ft. Further, the inclusion of Equation 3–2 determines the minimum finish time under the

constraint that no file transfer may take place in intervals q > i . Also, Equations 3–3

through 3–8 have no feasible solution iff the file transfers cannot be scheduled so as to

complete by time Ti+1.

As noted above, since fractional flow is allowed, minFinishTime is polynomially

solvable[6]. A binary search over the basic intervals is needed to determine the interval

where minFinishTime is located and also exact value of minFinishTime. This requires us

to solve O(logB) LPs, where B is the number of basic intervals.

Although the f jlk(q)s that determine minFinishTime define a file transfer schedule

that achieves this finish time, these f jlk(q)s may define a transfer schedule that includes

cycles. That is, we have portions of a file being moved from node a to node b and

back to node a, for example, in the same basic interval. While these cyclic flows do not

negatively impact the overall finish time, they affect available bandwidth capacity and so

negatively impact our ability to schedule file transfers in future periods.

In Step 2, we overcome the deficiencies of the file transfer schedule obtained

from Step 1 by using a slightly different LP formulation that is given by Equations 3–9

through 3–14. In this formulation, we minimizes the sum of the f jlk(q) values across all

basic intervals. The value U = minFinishTime computed in Step 1 is used to limit the

file transfers’ start and end times. We also use i to denote the basic interval for which

Ti ≤ minFinishTime ≤ Ti+1. It is obvious that the solution to Equations 3–10 through

3–14 may contain no cycle, or it can not be optimal, since we can always remove cycles

and produce a better solution.

47

min∑j∈J

∑(l ,k)∈E

i∑q=0

f jlk(q) (3–9)

subject to∑

k:(l ,k)∈E

f jlk(q)−∑

k:(k,l)∈E

f jkl(q) = 0

∀j ∈ F ,∀l ∈ V , l ̸= sj , l ̸= dj , 0 ≤ q ≤ i (3–10)i∑

q=0

(∑

k:(l ,k)∈E

f jlk(q)−∑

k:(k,l)∈E

f jkl(q)) = fj if l = sj

−fj if l = dj

∀j ∈ F (3–11)

∑j∈F

f jlk(q) ≤ blk(Tq) ∗ (Tq+1 − Tq),∀(l , k) ∈ E , q ≤ i (3–12)

f jlk(q) ≥ 0, [Tq,Tq+1] ⊆ [Sj ,U],∀(l , k) ∈ E ,∀j ∈ F (3–13)

f jlk(q) = 0, [Tq,Tq+1] * [Sj ,U],∀(l , k) ∈ E ,∀j ∈ F (3–14)

We note that while the LP of Equations 3–2 through 3–8 is solved O(logB) (B is the

number of basic intervals) times, the LP of Equations 3–9 through 3–14 is solved only

once as a minimum-cost flow problem. Using the Successive Shortest Path algorithm[6],

this flow problem can be solved with O(E ∗ log(U)), where E is number of links in

the network while U is the largest amount of flow. The just described two-step batch

scheduling algorithm is referred to as algorithm All-Batch.

3.3.2.1 N-Batch heuristics

As we will see in Section 3.3.4, the computing time required to compute the optimal

schedule using algorithm All-Batch is very high. One way to decrease the computation

time is the divide the set of file transfers into smaller batches of size N and process them

one by one. When N > 1, the corresponding heuristic is called N-Batch. The solution

for N-Batch is as follows. The batches are processed one at a time sequentially in an

increasing order of the batch’s collecting times. When computing the optimal schedule

for a given batch, the start time for that batch is given by the end time (the time of the

48

last scheduled job) of the previous batch. The overall finish time is the finish time of the

last batch scheduled.

As we use the greedy approach to process all the requests, one of the key issues

is to decide the greedy selection criterion. [9] suggested that Largest File First(LFF)

is a reasonable and quite effective heuristic to select the request(s) greedily. This

approach is based on the intuition that the larger files will take more time to transfer.

When scheduling the largest N files first, the larger files are given more priority in the

the resource contention, which results in a potentially earlier finish time for the large file

transfers. Since these long transfers actually determine the overall finish time often, this

heuristic is expected to improve the overall finish time. This expectation is borne out in

our experimentaal evaluation and the observed finish times are close to those of the

optimal solutions generated by All-Batch.

3.3.3 Online Scheduling Algorithms

When the batch size equals 1, the BFPSP turns into an instance of Online

Scheduling, where all file transfers are scheduled one by one without using any

knowledge of the transfers scheduled later in the sequence.

We propose six online file transfer scheduling algorithms. In Sections 3.3.3.1 and

3.3.3.2 , we describe the GOS and GOS-E algorithms. The remaining four algorithms

are variations of the k-Path heuristics and are described in Section 3.3.3.3. The greedy

algorithm, GOS, employs network flows to minimize the finish time of each single file

transfer being scheduled. GOS-E considers the path switching overhead. The 4 k-Path

heuristics use the the k-shortest paths or k-disjoint paths to compute the schedule on a

smaller network than the original one. These adaptations reduce the complexity of the

scheduling algorithm, but yield little in maximum finish time.

3.3.3.1 Greedy algorithm

Our greedy online algorithm, GOS, schedules a file transfer (si , di , fi ,Si) by

examining the basic intervals in the network’s current global time list in increasing

49

order. The examination begins with the basic interval that includes the time Si . In each

examined interval, we transfer as much of the file as is possible. This maximum amount

can be determined using a max-flow algorithm (see [6], for example). The examination

of basic intervals stops when all fi bytes of the file have been scheduled. Figure 3-13

gives the procedure of our greedy online algorithm, GOS, to schedule the i th request.

In the specification of this algorithm, we construct a reduced graph N from G with only

the links have some available bandwidth in current basic interval. we use the term max

flow links to denote those edges of N that have a non-zero flow in the max flow solution

for N. Also, note that maxFlow may be zero in some basic intervals and care needs to be

taken when programming algorithm GOS to avoid a divide by zero error when computing

rfs/maxFlow .

Theorem 3.1. If G has a path from si to di , then Algorithm GOS schedules the i th file

transfer request (si , di , fi ,Si) so as to complete at the earliest possible time.

Proof. From the following facts (a) G has a path from si to di , (b) the rated capacity of

each link of G is more than 0, (c) the last basic interval of the global time list always

extends to ∞, and (d) the available bandwidth of each link is its rated capacity during

this last basic interval, it follows that the max flow from si to di in the last basic interval is

non-zero and so the remaining file size rfs can always be scheduled for transfer in this

last basic interval. Hence, GOS is able to schedule every file transfer request.

Let the finish time of a file transfer schedule constructed by GOS be ft. Note that

ft is the value of maxTime when GOS terminates. We show, by contradiction, that ft

is the earliest possible time at which this file transfer can complete. Suppose there is

another transfer schedule, S , for the same request that completes the transfer by time

ft′ < ft. Let q be such that Tq ≤ ft < Tq+1 (all global time references in this proof

are to times as relabeled by GOS) and let q′ be such that Tq′ ≤ ft′ < Tq′+1. Note that

q′ ≤ q. If q′ < q, then there is a basic interval u < q such that the amount of fi scheduled

for transfer in interval u by schedule S is more than that scheduled for transfer in u by

50

GOS(i , G ){

Construct the current global time list T from theTB lists;

Delete from T all Ti ≤ Si ;Insert Si and ∞ into T and relabel the members of T in

ascending order beginning with the label T0;rfs = fi ; //remaining file sizej = 0; //basic interval index;while (rfs > 0){

Let N be the network derived from G by assigningto each link a capacity equals to its availablebandwidth in the basic interval [Tj ,Tj+1),

Remove the links with 0 capacity from N;maxFlow = Max flow from si to di in N;maxTime = min{Tj + rfs/maxFlow ,Tj+1};size = (maxTime − Tj) ∗maxFlow ;Schedule the transfer of size bytes from Tj to

maxTime using the max flow links;Update the TB lists of the max flow links;rfs − = size;j ++;

}

}

Figure 3-13. Greedy online scheduling algorithm GOS

the GOS schedule. This isn’t possible since the GOS schedule transfers the maximum

possible amount in each basic interval prior to q. If q′ = q, then since ft′ < ft, the

amount scheduled for transfer by S from Tq to ft′ is less than that scheduled for transfer

by the GOS schedule from Tq to ft, or the flow used by GOS schedule after Tq could

not be the maximum flow. Hence, there must be a basic interval u < q = q′ in which

more of fi is scheduled for transfer by S than by the GOS schedule. As noted earlier, this

isn’t possible. Hence, there is no transfer schedule S with ft′ < ft.

The complexity of algorithm GOS is determined by the complexity of the max flow

algorithm that is used as well as by the number of basic intervals in the global time list.

The complexity of the push-relabel max flow algorithm described in [6] is O(n3), where

51

n is the number of vertices in the network flow graph. For networks with few edges, the

sparse graph network flow algorithm of Sleator and Tarjan (see [6], for example) may be

used. The complexity of this algorithm is O(nm log n), where m is the number of links

in the network. When scheduling the i th file transfer, the size of the global time list is at

most 2i , since each previously scheduled request will increase the size of global time

list by at most 2: job’s start and end time. So, the complexity of GOS is O(n3i) when the

push-relabel max flow algorithm is used and O(nmi log n) when the sparse graph max

flow algorithm is used. Since typical computer networks are generally sparse and have

only O(n) links, using the sparse graph max flow algorithm results in a complexity of

O(n2i log n) for GOS.

3.3.3.2 Greedy scheduling with finish time extension(GOS-E)

In the GOS algorithm, the network flow is computed for each basic interval. This

implies that, for a file transfer that lasts n consecutive basic intervals, up to n establishing

and tearing down operations on flow path would take place in the network. Moreover,

given the fact that multi-paths are required for each flow, the path switching overhead

would significantly affect the GOS’s performance in practice.

To decrease the switching overhead, GOS-E is proposed to reduce the number of

path switchings by reducing the total number of basic intervals in the network. In GOS-

E, we tried to extend the current job’s finish time to the the end of nearest later basic

interval ti , if ti is not too far away from ft, which is the earliest finish time computed from

GOS. The extension can be done by either directly over-reserving the bandwidth in the

last basic interval involved in the file transfer according to the original reservation plan

from GOS, or reduce the amount of required bandwidth in the last interval to cope with

the longer transfer time, which will not waste bandwidth. As bandwidth resources are

limited in our scenario, we take the second approach. The extension scope should be

limited to a certain range so that the performance on a single file transfer is not greatly

affected.

52

With GOS-E, we are able to eliminate the small basic intervals by merging them into

the previous large intervals by reducing its link bandwidth. As small intervals generally

perform smaller amount of file transfers than large intervals, this merging process

actually costs little additional network throughput but provides the potential to reduce the

overall path switching overhead. In the evaluation section, we will test this heuristic and

compare with the original GOS algorithm, Also, the relationship between the extension

scope and algorithm performance is discussed.

3.3.3.3 K-Path algorithms

Another approach to accelerate the algorithm is to reduce the problem size. We

incorporate the idea behind KSP[69] and KDP[45] scheduling into algorithm GOS so as

to compute the max-flow in a reduced network. In the KSP and KDP adaptations, when

scheduling the request (si , di , fi ,Si), we limit our resource allocations to a subgraph

defined by the k paths from si to di . In the case of the KDP adaptation, since the k paths

are disjoint, the max flow from si to di in any basic interval is easily seen to be the sum

of the minimum available capacity of a link on each of the k paths. So, we avoid running

a complex network flow algorithm to determine the max flow. In the case of the KSP

adaptation, since the paths are not disjoint, we still need to run the GOS algorithm on

the network formed by these k paths. However, since the size of the network being

considered is smaller, run time is reduced.

For both the KSP and KDP adaptations, we define a static and a dynamic variant.

In the static variant the cost of a link is defined to be its rated capacity (alternatively,

some other non-changing cost may be assigned) and the k paths between every pair

of nodes, whether disjoint or not, are computed once at the first time a request arrives

for this pair of nodes and use directly for the scheduling request between the same

source/destination pair afterward. In the dynamic variant, links are assigned a cost each

time a scheduling request arrives and the k shortest paths to use are computed using

these newly assigned link costs. The cost assigned to a link in the dynamic variant is

53

proportional to the fraction of its rated capacity that has been committed from the current

time to the finish time of the last finishing file transfer so far scheduled in the network. In

both static and dynamic variants, the length of a path is the sum of the link costs. The

static and dynamic variants of the KSP and KDP adaptations of GOS are referred to as

KSP-S, KSP-D, KDP-S, and KDP-D, respectively.

3.3.4 Experimental Evaluation

3.3.4.1 Experimental framework

In this section, we measure the performance of the batch scheduling algorithm in

Section 3.3.2 and online scheduling algorithms in Section 3.3.3. For our experiments,

we used the MCI network and random topologies that generated at run time.

File transfer requests were synthetically generated. Each request is described by

the 4-tuple (source node, destination node, file size, request start time). The source

and destination nodes for each request were selected using a uniform random number

generator. The file size is also uniformly distributed between 10GB and 100GB. The

earliest time at which a file transfer can start followed a Poisson distribution and

the request arrival rate (request density) varied from 0.05 requests/time unit to 10

requests/time unit. Our experiments started with a clean network (i.e., no existing

scheduled transfers) and simulated the job arrival process for 100 time units. So, for

example, with a request density of 5 requests/time unit, one run of our experiment would

process approximately 500 requests.

We used the max finish time (MFT), i.e. the time when all file transfers in the

sequence finish as the performance metric. The execution time of an algorithm is

measured in seconds. For GOS-E, the extension scope is set to be either 2%, 5% or

10% of the current file transfer’s duration. Suppose that the file transfer Ji , with start

and end time Si and Ei , respectively, is being scheduled. Suppose that Ei ’s nearest

future basic interval ends at T . Then, the extension of Ei to T is performed when

(T − Ei) ≤ (Ei − Si) ∗ ExtensionScope.

54

For the KSP and KDP variants of GOS, we set k , the number of paths, to 16. This

setting is consistent with the results of [45], which show that not much improvement can

be obtained from using a higher k , and is also consistent with our own experiments on

scheduling file transfers that indicate 16 to be a good choice for k .

3.3.4.2 Single start time scheduling(SSTS)

A special case of BFPSP has all files ready for transfer at the same time, which

means all requests in the batch have the same Si . This speciaal case arises, for

examp[le, when all file tranfer requests originate from a single group of users. For this

experiment, we set Si = 0 for all transfer requests. The number of files to be scheduled

varies from 200 to 1000. The size of the random network varies from 100 nodes to 500

nodes.

Figure 3-14. Comparison of different algorithms’ MFT for different number of files in MCIusing SSTS.

Figure 3-15. Comparison of different algorithms’ MFT for different number of files in 100nodes random topology using SSTS.

55

Figure 3-16. Comparison of different algorithms’ MFT random Topologies of differentsize using SSTS.

Figures 3-14 and 3-15 show the maximum finish time of the schedules for various

number of files on the MCI network and a 100-node random network, respectively. All

the algorithms proposed in Sections 3.3.2 and 3.3.3 are compared except for GOS-E.

The main objective of GOS-E is to reduce the path switching overhead. This is not

addressed by the other algorithms. Figure 3-16 shows how the scheduling results

change as network size increases and the number of requests is fixed at 400. We make

the following observations.

a. Batch scheduling performs better than online scheduling in all cases. The largerthe batch size, the better the relative performance of batch scheduling. In largenetworks like random-100, larger batch size has significantly larger improvementson the overall performance; however, in small networks like MCI, the impact ofbatch size is relatively small.

b. When batch size is 50, N-Batch performs as well as the optimal batch schedule inmost cases. GOS has the best performance among online scheduling algorithmsin all cases. In small networks, GOS performs almost as well as the optimalsolution. In large networks, the improvement acheived by using All-Batch is usuallyno more than 5% over GOS.

c. Among the 4 k-Path heuristics, two KDP heuristics perform better than the KSPheuristics and dynamic link cost can improve the performance. This can bepartly attributed to the congestion avoidance mechanism that dynamic link costprovided. Without doubt, KDP-D provides the best performance among the 4k-Path heuristics. Although not comparable to GOS algorithm for large networks,their performance gap is small enough to be acceptable for small topologies.

56

d. The MFT increases as the number of files increases. The amount of this increaseis faster in small networks than in large ones, as small networks are becomecongested and fully loaded sooner. On the other hand, MFT decreases as thenetwork size increase, as the network provides more bandwidth resources in aglobal view.

In summary, the All-Batch’s advantage over N-Batch and GOS is not obvious,

especially in small networks. When the batch size N reaches 50, the performance gap is

relatively small and can be ignored in many practical scenarios.

Figure 3-17. Comparison of different algorithms’ execution time on different numbers offiles on MCI network using SSTS.

Figure 3-18. Comparison of different algorithms’ execution time on different numbers offiles on 100 nodes random topology using SSTS.

We were motivated to explore online scheduling heuristics by the desire to reduce

the computing time required by the scheduler. Our heuristics solve an easier max-flow

problem rather than the complex and time consuming LP formulation solved by All-

Batch. Figures 3-17 and 3-18 show the execution time for the MCI network and for a

57

random network with 100 nodes, respectively. The horizontal axis is the number of files

to be scheduled. The execution time is measured in seconds.

In all cases, the execution time of our online algorithms is much less than that

of the batch algorithm. In small networks, the scheduling time is acceptable for all

algorithms. Less than 2 minutes are taken to obtain the optimal schedule in MCI for

1000 file requests; the online algorithms take several seconds. In large networks, our

online heuristics are dramatically faster than the bacth scheduling algorithm.

Figure 3-19. Comparison of different algorithms’ MFT execution time on randomtopologies with different size using SSTS.

Figure 3-19 shows the algorithms’ execution time for various network sizes.

During the test, we scheduled 400 jobs using SSTS. We observed that, although

every algorithms’ execution time increases with network size, the time required by

All-Batch and N-Batch actually increase much faster than that required by our online

algorithms. This is due to the lower complexity of the online algorithms. In the 500 node

topology, GOS only takes several minutes, but All-Batch takes about 2 hours to compute

the optimal schedule, which exceeds the actual file transfer time in our experiments.

3.3.4.3 Multiple start time scheduling(MSTS)

BFPSP actually does not require all the file transfers to start at the same time.

When jobs start at various times, the total traffic load is expected to be less intensive

than the SSTS case, since the file transfers are less overlapped due to the difference in

their start time.

58

Figure 3-20. Comparison of different algorithms’ MFT for different number of files in MCIusing MSTS.

Figure 3-21. Comparison of different algorithms’ MFT for different number of files in 100nodes random topology using MSTS.

Figures 3-20 to 3-25 give the experimental results using MSTS. Figures 3-20,

3-21 and 3-22 show how the maximum finish time changes with number of requests

and network size. Figures 3-23, 3-24 and 3-25 show the algorithms scalability with

increasing number of requests or network size. The observations from MSTS are similar

Figure 3-22. Comparison of different algorithms’ MFT in random topologies of differentsize using MSTS.

59

Figure 3-23. Comparison of different algorithms’ execution time in MCI network usingMSTS.

Figure 3-24. Comparison of different algorithms’ execution time in 100 nodes randomtopology using MSTS.

to those for SSTS. The batch algorithms outperform online ones in terms of MFT value,

but require significantly longer computation time. The batch algorithms also show worse

scalability with both network size and number of requests. GOS performs better than all

other online heuristics and provides very good schedules with very small execution time.

Figure 3-25. Comparison of different algorithms’ execution time on random topologieswith different size using MSTS.

60

3.3.4.4 GOS v.s. GOS-E

When path switching overhead is considered, the GOS’s potential to switch path

for every basic interval can possibly become its major drawback. This problem is more

severe when the file is transferred using multiple paths. In this section, GOS-E is

evaluated and compared with the original GOS algorithm.

When performing the test, one of the important issues is to simulate the path

switching overhead. From the previous section, we know that path switching happens

only between two adjacent basic intervals. So we add a lag of tl in our simulation

between each basic interval to represent the delay due to path switching. In our

experiments, we set the delay to be 1 second for establishing and tearing down an

optical path. That is, if the schedule for a certain file transfer changes its routes m times

during the transfer, a delay of 2 ∗m seconds will be added to its finish time.

Figure 3-26. Comparison on number of Max-Flows is computed by GOS-E and GOS in100 node random network.

The test is performed on GOS and three GOS-E variants with different extension

scopes: 2%, 5% and 10%, which means GOS-E can search for the next existing basic

interval and extend the current schedule in the range of 2%, 5% and 10%, respectively.

Figures 3-26 and 3-27 show the MFT performance for all four algorithms, with and

without path switching delay counted. We can see that when the path switching delay

is not accounted for, the schedules generated by GOS-E take more time to complete

the file transfers, but when the path switching delay is accounted for, GOS-E actually

61

Figure 3-27. Comparison on execution time used by GOS-E and GOS in 100-noderandom network.

Figure 3-28. Comparison on number of Max-Flows is computed by GOS-E and GOS in100 node random network.

outperforms GOS. We also notice that a larger extension scope does not imply better

MFT, as GOS-E with 5% extension scope generates better schedules than with an

extension scope of 10%. Although the switching overhead can be reduced using a larger

Figure 3-29. Comparison on execution time used by GOS-E and GOS in 100-noderandom network.

62

extension scope, the actual file transfer time increases, which compromises the benefit

of fewer path switchings.

Figure 3-28 compares the average rounds of max-flow computation for each

request in GOS-E with GOS, while Figure 3-29 compares their execution times. These

results show that bacause of the reduction in the number of max-flow computation

rounds, the execution time for GOS-E is lower as compared to the GOS algorithm. Also,

the execution time is further reduced with larger extension scopes albeit at a price of

longer MFT. However, in large networks, this degradation is considerably small. This

makes GOS-E more attractive for large networks.

3.4 General Network Scheduling Algorithms Summary

In this chapter, we have categorized the different bandwidth algorithms that have

been proposed for in-advance scheduling by both the problem being addressed (fixed

slot, maximum bandwidth in slot, maximum duration, first slot, all slots, all-pairs all-slots).

In addition, we have proposed several new algorithms (SWFP, SDFP, DAFP, kDP, and

kSP) for the fixed-slot problem that are adaptations of algorithms proposed earlier for

on-demand scheduling. Although the DAFP algorithm proposed by us is an adaptation

of the dynamic adaptive path algorithm proposed for on-demand scheduling in [40],

DAFP guarantees to find a feasible path whenever such a path exists whereas the

dynamic adaptive path algorithm of [40] does not provide such a guarantee. We have

conducted extensive experiments with the various fixed-slot algorithms for in-advance

scheduling. Our experiments indicate that the minimum-hop feasible path algorithm

proposed by us in [52] is the best (in the sense of maximizing network utilization) of

these on large networks. For networks with a small number of nodes (say 20 or less),

the DAFP algorithm proposed in this paper is best. From the standpoint of algorithmic

complexity, MHFP is considerably faster than DAFP.

We have also developed several multi-path reservation algorithms for in-advance

scheduling of single and multiple file transfers in connection-oriented optical networks.

63

A novel two-step solution, All-Batch, has been developed to compute schedules with

minimum finish time (i.e., optimal schedules). An N-Batch heuristic was developed

to enable batch scheduling in more realistic scenarios. We also proposed a new

max-flow based greedy algorithm (GOS) and four variants of k-path algorithms to

reduce computation time. These heuristics schedule an individual file transfer to

complete at the earliest possible time. Extensive simulations using both real world

networks and random topologies show that GOS presents a good balance among

maximum finish time, mean finish time, and computation time. Further reduction in

computation time by sacrificing maximum finish time may be obtained using our k-Path

variants. Of these, KDP-D works best. When path switching overhead is considered,

GOS-E provides good performance.

In the future, it would be useful to explore BATCH algorithms that also incorporate

the mean finish time in the optimization metric. Also, when switching the file transfers to

the scenario of On-demand scheduling, the delay caused by the batching process must

also be taken into account. Therefore studying the tradeoff between the batch size and

the delay incurred is of interest.

64

CHAPTER 4SCHEDULING IN OPTICAL NETWORKS


Dedicated connections are needed to effectively support a variety of geographically

distributed application tasks. Many high speed networks that provide dedicated

connections are based on optical interconnects and optical switches. For these

networks, the bandwidth along a given link can be decomposed into multiple wavelengths.

The bandwidth scheduling and path computation problem in the context of optical

networks is usually called RWA (Routing and Wavelength Assignment) [71]. Algorithms

for RWA may have to adhere to one or more of the following constraints:

1. Wavelength continuity constraint: This constraint forces that a single lightpath mustoccupy the same wavelength throughout all the links that it spans. This constraintis not required when an optical network is equipped with wavelength converters.When such converters are present, the network is called wavelength convertiblenetwork. The algorithms presented in this paper assume that either wavelengthconversion is available at all switches or not available at any switch.

2. Wavelength sharing constraint: For many deployments, it is most effective toconsider the bandwidth on a link as consisting of integer multiples of wavelengthand a single wavelength as a unit for assignment i.e., one wavelength is occupiedby only one reservation at a certain point of time. The algorithms in this paperassume that this constraint needs to be satisfied. It is worth noting that techniquesbased on Time Division Multiplexing (TDM)/Wavelength Division Multiplexing(WDM) [73] allow for decomposing the bandwidth on a wavelength.

The Enlightened project [18] has developed several routing algorithms for optical

networks assuming wavelength convertibility and no wavelength sharing constraints.

They are developed using the Flexible Advance Reservation Model (FARM)[58] that tries

to reduce the blocking probability of requests by assigning a scheduling window for each

request [27]. The algorithms developed in the Enlightened project can be termed as k

Dynamic Paths (kDP) algorithms according to the classification of [32]. These algorithms

do not guaranteed to find a feasible path whenever such a path is present.

The wavelength assignment problem is a relatively orthogonal problem from the

routing problem and many heuristics have been developed for its solution[71]. For our

65

experimental comparisons, we use the wavelength assignment policy as proposed for

that given routing algorithm.

In Section 4.2, we extended the Extended Bellman-Ford (EBF) algorithm to

incorporate the wavelength sharing and wavelength continuity constraints. We also

propose modified versions of the algorithms in the Enlightened system for continuous

time model. These algorithms are called Modified Switch Path First (MSPF) and

Modified Switch Window First (MSWF). MSPF tries to find the earliest path within the

scheduling window, while MSWF tries to find the shortest path within the scheduling

window. Moreover, a deferred wavelength assignment strategy is presented. This

strategy only counts the number of wavelengths that are used on a link. The actual

assignment of the wavelength is done when the request is actually fulfilled.

Although the design and implementation of a full-conversion scheduler is relatively

straightforward, the high cost and the added latency introduced by wavelength

converters make sparse-conversion more attractive in practice. Existing research on

On-Demand scheduling has shown that wavelength converters have the potential

to improve blocking performance significantly and that it is necessary only for a

relatively small fraction of the nodes to have a wavelength converter to achieve blocking

performance comparable to that of full wavelength conversion [14, 35, 56]. Since

on-demand scheduling reserves a path for a fixed time slot, this result applies directly to

the Fixed-Slot scenario.

In Section 4.3, we present a new network model that that can emulate the

existing full-conversion algorithms when only a subset of nodes have a wavelength

converter. We demonstrate the utility of this approach using the Extended Bellman-

Ford(EBF) and k-Alternative Path(k-Path) algorithms. We evaluate the algorithms on 3

performances metrics: blocking probability, average start time and scheduling overhead.

Blocking probability, which measures the ratio of blocked requests to the number of

scheduled requests, is the primary metric used to evaluate a scheduling algorithm.

66

Average start time, which presents how early the requested lightpath is available, is of

special importance in the First-Slot scenario. Scheduling overhead, which compares

algorithms according to their computation costs, is an important metric for the algorithms

practicality.

4.2 Scheduling in Full Wavelength Conversion Network

4.2.1 Problem Definition

An optical network topology is represented as a graph G = (V ,E ,W ) where V

is the set of nodes, E is the set of links and W is the set of wavelengths supported by

each link. An in-advance reservation request for a lightpath can be made between any

two nodes on G . Algorithms for RWA may have to adhere to the wavelength continuity

and wavelength sharing constraints as described in the introduction. We use the Flexible

Advance Reservation Model (FARM). This model tries to reduce the blocking probability

of requests by assigning a scheduling window for each request [27].

In this section, we address the following queries:

1. Find the least time t within the scheduling window for which there is a path withan available wavelength from the source s to the destination d from time t to timet+dur and reserve such a path. This is a variation of the First-Slot Problem in [52].

2. Find the shortest path with an available bandwidth from a source s to a destinationd within the scheduling window and reserve such a path.

Each request R for a single lightpath is defined as follows: R = [s, d , dur , start, end ],

where s is the source node of the lightpath, d is the destination node of the lightpath,

dur is the reservation duration, and start and end are the start time and end time of

the scheduling window respectively. The scheduling window must be larger than the

reservation duration d . The scheduler must check if a path is available during any

possible interval in the scheduling window. In slotted time model, the intervals will be

[start + t, start + t + dur ] where t = 0, 1, 2, ...., end − start − dur . In the continuous time

model, a discrete sliding approach may miss intermediate start times.

67

The algorithms presented in [58] do not guarantee to find a feasible solution even if

a solution is present. The algorithms developed in this section provide such guarantees.

We compare the time requirements and the effectiveness of these algorithms in the

following sections. The routing and the wavelength assignment portions of these

algorithms are presented in separate subsections.

4.2.2 Routing Algorithms

The goal of the first query is to find the path with the least start time during a

scheduling window, while the goal of the second query is to find the path with shortest

distance/hop during a scheduling window. We develop two algorithms - Modified Switch

Path First (MSPF) and List Sliding Window (LSW) - for answering the first query. We

also develop two algorithms - Modified Slide Window First Algorithm (MSWF) and

Extended Bellman-Ford Algorithm (EBF) for answering the second query. These are

described in detail in the rest of this section.

The algorithms developed in [58] can be termed as k Dynamic Paths (kDP)

algorithm according to the classification of [32]. These algorithms check k dynamic

paths based on the current network status and test them for their feasibility. If more

than one paths are found to be feasible, ties are broken appropriately. These algorithms

do not guarantee to find a feasible solution even if a solution is present. Experimental

results [58] showed that when the link costs are dynamically updated to incorporate

current allocation and provide Load Balancing (LB), the blocking rate is significantly

reduced. The Load Balancing (LB) scheme[58] assigns the cost of a link with no

reservation to be equal to 1. This cost is incremented as additional reservations are

assigned to the link. The two versions based on load balancing using the different

approaches to choose between multiple paths are called Switch Path First (LB-SPF)

and Switch Window First (LB-SWF) algorithm. LB-SPF tries to find the path starting

earliest within the scheduling window, while LB-SWF tries to find the shortest path within

the scheduling window. We have extended these algorithms for the continuous time

68

model to provide comparisons with our algorithms and are called MSPF and MSWF

respectively.

Modified Switch Path First Algorithm (MSPF). For the slotted time model, this

algorithm starts at the beginning of the scheduling window. It computes the shortest

path using Dijkstra’s algorithm. If this path is feasible, it assigns an available wavelength

along the path; else, it deletes the busy links on the path and recomputes the shortest

path. For a given fixed reservation interval [tstart , tstart + duration], this shortest path

computation has to be repeated at most k times. If no feasible path is found, the

reservation interval is incremented by one slot until the interval’s end time equals the

scheduling window’s end time. To adapt this approach to the continuous time model, we

advance to the next start time in the ST list instead of sliding the reservation interval by

one time slot. The modified algorithm is as follows:

Step 1: Sort all a values in the ST list of each link in the network. For each ai in the

sorted list, repeat Steps 2 and 3 at most k times.

Step 2: Compute the shortest path based on current link costs. Verify feasibility for

reservation interval starting at time ai .

Step 3: If the path is feasible, stop the algorithm and return this path; else delete all the

busy links on the path.

Modified Switch Window First Algorithm (MSWF). The original SWF algorithm

is similar to SPF except that it slides the reservation interval before switching the path.

This requires a check on all the possible reservation intervals within the scheduling

window for one path before checking the next path. For the continuous time model,

modified algorithms can be implemented in several ways. One possible variation is as

follows:

The intersection of all the ST lists on a path can be used to derive the feasibility of

a given path. If the path is feasible, an earliest start time can be easily derived by using

69

the smallest number in the intersected list. The algorithm attempts at most k different

paths and check all the possible start time for each path as follows:

Step 1: Compute the shortest path and it associated ST lists. Verify the path by

intersecting all these ST lists.

Step 2: If intersected ST list is not empty, return this path with first possible start time,

else remove the busiest link during the window.

Step 3: Repeat step 1 and 2 for k times. If none of the path is feasible, reject the

request.

If the network is equipped with wavelength converters, one single ST list can be

managed per one link. For networks with no wavelength converters, a single path has

to be treated as multiple sub-paths corresponding to each wavelength and repeat the

process over all sub-paths.

The MSPF and MSWF algorithms are not guaranteed to find a solution even if one

exists. They only try k dynamic paths, where k is arbitrarily specified in advance. In the

following, we describe two algorithms List Sliding Window (LSW) and Extended Bellman

Ford (EBF) that are guaranteed to find a feasible path if one exists.

We also modify two algorithms from previous chapter to provide a guarantee on

finding a feasible path if one exists:

1. List Sliding Window (LSW) The LSW algorithm [32] finds the path with theleast start time. This algorithm sorts all the a values in ST lists of each link in thenetwork. It determines the smallest ai for which a feasible path is available byscanning this list sequentially. The search for a feasible path uses a breadth-firstsearch [32]. The execution time can be improved by saving and utilizing thebreadth first search computations from the prior start times. For example, thebreadth first search for ai can use the search computations required for the breadthfirst search of ai−1 utilizing the following observation: The breadth-first searchmust scan the ST list of each wavelength on each link that is traversed duringthe search, this scan may begin where the most recent scan of this list (from thebreadth-first search for an earlier ai ) was completed.

2. Extended Bellman-Ford (EBF) This algorithm finds the minimum hop pathwithin a scheduling window. It was originally proposed for the First Slot andAll-Available Slots problems in [52]. We develop a modified version of the original

70

EBF algorithm so that it can stop as soon as the ST list at the destination nodeis not empty. It also extends the notion of an ST list for a link to incorporatelightpaths. The resulting start time corresponds to the solution with minimum hoppath.

4.2.3 Wavelength Assignment Algorithms

Once a path is found, the wavelength assignment algorithm is applied. This

algorithm is relatively orthogonal to the routing algorithm although it shares the current

reservation information with it. When no wavelength conversion is allowed, we extended

each algorithm by applying the routine to each wavelength sequentially i.e. a First-Fit

wavelength assignment scheme is used.

When wavelength conversion is allowed, flexible heuristics can be designed since

we can choose any available wavelength in a link. Several metrics such as min-leading

or min-trailing gap[58] can be used in addition to path-wide metrics. To pick up the most

appropriate wavelength among possible wavelengths, the Min-Leading-Gap strategy

bases its decision on the leading gap between the new and previous reservations. We

use the Min-Leading-Gap strategy[58] for the MSPF and MSWF algorithm; we always

choose the one that produces the minimum leading gap with the assignment of the

current task. This strategy was shown to have the best performance in [58] for link

utilization and acceptance ratio.

For LSW and EBF algorithms, we propose a new mechanism called deferred

wavelength assignment. This mechanism defers assigning a specific wavelength at

reservation time. A deferred strategy only counts the number of wavelengths that are

used on a link. The actual assignment of the wavelength is done at the time of the

request is actually fulfilled. A deferred wavelength strategy can be shown to always

guarantee a feasible solution as long as the total amount of reserved bandwidth does

not exceed the total capacity of bandwidth of a link at a specific point of time. This

alleviates the need to keep track of bandwidth allocation status of each wavelength.

Only, a count needs to be maintained for each link.

71

Req ID Source Destination Start Time End TimeR1 n0 n1 t0 t2

R2 n0 n1 t4 t6

R3 n0 n1 t0 t3

R4 n0 n1 t2 t5

R5 n0 n1 t2 t5

Figure 4-1. A request table with 5 requests

An algorithm based on the left edge algorithm for channel assignment in the VLSI

routing context [26] can be used for this purpose. Although, this algorithm was for batch

assignment, it may be adapted to our context as below:

1. Assign jobs so long as no link has more than k jobs assigned at any time, where k

is the number of wavelengths.

2. At current time t, assign each job that begins at time t to any one of the availablewavelength on the path reserved for this job; the wavelength is the same for eachlink on the reserved path. Consider any link e. If the link has q jobs scheduled tostart at t, it can have at most k − q jobs continuing from before t. Hence there areat least q wavelengths available on e from t to ∞.

Thus, it can be shown that if the reserved number of wavelengths does not exceed

the maximum number of wavelengths of a link, all requests can be accommodated with

deferred wavelength assignment.

Figure 4-1 shows an example of arriving requests; the network has only two nodes

n0 and n1 that are connected to each other by a link with 3 wavelengths ( λ1, λ2 and λ3).

The requests are sorted in ascending order of arrival time, i.e., R1 arrived before R2. A

pictorial comparison between min-leading gap and deferred wavelength assignment is

shown in Figure 4-2. The min-leading gap wavelength assignment scheme schedules

requests in a first-come-first-serve fashion, i.e., R1 → R2 → R3 → R4 → R5. This

results in a situation that R2 is scheduled on λ1, R3 on λ2, R4 on λ3 and finally R5 fails to

be scheduled. Using a deferred wavelength assignment, all requests are accepted since

the link is computed to be available for the period [t2, t5]. During this period the number

of allocated wavelengths is less than or equal to 2. The deferred wavelength assignment

72

scheme (Figure 4-2 (b)) schedules requests according to start times of accepted

requests. At t0, R1 and R3 are activated, and scheduled on λ1 and λ2 respectively. R4

and R5 are scheduled on λ1 and λ3 at t2. Finally at t4, there is still room at λ2 for R2 and

R2 is scheduled.

Figure 4-2. Comparison of wavelength assignment using different schemes for requesttable of Figure 4-1.

4.2.4 Performance Evaluation

In addition to the traditional metrics of space and time complexity, the effectiveness

of an in-advance scheduling algorithm in accommodating reservation requests is critical.

The space complexity needs to be “reasonable”. That is, the space requirement should

not exceed the available memory on the computer on which the bandwidth management

system is to run. The time complexity is important as this influences the response time

of the bandwidth management system and, in turn, determines how many reservation

requests this system can process per unit time. Scheduling effectiveness is, of course,

critical as revenue is generated only from tasks that are actually scheduled.

Figure 4-3 summarizes the time complexity of each of the algorithms described in

Section 3.2.2. If deferred wavelength assignment is used for a wavelength assignment

algorithm combined with a certain routing algorithm on a wavelength convertible

network, all above mentioned complexities will be smaller by the factor of W since it only

counts the number of wavelengths that are used on a link.

73

Problem Algorithm SlottedArray ContinuousFirst Slot SPF O(τk(n log n +Wdn)) O(qk(n log n + L))

LSW O(τed) O(q(e + L)))

Any Slot SWF O(τk(n log n +Wdn)) O(qk(n log n + L))EBF O(nel + ew) O(nel + L)

d = duration of a request, k = number of paths to tryl = size of longest st list within a scheduling window

q = number of different ai s in the ST lists within a periodL = sum of lengths of TB lists, τ = end of scheduling window - d

w = size of scheduling window

Figure 4-3. Time complexity of different algorithms

4.2.5 Experiments

In this section, we first briefly present our simulation environment including the

network topologies used and the request generation process. We then present the key

variations that were implemented and compared. This is followed by our experimental

results and observations.

4.2.5.1 Simulation environment

For test networks, we used the 24-node NSF network (Figure 4-4A) and 33-node

GEANT network (Figure 4-4B) of [58], the 19-node MCI network and the 16-node

cluster network of [40], the 11-node network of [12], the Abilene network [5], and several

randomly generated topologies. Although many of these networks (except NSF and

GEANT) do not use optical interconnects, we converted these networks into optical

networks by setting the number of wavelengths based on the original bandwidth of links;

For example, the MCI network that has bandwidths ranging from 45Mbps to 310Mbps

was converted to a corresponding optical network by dividing the bandwidth of each link

by 5.

The random networks we used for our simulations had 200, 400, or 800 nodes.

The out-degree of each node was randomly selected to be between 3 and 5. To ensure

network connectivity, the random network had bidirectional links between nodes i and

i + 1 for every 1 ≤ i < n, where n is the number of nodes. The number of wavelengths

for each link was randomly selected between 5 and 10.

74

A NSF B GEANT

Figure 4-4. NSF and GEANT network

We generated a synthetic set of reservation requests in the same way as in

previous chapters. For each trial, we measured the request acceptance and bandwidth

acceptance ratios for 10 times and presents the average value of the results.

4.2.5.2 Evaluated algorithms

There are several variants for each basic algorithm that is described in the previous

section based on the properties of network (converters or lack of converters) and

whether or not a deferred strategy was used for wavelength assignment. For example,

the three variant of Modified Switch Path First (MSPF) are as follows:

1. MSPF w/ converter and w/ wavelength assignment,

2. MSPF w/ converter and deferred wavelength assignment, and

3. and MSPF w/o converter

Similar variations can also be derived for Modified Switch Window First (MSWF)

algorithms.

List Sliding Window (LSW) and Extended Bellman-Ford (EBF) algorithms have two

variants depending on the presence or absence of wavelength converters in the optical

network. For these algorithms, deferred wavelength assignment is assumed for the

networks without converters. For example, the variations for LSW are labeled as LSW

w/Conv and LSW w/oConv respectively. We programmed all the reasonable variants for

75

each basic algorithms in C++ and measured their effectiveness. We also studied the

impact of using converters on the effectiveness.

4.2.5.3 Results and observations

Figures 4-5 and 4-6 provide the average acceptance ratios for the algorithms as

a function of the number of requests in the study interval for two network topologies.

The average acceptance ratios for these algorithms as a function of the mean request

duration for the various network topologies were similar.

A NSF B GEANT

C Burchard D CLUSTER

E Abilene F MCI

Figure 4-5. Network acceptance ratio vs number of requests

Our experimental results show the following:

76

A 200 nodes B 400 nodes C 800 nodes

Figure 4-6. Acceptance ratio vs requests number in various random topologies

1. EBF consistently outperformed all other algorithms. For the remaining algorithms,the relative performance varies based on network properties. For homogeneousnetwork topologies (NSF, GEANT, Burchard and Abilene) (results not presenteddue to space limitations) that have same number of wavelengths on all the links,MSWF performs better than MSPF and LSW. However, for non-homogeneousnetwork topologies (MCI, cluster and random networks), LSW has betterperformance than MSWF and MSPF.

2. For non-homogeneous topologies, the algorithms that do not guarantee to findfeasible reservations (MSPF and MSWF) fared worse than the algorithms thatprovide such a guarantee. However, at times, the performance of the best “noguarantee algorithm” was quite close to or slightly better than that of the worst“guarantee algorithm”. As the network got saturated (i.e., the number of requestsin the study interval increases), the RAR for all algorithms declined and the rate ofdecline for the “no guarantee algorithm” was found to be higher than “guaranteealgorithm.” algorithms.

3. The relative performance of the algorithms does not change significantly withmean request duration. However, the performance gap grows larger as themean request duration decreases. The performance difference using deferredwavelength assignment method or not using it was not significant.

4. The use of wavelength converters generally led to better performance. Thus, theadditional flexibility that wavelength converters provide in a network is worthwhile.

4.3 Scheduling in Sparse Wavelength Conversion Network

4.3.1 Problem Description

In optical network scheduling, the primary concern is routing and wavelength

assignment (RWA). Wavelength division multiplexing (WDM) allows multiple lightpaths

from different users to share one optical fiber simultaneously. Normally, a feasible

77

lighpath in the optical network has to fulfill the wavelength continuity constraint, which

forces a single lightpath to occupy the same wavelength throughout all the links that

it spans. However, this constraint is relaxed in an optical network that is equipped

with wavelength converters. The signals received by a wavelength converter may be

transmitted on a different wavelength in the next hop. When every node in the network is

equipped with a wavelength converter, the network supports full wavelength conversion.

When only some of the network nodes have a wavelength converter, the network

supports sparse wavelength conversion.

The impact of wavelength converters in all-optical routing has be widely studied.

[56] showed that for certain topologies and fixed-path routing, sparse wavelength

conversion is almost as effective as full wavelength conversion. [14, 16] have investigated

the performance of the k-Alternative Paths algorithms in the presence of wavelength

converters. Their focus is the blocking probability of sparse wavelength conversion for on

on-demand scheduling. [33] considers in-advance scheduling using a continuous time

model and evaluates the various algorithms’ blocking performance for full wavelength

conversion and low workload.

In this section, our primary goal is to study the impact of wavelength converters on

First-Slot scheduling and to analyze the temporal behavior of typical RWA algorithms

in the context of sparse wavelength conversion. Extensive experimental results are

presented in this paper. According to our test results, increasing the fraction of nodes

with wavelength converters is of greater value for blocking performance in relatively

low traffic cases than in the high workload cases. However, the average start times are

almost unaffected by wavelength converters ratio except for the marginally improvement

in small topologies.

Another key consideration here is to explore the advantages and disadvantages

of the 2 scheduling strategies represented by EBF and k-Path respectively. Intuitively,

always accepting a request whenever there is a feasible lightpath in the network, which

78

is done by EBF, should provide better performance than limiting the search for a feasible

path to a small set of candidate paths as is done in k-Path. However, our results show

that this statement is only true when the overall workload is small compared to network

capacity. EBF often schedules a request on a longer path than used by the k-Path.

So, when the workload is high, the additional resources utilized by EBF to accept

a request negatively impacts the acceptance of future requests. Our results clearly

demonstrate these tradeoffs: EBF performs better when network capacity is ample for

the requested workload, but k-Path outperforms when the traffics congest the network.

This observation leads us to propose a hybrid approach that automatically switches

between the two algorithms based on current network traffic.

We also study different tie-breaking approaches when multiple paths are feasible

and the impact of different tie-breaking schemes on overall performance in the presence

of sparse wavelength conversion. A slack tie-breaking scheme is proposed and its

performance relative to other widely used strategies is analyzed.

4.3.2 Extended Network Model

The topology of a optical network with sparse conversion is represented as a graph

G = (V ,E ,W ) where V is the set of optical switches or routers, E is the set of optical

links and W is the number of wavelengths supported by each link. Each node n in V is

associated with a boolean function F (n), which is true if and only if the node is equipped

with wavelength converter. To emulate the full-conversion algorithms, we first convert

the above graph into a new graph G ′ = (V ′,E ′). To map node set G to G ′, for a node

n ∈ V , if n equips a wavelength converter (i.e. F(n) is true), a corresponding node n′ will

be inserted into V ′. If F(n) is false, W pseudo-nodes (n′1, n′2, ..., n′w) will be inserted. W

is the number of wavelengths defined in G . For a link l ∈ E , if e connects 2 nodes with

converters, it will be mapped to a link l ′ in E ′; else, l will be mapped into W pseudo-links

(l ′1, l ′2, ..., l ′w) and each pseudo-link stands for a specific wavelength that carried in l . In

79

the extended model, l ′i is incident to n′i iff l is incident to n in the original graph and l ′i

and n′i is their ith pseudo copy respectively.

Figure 4-7 shows a example of a 4-node ring topology. The network contains 2

wavelength converters, nodes A and B. Each link carries 2 wavelengths. Its extended

presentation, shown on the right side, contains 6 nodes and 7 links. Each non-converter

nodes are split into 2 pseudo-nodes(nodes {C1, C2} for node C and {D1, D2} for node

D). Each optical link that is incident to a non-converter node is split into 2 pseudo-links

and each pseudo-link stands for one individual wavelength(link(B, C1) and (B, C2)for link

(B, C) for example). Two pseudo-nodes that are adjacent in the extended model must

fulfill 2 conditions: 1)their corresponding nodes are adjacent in the original graph, and 2)

they are either converter nodes or their wavelength index matches.

Figure 4-7. Extended Network Model

The extended graph G ′ is equivalent to the original graph G .Every feasible lightpath

in G has a corresponding path in G ′ and vice versa. The wavelength continuity

constraint is also preserved in the extended model. If node n has no wavelength

converter in G , every corresponding pseudo-node n′i in G ′ is incident by and only

by the pseudo-links of wavelength index i . Hence, we can directly apply the RWA

algorithms that originally designed for full-conversion/no-conversion networks to the

sparse conversion scenario with little adaptation.

When scheduling a request, if the source or destination nodes are extended

to multiple pseudo-nodes, the RWA algorithm needs to check every corresponding

pseudo-source-destination pairs. So at most W 2 rounds of the original algorithm are

80

needed in the extended model, where W is the maximum number of wavelength carried

by a link in the network. Also, during the model extension, the graph size can increase

by at most W times for its link and node number. Therefore, an algorithm’s computation

time on the extended model G ′ is bounded by a constant ratio(a polynomial of W ) of

the computation time for that algorithm in full-conversion scenarios, which runs on the

original network G . So, the adapted algorithm has the same asymptotic complexity as its

original version.

In this paper, we employ Extended Bellman-Ford algorithm and k-Alternative path

algorithm as our RWA algorithms. The detail of the algorithms will be presented in

Section 4.3.3.

4.3.3 Routing and Wavelength Assignment Algorithms

The RWA algorithm for First-Slot problem usually contains 3 steps:

1. Identify the earliest start time.

2. Find the shortest path that provides such start time.

3. Assign the wavelength.

As the wavelength assignment are independent from the first 2 steps, most RWA

algorithms handle this step separately. In our paper, EBF and k-path algorithm proceed

the first 2 steps. A wavelength assignment strategy called Least Conversion Assignment

is explained in Section 4.3.3.4.

4.3.3.1 Extended Bellman-Ford algorithm for sparse wavelength conversion

Extended Bellman-Ford algorithm[52] applies the Bellman-Ford shortest path

algorithm[17] to the ST lists on links that may connect source and destination. The key

steps of the algorithm are as follows:

1. Let st(k, u) represent the union of the ST lists for all lightpaths from vertex s tovertex u that have at most k edges. Clearly, st(0, u) = ∅ for u ̸= s and st(0, s) =[0,∞]. Also, st(1, u) = ST (s, u) for u ̸= s and st(1, s) = st(0, s). For k >= 1, the

81

following recurrence can be derived:

st(k , u) = st(k − 1, u) ∪{∪v ,(v ,u)∈E{st(k − 1, v) ∩ ST (v , u)}}

2. Construct the list st(n − 1, d), which gives the start times of all paths from s to dthat have bandwidth BW available for a duration Dur . If st(n − 1, d) is not empty, ain its first (a, b) pair is the earliest start time for this current source/destination pair,denoted as esti .

3. For all possible source/destination pairs, find the minimum esti as the earlieststart time for the request and the corresponding source/destination pair (si , di) arerecorded.

4. Remove the links that can not provide the requested bandwidth at the earliestfinish time.

5. Run Breath-First Search[17] on the extended model and find a shortest route fromthe si to di , map the route back to the original graph.

The complexity of intersection and union operation is linear to the length of the

current ST List. For each iteration of constructing st(k , d) for st(k − 1, d), we needs to

compute the ST List for each link and each computation takes O(L) time, where L is

the length of the longest st list. Since the construction iterates at most N − 1 times, the

complexity of the extended Bellman-Ford algorithm is O(N ∗E ∗L), where N and E is the

number of nodes and links in the graph.

4.3.3.2 k-Alternative path algorithm

The k-Alternative path algorithm is extended from the shortest path algorithms

routing used by OSPF in Internet routing. Recognizing that an OSPF-like algorithm

may fail to find a feasible path in a network that has a feasible path, k-Path algorithm

generates additional disjoint paths with the hope that one of the additional paths will be

feasible. By constructing ST List on each path, it is easy to decide which path provides

earliest start time. If the all generated path is infeasible, the request is reject.

The k paths can be either fixed or dynamically generated. k-Fixed path are

computed for each node pair before the first scheduling begins according to some

given link costs. The k-dynamic path are computed for every request according to

82

current network status. Usually, the most congest link will have large cost so as to be

avoid in the path computation. [45] have shown the k-Dynamic paths can provide a

much larger network throughput than k-Fixed paths, especially when the traffic load is

relatively large for the network’s capacity. So, in this paper, we adopt the this k-Dynamic

paths algorithm and call it KDP.

4.3.3.3 Breaking the ties in path selection

When multiple paths can satisfied a user’s request, a tie-breaking scheme is

needed to select one of them as the actual scheduled path. The most straightforward

scheme breaks ties based on first-fit(FF) strategy or Shortest Path(SP) strategy. First-fit

strategy terminates the path selection once a succeed path is found. Shortest-path

strategy chose the path with minimum hop number. Both strategies are widely used

in non-conversion and full-conversion scenarios, but using them directly in Sparse

wavelength conversion needs careful consideration as these paths are computed on

extended models.

In the extended model, the shortest path does not necessarily corresponds to the

shortest path in the original graph for same source/destination pairs. Figure 4-8 gives

an example of a 4-node ring with only 2 converters, nodes A and C. Each link consists

2 wavelengths. At some conjuncture, the available wavelengths are shown in extended

graph on the right side. For a request of 1 wavelength of capacity from node B to A. Two

candidate paths are available: B1 → C → D1 → A and B2 → A. Assume that both

path provides the same start time, The First-Fit tie-breaking scheme will choose the

pair (B1,A) as source and destination when the node-pairs are checked in lexical order

and the Shortest Path tie breaking scheme will choose the 3-hop path as it is the one

available B1 to A.

A tie-breaking scheme that choose the shortest path by examining all source-destination

pairs, rather than the first succeed pair, would solve the problem. However, considering

only the path length would not be enough in the context of sparse conversion. In

83

Figure 4-8. Extended Network Model

networks without wavelength conversion, shorter path is more likely have common

wavelengths, therefore less likely to block the requests. However, the present of

wavelength converters reduces the correlation between path length and path capacities

due to the elimination of the wavelength continuity constraint. Hence, in terms of load

balancing, choosing longer path with higher capacity may also reduce the potential

congestion by alleviate the traffic in the short paths. In this paper, instead of using

First-Fit strategy in original EBF and KDP algorithm, we employ a slack tie-breaking

scheme that selects the path that at most h hops longer than the shortest path but have

the most free wavelengths among all the paths that have fewer or equal hop counts. We

call these variant EBF-Sand KDP-S.

Intuitively, h should not be too large or the benefit of load balancing would be

canceled by the waste of link capacities on longer paths. In Section 4.3.4, we will

comparing the First-Fit scheme with Slack scheme and perform a numeric analysis on

the choice of h in different scenarios.

4.3.3.4 Wavelength assignment

With the presence of wavelength converters, wavelength assignment becomes

less important in optical routing. However, as wavelength conversion contributes a

considerable delay in the optical transmission[16], a proper wavelength assignment

would potentially reduce such overheads.

For those links that connect to a non-converter nodes, the wavelength assignment

is quite straightforward, as the specific wavelength has been already identified with

84

the lightpath during the path selection process. However, for those links connect two

wavelength converters, we apply the least conversion assignment rather than the first-fit

algorithm used in most RWA algorithms. In least conversion assignment, the link that

connects 2 converters will use the same wavelength either as its previous hop or as it

next hop unless neither wavelengths is not available. For instance, let l ∈ E connects

two wavelength converters, wi be the wavelength assigned for the previous link on

the path and {wn} be the set of common wavelength of both l and l ’s next link. If wi

is available on l , we assign this wi to the lightpath, else we assign any wavelength in

{wn} if {wn} ̸= ∅. If neither is available, a random available wavelength in assigned.

This strategy can always guarantee a feasible solution while avoiding the unnecessary

conversions.

4.3.4 Experimental Evaluation

4.3.4.1 Experimental framework

In this section, we measure the performance of the scheduling algorithms described

in Section 4.3.3 and analyze how wavelength converters affects the algorithms’

performance in various scenarios. We compare the tie-breaking schemes in Section

4.3.3.3 to show the effectiveness of our slack strategy; We analyze the performance

of RWA algorithms for 3 metrics: blocking probability, average start time and execution

time. In our experiments, blocking probability is measured by ratio of rejected requests

comparing to the total submitted request. Average start time is measured by the

average delay between the reservation window’s start time for a job and its actual start

time. Execution time measures the convergence speed of each algorithm and how it

varies with network size and workloads. We also proposed a self-adaptive algorithm

switching strategy that dynamically choose the suitable algorithm according to the

current workload. The performance of this switch strategy is tested in various scenarios.

To simulate a e-Science backbone, we use a 10-node ring topology, a 25-node

mesh-torus topology, a real work 11-node Abilene network (Figure 4-9) and several

85

randomly generated topologies. Each link is assumed to carry 10 wavelengths. For

randomly generated topologies, we set the out-degree of each node to be random

integers between 3 and 7. To ensure network connectivity, the random network has

bidirectional links between nodes i and i + 1 for every 1 ≤ i < n, where n is the number

of nodes. During the results analysis, we found that the test data from Ring and Abilene

topology are very similar to each other, while Mesh-Torus and random topology also

results in almost same observations. Thus, in this paper, we only present the results

from Ring and Random topology to avoid redundant diagrams. The full analysis on data

from all four topologies can be found in our technical report[39].

A Ring B Mesh

Figure 4-9. Network Topologies

Besides ratio of nodes that have converters, their placement in the network is also

an important consideration and extensive studies have been performed[15, 57, 68, 70].

In this paper, we do not employ any specific wavelength converter placement policies for

following reasons: 1) most converter placement strategies are closely related to traffic

distribution in the network, but the traffics pattern are not precisely known at the network

design time. 2) we assume that topology and converter placement are predefined. The

scheduler cannot make changes. Instead, a simple placement strategy is used in our

evaluation. A node is capable of wavelength conversion with probability of q independent

of the other nodes. The number of wavelength converters in a network of N nodes is

binomially distributed with an expectation of Nq. In our simulation, we use WLC Ratio

to indicate the percentage of nodes that are equipped with wavelength converters. We

86

will compare the algorithms performance under different WLC Ratio to plot the impact

of wavelength converters. Although a simple strategy is applied here, our expectation is

that a performance comparison between different approaches should be applicable even

when a more sophisticated placement is used.

File transfer requests are synthetically generated. Each request is described by

the 6-tuple (si , di ,BWi ,Duri ,STi ,ETi). The source and destination nodes for each

request were selected using a uniform random number generator so that the workload

is distributed uniformly among different node pairs. Without loss of generality, we

assume each request asks for a capacity of only 1 wavelength. The duration is uniformly

distributed within a range of 100 to 500 time units. The arrival of the request follows a

Poisson distribution with rate α. The reservation window starts at some time after the

requests’ arrival. As the results are relatively insensitive for this lag, we arbitrarily chose

it as 100 time units. The length of the window is randomly select from 2 to 4 times of the

request duration.

We assume that the requests arrive in a Poisson process for each source/destination

pair with an arrival rate α. Following the experimental setting in [56], α is picked in the

range from 0.01 requests/time unit to 0.1 requests/time unit for each node pair. So, for

example, with a arrival rate of 0.05 requests/time unit, one run of our experiment on a

100 node random topology would process approximately 5 ∗ 105 requests during the

tests which lasts 1000 time units. All our experiment assume that we start with no load

i.e., no existing scheduled transfers.

4.3.4.2 Slack tie-breaking scheme

Recall that in Section 4.3.3.3, to select the best candidate path, we consider all

the paths that are at most h hops longer than the shortest path. Hence, to evaluate

the performance of these heuristics, we must first decide the value of the h. Figure

4-10 explore how h’s value influence the blocking performance of EBF − S in various

topologies. In the small topology like 8-node ring, h = 1 provides the best performance,

87

A Ring B Random

Figure 4-10. Different h values for different topologies. Network Traffic Load: α = 0.05

while in the 100-node random network, the h = 2 case is marginally better than

h = 1. Also, large h values like 3 or 4 actually deteriorate the performance as the link

capacity is wasted by choosing long paths. In our test, h = 1 is also the best choice for

Mesh-Torus and Abilene. Similar results (not presented here) were also observed for

KDP − S . In the following tests, we chose h = 1 in for Ring, Abilene and Mesh-Torus

topologies and h = 2 for random network.

A Ring B 100-node Random

Figure 4-11. Benefit of slack tie-breaking scheme in various topologies. Network TrafficLoad: α = 0.05.

Figure 4-11 depicts the wavelength converters’ impact on blocking performance by

varying the wavelength converter ratio for various topologies. EBF-S and KDP-S use

the simple First-Fit tie-breaking scheme but EBF-S and KDP-S employ slack scheme.

We observe that the algorithms with slack scheme work much better than the algorithms

using FF scheme in all cases. Thus, choosing longer path in presence of excess

88

capacity benefits the blocking performance for sparse wavelength conversion. Another

observation is that when WLC Ratio equals 20%, the blocking probability of First-Fit

algorithms is worse than not having any converters. This phenomenon can be explained

as follows: when no wavelength converter is available, many requests are rejected due

to lack of continuous wavelength, but the accepted request are more evenly distributed

among all wavelengths and long lightpaths are less likely to be established. When a

small number of nodes are equipping with converters, the traffic scheduled by First-Fit

strategy are more likely to use those long paths between with wavelengths of lower

index, as shown in Section 4.3.3.3. The additional benefit from having 20% converters is

not enough to cover the degradation due to the wasted capacity of long paths. Although

this degradation can be compensated by either inserting more converters, as plotted

in Figure 4-11, using the slack scheme is obviously more effective and economical.

We also notice that the improvement brought by the slack scheme in Ring and Abilene

topology is not as much as in mesh-torus and random topology. This is consistent with

the conclusion in [56], which states that wavelength conversion can help more in the

topologies with more divergence and connectivity, as more variants in the paths are

available.

In the test for both Figure 4-17 and Figure 4-11, the network traffic load is set to a

moderate degree: α = 0.05. However, similar results are also observed under different

workloads.

4.3.4.3 Blocking probability

In this section, the blocking performance of EBF-S and KDP-S are evaluated and

compared the Fixed-Shortest Path routing algorithm. For Fixed-Shortest Path routing,

no tie-break is specified as only one path is available. However, as First-Fit wavelength

assignment is applied, Fixed-Shortest Path routing is denoted asSP-FF in our diagrams.

Figure 4-12 and Figure 4-13 depict the how the blocking probabilities changed with

wavelength converter ratio in Ring, Mesh-Torus, Abilene and Random-100 topology.

89


Figure 4-12. Blocking Probability vs. Wavelength Converter Ratio in various topologywith low traffic load.

Figure 4-12 present the results obtained when the network’s traffic load is relatively low:

α ∈ [0.01, 0.05], while Figure 4-13 present the result when workload is relatively high:

α ∈ (0.05, 0.1].


Figure 4-13. Blocking Probability vs. Wavelength Converter Ratio in various topologywith high traffic load.

From Figure 4-12 and Figure 4-13, we note that Increasing the wavelength

converter ratio can decrease the blocking probabilities for all algorithms. However,

the improvement is also dependent on the network’s traffic load. When network traffic

load is relatively low, EBF only needs about 40% of wavelength converters to provide

a satisfactory blocking performance, but the blocking probabilities of KDP and SP

decreases more gradually as the increase of WLC Ratio. When traffic load is high,

increasing the wavelength converters has only marginal improvement on all algorithms.

This can be explained as follows. EBF explores the network more thoroughly for an

90

available path than KDP and SP. With a small amount of converters, EBF is able to

satisfy the traffic demands but KDP and SP can not. However, when traffic load is

high, majority of blocking occurs due to lack of link capacities but not the availability of

continuous wavelengths. Hence, increasing wavelength converters has little impact in

heavy loads.

We also note that in all topologies. EBF-S and KDP-S algorithms can achieve a

much smaller blocking probability comparing to SP-FF algorithm. In simple topologies

like Ring (Figure 4-12A and 4-13A), EBF-S and KDP-S leads SP-FF for about 2-5% on

blocking probability, while in those more complex topologies like random network (Figure

4-12B and 4-13B), the advantages are doubled. This is consistent with the results in

[14] that shortest-path routing is less likely to be improved due to the small number of

alternative lightpaths.

A Low Workload B High Workload

Figure 4-14. Total resource consumption in a 100-node random network under differentworkload.

Another important observation is that EBF outperforms KDP in low traffic

workloads(Figure 4-12), but KDP leads EBF in high traffic workloads(Figure 4-13).

This can be explained as follows: EBF tries to find any possible path for current request

if it exists, but KDP only tests no more than k paths. When the long-term traffic load

is low, the network’s capacity is ample to accommodate most requests, the EBF that

acts greedily would accept more requests than KDP. However, when traffic load is high,

limiting the routes to only those short paths and reject some long paths would definitely

91

benefit the schedule of future requests. In that case, the more conservative KDP would

provide a better performance in the long run.

This observation can also be supported by the fact that EBF actually consumes

network bandwidth faster than KDP does. Figure 4-14 depicts the total amounts

of link capacities that EBF and KDP consume under different workloads. The total

resource amounts are defined as∑

p∈RLP(Dur(p) · length(p)), where RLP is the set of

all established lightpaths, Dur(p) is lightpath p’s duration and length(p) is p’s hop count.

We see that, for the same request set, EBF consumes more link capacities than KDP

in low workload cases, as KDP rejects those long-path requests while EBF accepts

them. When the workload is high, KDP, which rejects some early long-path requests,

substantially accepts more requests in the long run. Therefore, the total amounts of link

capacities both algorithms consumes are almost equal in high workload.

4.3.4.4 Requests’ average start time


Figure 4-15. Average Request Start Time vs. Wavelength Converter Ratio in varioustopology with low traffic load.

In the all-optical routing area, blocking probability is always the primary concern, but

for this special case of First-Slot scheduling, the availability of earlier start time may also

be an important metric to evaluate a scheduler’s performance. Figure 4-15 and Figure

4-16 present the influence of wavelength converter ratio on the requests’ average start

time. Similar with the previous section, Figure 4-15 shows the start time performance in

the low traffic case and Figure 4-16 shows the test result in high workload cases.

92


Figure 4-16. Average Request Start Time vs. Wavelength Converter Ratio in varioustopology with high traffic load.

To exclude the impact of blocked requests on the average request start time, we

set the reservation window for each request to a large enough time interval such that

every request will be accepted at some time within the window. From the above figures,

we can observe that increasing wavelength converters has positive impacts on the

requests’ start time, but the improvements are not as obvious as the impact on blocking

performance, especially for EBF-S and KDP-S. In Ring topology, the improvement from

0% WLC ratio to 100% WLC ratio is only about 5% in low traffic load case, and 15% in

high traffic load cases. In random topology, the improvements are almost negligible.

We also note that EBF-S and KDP-S lead the average start time over SP-FF in

all case. The advantages are larger in random topology than in Ring topology and

they increase as the workload increase. This shows that EBF-S and KDP-S have

the ability to schedule the requests in a more parallel way than SP-FF. When the

network capacities are ample for the requests, EBF-S and KDP-S provide much faster

schedules. When the traffic congests the network, those forthcoming requests have to

wait for previous requests to finish due to lack of network capacity, which reduce EBF-S

and KDP-S’s advantages.

Similar to the results in previous section, EBF-S again outperformed KDP-S in

low workload cases on average start time, and KDP-S gains better performance for

conservative reservation strategy in the heavy workload. When the networks’ capacity is

93

relative ample comparing to the requested workload, EBF-S has the ability to start the

job earlier than KDP-S, but their performances are very close in large topologies. When

the work network is under congestion by high request rates, KDP-S leads the average

start time in all topologies and its advantage are more observable in large networks.

4.3.4.5 Scheduling overhead

In this section, we discuss about the scheduling overhead of our RWA algorithms.

Figure 4-17 presents comparisons on the scheduling overhead of EBF-S and KDP-S

with various WLC ratio and network size. These results show that EBF-S is about 2-5

times slower than KDP-S and its computation time grows faster than KDP-S with the

increase of network size. The execution time decreases with the increase in WLC ratio,

as the size of extended network is smaller. However, even for a 400-node topology, EBF-

S can schedule a request averagely within several seconds. This should be acceptable

in most scenarios.

A CT vs. WLC Ratio B CT vs. Network Size

Figure 4-17. Average computation time of EBF-S and KDP-S.

In summary, equipping the network with some wavelength converters do improve

the blocking performance, but adding more converter after certain threshold does not

bring more benefits. Meanwhile, the traffic load and network topology also cast great

influence on the blocking probability. For the average start time, we found that the effect

of wavelength converter ratio is minor for both EBF-S and KDP-S, but the traffic load

and network topology have more evident influences on this metric. Comparing two

algorithms, EBF-S generally performs better in the low traffic cases, but KDP-S is a

94

better choice when the traffic load is heavy. KDP-S is faster than EBF-S in all case, but

in even in the worst case, EBF-S can still schedule the requests in an acceptable speed.

4.3.4.6 Algorithm switching strategy

We find out in Section 4.3.4.3 and Section 4.3.4.5 that the greedy approach EBF-S

has better performance when the traffic load is comparably light, while the conservative

approach KDP-S works more effectively in the high workload case. In this section, we

propose a self-adaptive algorithm switching strategy that automatically chooses EBF-S

or KDP-S according to current traffic load in the network.

The main idea of the switch strategy is as follow. The time domain is divided

into equal-length time slots. The scheduler runs both algorithm simultaneously on

each request, the choice of whose result to apply in current time slot is made at the

beginning of this slot, assuming current traffic has the similar pattern as in last slot.

The performances of the candidate algorithms can be evaluated by either their blocking

performance, or average start time of the requests scheduled in the last slot, or a

combination of the two. Comparing the statistical performance of both algorithms in the

last slot, the algorithm that performed better in the last slot takes effect in the current

slot. If it does not perform as good as the other one in current slot, the schedule will

switch to its alternative at the beginning of the next slot.

A Blocking Performance B Start Time Performance

Figure 4-18. The performance of algorithm switching strategy in Slow Traffic PatternSwitching.

95

A Blocking Performance B Start Time Performance

Figure 4-19. The performance algorithm switching strategy in Fast Traffic PatternSwitching.

To evaluate the performance of our algorithm switch strategy, we designed 2

scenarios: slow traffic pattern switching(STPS) and fast traffic pattern switching(FTPS).

In our test, we assume the request arrival rate α is randomly selected from the range

[0.01, 0.1] and the length of each time slot is 10 time units. In STPS scenario, the arrival

rate α changes its value with 50% probability every 100 time unites. In FTPS scenario,

the arrival rate changes its range with 50% probability every 10 time units.

Figure 4-18 and Figure 4-19 depicts the blocking and start time performance of

our switch strategy in both scenarios in a 100-node random topology. DS-SS algorithm

stands for the switch strategy worked in the STPS scenario, while DS-FS algorithm

stands for the switch strategy worked in the FTPS scenario. The results show that

the hybrid algorithm worked in STPS scenario, DS-SS, have the best performance

on both metrics, as the history information from last slot predict the current traffic

pattern quite accurately. On the other hand, the performance of DS-FS algorithm which

worked in FTPS scenario degraded dramatically due to the increasing probability of

inaccurate predictions in FTPS scenario. In some cases, DS-FS even provides the worst

performance. So, when the traffic pattern remain static or changed infrequently, our

algorithm switch strategy can provide a pretty good performance by synthetic the merits

of both algorithms. However, when the traffic pattern is changing frequently, the switch

strategy does not guarantee any improvement comparing to either EBF-S or KDP-S.

96

4.4 Optical Network Scheduling Summary

In this chapter, we conducted extensive simulations to evaluate the performance of

algorithms for a variety of request patterns and network topologies. Our results show

that Extended Bellman Ford (EBF) algorithm has consistently better performance that

other algorithms. For non-homogonous networks, LSW also provided comparable

solutions; while for homogeneous networks MSPF and MSWF provide comparable

solutions. Our simulations were performed for presence or absence of converters

that can be used to convert from a given wavelength to another wavelength. Our

experimental results showed that the use of wavelength converters generally led to

better performance. Thus, the additional flexibility that wavelength converters provide

in a network is worthwhile. We also showed that a deferred wavelength assignment

strategy can be effectively used in conjunction with the routing algorithms. A deferred

strategy only counts the number of wavelengths that are used on a link. Since, the

actual assignment of the wavelength is done at the time of the request fulfillment, this

alleviates the need of keeping track of bandwidth allocation status of each wavelength. A

deferred wavelength strategy always guarantees to find a feasible solution as long as the

total amount of reserved bandwidth does not exceed the total capacity of bandwidth of a

link at a specific point of time.

In this chapter, we also examined the impact of sparse wavelength conversion on

First-Slot scheduling. We proposed a new network model to emulate the full-conversion

algorithms in sparse conversion networks. Using this model, we conducted extensive

experiments to assess the impact of wavelength converters on First-Slot RWA

algorithms’ performance. This assessment used three metrics: blocking probability,

average start time and scheduling overhead. Our experiments have indicate that

increasing wavelength converters has positive impact on blocking performance, but very

little impact on the availability of earlier start times. Meanwhile, as most improvements

are achieved by having no more than 60% nodes with converters, deploying too many

97

wavelength converters may not be worth the additional cost. We also proposed a

Slack tie-breaking scheme when multiple feasible paths are available. This tie-breaking

scheme is shown to have much better performance than the traditional First-Fit or

Shortest-Path tie-breaking schemes. The comparisons between EBF and KDP

also lead to the conclusion that accepting requests greedily would provide better

performances in low traffic load case, but rejecting some requests with long path can

be a superior strategy when workload is high. An algorithm switching strategy that

adapts the scheduling algorithm as the current workload changes is proposed. When

the network traffic pattern changes slowly, this strategy has considerable advantage over

static algorithms. Overall, our results show that adding a small number of wavelength

converters may have a limit positive impact on First-Slot scheduling. However, this

impact should be carefully weighted against the additional cost.

98

CHAPTER 5MULTIPLE RESOURCE SCHEDULING


Many complex e-Science applications require a large amount of schedulable

resources which are subjected to dynamic changes. The ability to reserve various

types of computational resources such as bandwidth channels, CPU, memory and

disk space has become a key requirement for overall effectiveness for the e-Science

community. To meet this need, we propose a framework for conducting advance

reservations, admission control, and scheduling of network service requests along

with other resources such as CPUs, memory, disk space, and software licenses.

This framework is segregated from the scheduling of network resources such as

network bandwidth. For example, LSF-HPC [44] and Maui [42], which are popular

high-performance computing schedulers for clusters and supercomputers, do not

schedule network resources, although they are able to do advance reservation of

resources such as CPUs, memory, disk space, and software licenses [3]. The same is

true for Condor (University of Wisconsin’s high throughput scheduler), PBS (portable

batch system-the parallel scheduler for the IBM SP2) and VMWare ESX. On the other

hand, network bandwidth management systems such as those for UltraScienceNet

(USN) and ESnet do not schedule computer resources. The Sharc system [63] modeled

both network and CPU resources in the clusters as unified resources blocks, but the

constrains among resources, such as network topology, resource compatibilities are not

considered.

We envision an environment in which a computational network contains hundreds of

nodes, and computational resources with different platforms. When multiple resources

are reserved, their topologies, dependencies, and compatibilities must be handled.

The purpose of our paper is to develop a co-scheduler that simultaneously schedules

99

multiple types of resources with a network focus based on a our Multiple Resources

Reservation Model (MRRM). With MRRM, we can

1. give multiple heterogeneous resources a unified presentation, but still keep theirdiversity;

2. efficiently present the various types of constraints among different resources, suchas compatibility, accessibility, and assignability; as well as

3. for different type of user’ requests, the model can be flexible enough to adjust theresource presentation for better scheduling efficiency.

We define a Multiple Resource First Slot problem (MRFS) with the objective of

determining the earliest time that can be used to reserve all resources required to

compute a given request. We divide the MRFS problem into four sub-problems, based

on (a) whether the request consists of multiple subtasks that can be assigned to different

resources, and (b) whether each resource can be assigned independently.

In system and architecture field, many paper also focused on resource reservation.

[20] proposed the GARA framework that allow multiple resources to be monitored

and reserved in a Cluster system. In GARA, all resources are consider as unified

blocks so that one single algorithm is enough to handle all resources. However, GARA

is only a framework for resource management, No specific algorithm is mentioned.

[62] modeled different resources as queues. One request is allowed to enter the

queue iff current resource fulfill the user’s request. [62] also groups resources of same

type in a single layer such that request in layer i can only access layer i + 1 or layer

i − 1 after its work in layer i finished. Actually, this layered model is not only widely

used in most hardware architectures, but is also applied to many softwares, such as

TCP/IP protocol stack. In this paper, we also use such layered model to present our

local computational resources. However, [62] does not handle the dependency and

compatibility constrains at the same level of detail or fidelity that our MRRM formalism

does. In [7], the resources are grouped according to their type rather than their physical

100

Local

Network

Local

Resources

Edge EdgeNetwork

Local

Resources

router router

Resources

Figure 5-1. General Model of MRRM

residency, which brought convenience for centralized scheduling, also the resource

accessibility problem is proposed, but only the fully accessible model is used.

5.2 Resource Model and Data Structure

5.2.1 Resource Model: MRRM

When developing a resource reservation system, one must first design a representational

formalism. For example, a computational system can be viewed as a network with

computation or storage centers attached to some of its edge nodes. Data sets are

generated or processed using these attached resources, and are transmitted through

the network from one edge router to another.

For the vast majority of e-Science applications, resources can be classified into

network resources and local computational resources. Network resources transfer user

data from one site to another, and include but are not limited to optical links, routers and

switches. Local computational resources include CPU, memory, hard disk and other

resources used in processing user requests. Figure 5-1 provides a general view of

MRRM ’s basic structure.

As mentioned in Section 5.1, one of the major goal of our resource model is to

provide a unified view for multiple types of resources. Our MRRM represents a network

in terms of a graph G = (V, E). The organization of G mirrors the connectivity of the

computational network being modeled. Each switch or router is represented as a node

in V, while each network link is mapped to an edge in E. Behind the edge routers,

each computation or storage center groups its own local computational resources in

related clusters. Each cluster is represented as a sub-graph attached to one of the edge

101

routers. A single resource unit is modeled as a resource link and is associated with an

edge. We choose this representation for the following reasons:

1. It is natural to present networks as graphs and associate link bandwidth with edgesin the graph, as in majority of network models.

2. In the reservation aspect, mapping a CPU node, a memory block or a hard diskinto a edge in the graph is feasible, since our main modeling concern pertains torepresenting the amount of available resources. We can thus present a CPU’scapacity in the same manner as network link bandwidth.

3. The compatibility and dependency constrains among different types of resourcescan be represented as connectivity constraints among resource links in the graph.

4. A unified view of different resource types facilitates algorithm design for multipleresources scheduling. We can extend the known graph-based, single resourcescheduling algorithms [25, 48, 52] to the current scenario, while inheriting theirattributes.

A computational system is therefore modeled as a graph with the communication

network in the middle, and computational resources attached to network edge nodes,

as shown in Figure 5-1. However, in contrast to traditional graph representations of a

network, two new research challenges emerged from the MRRM representation:

1. How do we model heterogeneous resources with a unified representation thatdoes not reduce or eliminate their diversity?

2. How do we model the compatibility and assignability constraints among differentresources?

To solve the heterogeneity problem, we first assign each resource link a type ID

(T − ID), to specify its type - for example, CPU, Memory, or more generally, Resource1,

Resource2, etc. With the type ID, all local resources can be grouped into one of several

multi-partitioned resource constraint graphs (MPRCGs), which enables resources

with the same type to be managed as one group. This grouping strategy fulfills the

requirements of a user request with respect to resource granularity, as opposed to a

single resource unit. Typically, a user only cares about the quantity of a given resource

type that will be available, but not whether his job will be run on a given CPU node

102

or stored in a particular memory block. This is reasonable in practice, because the

organizations of supercomputers vary from one architecture to another, but share a

similar layered structure. For example, a CPU can access memory directly, but not a

hard disk. If a page fault occurs, then the disk will write to the memory, but not directly

through the memory to the CPU.

Compatibility is another important issue in MRRM. Usually, we have multiple

choices within a given resource type. However, not all resource units of a given type are

appropriate to perform a user-or system-defined task. For example, if a user program

is written in native Intel x86 code, then it will likely not run on a MIPS CPU. In this

case, computational nodes that do not support Intel x86 code should be excluded for

reservation by the co-scheduler. Similarly, if a program can be parallelized, then the

CPUs involved in parallel computation must be compatible in terms of code type and

method of data sharing across multiple resource units in MRRM. Thus, we supply a

compatibility ID (C-ID) to each resource, which facilitates grouping of resources with

the same T-ID into different compatibility classes. In our current approach, a user

specifies a job in terms of its compatibility ID, and all resources with the same C-ID will

be considered for reservation. If no C-ID is specified, then all the resources classes are

available for this request.

A further consideration involves accessibility. For example, some computers use

Distributed Shared Memory to provide all CPU nodes full access to the memory model.

However, other systems only allow certain CPUs to access specific memory partitions,

due to security concerns or physical connectivity. A similar situation also can exist

with memory and hard disk connections. Our MRRM is capable of simulating each of

these scenarios. If two resource links within different MPRCGs are accessible to each

other, then a specific auxiliary link with unlimited capacity connects the two links. If all

resources in one MPRCG are accessible to all resources in another MPRCG, then a

connection is made at the MPRCG level, so that every resource link in either MPRCG

103

Figure 5-2. Detailed Model of MRRM

is connected with every resource link in the other MPRCG. Because these auxiliary

paths have unlimited capacity, they will only identify resource accessibility, and will not

influence the reservation process.

Figure 5-2 provides a detailed example of MRRM on a local site, where all

resources with the same type ID are grouped within an MPRCG. Resources in one

MPRCG need only access the resources in that MPRCG’s neighborhood. Within each

reservation MPRCG, resources with the same compatibility ID are re-grouped together

such that parallel programs can run using all resources in any one of the inner cycles. If

two MPRCGs are completely accessible to each other, then we (a) add a dummy node

between the MPRCGs, and (b) add an auxiliary link with unlimited capacity between

each resource link’s end node and the dummy node. In Figure 2, this process is shown

between resource types A and B. Adding this dummy node, as opposed to constructing

a fully connected bipartite graph directly between two MPRCGs, reduces the algorithms

complexity by reducing the number of edges in the graph. For example, suppose we

have N resource links. Then, a fully connected bipartite graph requires N2 auxiliary links,

but only 2N auxiliary links when an extra dummy node is added. In case the resources

in one MPRCG can only access part of the resources in its neighborhood, then a direct

auxiliary link is required to represent this accessibility between two different resources.

In Figure 5-2, this is represented between resource types B and C.

104

Also notice that we have all the resource links in one end of an MPRCG (e.g.,

left-most or right-most MPRCG in Figure 5-2) connected to the edge router with the

same auxiliary links, so as to have all computational resources attached to the network.

We also add a dummy node as a sink node in the local graph, such that resource links

at the opposite end of the MPRCG will connect to this dummy node. Thus, all resources,

whether network or local computational resources, are connected to provide a single

MRRM graph. To fulfill a user job scheduling request, the scheduling algorithm then

attempts to find a single or multiple path from the source sink node to the destination

sink node. This can provide as many resources from all resource types as the user

specifies for job completion.

Finally, without loss of generality, we assume that in one site, its local computational

resources can either be all fully accessible or all partly accessible.

5.2.2 Data Structures

Another important aspect of our resource model is its temporal representation,

which describes time-varying resource quantities. The options are to either consider

time as divided into equal size slots as is done in [12, 19, 23, 58] or to consider time as

being continuous as in [32, 46–48, 52]. The slotted time model uses an array for each

edge in the graph to record its resource status for each time slot. For example, we may

use a two dimensional array R such that R[d , t] gives the disk space available on hard

disk d in slot t.

In the continuous time model, the status of each resource unit is maintained using a

time-resource list (TR list) that simulate the TB List in the previous chapters. we chose

continuous time model with TR List to present the time domain and the corresponding

resource quantity. The advantage of continuous time domain are: (i) continuous time

model is a more natural representation for the time. The change of link bandwidth may

apply to any point in the time domain, which is unable to be described in the slotted

time model. (ii) there is no need to pick a time granularity or to place a bound on the

105

length of the look ahead period. (iii) The amount of memory required to represent a

link state (i.e., the TR list) is only a function of the time variation in resource availability

rather than the length scheduling horizon T or the length of the unit time. Moreover,

the run-time of reservation algorithms is a function of the size of the TR lists. This

size depends on the number of tasks that have been scheduled. The limitations of the

continuous time model include its relative complexity. If array is used, the complexity

of determining the status of a link at any given time will be O(log |TR[l ]|) using binary

search. Meanwhile, the same operation in slotted time model only takes O(1) time.

Because of the correspondence between a slot and time, we often use the two terms

interchangeably.

We also employ two other data structures: Steady Stage and ST(Start Time) List.

Steady Stage from the previous chapters. The whole MRRM ’s changing status over

time can be shown as a series of steady stages. Recall that the capacity of a link is

presented by a TR list, which is a array of time-resource tuples. If only consider the time

part of the TR list, t0, t1, t2...tq, any time interval [ti , ti+1] forms a steady stage of that

link. Hence, if we union the time parts of each resource link’s TR list together, we will

obtain a global time list. Let T0 < T1 < T2... < Tn be the distinct values in that time list.

we note that any time interval [Ti ,Ti+1] from that global time list is a steady stage for this

whole model. This is because no resource link would change its status during this time

interval. If there is any link changes its resource, there would be another value T ′i in the

list such that Ti < T ′i < Ti+1.


For the MRFS problem, the user’s request to have job start as early as possible

within a specific time window. If the request is feasible at some time within the

reservation window, the earliest time t will be reported to the user and the corresponding

resources will be reserved, or else, the request will be rejected.

106

For a MRFS request, the user wants to have the resource ready as early as

possible within a Reservation Time Window. This window specify the earliest start

time and latest finish time of the user’s job. A MRFS request is a 6-tuple (s, d , dur ,

ResWin < st, et >, RV < R0,R1,R2... >, shareable). s and d are the source and

destination node of the data transfer, the computational resources attached to s and

d are the computational resources that are going to be reserved. dur is the duration

that those resources needs to be reserved. ResWin < st, et > is the reservation

window, user job must start and end within this window, which means the job’s start

time must be in the time interval [st, et − dur ]. RV is a vector that contains the all

resource requirements. Each element in RV specifies the amount of one certain type

of resources that need to be reserved. Conventionally, we use the first element RV [0]

to present the required bandwidth from s to d . Shareable is a Boolean tag that indicate

whether the job’s workload can be split among different resource units. If shareable is

true, then the user request can be fulfilled with aggregated computational resources and

multiple network paths. Otherwise, we can reserve multiple resources units in each local

resource stage and multi-paths in the network.

For a MRFS request to be accepted, the scheduler needs to find out the minimum

t ∈ [st, et − dur ], such that within [t, t + dur ], a single path/multipath that connects s

and d can provide a bandwidth/aggregated bandwidth of RV [0]. Also, within the same

time interval, in each resource stage that attached to s and d , at least one resource

unit/compatible set can provide enough resource/aggregated bandwidth to fulfill the

corresponding requirement.

5.4 Multiple Resource Scheduling Algorithm

Based on whether job can be split and whether the local resources are fully

accessible, we divided the MRFS problem into 4 sub-problems:

107

WS-RC: The workload can be split, and local computational resources are

constrained on accessibility, which means resources in one layer can only access

part of the resources in its neighborhood layer.

WN-RC: The workload cannot be split, but local computational resources are

constrained on accessibility.

WS-RN: The workload can be split, and local computational resources are not


WN-RN: The workload cannot be split, but local computational resources are not


For the MRFS scheduling problem, every time a request arrives at the centralized

controller, the centralized controller will select the corresponding scheduling algorithm

according to the shareable flag in the request and the local resources’ accessibility. This

algorithm will compute the earliest finish time that fulfill the job’s resource and duration

requirement. In this section, we will give the detail of all 4 algorithms that correspond to

the 4 sub-problems proposed above.

5.4.1 WS − RC Scheduling Algorithm

If the workload can be split, then given request ri and the multi-resource graph

G , the algorithm discovers the maximum flow from source’s sink to destination’s sink

for each basic interval within the reservation window. The scheduler then attempts to

identify (a) if there are enough resources within the current basic interval BIi [if true,

BIi is marked as feasible]; and (b) whether or not there exists one consecutive (i.e.,

temporally connected) sequences of basic intervals [BIi , BIi+1, ..., BIj ] with total length

longer than the required duration dur . If such a sequence is found, then the earliest

possible start time of the sequence becomes the first possible start time of the user’s

request.

When running the maximum flow algorithm in our multiple resource graph, the

original algorithm can not be applied directly. Imaging the multipath from s ’s sink to d ’s

108

sink, it may go through resource links with various types. The flow alone such multipath

will not make any sense on the resource capacity from s to d . To solve this problem,

we scale the link capacity of certain types’ resource links. For each single request, we

will produce a copy of current model, let’s say G ′. Then, we will chose the requirement

of one resource as basevalue, and all the other resource links with different type will

scale their capacity numerically by the ratio of their corresponding requirements to

the basevalue. For example, we chose network bandwidth as our base resource. Let

bandwidth’s request be 10(MB/s) and CPU’s request be 5(GHz). In this case, we

will scale all the CPU resource links’ capacity by a factor of 2 and we also need to set

the CPU to requirement from 5GHz to 10GHz . In this way, we achieve the numeric

unification among different type of resources, which enables the traditional maximum

flow to run directly on our scaled graph. After the flow computation, the flow size is

compared with the basevalue, if flow size is larger than basevalue, current steady stage is

considered to be feasible for the request. The detail of WS −RC Scheduling Algorithm is

shown below.

Here we use the min-cut algorithm [6] to solve the max-flow problem. The

complexity of the above WS − RC scheduling algorithm is O(|SSRW | ∗ N3), where

|SSRW | is the size of steady stage list what is within the reservation window and

N = NN + Ns + Nd . NN is the number of nodes in the network. Ns and Nd the number

of local computational resources attached to s and d . We also note that any path in the

MRRM model can be divided into 3 separate parts: the path through the local resources

attached to s, the network path and the path through the local resources attached to

d . These 3 parts are independent with each other, so the flow on these 3 parts can be

calculated separately. The global maximum flow equals to the minimum of the three.

With this heuristic, the complexity of WS − RC scheduling algorithm can be reduced to

O(|SSRW | ∗ (Ns3 + NN

3 + Nd3).

109

WS-RC Scheduling (reqi , G ){

Randomly choose resource rk as base resource;G ′ = Scale(G , rk);Build the global time list L from G’, remove all the Ti outside the scheduling window

reqi .ResWin;Identify all the Steady Stages;For each Steady Stage SSi{

MFi = MaxFlow(G ′,SSi);if(MFi ≥ basevalue)

Mark SSi as a feasible Steady Stage;}

Traverse the Steady Stage list again. find out the first consecutive feasible steadystages list which is longer than reqi .dur ;

if (such list exist);accept the request;set list’s the start time as request’s start time;

elsereject the request;

}

Figure 5-3. WS − RC Scheduling Algorithm

5.4.2 WN − RC Scheduling Algorithm

In the WN − RC case, the workload can only be transferred on a single path in the

network and can only be processed using a single unit of each type local computational

resource. In our MRRM model, scheduling a WN − RC is to find a single path from

source’s sink to destination’s sink. Here, we will use the Extended Bellman-Ford

algorithm proposed in [52] to solve this problem.

First, we will extend the concept of an ST list for an edge a path. Let st(k , u) be

the union of the ST lists for all paths from vertex s to vertex u that have at most k edges

on them. Clearly, st(0, u) = ∅ for u ̸= s and we assume st(0, s) = [0,∞]. Also,

st(1, u) = ST (s, u) for u ̸= s and st(1, s) = st(0, s). For k > 1 (actually also for k = 1),

110

we obtain the following recurrence

st(k , u) = st(k − 1, u) ∪ {∪v :(v ,u) is an edge

{st(k − 1, v) ∩ ST (v , u)}} (5–1)

where ∪ and ∩ are list union and intersection operations. For an n-vertex graph, st(n −

1, d) gives the start times of all feasible paths from s to d . The Bellman-Ford algorithm

[51] may be extended to compute st(n − 1, d).

It is easy to see that the computation of the st(∗, ∗)s may be done in place (i.e.,

st(k , u) overwriting st(k − 1, u)) and the computation of the sts terminated when

st(k − 1, u) = st(k , u) for all u. With the above observation, here we give the detail of

Extended Bellman-Ford algorithm.

Each iteration of the for loop takes O(L) time, where L is the length of the longest

st list. Since this for loop is iterated a total of O(N ∗ E) times, the complexity of the

extended Bellman-Ford algorithm is O(N ∗E ∗L), where N and E is the number of nodes

and links in the whole multiple resource graph.

When using the extended Bellman-Ford algorithm to solve the first slot problem,

we first find the earliest start time t for a feasible path using ExtendedBellmanFord .

Then, the actual path may be computed using BFS where the feasibility of each link is

computed by fixed the job’s tstart = t and tend = t + dur . Also BFS guaranteed to find the

shortest feasible path in the graph.

5.4.3 WS − RN Scheduling Algorithm

In the WS − RN scenario, the WS − RC scheduling algorithm can directly be

applied. However, this does not exploit the full accessibility of local resources and the

MPRCG graph although this would greatly simplify our scheduling process for the local

computational resources. The WS − RN algorithm proceeds as follows.

Firstly, the network path computation proceeds separately from local resource path

computation. When computing the feasibility of a given basic interval, the max-flow

111

Extended Bellman-Ford(s,d)

{

initialize st(∗) = st(0, ∗);// compute st(∗) = st(n − 1, ∗)put the source vertex into list1;for (int k = 1; k < n; k++)

{

// see if there are vertices whose

// st value has changed

if (list1 is empty)break; // no such vertex

while (list1 is not empty){

delete a vertex v from list1;for (each edge (v , u)){

st(u) = st(u) ∪ {st(v) ∩ ST (v , u)};if (st(u) has changed and

u is not on list2) add u to list2;}

list1 = list2;make list2 empty;

}

}

}

Figure 5-4. Extended Bellman-Ford algorithm

algorithm is applied to network resources in the same manner as in the WS − RC

scenario. In contrast, for local resources, a feasible reservation is computed by

checking, for each resource partition, whether any set of compatible resources can

provide sufficient resources to satisfy the user’s job request. Given each resource link in

a resource partition, we first group a given link with all other compatible resources, then

compute the aggregate TR list for this compatibility group. Then, the corresponding ST

list is computed for every compatible set in the partition, and these ST lists are unioned

to produce a large ST list for the current partitions. Finally, we intersect the network’s

start time list with the ST lists specific to all local resource partitions, to determine

availability of a start time.

112

In this case, we do not need to scale our multiple resource graph and we do not

need to run the max-flow algorithm on the local computational resources. Thus, we

only check each local resource stage for each corresponding request by visiting each

resource link for O(1) times. Since the list union and intersection can also be finished in

linear time, we can reduce the algorithm run time to O(|SSRW | ∗ (Ns + NN3 + Nd).

5.4.4 WN − RN Scheduling Algorithm

The WN − RN problem is similar to the previous WS − RN problem. In particular,

a network path can be computed via the Extended Bellman-Ford algorithm to yield the

first start time. This computation is followed by breadth-first search to identify the path.

For local computational resources, we can apply the same approach as in WS − RN.

However, WN − RN neglects the grouping of resources according to compatibility. Since

the requested job cannot be split, only one resource unit in each resource partition is

required. In that case, the algorithm’s complexity is bounded by O(NN ∗EN ∗L+Ns +Nd)

5.5 Evaluation

5.5.1 Evaluation Environment

We tested our work on a USNET simulator at Oak Ridge National Laboratory

(ORNL). Our algorithms were integrated with the middleware that is currently used

at ORNL and University of Memphis, in order to co-schedule network, processor and

storage resources simultaneously. However, the current testbed at ORNL has limited

storage and compute resources. Thus, the main test objective was to ensure that the

network bandwidth scheduling capability correctly reserved bandwidth on the software

testbed. Based on extensive testing, our algorithms were able to effectively reserve

bandwidth, as well as generate reservation instructions for the virtual processor and

storage systems that we assumed to be present on the USNET end nodes.

We measured the performance of the 4 multiple resource scheduling algorithms

on random generated network. For our experiments, the random networks we tried

had 100, 200, 300, 400 or 500 nodes and the out-degree of each node was randomly

113

selected to be between 3 and 7. To ensure network connectivity, the random network

has bidirectional links between nodes i and i + 1 for every 1 ≤ i < n, where n is the

number of nodes. The bandwidth of each link in a randomly generated network was

randomly selected from the set 50Mbps (OC1), 155Mbps (OC3), 620Mbps (OC12) and

1000Mbps (1G Ethernet).

After the network is generated, 10% of its nodes are randomly selected as edge

routers. Computational resources are generated and attached to these edge routers.

For each local site, we assigned 3 types of computational resources: CPU, Memory

and Disk. Each type of resource has its own resource stage and all stages contain the

same amount of resource units. The amount of resources within a resource stage is

randomly selected from 15-20. A resource link has a randomly assigned initial capacity

and class-ID. The number of different compatible set in one stage is randomly chosen

form 2 to 5. If resources are fully accessible, corresponding dummy nodes and links

will be added to connect the resource stages. If resources are partly accessible, the

accessibility among resources from stages are randomly determined. However, the

basic connectivity is guaranteed: each CPU link is guaranteed to access at least one

memory link and each memory link is guaranteed to access at least one hard disk link,

and vise versa.

Each of our experiments started with a clean resource graph and continued for 100

units of simulated time. So, for example, with a request density of 5 requests/second,

a single experiment would schedule a sequence of approximately 500 requests. We

measured the computation time and acceptance ratio by which all the requests in the

sequence finish.

5.5.2 Results and Observations

Figures 5-5A and 5-5B provide the average acceptance ratio and algorithm run

time as a function of request density. The result is acquired using a 100 node random

network with 15 randomly selected local resources sites. Figures 5-6A and 5-6B provide

114

A Acceptance ratio vs. request density B Acceptance ratio vs. network size

Figure 5-5. Our co-scheduling algorithms’ performance on acceptance ratio.

8000

10000

12000

Tim

e(m

s)

WN RF

0

2000

4000

6000

1 2 3 4 5 6 7 8 9 10

Algorithm

Run

Request Density (req/sec)

WN RP

WS RF

WS RP

A compute time vs. request density

5000

6000

7000

8000

Tim

e(m

s)

WN RF

0

1000

2000

3000

4000

100 200 300 400 500

Algorithm

Run

network size (node number)

WN RP

WS RF

WS RP

B compute time vs. network size

Figure 5-6. Our co-scheduling algorithms’ performance on converge speed.

the average acceptance ratio and algorithm run time as a function of network size. The

result is acquired under a request density of 3 requests per second.

Our experiment results show the following:

1. The increase of request density will degrade every algorithms’ performance. Asmore request come into the system within the same time interval, the networkbecomes congested, hence more requests were rejected since not enoughresources are available. In the mean time, as more requests are running in thesystem simultaneously, the length of Global time list and ST list increases, whichleads to longer algorithms’ run time.

2. As the network size increases, the system gains larger capacity to afford morejobs running within certain time period. So the requests’ acceptance ratio actuallyincreased together with the network size. However, the increase of network sizemakes the max − owalgorithm and EBFalgorithm take more time to converge. So,the algorithm run time still increased.

3. When multiple paths are allowed and resources are fully shared, the schedulercan better utilize system resources, so as to accept more requests. However, the

115

resulting multi-path algorithms require more computation time to obtain a feasibleresult.

4. Generally speaking, all 4 algorithms scales very well with either system size orrequest density. Even when the workload is high, one request averagely takes lessthan a minute to find out scheduling result.

116

CHAPTER 6SCHEDULING IN TIME-DOMAIN WAVELENGTH INTERLEAED NETWORKS


Time-domain Wavelength Interleaved Networking (TWIN) is an optics-based

transport network architecture that aims to provide cost effective optical grooming

[43, 50, 65]. Traditional optical networks work in one of the following two modes: optical

circuit switching (OCS) or optical packet switching (OPS). In OCS networks, the finest

bandwidth granularity offered by an optical switch is at a wavelength level, i.e. one single

wavelength on a fiber can be used by only one end-to-end traffic and cannot be shared

with other traffic. This is not effective when the traffic demand is much lower than the

wavelength capacity. At the other extreme, OPS networks permit sharing of optical links

by traffic with different sources and destinations. These networks, which are enabled by

optical-electronic-optical (OEO) conversion at each node in the network, tend to incur

a relatively high system cost and transmission delay, as OEO converters are generally

expensive and the conversion process is time-consuming comparing to direct circuit

switching. Some techniques have been introduced to improve the utilization of optical

links by simulating OPS over OCS, such as optical burst switching (OBS) [60]. However,

OBS still needs high-speed optical switches and a contention algorithm at each switch.

Widjja et al. etc. proposed TWIN to overcome link utilization problems in OCS but

avoid the high cost and delays resulting from OEO converters being deployed at all the

optical switches [65] . TWIN performs optical grooming only at its edge switches and the

network core is purely based on passive wavelength-selective switches (WSS) that route

the wavelengths from their ingress ports to the appropriate egress ports [55]. In a TWIN

network, the edge nodes can be either sources or destinations. A transmitter with a

multi-frequency laser is located at each source node. With this transmitter, source nodes

can change the wavelength of their optical signal in sub-nanoseconds [34]. Source

nodes collect data units from various clients and assemble data units for the same

117

destination into one burst. When sending the burst, the source changes its fast-tunable

laser to the wavelength uniquely assigned to that destination. The intermediate nodes

route optical bursts based purely on the wavelength of the burst. When the burst is

received at its destination, it is disassembled and forwarded to the corresponding clients.

As current optical switches cannot separate the bursts that share the same wavelength,

only traffic with the same destination may share a wavelength in the time-domain.

This constraint leads to tree-like routes in the network for every destination, where the

destination is the root and the sources are the leaves.

In this chapter, we discuss the wavelength assignment problem for TWIN networks

(TWIN-WA). We solve this problem using a two-phase process: Tree Construction

and Tree-Wavelength Assignment. Tree-Construction groups the traffic demands with

the same destination together and constructs the corresponding destination trees.

Tree-Wavelength Assignment process assigns wavelengths to the destination trees

constructed in the previous step. The goal of the Tree-Wavelength assignment phase

is to minimize the total number of wavelengths needed to accommodate the traffic

demands.

We show that the minimum number of destination trees can be constructed using

a greedy approach in the Tree Construction phase. For the Tree-Wavelength Assign-

ment problem, we prove its NP-Completeness by reducing the Graph-Coloring problem

to it. We propose a greedy strategy that matches destination trees and wavelengths

one by one. We also proposed two tree sorting methods and two wavelength sorting

methods to regulate the order of tree-wavelength matching. When different tree sorting

and wavelength sorting methods are applied to the tree-wavelength assignment

scheme, four heuristics are presented: MC-BF, MC-MF, MP-BF and MP-MF. Extensive

simulations are conducted to evaluate the performances of these heuristics. The

results show that performing sorting on destination trees and wavelengths improves

the assignment results, especially under low traffic loads. However, performing sorting

118

brings some extra overhead to the sort heuristics’ running time, but overall computation

costs remain acceptable. In large topologies with heavy workload, the heuristic without

any sorting becomes competitive as it can provide similar scheduling performance with

much less computational cost.

The rest of this chapter is organized as follows. In Section 6.2, we discuss

related work. In Section 6.3, we explain the TWIN architecture in detail and define the

TWIN-WA problem formally. In Section 6.4, a greedy algorithm for Tree-Construction is

presented. In Section 6.5, we prove the NP-Completeness of Tree-Wavelength assign-

ment problem and four heuristics are discussed. Section 6.6 presents an experimental

evaluation of the four heuristics for Tree-Wavelength assignment. Section ?? gives the

conclusions.

6.2 Related Work

Optical Circuit Switching (OCS) with wavelength-dimension multiplexing (WDM)

[30] provides the most economical solution for high speed optical networks. However,

the inflexible routing scheme and coarse multiplexing granularity make it only suitable

for the long-lived large bulk data transfers. On the other hand, Optical Packet Switching

(OPS) [59] and Optical Burst Switching (OBS)[13] have been proposed to provide

sub-wavelength scheduling granularity and the capability of dynamic routing. However,

the ultra-high speed optical-electronic-optical switches that are required in the OPS/OBS

networks are normally expensive and difficult to maintain. The high cost in deployment

and maintenance inhibit the use of OPS and OBS in modern networks.

Time-domain Wavelength Interleaved Networking (TWIN) has been proposed to

fill the gaps between OCS and OPS/OBS. The architecture of TWIN is introduced in

[65]. The goal of TWIN is to provide sub-wavelength granularity for traffic scheduling

without using expensive high speed optical switches in the networks. TWIN achieves

this by only allowing light paths with the same destination to share a wavelength. As

the traffic on the same wavelength will not be split again, the economical switches

119

used in OCS networks are able to route the bursts in TWIN networks. TWIN brings

new challenges to traditional optical scheduling approaches. [53] presents some

basic ideas in the routing the burst scheduling in TWIN networks and proposed a

performance measurement framework. [50] investigated the optical burst scheduling

problem in TWIN networks. They show that achieving the maximum throughput with

zero propagation delay is equivalent to the optimal matching problem in bipartite graphs.

They also demonstrate that even when propagation delay is non-negligible, a factor-2

approximate scheduling algorithm exists to maximize the throughput. Meanwhile, [66]

focused on the providing better QoS in TWIN networks. They introduced an Integer

Linear Programming formulation that minimize the queueing delay of the optical bursts.

They also proposed the Destination Slot Set (DSS) algorithm to approximately solve the

problem within reasonable time.

In this chapter, we focus on the wavelength assignment problem for TWIN networks.

Traditional wavelength assignment strategies take the available wavelength number as

the main constraint. However, as fractional wavelength is allowed, and the general traffic

flow is assumed in sub-wavelength level, TWIN wavelength assignment (TWIN-WA) is

relaxed from the integer capacity constraint in traditional networks. The main concern in

TWIN-WA is the conflict on topologies among multiple destination trees when they share

one wavelength. Moreover, instead of assigning wavelength to each light path, TWIN

assigns wavelengths to a destination tree. Traditional wavelength assignment problem

are normally equivalent to the Bin-Packing problem [1]. However, TWIN networks’

wavelength assignment problem is a variation of the Graph-Coloring problem [3], as

shown in Section 6.5.1.

[23, 33, 40] together provide a summary on the existing wavelength assignment

strategies. The most popular wavelength assignment strategies are First-Fit and Best-

Fit, where the wavelengths are matched with the request according to a random order or

to their remaining capacities. [33] proposed a deferred wavelength assignment strategy

120

for optical networks with wavelength converter provides.This strategy improves the

request accepting rate by deferring the wavelength assignment from scheduling time

to actual job start time. [38] proposed the least-conversion assignment scheme that

attempts to reduce the wavelength conversion overhead in sparse wavelength converter

networks.

In TWIN networks, the simplest wavelength assignment strategy is to assign each

destination tree an individual wavelength. However, this strategy requires that the

number of available wavelengths be equal to the number of destination trees. [43]

investigates a scenario where the number of available wavelengths are less than the

number of destination nodes. A wavelength reuse scheme is proposed as an extension

to TWIN. It allows multiple destination trees to share a wavelength using Time-Domain

Multiplexing. To avoid the collisions among traffic flows, the sources nodes that belong

to different destination trees should work in a time-sharing manner and a comprehensive

burst scheduling algorithm is needed. To provide fairness among trees that share

a wavelength, the data buffer for each source node is monitored. A destination tree

will contend for a a certain wavelength when the length of its source nodes’ input

buffers grows beyond a certain threshold. As the input data rate is always assumed

to be less than the link capacity, no destination tree will need to keep occupying a

wavelength and sending data. Our algorithm considers the wavelength reuse problem in

a different direction: multiple destination trees share one wavelength if their topologies

are compatible. Once the wavelength is assigned, the source nodes can work in a

full load to transmit the burst without worrying about the flow collisions, which greatly

simplifies the burst scheduling.

6.3 Network Model and Problem Definition

On the spectrum of optical networks, TWIN networks reside between the OCS

and OPS networks. Compared to these traditional optical networks, the TWIN poses

following new features:

121

1. Similar to traditional OCS networks, TWIN’s data bursts travel along the lightpath using a pre-assigned wavelength. However, a wavelength can be shared bymultiple traffic flows from different sources, only if their total flow size does notexceed the wavelength capacity.

2. As current optical switches cannot separate the bursts that share the samewavelength, traffic on one optical link has to be routed to the same destination ifthey are using the same wavelength. So TWIN light paths with same destinationare grouped together as a tree structure where the destination is the root and thesources are the leaves. Wavelengths are assigned to each of these trees, ratherthan to a single light path.

Figure 6-1 shows a simple TWIN network. Two source nodes, S1 and S2 are

sending traffic to two destination nodes, D1 and D2. A 5-node communication network

connects the sources and the destinations. In the network, there are 4 different light

paths: (S1,D1), (S2,D1), (S1,D2) and (S2,D2). Before the wavelengths are assigned,

these light paths are grouped into 2 tree structures according to their destination,

denoted as T1 =< (S1,S2),D1 > and T2 =< (S1,S2),D2 >, respectively. D1 is the

root of T1 while D2 is the root of T2. The network contains two wavelengths: W1 and

W2. Each destination tree has to be assigned to a wavelength before the transmission

can start. In the simplest case, T1 is assigned wavelength W1 and T2 is assigned

wavelength W2. During the transmission, S1 and S2 interleave their traffic to D1 and

D2 by tuning the color of their laser to the corresponding wavelength. For each node

in the communication network, a routing table is maintained to indicate the outgoing

port for different wavelengths. When the traffic arrive at the internal switches, routing is

performed using only the rules in the routing table and the color of the incoming bursts.

This guarantees that optical bursts of a given wavelength will be routed to the intended

destination. For example, in Figure 6-1, node a must combine the traffic from node S1

and d on wavelength W1 and forward is to the link that connects to node b, according to

the routing table. Node b, after receiving the bursts on wavelength W1, will forward them

to node D1, which is their destination.

122

Figure 6-1. An example of TWIN network.

For traffic whose required bandwidth is fraction to the wavelength capacity,

TWIN networks will greatly facilitate their scheduling by providing more flexible and

finer-grained routing and wavelength assignment scheme. Most optical networks

still use static routing in the high speed mode as changing the routes on-the-fly

incurs very high overheads. Therefore, in this paper, we also assume that the path

for each source/destination pair as pre-computed, and focus our research on the

wavelength assignment problem for TWIN networks (TWIN-WA). Given a TWIN network

G =< V ,E >, a traffic demand is defined as r = (s, d , bw), where s, d ∈ V is the source

and destination node of the traffic flow, and bw ∈ (0, 1] is the fraction of the wavelength

capacity required. TWIN-WA takes a set of traffic demands R as input. The goal is to

accommodate the all demands r ∈ R using a minimum number of wavelengths.

As described in Section 6.1, TWIN-WA is solved using a 2-step process. In

the Tree-Construction phase, we construct the destination trees and in the Tree-

Wavelength assignment phase, we perform the wavelength assignment. During

the Tree-Construction phase, the traffic demands in R are grouped together by their

destination. In each group, the corresponding light paths are merged together to form

a destination tree. As fractional job assignment is allowed, a simple greedy algorithm

will generate the destination tree set with minimum size. Tree-Wavelength assignment

algorithms assign each destination tree a wavelength. We show that finding an optimal

123

assignment that uses the fewest number of wavelength is NP-Hard. Several heuristics

are then proposed for wavelength assignment.

6.4 Tree Construction

In a TWIN network G < V ,E >, a destination tree for node Di is denoted as

T (Di) = (< S >,Di), where < S > is the set of all source nodes in T (Di). Given the

set of traffic demands R, we need to first construct destination trees from the light paths

before we can actually assign the wavelength. This process is called Tree-Construction.

The goal of this process is to minimize the total number of the destination trees in the

result set T (D).

Figure 6-2. An example of TWIN Tree Construction.

In this chapter, we allow the traffic request (s, d , bw) to be partitioned into multiple

sub-requests that can be assigned to different destination trees. This is reasonable

as most modern optical switches are capable of transmitting/receiving data bursts on

different wavelengths simultaneously. As long as the destination nodes are capable

of package ordering and re-assembly, fulfilling one request with multiple data flows is

totally feasible. On the other hand, if we simply merging all the light paths with the same

destination, the resulting destination tree may not be admissible to the network, as the

total flow size for one destination may exceed the wavelength capacity. Figure 6-2 shows

an example of tree construction. Three source nodes, S1, S2 and S3, are to send data

124

TreeConstruction(G, R)

{

results = ; Group the traffic demands according to their destinations.for (each destination group DG(Di)){

Initialize a new destination tree Tj(Di).Tj(Di).capcity = wavlengthcapacity .curTree = Tj(Di).for+ (each traffic demands r in DG(Di)){

Merge the light path from r .s to r .d into Tj(Di).if(r .bw < curTree.capacity )

curTree.capacity -= r .bw .else{

if(r .bw > curTree.capacity )Insert a new demand (s, d , r .bw − capacity) into DG(Di).

Add curTree into results.Initialize a new destination tree Tj+1(Di).curTree = Tj+1(Di).

}

}

}

return results;}

Figure 6-3. The greedy algorithm for Tree-Construction

to node D simultaneously. The data rate at each source node is 0.6. If we merge all

3 light paths into one destination tree, the total traffic on link (a,D) would exceed the

wavelength capacity. So the demands have to be split into two separate destination

trees, i.e. T0(D) = (< S1,S2 >,D) and T1(D) = (< S2,S3 >,D). Moreover, when

composing the destination trees, we should try to use up all the wavelength capacities,

as the unutilized capacity cannot be shared by other destination trees. Based on the

above observations, we proposed the folloing greedy algorithm to compute the minimum

destination tree set, as shown in Figure 6-3

Our Tree-Construction algorithm first groups the traffic demands according to their

destination. This can be done by simply scanning the demand set once. For each group,

125

the corresponding destination trees are constructed greedily. If adding the current light

path to the current destination tree would exceed its wavelength capacity, we split the

current request into two sub-requests. The first part joins the current tree and uses

all its remaining capacity. The second part starts a new destination tree into which we

attempt to merge the remaining paths in the current group. The optimality of this greedy

algorithm is obvious as the number of result trees is minimized for each destination

nodes. The time complexity of this tree construction algorithm is O(|V | ∗ |R|), where |R|

is the size of the traffic demand set and |V | is the number of nodes in the network, which

bounds length of all possible light paths.

6.5 Tree-Wavelength Assignment

When a data burst is ready to be sent out, the source node need to know which

wavelength it will use to transmit the burst for its intended destination. In TWIN

network, this is decided by the tree-wavelength assignment process. In traditional

optical networks, wavelengths are assigned to specific light paths. However, in TWIN

networks each destination tree is assigned a wavelength. In this section, we discuss

different strategies of assigning wavelengths to destination trees. The destination trees

are constructed in the previous tree constructing phase. Our goal is to minimize the

number of wavelengths that we use to accommodate all the trees. In Section 6.5.1, we

introduce the generic form of the tree-wavelength assignment problem and prove that

computing the optimal tree-wavelength assignment is NP-Hard. In Section 6.5.2, four

greedy heuristics are proposed to approximately solve the problem in reasonable time.

6.5.1 Generic Form of the Tree-Wavelength Assignment Problem

We note that in TWIN networks, two destination trees that share some links

cannot be assigned to the same wavelength, as the TWIN switches will not be able

to distinguish their traffic. So, trees that have common links are considered in conflict

for wavelength assignment. On the other hand, trees that do not not share any link

126

can be assigned the same wavelength without interference. Such trees are said to be

compatible.

Another observation for tree-wavelength assignment is that a destination tree may

be assigned more than one wavelength. That is, some source-destination paths may

use one wavelength while the other paths use a different wavelengths. In particular, we

can divide a destination tree into a compatible part and a conflict part with respect to a

current wavelengths that has already been assigned to some other trees, and assign

the compatible part to the current wavelength. Note that the split always starts from

the source nodes (leaf nodes), and ends at the destination (root). Since the destination

nodes is able to receive data flow from multiple wavelengths simultaneously, splitting

destination tree as described does not affect the correctness of the data transmission.

However, it provides more flexibility when we resolve the conflicts among destination

trees.

Based on the above observations, the generic form of the tree-wavelength as-

signment problem is as follow: Given a set of destination trees DT = (t0, · · ·, ti) on

a TWIN network G < V ,E >, minimize the total number of wavelengths that are

needed to accommodate all the trees in DT , without violating the following constraints:

1). Destination trees that share a wavelength should be compatible with each other.

2). Destination tree obtained from the tree construction phase is either assigned a

single wavelength, or split into several parts with each part being assigned to different

wavelengths.

Theorem 6.1. The above tree-wavelength assignment problem is NP-Hard.

Proof. We prove this by reducing the Graph-Coloring problem to the tree assignment

problem. Graph-Coloring is a well-known NP-Complete problem. Given a graph G <

V ,E >, we want to color all the vertices with a minimum number of colors such that no

two adjacent vertices have the same color.

127

We first construct a corresponding TWIN-WA instance based on a Graph-Coloring

instance G < V ,E >. For each node vi in G , we initialize a corresponding tree ti , which

only contains its root node ri . For each link (vi , vj ) in G , we insert a new edge (nij 1, nij 2)

to both trees ti and tj . We append this new edge to the last inserted node in the tree,

so the tree has a chain-like structure. Figure 6-4 gives a simple example. Node v1 and

v2 are adjacent in G . So we have edge (n12 1, n12 2) appended to nodes r1 and r2 for

trees t1 and t2 respectively. For the same reason, edge (n13 1, n13 2) is appended to node

n12 2 in t1 and node r3 in t3. After we finish the above steps for all links in E , we have a

destination tree set DT = (t1, t2, · · ·, tn). We construct a TWIN network Gt from DT by

merging the topology of all the trees in DT . In the example, we obtain a 7-node graph

Gt by merging trees t1, t1 and t3 in DT .

Figure 6-4. Reduction from Graph-Coloring problem to tree-wavelength assignmentproblem.

From the construction of DT and Gt , we can see that if two vertices vi and vj are

adjacent in G , tree ti and tj must have a common link nij 1 and nij 2, which means ti and

tj are in conflict in the tree-wavelength assignment process for network Gt . On the other

128

hand, if two trees ti and tj are in conflict for wavelength assignment, they must share

the edge from nij 1 to nij 2 and that edge is the only link that is common to both trees.

From the construction, there must be a link between vertices vi and vj in G . Meanwhile,

if the trees are all in the shape of a chain, as in our construction, splitting a tree brings

no benefit to the wavelength assignment process. Therefore, in the optimal assignment

for DT on Gt , every tree in DT is assigned to a single wavelength.

Now, let k be the minimum number of colors we need to color G and m be the

minimum number of wavelengths we need to accommodate all the trees in DT . Based

on the above observation, we claim that k = m. First, we show that k wavelengths is

sufficient, if we can color G using at most k colors. Our wavelength assignment scheme

is to assign tree ti the wavelength Wj , j ≤ k if the corresponding vertex vi in G is colored

using color Cj . Since ti will not be split, and all the vertices in G that are colored with

Cj cannot be adjacent to each other, we can guarantee that ti will be compatible to any

other trees that are assigned the wavelength Wj . Now, we show that G also can be

colored without conflict using at most m different colors, whenever m wavelengths are

sufficient for the constructed tree set DT . For each node vi in G , if its corresponding tree

ti is assigned to wavelength Wj , it will be colored with Cj . Since there is no conflict in Wj ,

nodes with color Cj will not be adjacent to each other in G . Therefore the color of vi is

valid.

From the above statements, Graph-Coloring can be reduced to the tree-wavelength

assignment problem in polynomial steps. So tree-wavelength assignment is a NP-Hard

problem.

6.5.2 Greedy Heuristics

In this section, we propose a set of greedy heuristics to compute an approximately

optimal assignment in reasonable time. These heuristics have a similar main process

when computing the wavelength assignment. However, they differ from each other in the

order the input destination trees and the existing wavelengths are assigned.

129

The main idea of our greedy heuristics is as follows. The destination trees in

DT are checked one by one according to the tree sorting order. A destination tree

is matched against already assigned wavelengths according to the wavelength sort-

ing order. For tree ti and wavelength Wj , if part of ti can fit into wavelength Wj , ti

is divided and a part of it is assigned the Wj . The rest of ti is then matched against

the wavelengths Wj+1 and so on. If all the in-use wavelengths together cannot

accommodate ti , a new wavelength is opened for the unassigned part of ti .

We propose 2 different approaches to sort the destination trees.

1. Most Conflicts Tree First (MC): The trees are sorted in decreasing order of to thenumber other trees in DT with which they have a conflict, denoted as CNi . Thisis sorting criterion is based on the idea that if we assign trees with more conflictsfirst, we may reach the minimum number of required wavelengths very quickly.Then, for those trees with less conflicts, there is a higher chance that they will fitinto the existing wavelengths.

2. Most Processed Tree First (MP): Let Pi be the number of conflicted trees of tithat have already been assigned wavelengths. Instead of choosing trees withlarger CNi values, we pick up trees that has higher Pi values. Each time after atree is assigned, the Pi values of all the unassigned trees are updated and the onewith the largest Pi value is chosen as the next tree to be assigned wavelengths.When multiple trees have the same Pi , the tie breaker will be the value of their CNi

value. The thought behind this ordering is similar to the MC ordering. Moreover,MP order is hoped to improve the MC order by keeping the priorities synchronizedwith the result of the existing assignments.

We also propose two sorting orders for wavelengths.

1. Best-Fit Wavelength First (BF): The in-use wavelengths are sorted in thedecreasing order of the number of links in the network that do not use thiswavelength. This order is updated every time a tree-wavelength assignment iscompleted.

2. Most-Fit Wavelength First (MF): Every time before a destination tree is beingassigned, the existing wavelengths are sorted by the size of the subtree they canaccommodate for the current tree. We measure the subtree size by counting thenumber of source nodes that can be contained in the current wavelength. If onewavelength can hold a larger number of the source nodes and their correspondinglight paths, it will have higher priority during the matching. The wavelengthsre-ordering is triggered at runtime whenever the current tree is changed. Either asplit on the current tree, or a new tree is taken out from DT for assignment. We

130

also note that there is no need to completely sort all the wavelengths duringthe updates. The only wavelength we are interested in is the one that canaccommodate the largest subtree. Therefore we only need to find the Most-Fitwavelengths, rather than sort all wavelengths.

Combining the different tree sorting and wavelength sorting methods together, we

obtain 4 different heuristics for tree-wavelength assignment : MC-BF, MC-MF, MP-BF

and MP-MF. The complexity of each of our heuristics is as follows:

1. MC-BF: Let |V | be the number of vertices in the TWIN network and |T | be thenumber of destination trees in DT . When we determine the conflicts betweeneach pair of trees, it takes O(|V |) time as each tree may contain at most |V | − 1edges. Since every pair of trees in DT is checked, counting the conflicts for thewhole DT set takes O(|V | × |T |2) time. The sorting takes another O(|T | log(|T |))time. Therefore computing the MC order takes O(|V | × |T |2) time. Duringthe assignment process, the maximum number of wavelength needed is |T |.So the number of matches for each destination tree is O(|T |). For each treewavelength pair, it takes O(|V |) time to match them. So the processing timefor one single destination tree is bounded by O(|V | × |T |). To maintain theBF order, we need to update the wavelength capacities and sort them. It takesanother O(|T | log(|T |)) time. So the overall processing time for one destinationtree is O(|V | × |T | + |T | log(|T |)). The total complexity for MC-BF algorithm isO(|V | × |T |2 + (|V | × |T |+ |T | log(|T |)× |T |) = O(|V | × |T |2).

2. MC-MF: To find the Most-Fit wavelengths to the current tree, we need to matchthe tree against all the wavelength, This takes O(|V | × |T |) time. A destinationtree will split at most |V | − 1 times during the assignment, so O(|V |2 × |T |) timeis taken to process one destination tree. The overall complexity for MC-MF isO(|V | × |T |2 + |V |2 × |T |2) = O(|V |2 × |T |2)), where O(|V | × |T |2) is the MCsorting time and O(|V |2 × |T |2) is the tree-wavelength matching time.

3. MP-BF: If the tree order is updated dynamically, extra O(T ) operations are addedto the processing of each destination tree. However, these extra operations do notchanged the asymptotic complexity for the tree-wavelength matching process. Theoverall MP-BF complexity is the same as for MC-MF : O(|V | × |T |2).

4. MP-MF: Similar to MP-BF, the extra operations required to maintain the MP orderis dominated by the other tree-wavelength assignment operations. Thus, theseextra operations do not affect the asymptotic complexity of MP-MF, which is stillO(|V |2 × |T |2).

131

6.6 Evaluation

6.6.1 Experimental Framework

In this section, we measure the performance of the wavelength assignment

heuristics described in Section 6.5 and evaluate how different sorting schemes affect

the performances in various scenarios. Besides comparison on the optimality of the

their assignments, we also measure the execution time of each heuristic and study

how execution time varies with network size and workloads. We implemented a no-

sort version of the greedy heuristics that does not do the sorting steps for either the

destination trees or the wavelengths. By comparing the no-sort heuristic with the ones

we proposed in Section 6.5.2, we can investigate the impact of the sorting steps. For

every test case, we also provide a lower-bound for the optimal solution (LB). The

lower-bound is computed by counting the occurrences of each network links in all

destination trees. The maximum count among all the links is the lower bound for the

minimum number of wavelengths we need. With this bound, we can estimate how well

our heuristics can do in the experiments.

To simulate a optical network, we use a 25-node mesh-torus topology, a real world

19-node MCI network (Figure 6-5) and several randomly generated topologies. For

randomly generated topologies, we set the out-degree of each node to be a random

integers between 5 and 7. To ensure network connectivity, the random network has

bidirectional links between nodes i and i + 1 for every 1 ≤ i < n, where n is the number

of nodes. Since the test results from MCI and Mesh topology are very similar to each

other, in this chapter we only present the results from MCI and Random topologies.

The traffic demands are also synthetically generated. Each request is described by

a 3-tuple (s, d ,BW ). We first identify the sets of source nodes and destination nodes

from the all graph vertices V . In the experiments, we mark 40% of the vertices in V as

source nodes and another 20% nodes as destination nodes. The remaining 40% nodes

are served as communication nodes in the network. The process of marking nodes

132

A MCI B Mesh

Figure 6-5. Network Topologies

is totally random. The source s and destination d are then selected using a uniform

random number generator from the respective sets so that the workload is distributed

uniformly among different node pairs. The required flow size BW is generated using

a chopped Normal Distribution N (0.1, 2.5 × 10−3). Using this distribution, about 96%

of the flows sizes are in the interval (0, 0.2). Generated flow size are discarded if its

value is outside the range (0, 1). As the expectation of traffic demands is only 0.1, most

admissible destination trees generated will comprise multiple light paths.

For each test case, the maximum number of traffic demands is bounded by the

number of source-destination pairs. This number is denoted as MaxLoad. For example,

in a 100-node random network, if we mark 40% of the nodes as source nodes and 20%

nodes as destination nodes, we will have at most 800 different source-destination pairs.

That would be the maximum number of light paths that we need to handle in the test

case. During the experiments, our workloads are varied from 20% of MaxLoad to 100%

of MaxLoad.

6.6.2 Evaluation Results

Figures 6-6 and 6-7 present the evaluation results for our wavelength assignment

heuristics under various traffic loads in MCI and random networks. In the experiments,

we produce traffic loads that are 20%, 40%, 60%, 80% and 100% of the MaxLoad. From

the experimental results, we make the following observations:

133

Figure 6-6. The performances of wavelength assignment heuristics under differentnumber of requests in MCI network.

Figure 6-7. The performances of wavelength assignment heuristics under differentnumber of requests in 100-node random topologies.

1. All 4 heuristics that we propose in Section 6.5.2 generate better assignments thanthe no-sort heuristic in all test scenarios. These heuristics outperform the no-sortheuristic with more obvious margins in the light traffic loads (less than 60%) than inheavy traffic loads. This shows that the sorting the trees and wavelengths providesmore help to the wavelength assignment when the network is less occupied. Whenthe networks links are saturated, rearrange the order of match will not be able toimprove the scheduling much.

134

2. Among the four greed heuristics, MP-MF heuristic gives the best performancein all test cases. Regarding the sorting methods for the destination trees, theMP heuristics provides better assignments than the MC heuristics. This meansadjusting the tree order dynamically provides more reasonable matching ordersduring the tree-wavelength assignment. On the other hand, the MF heuristicsoutperform the BF heuristics when the workload is high (more than 80%).However, when the workload is less than 40%, the performances of MF heuristicsand BF heuristics are comparable. This shows that when the traffic load is high, amore careful choice on the wavelength, like MF, is necessary to provide a betterassignment. When there are plenty of resources available, a relatively crudesorting, like BF, is sufficient.

3. When the traffic load is light, the sorted heuristics provide a results close to thelower bound, i.e. a very good approximation on the optimal solution. When trafficload is high, the assignments from the heuristics are relatively far away from thelower bound. However, this does not necessarily mean that the heuristics cannotapproximate the optimal solutions under high workloads, as the lower bounds maynot tightly bound the optimal solutions when traffic load is high.

4. The number of wavelengths needed increased with the traffic load. In smallnetworks like MCI and Mesh, the need for extra wavelengths increases faster thanin large random networks. The reason is that it is less likely to find disjoint lightpaths for different source-destination pairs. When traffic load increases, conflictsare more frequent in small networks than in large networks.

Figure 6-8. The performances of wavelength assignment heuristics in random networkswith various sizes.

Figure 6-8 presents the performance of the heuristics on random topologies of

various size when the number of traffic demands is 800. We can see that with the

135

increase of the network capacity, fewer wavelengths are required to accommodate the

request set. However, when the network size is more than 400 nodes, the improvements

are almost negligible. Recall that during the tree construction phase, multiple destination

trees are built if the total traffic size exceeds the wavelength capacity. So when the

network topology is large enough to resolve most conflicts in the tree topologies, the

minimum number of wavelengths needed in such networks is heavily influenced by the

maximum number of admissible trees that share the same destination, i.e. the capacity

of the wavelength again becomes the main constraint.

Figure 6-9. The algorithm running time of wavelength assignment heuristics underdifferent number of requests in 100 node networks.

Figure 6-9 presents the running time of our wavelength assignment heuristics

under different workloads and Figure 6-10 gives the running time as a function of the

network size. We see that the no-sort heuristic is always the fastest algorithms. The

difference in the running time increase as the network size grows, as well as the traffic

loads increase. Heuristics using the same tree sorting algorithms generally have the

same running time, which means the overhead brought by the two wavelength sorting

methods are similar to each other. For heuristics using different tree sorting scheme, we

note that the MC sorting is faster than the MP sorting. However, their performance gap

136

Figure 6-10. The algorithm running time of wavelength assignment heuristics in randomnetworks with various sizes.

is much smaller than the gap with the no-sort heuristic. Although the sorted heuristics

are relatively slow compared to the no-sort heuristics, their overall computational costs

are still acceptable. In best cases, the average scheduling time for one request is less

than 5 seconds. In the worst case, the average scheduling time is less than 30 seconds

for the slowest heuristic.

As a summary, the sorting schemes provide considerable benefits to the TWIN

wavelength assignments. The improvement is more with relatively low workload.

However, the sort heuristics’ running time is affected by the extra overhead brought by

the sorting process. Nevertheless, the running times are still reasonable even in the

worst case. The no-sort heuristic is competitive when the networks are large and traffic

demands are heavy. It provides much faster scheduling speed while yielding little in the

assignment optimality.

137

CHAPTER 7CONCLUSION

This dissertation has focused on solving various resource scheduling problems in

high speed networks. Our contributions are conclude below.

We defined a set of data structures to represent the changing status of available

resources. We discussed in detail the pros and cons of the continuous time model and

discrete time model. We used Time-Bandwidth List to associate temporal info with the

resource availability. We used Start Time List to indicate the feasibility of a network

path for certain user requests. We also used Steady Stages to represent the periods

during which resource availability is static for the whole network. For optical networks,

an extended network model was presented to deal with the wavelength converters in

the network. For multiple resource scheduling, the MRRM model was propose. In this

model, different types of resources are uniformly represented using a graph based

model.

Several scheduling problems for single path scheduling in general networks,

including fixed slot, maximum bandwidth in slot, maximum duration, first slot, all

slots and all-pairs all-slots were considered. For each problem, we proposed several

algorithms (DAFP, kDP, and kSP for fixed slot problem, LSW and EBF for first slot

problem and so on). We also conducted extensive evaluations of each algorithm in

various test environments to assess its performance.

We defined the Earliest Finish Time File Transfer Problem (EFTFTP) to explore

the benefit brought by multi-path routing for large file transfers from multiple sources to

multiple destinations. We developed several multi-path reservation algorithms to solve

this problem for the online and batch scheduling cases. A new max-flow based greedy

algorithm (GOS) and several novel variants of the k-shortest paths algorithms were

proposed for online scheduling. A novel LP formulation was used to develop an optimal

algorithm for batch scheduling. Extensive simulations using both real world and random

138

networks show that our GOS algorithm provides a good balance among maximum

finish time, average finish time, and computational complexity. This algorithm may be

extended to the case when switching overhead is not negligible.

We have extended the algorithms originally proposed for general networks to

incorporate the wavelength sharing and wavelength continuity constraints of optical

networks. We modified two existing optical network scheduling algorithms (MSPF and

MSWF) to achieve better performance. We also showed that a deferred wavelength

assignment strategy can be effectively used in conjunction with many routing algorithms.

This effectively alleviates the need to keep track of the bandwidth allocation status of

each wavelength. Our results show that the adapted EBF algorithm performs better

than other algorithms. For heterogeneous networks, LSW also provided comparable

solutions; while for homogeneous networks MSPF and MSWF provide comparable

solutions.

Besides full-wavelength conversion, we also explored the impact of sparse

wavelength conversion on first-slot scheduling. We proposed a new network model

to emulate the full-conversion algorithms in sparse conversion networks. Using this

model, we conducted extensive experiments to assess the impact of wavelength

converters on First-Slot RWA algorithms’ performance. Our experiments indicated that

increasing wavelength converters has positive impact on blocking performance, but very

little impact on the availability of earlier start times. We also showed that for networks no

larger than several hundred nodes, deploying wavelength converters on at most 60% of

nodes would be enough to provide a satisfying performance. Additionally, an algorithm

switching strategy that adapts the scheduling algorithm as the current workload changes

was proposed. When the network’s traffic pattern did not changing dramatically, this

strategy resulted in considerable performance improvement.

We considered the multiple resource scheduling problem, and presented several

solutions in terms of a multi-resource model. We proposed a flexible and efficient

139

multi-resource reservation model (MRRM) and solved four instances of the multiple

reservation first slot (MFRS) problem. Based on our model, four algorithms were

developed for each individual instance of MRFS . Experiments on a heterogeneous

computer network showed that our algorithms scale linearly in terms of network size and

request ratio.

We proposed a 2-step process to solve the wavelength assignment problem for

TWIN networks. We showed that determining the wavelength assignment that use the

minimum number of wavelengths is a NP-Complete problem. Four greedy heuristics are

presented to compute the approximated solution within reasonable time. The evaluation

results show that performing sorting on destination trees and wavelengths improves

the assignment results, especially under low traffic loads. However, performing sorting

brings some extra overheads to the sort heuristics’ running time, but overall computation

costs are still acceptable. Meanwhile, in large topologies with heavy workloads, the

no-sort heuristic becomes competitive as it can provide similar scheduling performance

with much less computational cost.

140

REFERENCES

[1] “BinPacking Problem.” Http://mathworld.wolfram.com/BinPacking.html.

[2] “Dynamic resource allocation via GMPLS optical networks.” Http://dragon.

maxgigapop.net.

[3] “Graph Coloring Problem.” Http://mathworld.wolfram.com/GraphColoring.html.

[4] “On-demand Secure Circuits and Advance Reservation System.” Http://www.es.

net/oscars.

[5] Abilene. “Abilene.” Http://abilene.internet2.edu.

[6] Ahuja, Ravindra, Magnanti, Thomas, and Orin, James. Network Flows: Theory,Algorithms, and Applications. Prentice Hall, 1993.

[7] Aron, Mohit, Druschel, Peter, and Zwaenepoel, Willy. “Cluster reserves: amechanism for resource management in cluster-based network servers.” InMeasurement and Modeling of Computer Systems. 2000, 90–101.

[8] Aukia, P., Kodialam, M., Koppol, P. V. N., Lakshman, T. V., Sarin, H., and Suter, B.“RATES: A server for MPLS traffic engineering.” IEEE Network (March/April 2000):34–41.

[9] Banerjee, Amitabha, chun Feng, Wu, Ghosal, Dipak, and Mukherjee, Biswanath.“Algorithms for Integrated Routing and Scheduling for Aggregating Data fromDistributed Resources on a Lambda Grid.” IEEE Trans. Parallel Distrib. Syst. 19.2008. 24–34.

[10] Banner, Ron and Orda, Ariel. “Multipath routing algorithms for congestionminimization.” IEEE/ACM Trans. Network 15 (2007): 413–424.

[11] Black, U. MPLS and Label Switching Networks. Prentice-Hall Pub., 2002.

[12] Burchard, L.O. “On the performance of networks with advance reservations:Applications, architecture, and performance.” Journal of Network and SystemsManagement. 2005.

[13] Chen, Yang, Qiao, Chunming, and Yu, Xiang. “Optical burst switching: a new areain optical networking research.” IEEE Network 18 (2004).3: 16–23.

[14] Chu, Xiaowen and Li, Bo. “A Dynamic RWA Algorithm in a Wavelength-RoutedAll-Optical Network with Wavelength Converters.” INFOCOM. 2003.

[15] Chu, Xiaowen, Li, Bo, and Chlamtac, Imrich. “Wavelength Converter PlacementUnder Different RWA Algorithms in Wavelength-Routed All-Optical Networks.” IEEETransaction on Communications 51 (2003).5: 607–617.

141

Http://mathworld.wolfram.com/BinPacking.html

Http://dragon.maxgigapop.net

Http://dragon.maxgigapop.net

Http://mathworld.wolfram.com/GraphColoring.html

Http://www.es.net/oscars

Http://www.es.net/oscars

Http://abilene.internet2.edu

[16] Chu, Xiaowen, Liu, Jiangchuan, and Zhan, Zhensheng. “Analysis of Sparse-PartialWavelength Conversion in Wavelength-Routed WDM Networks.” INFOCOM. 2004.

[17] Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Introduction toAlgorithms. New York: The MIT Press, 2001.

[18] enlightened. Enlightened Computing, http://www.enlightenedcomputing.org/.

[19] Foster, I., Kesselman, C., Lee, C., Lindell, R., Nahrstedt, K., and Roy, A. “Adistributed resource management architecture that supports advance reservationsand co-allocation.” 7th Intl. Workshop on Quality of Service (IWQoS) (1999):27–36.

[20] Foster, Ian, Kesselman, Carl, Lee, Craig, Lindell, Bob, Nahrstedt, Klara, and Roy,Alain. “A distributed resource management architecture that supports advancereservations and co-allocation.” In Proceedings of the International Workshop onQuality of Service. 1999, 27–36.

[21] Fusion. “International Thermonuclear Experimental Reactor.” Http://www.iter.

org.

[22] geant. Geant2, http://www.geant2.net.

[23] Guerin, R. and Orda, A. “Networks with advance reservations: The routingperspective.” Proceedings of the 19th Annual Joint Conference of the IEEEComputer and Communications Societies INFOCOM. 2000. 118–127.

[24] Guerin, R, Orda, A, and Williams, D. “QoS routing mechanisms and OSPFextensions.” IETF Internet Draft. 1996.

[25] Guerin, R. A. and Orda, A. “QoS routing in networks witrh inaccurate information:Theory and algorithms.” IEEE/ACM Transactions on Networking 7 (1999).3:350–364.

[26] Hashimoto, A. and Stevens, J. “Wire routing by optimizing channel assignmentwithin large apertures.” Proc. 8th Deasign Automation Workshop. 1971. 155–163.

[27] He, Eric, Wang, Xi, and Leigh, Jason. “A flexible advance reservation model formulti-domain wdm optical networks.” IEEE GRIDNETS 2006. 2006.

[28] hopi. Hybrid Optical and Packet Infrastructure, http://networks.internet2.edu/hopi.

[29] internet2. “Internet2.” Http://www.internet2.edu.

[30] Ishio, H., Minowa, J., and Nosu, K. “Review and status ofwavelength-division-multiplexing technology and its application.” vol. 2. 1984.448 – 463.

142

http://www.enlightenedcomputing.org/

Http://www.iter.org

Http://www.iter.org

http://www.geant2.net

http://networks.internet2.edu/hopi

http://networks.internet2.edu/hopi

Http://www.internet2.edu

[31] jgn2. JGN II: Advanced Network Testbed for Research and Development, http://www.jgn.nict.go.jp.

[32] Jung, Eunsung, Li, Yan, Ranka, Sanjay, and Sahni, Sartaj. “An Elvaluation ofIn-Advance Bandwidth Scheduling Algorithms for Connection-oriented Networks.”Proceedings of International Symposium on Parallel Architectures, Algorithms, andNetworks. 2008.

[33] Jung, Eunsung, Li, Yan, Ranka, Sanjay, and Sahni, Sartaj. “Performance Evaluationof Routing and Wavelength Assignment Algorithms For Optical Networks.” 13thIEEE Symposium on Computers and Communications. 2008.

[34] Kauer, M. “Terabit burst switching.” 2002. 3.3.3.

[35] Kovacevic, Milan and Acampora, Anthony S. “Benefits of Wavelength Translationin All-Optical Clear-Channel Networks.” IEEE Journal on Selected Areas inCommunications 14 (1996).5: 868–880.

[36] Lee, Y., Seok, Y., Choi, Y., and Kim, C. “A Constrained Multipath Traffic EngineeringScheme for MPLS Networks.” Communications, 2002. ICC 2002. IEEE InternationalConference on. vol. 4. 2002, 2431 – 2436.

[37] lhcnet. LHCNet: Transatlantic Networking for the LHC and the U.S. HEPCommunity, http://lhcnet.caltech.edu/.

[38] Li, Yan, Ranka, Sanjay, and Sahni, Sartaj. “In-Advance First-Slot Scheduling withWavelength Conversion for e-Science Applications.” In Proceedings of The IEEESymposium on Signal Processing and Information Technology. 2010.

[39] Li, Yan, Ranka, Sanjey, and Sahni, Sartaj. “Tech Report of CISE UF: In-AdvancedFirst-Slot Scheduling with Spare Wavelength Conversion for e-ScienceApplications.” (2009).

[40] Ma, Q. and Steenkiste, P. “On path selection for traffic with bandwidth guarantees.”5th Intl. Conf. on Network Protocols (ICNP). 1997. 191–204.

[41] Ma, Q., Steenkiste, P., and Zhang, H. “Routing high-bandwidth traffic in max-minfair share networks.” ACM SIGCOMM. 1996. 115–126.

[42] Maui. “Maui.” Http://www.clusterresources.com/pages/products/

maui-cluster-scheduler.php/.

[43] Nuzman, Carl and Widjaja, Indra. “Time-Domain Wavelength InterleavedNetworking with Wavelength Reuse.” INFOCOM. 2006.

[44] Platfrom LSF. “PlatfromLSF.” Http://www.platform.com/Products/platform-lsf.

143

http://www.jgn.nict.go.jp

http://www.jgn.nict.go.jp

http://lhcnet.caltech.edu/

Http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php/

Http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php/

Http://www.platform.com/Products/platform-lsf

[45] Rajah, Kannan, Ranka, Sanjay, and Xia, Ye. “Scheduling Bulk File Transfers withStart and End Times.” 6th IEEE International Symposium on Network Computingand Applications. 2007, 295–298.

[46] Rao, N. S., Carter, S. M., Wu, Q., Wing, W. R., Zhu, M., Mezzacappa, A.,Veeraraghavan, M., and Blondin, J. M. “Networking for large-scale science:Infrastructure, provisioning, transport and application mapping.” Proceedings ofSciDAC Meeting. 2005.

[47] Rao, N. S. V., Wing, W. R., Carter, S. M., and Wu, Q. “UltraScience Net: NetworkTestbed for Large-Scale Science Applications.” IEEE Communications Magazine(2005).

[48] Rao, N. S. V., Wu, Q., Carter, S. M., Wing, W. R., A. Banerjee, D. Ghosal, andMukherjee, B. “Control Plane for Advance Bandwidth Scheduling in UltraHigh-Speed Networks.” INFOCOM 2006 Workshop on Terabits Networks. 2006.

[49] Rao, Nageswara S. V. and Batsell, Stephen Gordon. “QoS Routing Via MultiplePaths Using Bandwidth Reservation.” INFOCOM. 1998, 11–18.

[50] Ross, Kevin, Bambos, Nicholas, Kumaran, Krishnan, Saniee, Iraj, and Widjaja,Indra. “Scheduling Bursts in Time-Domain Wavelength Interleaved Networks.”vol. 21. 2003. 1441–1451.

[51] Sahni, S. Data structures, algorithms, and applications in C++. Silicon Press, 2005.Second Edition.

[52] Sahni, Sartaj, Rao, Nageshwara, Ranka, Sanjay, Li, Yan, Jung, Eun-Sung, andKamath, Nara. “Bandwidth Scheduling and Path Computation Algorithms forConnection-Oriented Networks.” Sixth International Conference on Networking(ICN’07). 2007. 47.

[53] Saniee, Iraj, Widjaja, Indra, and Morrison, John. “Performance of a distributedscheduling protocol for TWIN.” SIGMETRICS Performance Evaluation Review 32(2004).2: 38–40.

[54] Schelen, G. and Pink, S. “An agent-based architecture for advance reservations.”22nd Annual Conference on Computer Networks. 1997.

[55] SHIMOMURA, Kazuhiko and KAWAKITA, Yasumasa. “Wavelength Selective SwitchUsing Arrayed Waveguides with Linearly Varying Refractive Index Distribution.”Photonics Based on Wavelength Integration and Manipulation. 2005. 341–354.

[56] Subramaniam, Suresh, Azizoglu, Murat, and Somani, Arun K. “All-optical networkswith sparse wavelength conversion.” IEEE/ACM Trans. Netw. 4 (1996).4: 544–557.

[57] Subramaniam, Suresh, Azizoglu, Murat, and Somani, Arun K. “On the OptimalPlacement of Wavelength Converters in Wavelength-Routed Networks.” INFOCOM.1998, 902–909.

144

[58] Tanwir, S., Battestilli, L., Perros, H., and Karmous-Edwards, G. “Dynamicscheduling of network resources with advance reservation in optical grids.” In-ternational Journal of Network Management. vol. 18. 2008. 79–105.

[59] Tucker, Rodney S. “Optical packet switching: A reality check.” Optical Switching andNetworking 5 (2008).1: 2–9.

[60] Turner, Jonathan S. “Terabit burst switching.” vol. 8. 1999. 3–16.

[61] UCLP. User Controlled LightPath Provisioning, http://phi.badlab.crc.ca/uclp.

[62] Urgaonkar, Bhuvan, Pacifici, Giovanni, Shenoy, Prashant, Spreitzer, Mike, andTantawi, Asser. “An analytical model for multi-tier internet services and itsapplications.” In Proc. of ACM SIGMETRICS. 2005, 291–302.

[63] Urgaonkar, Bhuvan and Shenoy, Prashant. “Sharc: Managing CPU and NetworkBandwidth in Shared Clusters.” Tech. rep., IEEE Transactions on Parallel andDistributed Systems, 2001.

[64] Wang, Z. and Crowcroft, J. “Quality-of-service routing for supporting multimediaapplications.” IEEE JSAC. 1996. 1228–1234.

[65] Widjaja, Indra, Saniee, Iraj, Giles, Randy, and Mitra, Debasis. “Light core andintelligent edge for a flexible, thin-layered, and cost-effective optical transportnetwork.” vol. 41. 2003. 530–536.

[66] Xue, Daojun, Qin, Yang, and Siew, Chee Kheong. “Performance analysis of a noveltraffic scheduling algorithm in slotted optical networks.” Computer Communications30 (2007).18: 3559–3571.

[67] Yamanaka, N., Shiomoto, K., and Oki, E. GMPLS Technologies. CRC TaylorFrancis Pub, 2006.

[68] Yates, Jennifer M., Rumsewicz, Michael P., and Lacey, Jonathan P. R. “WavelengthConverters in Dynamically-Reconfigurable WDM Networks.” IEEE CommunicationsSurveys and Tutorials 2 (1999).2.

[69] Yen, Jin Y. “Finding the k shortest loopless paths in a network.” ManagementScience. 1971.

[70] Zang, Hui, Huang, Renxiang, and Pan, James. “Designing a Hybrid Shared-MeshProtected WDM Networks with Sparse Wavelength Conversion and Regeneration.”(2002).

[71] Zang, Hui, Jue, Jason P., and Mukherjee, Biswanath. “A Review of Routingand Wavelength Assignment Approaches for Wavelength-Routed Optical WDMNetworks.” Optical Networks Magazine. 2000.

145

http://phi.badlab.crc.ca/uclp

[72] Zhang, Z. L., Duan, Z., and Hou, Y. T. “Decoupling QoS control from core routers:A novel bandwidth broker architecture for scalable support of guaranteed services.”Proc. ACM SIGCOMM. 2000.

[73] Zheng, Jun, Zhang, Baoxian, and Mouftah, H.T. “Toward automated provisioning ofadvance reservation service in next-generation optical internet.” IEEE Communica-tions Magazine 44 (2006).12: 68–74.

[74] Zheng, X., Veeraraghavan, M., Rao, N. S. V., Wu, Q., and Zhu, M. “CHEETAH:Circuit-switched high-speed end-to-end transport architecture testbed.” IEEECommunications Magazine (2005).

146

BIOGRAPHICAL SKETCH

Yan Li received his Ph.D. on computer science in University of Florida in December

2010. He was working under the supervision of Dr. Sartaj Sahni and Dr. Sanjay Ranka.

His research interests are the algorithms and data structures for the resource scheduling

in high speed network.

Yan Li received his B.S. in Huazhong University of Science and Technology in 2003

and his M.S. in Institute of Software, Chinese Academy of Science in 2006. He is now

pursuing PhD in University of Florida.

147

DATA STRUCTURES AND ALGORITHMS FOR RESOURCE...

Documents

Transcript of DATA STRUCTURES AND ALGORITHMS FOR RESOURCE...