A HOLISTIC APPROACH TO MULTIHOP ROUTING IN SENSOR …cs.uccs.edu/~cs526/wsn/AlecWooPhDthesis.pdf ·...

A HOLISTIC APPROACH TO MULTIHOP ROUTING IN SENSORNETWORKS

by

ALEC LIK CHUEN WOO

B.S. in University of California, Berkeley 1998M.S. in University of California, Berkeley 2001

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy

in

Computer Science

in the

GRADUATE DIVISION

of the

UNIVERSITY OF CALIFORNIA, BERKELEY

Committee in charge:Professor David Culler, Chair

Professor Eric BrewerProfessor Steve Glaser

Fall 2004

The dissertation of ALEC LIK CHUEN WOO is approved:

Chair Date

Date

Date

University of California, Berkeley

Fall 2004

A HOLISTIC APPROACH TO MULTIHOP ROUTING IN SENSOR

NETWORKS

Copyright 2004

by

ALEC LIK CHUEN WOO

1

Abstract

A HOLISTIC APPROACH TO MULTIHOP ROUTING IN SENSOR NETWORKS

by

ALEC LIK CHUEN WOO

Doctor of Philosophy in Computer Science

University of California, Berkeley

Professor David Culler, Chair

The dynamic and lossy nature of wireless communication poses major challenges

to reliable, self-organizing multihop networks. Non-ideal link characteristics are especially

problematic with the primitive, low-power radio transceivers found in sensor networks and

raise new issues that routing protocols must address. We redefine the basic notion of wireless

connectivity in terms of probabilistic links, and demonstrate that link statistics can be cap-

tured dynamically through an efficient yet adaptive link estimator. This probabilistic notion

of connectivity changes the usual concept of a neighbor and introduces new problems with

neighborhood management: the neighbor table on a sensor node is of fixed size and cannot

always be used to gather link statistics about all neighbors, yet the process of selecting the

most competitive neighbors requires a comparison with the link statistics of those neighbors

that are not in the table. Together, link estimation and neighborhood management build a

probabilistic connectivity graph which can be exploited by a routing algorithm to increase

2

reliability. Together, these three processes constitute our holistic approach to routing. We

study and evaluate link estimation, neighborhood table management, and reliable routing

protocol techniques, focusing on the many-to-one, periodic data collection workload com-

monly found in sensor network applications today. Our final system uses a variant of an

exponentially weighted moving average estimator, frequency based table management, and

minimum transmission cost-based routing. Our analysis ranges from large-scale, high-level

simulations to in-depth empirical experiments and emphasizes the intricate interactions be-

tween the routing topology and the underlying connectivity graph, which underscores the

need for a whole-system approach to the problem of routing in wireless sensor networks.

Professor David CullerDissertation Committee Chair

i

TO MY PARENTS: MARY and SIU SHAN

ii

Contents

List of Figures v

List of Tables ix

1 Introduction 1

2 Background 92.1 Sensor Networking Platform and Implications . . . . . . . . . . . . . . . . . 10

2.1.1 Hardware Platform (Mica Motes) . . . . . . . . . . . . . . . . . . . . 102.1.2 Software Platform (TinyOS) . . . . . . . . . . . . . . . . . . . . . . 122.1.3 TinyOS Network Architecture . . . . . . . . . . . . . . . . . . . . . . 132.1.4 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Design Space of Routing in Sensor Networks . . . . . . . . . . . . . . . . . . 202.2.1 Network-Wide Dissemination . . . . . . . . . . . . . . . . . . . . . . 212.2.2 Tree-based Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2.3 Any-to-Any Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.4 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 Detailed Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.4 Related Work: A High-Level Picture . . . . . . . . . . . . . . . . . . . . . . 30

2.4.1 Packet Radio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.4.2 Mobile Ad Hoc Networks (MANET) . . . . . . . . . . . . . . . . . . 322.4.3 Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3 Understanding Link Characteristics 383.1 Connectivity, Range, and Link Dynamics . . . . . . . . . . . . . . . . . . . 39

3.1.1 Physical Connectivity and Communication Range . . . . . . . . . . 393.1.2 Time Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.1.3 Obstructions and Mobility . . . . . . . . . . . . . . . . . . . . . . . . 433.1.4 Irregular Connectivity Cell . . . . . . . . . . . . . . . . . . . . . . . 463.1.5 Implications: Connectivity and Hop-Count . . . . . . . . . . . . . . 47

3.2 Modeling the Observed Link Characteristics . . . . . . . . . . . . . . . . . . 483.3 Binomial Approximation of Stationary Packet Loss Dynamics . . . . . . . . 49

iii

3.4 Synthetic Trace Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.5 Effective Channel Capacity: Single and Multihop . . . . . . . . . . . . . . . 533.6 Received Signal Strength and Link Quality . . . . . . . . . . . . . . . . . . 553.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.8 Connectivity: A Probabilistic Perspective . . . . . . . . . . . . . . . . . . . 59

4 Characterizing Connectivity using Link Estimators 614.1 Link Estimation as Part of Network Self-Organization . . . . . . . . . . . . 624.2 Estimator Design Framework and Methodology . . . . . . . . . . . . . . . . 64

4.2.1 Metrics of Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 674.2.2 Error, Stability, and Memory Relationship . . . . . . . . . . . . . . . 684.2.3 Confidence Interval Approximation . . . . . . . . . . . . . . . . . . . 69

4.3 Estimator Design and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 704.3.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.3.2 Tuning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.3.3 Candidate Estimator Design and Evaluation . . . . . . . . . . . . . 72

4.4 Candidate Estimator Comparisons . . . . . . . . . . . . . . . . . . . . . . . 804.4.1 Stable Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.4.2 Agile Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.4.3 Performance based on Empirical Traces . . . . . . . . . . . . . . . . 834.4.4 Confidence Interval Estimation with WMEWMA . . . . . . . . . . . 84

4.5 Alternative Estimation Techniques . . . . . . . . . . . . . . . . . . . . . . . 854.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.7 Summary and Multihop Routing Implications . . . . . . . . . . . . . . . . . 89

5 Neighborhood Management under Limited Memory 915.1 Dense and Fuzzy Neighborhoods . . . . . . . . . . . . . . . . . . . . . . . . 925.2 Challenges of Neighborhood Discovery under Limited Memory . . . . . . . 935.3 An On-line Neighborhood Selection Process . . . . . . . . . . . . . . . . . . 95

5.3.1 Adaptive Down-sampling Insertion Policy . . . . . . . . . . . . . . . 975.3.2 Cache-Based Eviction and Reinforcement . . . . . . . . . . . . . . . 985.3.3 Frequency-Based Eviction and Reinforcement . . . . . . . . . . . . . 99

5.4 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.5.1 Effect of Adaptive Down-Sampling . . . . . . . . . . . . . . . . . . . 1025.5.2 Eviction and Reinforcement Policy . . . . . . . . . . . . . . . . . . . 103

5.6 Other Goodness Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1085.8 Multihop Routing Implications . . . . . . . . . . . . . . . . . . . . . . . . . 110

6 Cost-Based Routing 1126.1 Distributed Tree Building Process . . . . . . . . . . . . . . . . . . . . . . . 1136.2 Overview of the System Routing Architecture . . . . . . . . . . . . . . . . . 1176.3 Underlying System Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

iv

6.3.1 Rate of Parent Change . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.3.2 Packet Snooping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.3.3 Counting-To-Infinity Problem . . . . . . . . . . . . . . . . . . . . . . 1226.3.4 Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1226.3.5 Duplicate Packet Elimination . . . . . . . . . . . . . . . . . . . . . . 1236.3.6 Queue Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.3.7 Relationship to Link Estimation . . . . . . . . . . . . . . . . . . . . 124

6.4 Cost Metrics for Connectivity-Based Routing . . . . . . . . . . . . . . . . . 1256.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6.5.1 Table-Driven Routing . . . . . . . . . . . . . . . . . . . . . . . . . . 1306.5.2 Source-Initiated On-Demand Routing . . . . . . . . . . . . . . . . . 1326.5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7 Evaluation 1407.1 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7.1.1 Candidate Routing Protocols . . . . . . . . . . . . . . . . . . . . . . 1417.1.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.2 Network Graph Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1467.3 Effect of Neighborhood Management using Routing Cost . . . . . . . . . . . 1487.4 Packet Level Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

7.4.1 A Packet-Level Simulator . . . . . . . . . . . . . . . . . . . . . . . . 1537.4.2 Simulation Results on Routing . . . . . . . . . . . . . . . . . . . . . 155

7.5 Empirical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1627.5.1 Experiments over an Indoor 5x10 Grid Network (Mica) . . . . . . . 1637.5.2 Results over a 30-node Irregular Indoor Mica Network . . . . . . . . 1727.5.3 Results over an Irregular Indoor Mica2 Network . . . . . . . . . . . 173

7.6 Network Instability under Congested Traffic . . . . . . . . . . . . . . . . . . 1747.7 Techniques to Mitigate Network Instability . . . . . . . . . . . . . . . . . . 186

7.7.1 Out-bound Estimation Decay Window . . . . . . . . . . . . . . . . . 1867.7.2 Spreading Route Update Messages . . . . . . . . . . . . . . . . . . . 1897.7.3 Estimator Tuning and Confidence Interval . . . . . . . . . . . . . . . 1897.7.4 Technique Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 1907.7.5 Link Estimation of the Root Node and Stability . . . . . . . . . . . 1947.7.6 Adaptivity and Stability . . . . . . . . . . . . . . . . . . . . . . . . . 198

7.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

8 Concluding Remarks 207

Bibliography 213

v

List of Figures

1.1 Sensornet hardware platform evolution over time. . . . . . . . . . . . . . . . 31.2 Map of Great Duck Island and the locations of all the motes deployed in the

year of 2003. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 A Mica mote. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Design layout of a SPEC mote. . . . . . . . . . . . . . . . . . . . . . . . . . 172.3 Connectivity of a cell measured using 150 motes on an open tennis court with

RFM power setting of 70. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4 A holistic view showing the cross-layer interactions of routing. . . . . . . . . 27

3.1 Reception probability of all links in a network, with a line topology on atennis court. Note that each link pair appears twice to indicate link qualityin both directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2 After 20 minutes, the sender is moved from 15 ft to 8 ft from the receiverand remained stationary for four hours. . . . . . . . . . . . . . . . . . . . . 43

3.3 Link quality variation over a 7 hour period in an indoor laboratory environment. 443.4 Obstruction effects on packet loss behavior. A person deliberately stands

beside the receiver in the interval 15-20 minutes. . . . . . . . . . . . . . . . 453.5 Movement effects on packet loss behavior. Transmitter is deliberately moved

to different distances at various times. . . . . . . . . . . . . . . . . . . . . . 463.6 Cell connectivity of a node in a grid with 8-foot spacing as generated by our

link quality model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.7 Quantile of empirical data against quantile of binomial distribution. . . . . 513.8 Time series comparison of empirical traces with simulated traces. . . . . . . 523.9 Channel capacity of the Mica/RFM platform using TinyOS 1.0 radio stack. 543.10 Channel capacity of the Mica2/Chipcon platform using different versions of

the TinyOS radio stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.11 Relationship of RSSI signal strength and link quality on the Mica2/Chipcon

platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.12 Example showing strong RSSI values may not be a good indicator for link

quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

vi

4.1 General framework of passive link estimators. . . . . . . . . . . . . . . . . . 674.2 P (t) for different estimators at both stable and agile configuration. . . . . . 764.3 P (t) for different estimators at both stable and agile configuration. . . . . . 814.4 Output from the stable WMEWMA estimator using empirical data input. . 854.5 Confidence interval estimation with respect to the WMEWMA(30,0.5) esti-

mator for different link quality. . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.1 Illustration of the potential neighbors of a center node in a dense network.The darker shaded region shows the effective region while the lighter regionshows the transitional region. The cross indicates the center node. . . . . . 94

5.2 Downsampling process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.3 Insertion and reinforcement in Frequency algorithm. . . . . . . . . . . . . . 1005.4 Cumulative distributive function showing the link quality distribution of the

207 neighbors of a center node in a 80x80 grid network with 4 feet spacingusing our empirical link model. . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.5 Contour plot on yield of the FREQUENCY algorithm for different cell den-sities and table size with no down sampling for insertion. . . . . . . . . . . 103

5.6 Contour plot on yield of the FREQUENCY algorithm for different cell den-sities and table sizes with down sampling rate of 50% for insertion. . . . . . 104

5.7 Number of good neighbors maintainable at different densities with a tablesize of 40 entries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.8 Yield for different table sizes and cell densities. . . . . . . . . . . . . . . . . 107

6.1 Distributed tree building algorithm framework. . . . . . . . . . . . . . . . . 1146.2 Distributed tree building algorithm framework with link estimation incorpo-

rated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166.3 Message flow chart to illustrate the core components for implementing our

routing subsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1186.4 Typical data structure of the neighbor table. ROUTE TABLE SIZE deter-

mines the size of the neighbor table. . . . . . . . . . . . . . . . . . . . . . . 119

7.1 Hop distribution from graph analysis of a 400 node network with 8 feet gridsize. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

7.2 Path reliability to tree root from graph analysis of a 400 node network with8 feet grid size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.3 Insertion and reinforcement in Frequency algorithm using routing cost dif-ference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.4 Percentage of time spent in the neighbor table of the different neighbors vs.their difference in routing cost relative to the receiving node running theFREQUENCY algorithm. The cross indicates that node is chosen as theparent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.5 Percentage of time spent in the neighbor table of the different neighborsand their difference in routing cost relative to the receiving node runningthe FREQUENCY algorithm with routing cost filtering. The cross indicatesthat node is chosen as the parent. . . . . . . . . . . . . . . . . . . . . . . . . 152

vii

7.6 Screen shot of the packet-level simulator. . . . . . . . . . . . . . . . . . . . 1547.7 Hop distribution from simulations. . . . . . . . . . . . . . . . . . . . . . . . 1587.8 Cumulative distributive function of the distances of all the links in the net-

work using MT over graph analysis and packet level simulation. . . . . . . . 1597.9 Path reliability over distance from simulations. . . . . . . . . . . . . . . . . 1607.10 Stability from simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1617.11 End-to-End success rate over distance from simulations. . . . . . . . . . . . 1627.12 Deployment on the foyer in the Hearst Mining building. . . . . . . . . . . . 1647.13 Indoor reception probability of all links of a network in a line topology at

low transmit power setting (70) in the foyer. . . . . . . . . . . . . . . . . . . 1657.14 Hop distribution for the indoor 50-node deployment. . . . . . . . . . . . . . 1667.15 Average Hop over Distance Contour Plot for MT at power 70 for the indoor

50-node deployment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1677.16 Non-sink node next hop link quality for MT in the foyer. . . . . . . . . . . . 1687.17 End-to-end success rate over distance in the foyer. . . . . . . . . . . . . . . 1697.18 Actual and expected routing cost as computed using the MT cost function. 1717.19 Stability of the entire network in the foyer. . . . . . . . . . . . . . . . . . . 1727.20 End-to-end success rate versus hop in an office environment. . . . . . . . . . 1737.21 Stability for MT in an office environment. . . . . . . . . . . . . . . . . . . . 1747.22 End-to-end success rate of MT on Mica2 deployed in an office environment. 1757.23 Link estimation of a node to its neighbor over time in an office environment. 1777.24 21-node network stability under congested load. (Original) . . . . . . . . . . 1787.25 Network-wide link estimation changes on the logical connectivity graph over

time. (Original) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1797.26 21-node network end-to-end success rate under congested load. . . . . . . . 1807.27 Route instability of a node: distribution of time spent on different parents

(a) and the parent distribution of all the route switches of the node (b). . . 1817.28 Variations of link quality estimations of the different parents selected by a

node over an experiment with congested traffic. . . . . . . . . . . . . . . . . 1827.29 Variations of link quality estimations of the different parents selected by a

node over an experiment, with congested traffic and the overflow error fixed. 1847.30 21-node network stability under congested load with overflow error fixed. . 1857.31 Empirical cumulative distributive functions of the parent switching cost dif-

ference of a 21-node network under congested load, with and without theoverflow error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

7.32 Network-wide link estimation changes on the logical connectivity graph overtime. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

7.33 21-node network end-to-end success rate under congested load. . . . . . . . 1887.34 21-node network stability under congested load with stabilizing techniques. 1917.35 Empirical cumulative distributive function of the parent switching cost dif-

ference of a 21-node network under congested traffic, with stabilizing tech-niques including confidence interval filtering, larger parent switching thresh-old, phase-shifted route update messages, and OutBoundDecayWindow tol-erating up to 6 consecutive losses. . . . . . . . . . . . . . . . . . . . . . . . . 192

viii


7.37 21-node network end-to-end success rate under congested load. . . . . . . . 1937.38 Link quality of the tree root as estimated by a near-by node using the mini-

mum data rate relaxation under congested load. . . . . . . . . . . . . . . . . 1957.39 Link quality of the tree root as estimated by a near-by node under congested

traffic load, with the relaxation in link estimation removed. . . . . . . . . . 1967.40 21-node network stability under congested load, with relaxation in link esti-

mation of the tree root removed. . . . . . . . . . . . . . . . . . . . . . . . . 1977.41 Empirical cumulative distributive function of the parent switching cost dif-

ference of a 21-node network under congested load, with relaxation in linkestimation of the tree root removed. . . . . . . . . . . . . . . . . . . . . . . 198


7.43 21-node network end-to-end success rate under congested load. . . . . . . . 1997.44 21-node network stability under congested load, with the parent switching

threshold relaxed to its original setting (0.75 transmission). . . . . . . . . . 2007.45 Network-wide link estimation changes on the logical connectivity graph over

time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2017.46 21-node network end-to-end success rate under congested load. . . . . . . . 2017.47 Network-wide link estimation changes on the logical connectivity graph over

time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2027.48 21-node network end-to-end success rate under congested load, with a peri-

odic interfering traffic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2037.49 21-node network stability under congested load, with a periodic interfering

traffic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2047.50 21-node network stability under congested load, with one of the node disabled

in the middle of the experiment. . . . . . . . . . . . . . . . . . . . . . . . . 205

ix

List of Tables

2.1 TinyOS Media Access Control Parameters on Mica and Mica2. . . . . . . . 162.2 Summary of the differences among the different wireless networks. . . . . . 36

3.1 Definition of p(t) to model Figure 3.5 . . . . . . . . . . . . . . . . . . . . . . 52

4.1 Terminology used for describing link estimator design. . . . . . . . . . . . . 714.2 Simulation results of all estimators in stability settings. . . . . . . . . . . . 834.3 Simulation results of all estimators in agility settings. . . . . . . . . . . . . 84

x

Acknowledgments

David Culler is the best advisor I could ever ask for in my life. He has transformed

me from an undergraduate student, a novice in computer science, to a researcher working

closely at the forefront of the field. The completion of this thesis would not have been

possible without his guidance over these years.

David always pushed me hard to pursue a wider and deeper understanding to

problems and often challenged my designs and assumptions. I have also learned how to

articulate my ideas, argue and compromise with others. He spent endless hours with me,

improving my communication skills in both speaking and writing. He also recommended

that I take my first acting class in my life! I regard him as my father figure for life in

general.

Terence Tong has demonstrated the best qualities one can expect from a Berkeley

undergraduate student. He is intelligent, responsible, and very dedicated to conducting

research. He has contributed many days and nights in simulating, building, and running

the systems with me. Without his help, the completion of this work would have taken

longer.

I would also like to thank the TinyOS/NEST team: Jason Hill, Philip Levis, Sam

Madden, Joe Polastre, Cory Sharp, Robert Szewczyk, and Kamin Whitehouse. We have

been working hard together on many demos, papers, TinyOS releases, and tutorials. Most

importantly, the whole process was full of fun and we have established life-long friendships.

I had an opportunity to help start the Intel Berkeley Research Laboratory, which

allowed me to collaborate with many great researchers: Philip Buonadonna, Brent Chun,

xi

Kevin Fall, David Gay, Wei Hong, Alan Mainwaring, and Matt Welsh.

I am lucky to have had many friends to support me over these years to help me

get through the tough times. I would like to thank Horton Hua, Freddy Mang, Allen Miu,

Ada Poon, Wilson So, Hayden So, and Victor Wen.

This work was supported, in part, by the Defense Department Advanced Re-

search Projects Agency (grants F33615-01-C-1895 and N6601-99-28913), the National Sci-

ence Foundation under Grant No. 0122599, California MICRO program, and Intel Corpo-

ration. Research infrastructure was provided by the National Science Foundation (grant

EIA-9802069).

1

Chapter 1

Introduction

The information technology revolution over the last forty years has been driven by

the miniaturization of technology following the prediction of Moore’s Law. Not only does

computing become more powerful as transistor density keeps increasing exponentially, but

devices with the same computing power are also shrinking in size. With new fabrication

techniques that create micro electro-mechanical structures (MEMS), low-power microscopic

sensors can be manufactured at a very low cost. By joining CMOS technology and advance-

ment in MEMS, it is possible to embed intelligence with sensing capability all on a tiny

platform. Together, these developments help to bring the vision of potentially dust-size com-

puting platforms into reality. Low-cost CMOS-based RF radios have become adequately

low-power to support low data rate communication on these tiny nodes. The result is a new

platform, called sensor networks, that is capable of performing wireless communications,

some local processing, data storage, and sensing, all within the physical size of a typical

coin. Future platforms will have the potential to fit within a cubic millimeter of volume.

2

Besides having a small physical size, this new computing platform of a network of

sensors is very different from traditional computing. Nodes are not expected to support a

user or even have any user interfaces. They are stand-alone devices, with limited resources in

memory, computation power, and energy. With wireless capability, they are not expected to

be “plugged” into a wire infrastructure, where power and data bandwidth can be abundant.

In fact, with their small physical sizes, they can easily be embedded into the physical

environment to collect interesting information. Although each of these devices is a tiny

computation platform of its own, it can support powerful services in an aggregated form by

interacting and collaborating with each other. In particular, these platforms can collaborate

and perform local processing to infer interesting phenomena over noisy information from

the environment. By self-organizing into a network, they can propagate interesting data

to nodes that demand it, and move data to an infrastructure for higher level processing.

All in all, this new platform provides a new tier of computing that will make information

technology more pervasive and bring it closer to the physical environment.

Recent effort in research and development has rapidly advanced the field of sensor

networks. Figure 1.1 shows the evolution of the hardware platforms, called motes, developed

at UC Berkeley. While many mote generations are built from off-the-shelf components, a

newer generation of motes, such as the SPEC mote, demonstrates the possibility of creating

an integrated sensor node on a single chip. Although such a small platform is still in its

infancy, all the other sensor-node platforms are already in production, and supported by

many kinds of sensing hardware, such as light, temperature, humidity, acceleration, etc.

On the software front, there also exists effort to create an operating system cus-

3

Mica 1/02Rene 11/00 Mica2 9/02WeC 1/00 Mica2dot 9/02 SPEC 5/03

Figure 1.1: Sensornet hardware platform evolution over time.

tomized for this new computing platform. Such a system is called TinyOS [37], which

provides a programming and runtime environment with flexible hardware abstractions, net-

work stacks, and light-weight concurrency support. Both the hardware platforms and the

TinyOS operating system are readily available to researchers for developing new applications

and systems to advance this new computing paradigm.

The potential applications of this new computing technology are rich and span

many different disciplines, including scientific research, military usage, consumer markets,

and applications in the interest of society.

For scientific research, sensor network technology can be a wide-area monitoring

tool that allows scientists to collect potentially long-term data for understanding both mi-

croscopic and macroscopic phenomenon in the physical environment. The sensor nodes are

expected to be low-cost enough such that many of them can be used for monitoring data in

high resolution over a targeted area. Various wildlife habitat monitoring projects, such as

The Great Duck Island project at UC Berkeley [53, 74], James Reserve project at UCLA

[18], and ZebraNet project at Princeton [46], have started to collaborate with biologists

and open a new way to passively monitor wildlife with sensor networks. Figure 1.2 shows

4

10m

Single hop weatherSingle hop burrowMulti hop weatherMulti hop burrow

Figure 1.2: Map of Great Duck Island and the locations of all the motes deployed in the year

of 2003.

a picture of Great Duck Island and the relative positions of the 100 motes deployed on the

island. The project used this thesis work to successfully build a multihop network to collect

habitat information over the island.

For military purposes, sensor networks can be deployed over an open field to

passively collect information about intruding soldiers or vehicles. Since these devices are

small, they are difficult to discover. Furthermore, the networking capability allows them

to control and collect data over large areas using multihop communications. Interesting

potential military applications include detecting and tracking enemy vehicle movements or

even automated pursuit. Recent work in this kind of application can be found in [11, 15, 71].

5

In commercial applications, sensor networks can provide intelligent indoor lighting

and temperature control in buildings to conserve energy. Profiling electrical energy usage

on the outlets at home provides a novel approach to understanding energy consumption

distribution such that consumers can obtain feedback for more economical energy usage.

Precision agriculture can rely on sensor networks to optimize watering schedules and increase

yield per unit area. Asset management is yet another potential application: sensor networks

can monitor and track important assets during transit or while in storage.

These different examples show a wide variety of potential applications that can

take advantage of the sensor network technology. The point is to demonstrate that research

in this new computing paradigm can impact our lives through many different potential

applications.

As compared with 802.11-like mobile wireless networks, the different application

scenarios and resource limitations of sensor networks require a different kind of networking

support. First, a sensor network system is likely to be deployed in an uncontrolled environ-

ment, where nodes would fail or would be obstructed from each other due to environmental

effects and changes over time. Second, lack of an infra-structural support requires a different

network topology formation compared to the common single-hop 802.11 wireless local area

networks. Third, constraint in energy on these nodes can only support short-range commu-

nications. Therefore, a multihop networking topology is required for sensor networks, where

nodes locally communicate with nearby neighbors using short-range communication; nodes

would relay messages for communication that goes beyond immediate neighbors. For ex-

ample, a multihop networking topology would be required for nodes to propagate messages

6

to a remote gateway for higher-level processing or archival purposes.

Maintaining such a topology can be challenging. For scalability reasons, dis-

tributed, local rules should be used rather than a centralized approach. Constraints in

memory limit the amount of state a routing protocol can maintain on each node. Running

in an uncontrolled environment requires the system to be able to adapt robustly to failures

and environmental changes without the need of a network administrator. Thus, having the

system self-organize into a reliable network for multihop communication and self-adapt to

potential changes is one of the most fundamental system building blocks for sensor networks.

Such an ad hoc, self-organizing routing problem is not an entirely novel research

topic as there exists a rich literature in packet radio networks and mobile computing. Nev-

ertheless, the problem needs to be revisited in the context of sensor networks for various

reasons. First, the lossy, short-range wireless radios can break assumptions made about

connectivity at the routing layer, which can hinder both the robustness and reliability of

the routing protocol. Second, tight resource constraints together with lossy characteristics

introduce new challenges that routing protocol cannot neglect. Third, the traffic assump-

tion in sensor networks is very different from that in traditional wireless computing. Finally,

there is still no comprehensive systematic study that is specifically tailored towards routing

issues and performances using real sensor networking nodes and traffic pattern.

The major contribution of this thesis is to provide a thorough study of achiev-

ing a robust and reliable multihop wireless networking system using the Berkeley sensor

networking platform. In particular, the routing process must use only simple, distributed

local rules and must address many of the issues unique to this computing platform, includ-

7

ing limitation of memory, bandwidth, and processing power. Since the low-power CMOS

based radios used in most of the sensor networking platforms carry very different connec-

tivity characteristics from what the networking community usually assumes, the challenge

is to identify these differences and understand their implications for protocol design. These

implications will lead us to a new understanding of the wireless routing problem for sen-

sor networks, identify important subproblems and their interactions, introduce important

metrics to study, and impact the overall approach to studying the routing process as a

whole-system design problem.

We ground our study on extensive empirical measurements and experimentations.

The usual concept of the communication range is defined by the distance where a sharp fall-

off of connectivity occurs. Before this fall-off, communication is considered to be reliable. In

reality, we identify that the RF communication range on the sensor nodes actually consists

of three distinct regions: effective, transitional, and clear. In particular, the transitional

region, is a region where link quality can vary significantly; it also constitutes a large portion

of the communication range. We therefore advocate a probabilistic view on link connectivity

and use such a perspective throughout the whole routing process. We argue that the

process of routing should be separated into three subproblems: link quality characterization,

neighborhood management, and cost-based routing. Each of these subproblems is a local

process that a node must perform to achieve reliable routing. We carefully study each of

these local processes and understand their interactions. Together they provide an effective

routing solution as a whole. The solution is implemented in TinyOS and evaluated using

actual sensor nodes in different scales. The system is released to the community. The

8

experience gained in this thesis can provide valuable guidance for future development of

more advanced routing systems for sensor networks.

This thesis is organized into 8 chapters. Chapter 2 provides the background

overview of sensor networks, the platform that we based our study on, and points out

the corresponding implications for protocol design. Chapter 3 presents an empirical study

of the wireless characteristics found on our sensor-net platform, which motivates the need

to treat the problem of routing as three local processes. We then discuss each of the three

subproblems in the next three chapters: the link estimation process in Chapter 4, neighbor-

hood management in Chapter 5, and routing in Chapter 6. In Chapter 7, we combine the

three subproblems and evaluate the system as a whole using large-scale, high-level simula-

tions and in-depth empirical experiments. We conclude and discuss future work in Chapter

8.

9

Chapter 2

Background

We begin by first describing the resource constraints found on a typical sensor

networking system available today. These constraints guide our design decisions for the

rest of the chapters. Although these constraints may be relaxed as technology advances,

designing under these constraints will allow our work to cope with more extreme platforms

in the future such as the SPEC mote [38]. We also provide a brief overview of the TinyOS

operating system and the details of its network architecture that we used for our protocol

development and empirical study.

This chapter also introduces the design space of multihop routing in sensor net-

works by analyzing the different requirements arising from a set of important sensor net-

working applications today. By surveying the routing protocols in the literature, we discuss

how they fit into the design space, and motivate why it is necessary to revisit the problem of

routing and identify important subproblems unique to sensor networks. Finally, we present

a more detailed roadmap describing a holistic approach to routing, from the lowest level of

2.1. SENSOR NETWORKING PLATFORM AND IMPLICATIONS 10

defining connectivity to the network-level of reliable multihop communication.

2.1 Sensor Networking Platform and Implications

The sensor networking open platform developed by the NEST project at UC Berke-

ley [4] provides both hardware (motes) and software systems for researchers to conduct

sensor networking research. We used the Mica and Mica2 hardware platforms in our study;

these motes can be purchased from Crossbow [22]. They are supported by the TinyOS [37]

open-source operating system, which also provides a complete suite of programming and de-

velopment tools. In this section, we describe in detail our hardware and software platform,

the network architecture in TinyOS, and the platform implications for sensor networking

protocol design.

2.1.1 Hardware Platform (Mica Motes)

We used two generations of Berkeley Mica motes [36], Mica and Mica2. Except

for the different RF radios, they are similar in terms of their physical sizes and resource

limitations.

On the Mica platform, each node consists of an 8-bit, 4MHz Atmel Atmega103

microprocessor with 128kB of programmable memory and 4kB of data memory [6]. It

follows the Harvard architecture, with separated program memory and data memory. The

program memory can store read-only data. The network device is a RF Monolithics 916MHz

transceiver [5], using amplitude shift keying (ASK) modulation at the physical layer. The

processor is capable of driving the radio to deliver 40 kbps of raw data. The RF transmit


Atmel Processor916MHz Antenna

On/Off Switch

4MHz Clock

51-pin Extension Bus

Leds

Figure 2.1: A Mica mote.

power of the radio is tunable in TinyOS, with 0 being the maximum transmit power at

1.5dBm and 100 yielding no communication range at all. Each node also has a standard

UART interface, allowing it to be configured as a base station for relaying data to a PC

computer. Batteries are typically used to power the entire sensor node, yielding a lifetime

of about a week if the node is always on for processing and communication (assuming a

10mA of current consumption over a battery with 1800mAh capacity.) Figure 2.1 shows

the form factor of a Mica node, with the major parts on the top side of the node labeled.

The second generation of Mica platform (Mica2) uses an Atmega128L microproces-

sor [12], with a faster processor clock running at 7.38MHz, but the amount of programmable


and data memory remains the same. The network device is a Chipcon CC1000 FSK based

RF transceiver [2], driven by the processor to deliver 38.4 kbps of raw data. Unlike the

Mica, both the RF baseband frequency and transmit power are tunable. Our study uses

the 433MHz version of Mica2.

2.1.2 Software Platform (TinyOS)

TinyOS’s design philosophy is to support the natural sensor networking needs for

high concurrency and efficient modularity over a very limited platform while allowing de-

signers the flexibility to innovate new protocols or experiment with new extreme hardware

platforms. TinyOS uses a component-based programming model, with every component

providing and using a set of well defined interfaces. Programs (applications) and the en-

tire operating systems are built by wiring together customized or standard components

as a component graph. Since each system functionality is implemented as a component,

programmers can change the system behavior simply by replacing or modifying the com-

ponents. This makes innovating new protocols easy, since no predefined mechanisms or

protocols are hidden or required by TinyOS.

Programming in TinyOS uses a dialect of the C programming language called nesC

[30], which comes with the TinyOS release. The language directly supports the component-

based programming model of TinyOS. By holistically analyzing both the application and

the system as one component graph, cross-layer optimization for efficiency and code sizes

can be done more easily. Furthermore, static analysis beyond traditional compiler features

are also supported. For example, at compile time, potential race conditions can be detected


by the nesC compiler to enhance overall system robustness.

2.1.3 TinyOS Network Architecture

TinyOS provides an Active Message (AM) abstraction at the link layer. The packet

format is simple, with a 5-byte header, a message payload, and a 2-byte CRC checksum.

The header contains the destination address field (2 bytes), an AM handler field (1 byte), a

group ID (1 byte), and packet length (1 byte). Although the default maximum packet size

is small, only 36 bytes, almost all the applications are satisfied by this maximum packet

length; they either generate very little sensory data per packet or perform their own data

aggregations or fragmentations at a level above. Note that the destination address is used

for link addressing and the promiscuous mode for packet sniffing is also possible. The

AM handler acts as a dispatch mechanism by specifying the correct higher-level handler to

invoke for each packet reception. It is analogous to a network port. This form of dispatch

is naturally supported by the nesC language using parameterized interfaces. Link-layer

acknowledgments are supported to acknowledge both link-layer broadcast (on Mica) and

unicast messages. However, we only use unicast message acknowledgments in our study.

Message buffer allocation is done statically above the network layer; no copying is done

across the entire network stack for either transmission or reception, except for exchanging

data down at the hardware register with the radio. It is important to note that once the

network layer has accepted the message buffer for reception or transmission, the application

must not modify the buffer to avoid buffer corruption until the message buffer’s control is

returned back through SendDone event or Receive event in TinyOS.


Below the AM layer, the TinyOS radio stack provides different levels of support

in software, depending on where the hardware/software boundary lies, which is governed

by the choice of the radio. In traditional computing, many of this low-level support, such

as the MAC layer, resides at the chip’s firmware and often cannot be accessed or changed.

However, the flexibility of the hardware/software boundary in TinyOS allows protocol de-

signers to probe down and change low-level protocols or even registers inside the radio. This

is exemplified by the radio stacks of the two Mica platforms. At the high level, the two

stacks support the common packet-level interface of Active Messages. However, below the

packet-level abstractions, the two radio stacks are quite different in both architecture and

implementation. Although the default TinyOS CSMA-based MAC protocol and link-layer

acknowledgment semantics is similar on the two platforms, the underlying mechanisms and

the choice of parameters are different. We discuss these differences in the next section; more

detailed discussion can be found in [48].

Mica Radio Stack

On the Mica, the RFM radio only provides a bit-level interface. Therefore, the pro-

cessor must encode and decode each byte using a DC-balanced scheme. The default byte en-

coding scheme is Single-bit-Error-Correction-and-Double-bit-Error-Detection (SECDED),

which has a 1-to-3 encoding overhead ratio. For each packet successfully received with

a correct CRC checksum, a link-level acknowledgment is sent by the receiver. A simple

CSMA-based MAC is employed [77]; it adds a random delay before listening for an idle

channel and backs off with a random delay over a predefined window when the channel


is busy. Since there is no direct support for carrier sensing and received signal strength

indicator (RSSI) reading by the RFM, which only provides the raw baseband signal, carrier

sensing is done by monitoring incoming data bits.

Mica2 Radio Stack

On the Mica2, the CC1000 radio provides a byte-level interface and performs

its own bit-level encoding using the standard Manchester encoding scheme. An empirical

forward-error-correction study on this radio suggests that any additional byte-level encoding

does not provide significant benefits to justify the extra processing cost [43]. Therefore, the

default is to use the hardware-based Manchester encoding with no forward error correction.

Another major difference is the way carrier sensing for detecting an idle channel is done

on the CC1000. The Mica2 radio stack improves carrier sensing by performing automatic

gain control and on-line estimation of the noise floor in software and compares it against

the sampled baseband energy at the time of idle channel detection. The Mica radio stack

lacks this capability because the RFM radio does not support an accurate received signal

strength indicator (RSSI) on the baseband, even though RSSI values can be obtained in

the radio stacks on both platforms. Note that RSSI values reported on both Mica and

Mica2 are obtained through the processor’s 10-bit ADC channel; calibration and conversion

is required if dBm units are desired. Although the MAC protocol is similar between the

two stacks, the different mechanisms in carrier sensing and data movement make the two

MAC layers very different. Furthermore, the parameters used in the two MAC layers are

also different as they are empirically tuned in each case; the parameters are summarized in


Mote Mica Mica2/(TinyOS 1.13)(Granularity) (raw bits) (raw bytes)Max. Initial 64 10MAC Backoff 3.2 ms (@20kbps) 4.1 msMax. Congest. 64 16

Backoff 3.2 ms (@20kbps) 6.6 msAck. Overhead 48 16

1.2 ms (@40kbps) 6.6 msTable 2.1: TinyOS Media Access Control Parameters on Mica and Mica2.

Table 2.1.

2.1.4 Implications

These observations of the sensor networking platforms today show that a sensor

node is limited in resources across several dimensions: compute power (1-4 MIPS), data

memory (1s-10s of kB), data bandwidth (10s of kbps), and available energy (battery size).

This implies that protocol design for this space must be simple and keep as little state as

possible due to limited memory. Furthermore, it must minimize communication overhead

since it is costly in both energy and bandwidth. In fact, the amount of bandwidth available

for multihop communication is 3 times lower than the channel capacity in a single cell

because a packet needs to occupy a communication cell 3 times during a multihop relay.

Another interesting observation is an imbalance in the ratio between compute

power and available memory on a sensor node. According to Amdahl-Case Rule, a balanced

system should have 1 MByte of memory for each MIPS (millions of instructions per second)

of processor power and each Mbps (mega bits per second) of bandwidth. However, in our

sensor network platforms, the ratio of memory is at least 3 orders of magnitude less than

processing power. One reason is that memory takes up a significant amount of chip real


CPU and Accelerators

ADC

Memory Banks

Crystal Driver

900 MHZ transmitter

Frequency Synthesizer

I/O Pins

Figure 2.2: Design layout of a SPEC mote.

estate, which affects the size of a sensor node, especially for a mote-on-a-chip platform such

as SPEC [38]. Figure 2.2 shows a picture of the layout of the SPEC chip. Visually, the

memory consumes about 20%-30% of the chip area, but the size of the memory (3Kb) is

three orders of magnitude smaller than expected, given the processor speed (4MHz). It is

expected that as technology improves over time, miniaturization will allow more space for

memory. However, we believe that the same imbalance may still exist since every silicon-

based functionality will scale down at the same rate.

The use of a low-cost and low-power CMOS based RF transceiver as a primary

communication device has significant implications for protocol design. Figure 2.3 shows

an empirical connectivity graph of a typical communication cell of a node in a 150-node

network using the RFM radio. The node at the center indicated by a cross transmits;


other nodes count the number of packets successfully received. The contour map illustrates

packet success rate or link quality fall off in term of percent number of packets received

from the sender node under no other traffic interference. The graph shows connectivity

is very noisy over a large portion of the communication cell, with only a small number of

nodes able to receive the senders’ packets well. That is, connectivity is not a clear cut

concept between connected and not connected; it is probabilistic and can span from 0% to

100%. This empirical observation is important because it breaks the typical assumption of

the circular-disc connectivity model, while communication is good up to some radius r and

non-existent beyond; we call this boolean connectivity. As a result, protocols that rely on

such an assumption can suffer significantly in real deployment. We use a typical multihop

routing process in sensor networks as an example.

Based on observing packets from other nodes and performing a set of local rules,

which may generate additional packets, the network must form and maintain a multihop

routing topology to support some higher level communication pattern, such as data collec-

tion or aggregation into a specific node. For example, the data sink node could announce

its desire to receive data and its network depth from itself, namely zero. Nodes that re-

ceive this packet can determine their network depth (one) and start generating packets,

which carry this new network depth and the node’s own address along with the data. A

periphery of depth-two nodes learns about nodes of depth one and can start sourcing data

for the sink. However, these packets will have to be routed through one of the nearer (or

lesser-depth) nodes. Each node hears packets from several neighbors and chooses one with

a smaller depth as its “parent” to route its traffic over the next hop. Progressively, more


distant nodes learn of parents and a spanning tree is formed and continually maintained as

data flows toward the sink. It will route around obstacles and find alternative paths when

nodes fail or join.

The problem with this elegant, simple algorithm, and with many of its variants,

is that low-power radio communication is lossy as shown in Figure 2.3 and highly variable

due to external sources of interference in the spectrum, contention with other nodes, multi-

path effects, obstructions, and other changes in the environment, as well as node mobility.

Therefore, the assumption of good connectivity upon hearing a message made by this sim-

ple algorithm would break down, and simply hearing a packet is not a good enough basis

for determining that two nodes are ’connected’. This approach can yield poor reliability in

shortest path routing, such as the simple scheme discussed above, since long links with low

reliability are likely to be selected for communication as they tend to yield shortest paths.

Since end-to-end reliability is a product of link loss rate at each hop, selecting unreliable

links would have an exponential effect on the end-to-end packet success rate. In general, it

is likely to be much better to take more hops using more reliable (typically shorter) links.

Therefore, it is essential to build a routing topology upon reliable links, which is the main

focus of this thesis.

Before we continue, we should establish a definition framework to help clarify our

explanations in the rest of the thesis. The usual concept of connectivity is defined relative

to whether a receiver can hear a sender; we call this physical connectivity. In Figure 2.3,

physical connectivity is scoped by the contour lines. Boolean connectivity assumes all links

with physical connectivity are good. However, as illustrated in Figure 2.3, this assumption

2.2. DESIGN SPACE OF ROUTING IN SENSOR NETWORKS 20

0 5 10 150

5

10

15

0.2

0.2

0.2

0.2

0.2

0.20.2

0.2

0.2

0.2

0.2

0.40.4

0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.6

0.6

0.6

0.6

0.60.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.8 0.8

Figure 2.3: Connectivity of a cell measured using 150 motes on an open tennis court with

RFM power setting of 70.

is invalid in sensor networks and raises a question on the definition of the communication

range. The usual concept of the communication range is defined relative to a small bit-

error-rate with free space propagation between a sender and a receiver. However, Figure

2.3 shows that with such a large variation in link quality across the different receivers, the

communication range becomes a fuzzy concept. In Chapter 3, we will return to this issue

and describe how we characterize the communication range into three different regions.

2.2 Design Space of Routing in Sensor Networks

In sensor networks, the design space of routing is driven by the communication

scenario required by specific applications. Although sensor networking is still in its infancy,


researchers have already been developing a vast set of potential applications. The communi-

cation scenarios in these applications can be quite varied, but they can be mainly classified

into few-to-many data dissemination, many-to-one tree-based routing, and any-to-any rout-

ing.

2.2.1 Network-Wide Dissemination

Network-wide dissemination is one of the basic forms of data communication in

sensor networks. One important scenario, especially for a query system in a sensor network,

is to disseminate interest in data from one or a few source nodes to the whole or a subset of

the network. For example, an application may only be interested in data that matches well

with some application specific predicates under a certain sampling rate. With the interest

disseminated throughout the network, only nodes that have discovered the interested data

would need to report. Another general usage is for issuing commands for application-

specific control as done in an automated pursuit application [71] or general network-wide

retasking. The usual mechanism to support dissemination is to flood the entire network,

an approach taken by a few of the important sensor networking querying systems, such as

Directed Diffusion [40] and TinyDB [52]. For retasking with small updates, Trickle [60] uses

some randomized local rules to control the send rate for scaling and expedite the rate of

dissemination. For general retasking, Deluge [39] extends the work in Trickle to support

dissemination of large data objects reliably. More advanced dissemination protocols that are

under development attempt to exploit the geographical locations or semantic information on

each node to make dissemination efficient by reducing redundant or useless retransmissions.


For example, [51] uses semantic information of a range query to suppress query propagations

to nodes that have data outside of the range of the query. Many routing protocols use

network-wide dissemination as a mechanism in to discover and build a network topology. In

particular, the reverse paths of the dissemination establish a routing tree topology towards

the origin of the dissemination. Many mobile ad hoc routing protocols takes this approach

to create routing paths on-demand. We will discuss them further later in this chapter.

2.2.2 Tree-based Routing

Tree-based routing supports a vast set of data-collection applications, such as

environmental or habitat monitoring. In tree-based routing, each node can potentially be

both a router and a data source. The data will either be forwarded to a common destination

or multiple destinations. These destinations are often called the sink nodes. To support such

a communication pattern, all the nodes must self-organize themselves into a network with

a spanning-forest topology, composed of different trees with the tree roots being the sink

nodes. The sink node is a base station or bridge where data is collected for processing on

a more powerful machine, such as a PC computer. Sensor-network querying systems, such

as Cougar [79] or TinyDB [52], have been developed to support in-network processing and

aggregations over a network with tree-based topology. Directed diffusion [40] also operates

in this form of communication scenario, with multihop routing trees built for each data sink

node. Multiple sink nodes can coexist because each sink node may be interested in different

data, and thus they form different trees to collect it. In this case, a forest of multiple trees

can be built by running tree-based routing in each instance.


2.2.3 Any-to-Any Routing

Unlike tree-based routing where the final destination of traffic is always the sink

node, any-to-any routing supports data delivery from any node to any node in the network

similar to Internet routing. A few important in-network storage systems in the literature

e.g., [66, 49] rely on this any-to-any communication infrastructure. Similar to tree-based

routing, every node is both a router and data source; however, since the destination can

be any node, a network-wide addressing and discovery scheme is necessary, which could

be challenging given that nodes may be moved or fail, and that new nodes may join. The

common approach in the literature is to use physical geographical information to build a

coordinate system for network-wide naming. An alternative approach is to use a virtual

coordinate system based on connectivity information as discussed in [65]. In both cases,

different nodes may carry the same network address since they may be at the same location

or scope. Since the primary usage scenario is for in-network storage, this naturally takes

local redundancy into account. This implies that the final destination can be a cluster of

nodes rather than a single location. Nevertheless, tree-based routing can be classified as a

subproblem of any-to-any routing. In solving for any-to-any routing, the issues of network-

wide naming and discovery must also be resolved. Node mobility can further complicate

these issues.

2.2.4 Implications

The design space of routing in sensor networks is different from that in the Internet

and MANET (Mobile Ad Hoc Networks). The Internet is a wide-area wired network with


applications generating many independent flows of traffic originating from and destined to

anywhere in the network. Node failures or link congestions can occur within the Internet,

but the complications from wireless links do not exist. MANET is a local area network;

its traffic pattern consists of many pairs of independent traffic flows. The main research

challenge for this kind of network is mobility handling.

In contrast, a sensor network system typically operates as a collaborative system,

with many data sources routing related traffic to interested sink nodes. Since packets are

often application data units, in-network processing is a key technique to minimize energy

through data aggregation. A simple and robust spanning-forest routing mechanism matches

very well with many of the intended traffic models in sensor networks and also opens up

tree-based aggregation opportunities. While any-to-any routing can also provide tree-based

routing, it is a challenge to build and maintain an any-to-any routing topology efficiently

in a scalable way. In addition, only a few specialized systems in the network storage design

space today require such a routing support. Therefore, a tree-based or spanning-forest

routing support seems to be an emerging common networking abstraction, which reinforces

the results studied in [48].

Another implication is the emerging need to maintain a stable network topology to

perform in-network processing. Node mobility is much less of a concern in sensor networks

as compared with traditional mobile computing since nodes are relatively static. While the

routing system must cope with node failures and connectivity changes due to environmental

effects, frequent route changes may incur a high overhead for high level in-network processing

algorithms to adapt. Therefore, maintaining a stable adaptive routing topology becomes

2.3. DETAILED ROADMAP 25

one of the important criteria for routing in sensor networks.

2.3 Detailed Roadmap

From the platform and routing implications that we have discussed in this chapter,

we learn that designing a routing system for sensor networking carries a different set of

challenges from that in the Internet and MANET. This thesis seeks to revisit the problem

of routing in the sensor network context by taking a holistic approach. The implications in

this chapter allow us to conceptualize the challenges involved and lead us to identify and

isolate important routing subproblems and understand their interactions. The following is a

detailed road map illustrating the evolution process; we identify critical problems across the

different layers and take a whole-system perspective to evaluate and explore the intricate

interactions among these layers and the global effects.

We begin, in Chapter 3, by providing a rich set of empirical observations indicat-

ing the lossy and noisy nature of wireless connectivity on our sensor network platforms.

Unlike the boolean connectivity model, connectivity in real deployments can be noisy and

time varying. Our experimental data show that link quality fall-off varies substantially with

respect to different receivers; in fact, the communication range, where all links would have

good connectivity, is surprisingly short in our data. Therefore, even without interference,

to conclude from hearing a message that a link would exhibit good connectivity is a poor

assumption. Nonetheless, many wireless routing protocols today still rely on this boolean

connectivity assumption. The result is a high end-to-end packet loss rate as discussed be-

fore. Link retransmissions at each hop can overcome such a potential unreliability. However,


retransmissions would carry a high cost in bandwidth and energy, which are precious re-

sources in sensor networks. This motivates us to identify a subproblem in which every node

must characterize link quality for each neighboring node using an on-line link estimation

process. In Chapter 4, we discuss such local processes and explore a set of candidate link es-

timators. By characterizing the link quality of each potential neighbor, high-level protocols

can exploit routes with high-reliability in both directions of communication. This approach

changes the assumption about connectivity. Instead of accepting the boolean connectivity

assumption, we treat connectivity as a probabilistic metric and define it relative to link es-

timation. We should think of connectivity as a statistical relationship, Pij(t), representing

the probability of successful packet transmission from node i to node j and time t. We call

this probabilistic connectivity throughout this chapter.

However, estimating P for each neighbor can be costly in memory since statistical

history must be maintained for each node. This issue is particularly prominent in sensor

networks because of its high cell density property due to short sensing range. Since the

connectivity cell is irregular, as in Figure 2.3, a high node density would imply that a node

can potentially hear many nodes near it or very far away. All these nodes are neighboring

nodes if we take the common concept of neighborhood as defined relative to physical con-

nectivity. We call these nodes potential neighbors. Since connectivity is probabilistic rather

than defined within a specific cell radius, the number of potential neighbors is not bounded,

since there is a non-zero probability of hearing a node far away. As a result, it is difficult to

bound the number of potential neighbors a priori; non-uniform density in actual deployment

would further complicate the problem. As a result, each node faces a challenge of manag-


ADiscover & characterize

connectivity

Neighbor management- keep the good ones- build a connectivity

graph

A

Select Good RoutesA

Figure 2.4: A holistic view showing the cross-layer interactions of routing.

ing a potentially unbounded number of potential neighbors. In addition, connectivity to

each potential neighbor must be estimated in order to identify a reliable subset of them for

routing. The tight constraint in data memory renders maintaining link statistics for each

potential neighbor impractical, as the number of them can also go unbounded. Therefore,

a local process must exist to dynamically manage a notion of neighborhood, which consists

of a subset of neighbors suitable for routing, using only constant memory. This subproblem

is unique and especially important for routing in sensor networks.

These two subproblems of link quality estimation and neighborhood management

under limited memory have led us to concretely establish our holistic approach in identi-

fying the core underlying problems of routing in sensor networks. Figure 2.4 illustrates an

overview of this holistic perspective on routing. Any routing system, before performing any

route explorations, must first self-discover a network connectivity graph, which is analogous


to building a physical map before determining how to travel from one place to another. The

lossy wireless characteristics found in our sensor network platform complicate such a map

building process as it is necessary to quantify link quality to determine connectivity. There-

fore, link quality estimation becomes a fundamental map building process for routing, with

each node participating in a distributed fashion to build a distributed connectivity graph,

with each link weighted by link estimation. We call such a graph as derived connectivity

graph.

Above the link estimation process is neighborhood management. It assists the

graph building process by dynamically maintaining a subset of good neighbors with reliable

links suitable for routing. In particular, it must deal with the challenge of using only

constant memory on each node to maintain a subset of neighbors regardless of the actual

cell density. The result is a distributed graph, which is a subset of the derived connectivity

graph, that continuously adapts to changes in link quality, node failures, and new nodes

joining. We call this the logical connectivity graph. In Chapter 5, we present a study of

such a process and explore what management policy is effective to realize such a goal.

The routing process should run upon this kind of logical connectivity graph. For

example, a node would be a neighbor only if its link quality exceeded some threshold, say

75%. Connectivity is not necessarily symmetric, but nodes can broadcast link estimates

to neighbors or assume that most good links are roughly symmetric and verify the reverse

direction for links that are actually used. Route selection using this probabilistic connectiv-

ity approach can greatly improve packet delivery reliability in the shortest path algorithm

discussed before. Moreover, if we have a good estimate of the link success rates, we can use


more sophisticated path selection rules. For example, we may assume inductively that all

nearer nodes maintain an estimate of the path success rate from them to the sink. Once we

have an estimate of the local link success rate, we can locally combine the two estimates,

assuming that the success rate along the path from the neighbor is independent of how we

arrive at the neighbor, to select the parent that will give the maximum success rate to the

sink and we can record this as the path reliability estimate for use by children. All the above

routing processes would form a routing topology, as exemplified by the top layer in Figure

2.4, which is a subgraph of the logical connectivity graph. The resulting network traffic

would influence the underlying connectivity graph, which in turn would introduce changes

all the way up to the routing topology. Understanding these cross-layer interactions is one

of the goals of this thesis.

In the thesis, we focus on tree-based routing since it is the most common and

important routing service in sensor networks. Exploring the appropriate tree-based routing

protocols and cost functions that take our holistic approach and utilize the derived connec-

tivity graph is the goal of Chapter 6. We also present an overall architecture that illustrates

how the three local processes in Figure 2.4 can be combined to yield a complete tree-based

routing system. The routing system has many important underlying issues that it must

face, and we discuss each of these issues and provide an appropriate understanding and the

mechanisms to cope with them.

In Chapter 7, we present more information about the architectural framework and

the implementation details of our routing system. The goal of the chapter is to evaluate the

different routing cost functions and protocols discussed in Chapter 6, using extensive simu-

2.4. RELATED WORK: A HIGH-LEVEL PICTURE 30

lations and empirical experiments. We evaluate how carrying the probabilistic connectivity

concept all the way to the routing layer would affect global network-wide system properties,

such as the end-to-end success rate of packet delivery, the hop-count distribution of the net-

work, and topology stability. In addition, we study the intricate cross-layer interactions and

understand how the three local processes influence each other as in a closed-loop system.

These analyses help us to understand the routing dynamics when a holistic approach is

taken.

In Chapter 8, we summarize our contributions and discuss a few important exten-

sions that we are planning to take in the future to improve this work, especially taking the

holistic approach all the way to the application level for in-network processing to influence

routing decisions.

2.4 Related Work: A High-Level Picture

The related work on wireless ad-hoc routing can be found across a rich set of

literature from packet radios to mobile computing and sensor networks. On one hand,

these networks require routing protocols that are tailored to specific platform characteristics,

resource constraints and application needs. However, on the other hand, they address many

issues that are similar. Therefore, one of the challenges is to analyze the protocols from these

different networks, and understand the background that influences the particular approach

that they take and the role they play in the overall picture of wireless ad-hoc routing. In

this section, we attempt to paint such a picture and provide a high-level discussion on the

relevant related work found in packet radios, mobile computing, and sensor networks. We


leave the protocol details for the related work section in each of the remaining chapters.

2.4.1 Packet Radio

Packet radio research, such as the DARPA packet-radio project [41], began study-

ing wireless ad-hoc routing protocols around the 1970s. One of the primary goals of these

projects has been to devise protocols that guide the wireless nodes to self-organize into a

multihop mobile packet-radio network. It was assumed that the dominant traffic pattern

would comprise many independent traffic flows similar to that in the Internet today and

in the any-to-any traffic model. Nevertheless, packet radio research sets a foundation of

what an ad hoc routing protocol should be like; it should be scalable, expandable, and

robust against network dynamics, such as mobility and node failures. Such research goals

are influential on the later kinds of wireless networks, such as mobile computing and sensor

networks.

Similar to sensor networks, typical packet-radio nodes were also primitive, with

low-bandwidth wireless technology and limited compute power, memory and energy. They

also suffered from the lossy and noisy wireless characteristics. However, instead of taking

the probabilistic approach to connectivity that we advocate and building routing topologies

upon our definition of connectivity, their approach was to use a link estimator to identify

links with low link quality and avoid using them for route selection [41]. In this thesis, we

take the holistic approach and integrate link estimation into routing cost functions.

There also exists work [54, 70] that overcomes the limitation of memory resources

in maintaining neighborhood information in dense networks. In [70], neighbor selection is


done using a random selection process similar to the percolation theory in random graphs. It

focuses on the neighbor selection layer using the boolean connectivity assumption. A better

approach is presented in [54], which performs link estimation and relies on a candidacy list

for potential neighbors to build up link estimations before considering insertion into the

neighbor table. We discuss the details of these protocols in Chapter 5.

At the routing level, there exists various packet-radio protocols that are distance-

vector based, but use different routing cost functions other than shortest path with hop

counts. These protocols include Least-interference routing [73], Least-resistance routing

[62], and Maximum-minimum residual capacity routing [14]. These metrics enhance the

reliability of communication by routing over less congested or interfered paths. While these

factors are important to improve the quality of service in routing, these protocols rely on

the boolean connectivity assumption; it neither takes a probabilistic view of connectivity

similar to ours nor do they define routing cost functions to directly address the fundamental

lossy characteristics, which is the theme of this thesis.

2.4.2 Mobile Ad Hoc Networks (MANET)

In the early 1990s, mobile computing networks began to emerge along with the

advent of laptop computers and local area wireless networks. The technology became preva-

lent as short-range, spread-spectrum radios became affordable and protocol specifications

became standardized, with IEEE 802.11 [1] being the widely used one. Although mobile

computing and packet-radio networks share similar high-level design goals, the degree of

mobility in mobile computing is assumed to be high because users are expected to carry


their laptops and move around within an office or a building. The idea is to maintain a

multihop network over a set of laptop computers to support any-to-any traffic among these

nodes. Users are expected to move within an area such that the network can remain con-

nected. Supporting mobility efficiently rather than creating efficient optimal routing paths,

such as shortest paths, then became the first priority that routing protocols in MANET

had to consider. However, as mobile computing becomes more pervasive in indoor environ-

ments, a significant amount of ad-hoc communication research is devoted to traffic pattern

that directly communicates with the base station. That is, mobile nodes do not need to for-

ward each other messages since each node can directly communicate with one or more base

stations. Thus, mobility handling reduces to a problem of handoff from one base station

to another. This kind of infrastructure mode has become the dominant form of wireless

communication for portable computing.

Building upon the packet-radio literature, many important ad-hoc routing proto-

cols for mobile computing have emerged. However, improvements in technology and differ-

ent usage scenarios have impacted these protocols to leave out some of the system issues

that packet-radio protocols have addressed before. In particular, the abundant resources

in memory and compute power on a laptop computer relax the constraints on protocol

simplicity and the requirement for neighborhood management. Since mobility yields in-

termittent connectivity, having “some” connectivity is often adequate as packet losses can

often be recovered through link retransmissions. As a result, many of the protocols assume

boolean connectivity rather than performing link quality estimation, as such information

may become stale quickly or be handled by underlying link layer mechanisms.


There are improvements over the basic distance-vector based protocol. For ex-

ample, DSDV [58] uses a simple mechanism to convey route “freshness” that guarantees

cycle-free topologies and solves the well-known counting-to-infinity problem [9]; both of

these problems can arise easily when nodes are moving around, since stale routing infor-

mation can lead to cycles. Although the routing cost function still uses hop count, DSDV

optimizes for path “freshness” before minimizing the “shortest path”.

Another improvement in the mobile computing literature is source-initiated on-

demand routing, which exploits the fact that since traffic is compose of many independent

traffic flows, the system should only maintain state to support actual traffic. DSR [44]

and AODV [59] fall into this category. These protocols rely on the source node, which

should know the destination’s network address, to initiate route discovery to the destination

through some form of flooding mechanism; the reverse path of the flood is used as the

routing path. They also assume link qualities are good in general, while AODV does have

mechanisms to avoid routing over asymmetric links. For these protocols, their goal is to

identify a path to the destination quickly rather than optimizing for some metric such as

“shortest paths”.

A special kind of receiver-based source-initiated on-demand routing called Gradi-

ent Routing (GRAd) [67] has also emerged. It supports the same any-to-any traffic model

as do other on-demand routing protocols. It first establishes a gradient of routing cost by

using the same route discovery mechanism to build up reverse path routing. However, in-

stead of selecting the next node to forward a message via a unicast packet, all forwarding is

done as a local broadcast by the sender. Each receiver decides on whether it should forward


or not. This form of receiver-based routing is resilient to mobility, but may carry a high

communication overhead in redundant forwarding.

Another kind of on-demand routing protocol, TORA [57], that is based on a link

reversal technique has also emerged. It assumes boolean connectivity and links are bi-

directionally good in general. It first discovers a network through flooding and builds a

directed-acyclic graph (DAG) rooted at the source. To handle mobility and link failure, it

relies on the discovered DAG and maintains the DAG through link reversal mechanisms.

Since the topology is a DAG, it is always loop free. The protocol is more complex than

DSR and AODV and requires global time synchronization to establish temporal order.

2.4.3 Sensor Networks

The advent of sensor networking in the late 1990s, as pioneered and exemplified by

Directed Diffusion [40], shows a form of networking that is like packet radio, but supports

very different traffic scenarios and applications. In general, sensor network nodes are rela-

tively immobile; however, connectivity variations and node failures can be quite frequent.

The resource constraints are tight in both packet radios and sensor networking. Multihop

traffic is the norm of communication. One of the major characteristics of sensor networks

is to overlap communication and computation in the form of in-network processing. Since

communication is more expensive than computation in term of energy, processing within

the network helps to reduce the amount of multihop communication and to increase net-

work life time. Therefore, in-network processing is a key to cope with the tight resources

available in sensor networks.


Packet Radio Mobile Computing Sensor NetworksMobility Depend on Treat as Relatively Immobile,

Scenarios First Priority Adapt SlowlyResource Limited Abundant Limited

ConstraintsTraffic Independent Flows, Independent Flows, Correlated, Multihop,Pattern Multihop, Single-hop to BS, Many-to-one(few)

Any-to-Any Any-to-Any In-network ProcessingAddressing Internet Like Internet Network Address

Addressing Protocol FreeSource of Users and the Mainly from Mainly from theVariations Environment Users Environment

Table 2.2: Summary of the differences among the different wireless networks.

Directed diffusion shows a sample framework of how in-network processing and

multihop routing can be done to support the intended applications in sensor networking.

Such a framework does not dictate the underlying protocols and implementations while

allowing the applications to have the flexibility to devise new protocols. The idea is to

have a sink node to issue an interest in some particular data similar to the route discovery

in on-demand protocols, except that it is now destination-initiated and every node in the

network can potentially be a source of data. Nodes with the interested data will send the

data back along the reverse paths on the routing tree, with immediate nodes performing in-

network processing such as aggregations. Note that such a traffic pattern does not require

a network-wide addressing scheme, since nodes need not know the address of the sink node;

they simply need to know the link address of the next hop. Table 2.2 provides a framework

that summarizes the different dimensions across the different types of networks discussed

in this section.

The source-initiated on-demand routing protocols, used mainly to support inde-

pendent flows in mobile computing, does not match the many-to-few data collection traffic


pattern in sensor networks. However, as discussed in Section 2.2.1, this network-wide dis-

semination process can be used to establish a reverse-path routing tree for data collection

if the route discovery is sink-initiated. That is, sink-initiated tree-based routing would face

the same issues as source-initiated on-demand routing, since the discovery process is similar

and the reverse path is also used to route data.

For routing in packet radio and mobile computing, topology stability is generally

not a concern since nodes are expected to move anyway and getting data reliably to the

destination is the ultimate goal. However, for sensor networks, topology stability would be

very useful, since it benefits in-network processing by allowing high-level algorithms to rely

on a stable routing tree to perform aggregations. Therefore, achieving network stability is

one of the main goals in our study.

Besides tree-based routing, there exists other related work on routing in sensor

networks that fall in the any-to-any routing based on establishing either geographical or

virtual coordinates as discussed in Section 2.2. The geographical approach would require

additional localization support, such as a Global Positioning Systems (GPS), while the

virtual coordinate approach is still in its early stage of research [65].

Many of these routing protocols in sensor network assume that lossy connectivity

is hidden by low level mechanisms and routing protocols can safely assume that they operate

on a well-defined boolean connectivity graph and rely on link failure mechanisms to adapt

to changes. As discussed in the previous section, we take a different approach and expose

the underlying connectivity as a probabilistic metric to the routing layer such that it can

make the best routing decisions when different degrees of connectivity are encountered.

38

Chapter 3

Understanding Link

Characteristics

The starting point for development of a practical topology formation and routing

algorithm is to understand the dynamics and loss behavior of wireless connectivity in sensor

networks under various circumstances. Rather than carry along with a detailed model of the

channel or the propagation physics, we have sought a simple characterization of connectivity

through empirical studies over our sensor networking platform discussed in Chapter 2. Our

experimental results show that connectivity does not resemble the circular-disc model used

in many formulation of distributed algorithms. To the contrary, it is irregular, time-varying

and probabilistic. We present a simple model to approximate these empirical results, such

that synthetic packet traces resembling real-world packet losses can be generated to support

higher level protocol design and simulations. We also measure the actual channel capacity

under periodic traffic and the effectiveness of using received signal strength to predict link

3.1. CONNECTIVITY, RANGE, AND LINK DYNAMICS 39

quality. All these observations lead us to stress the need to take this probabilistic concept

of connectivity all the way up to the routing level.

3.1 Connectivity, Range, and Link Dynamics

With primitive, low-power radios, sensor networks face wireless characteristics that

tend to be more noisy and lossy than those found in typical wireless computer networks.

Thus, we carefully characterize connectivity observed on our sensor network platform. We

perform many empirical experiments to study packet loss behavior over distances across

many nodes, qualitatively define the structure of the communication range, observe time

variations of link quality, and capture the effects of obstructions and node mobility. Al-

though these experiments are done over the two different Mica platforms, the overall results

are similar and yield the same implications for high level protocols.

3.1.1 Physical Connectivity and Communication Range

We measured packet loss rates between many different pairs of nodes at many

different distances over a long period of time. Each node is scheduled to transmit packets

at a uniform rate and other nodes record the successful reception of these packets. That is,

only one node transmits at any given time, and for each transmitter, we obtain numerous

measurements at different distances. With sequence numbers embedded in all packets, we

can infer losses and generate a sequence of success/loss events that would constitute packet

loss traces. We vary the placement and environment of the nodes to explore how they may

affect connectivity.


One such representative measurement is summarized in Figure 3.1. It shows a

scatter plot of how link quality varies over a distance for a collection of many pairs of Mica

nodes. The nodes are placed as a line 3 inches above the ground in an open tennis court; the

first 20 nodes are placed 2 feet apart in the line while the rest are 4 feet apart. Each node is

scheduled to transmit 200 packets at 8 packets/s, with a power level setting of 50. A number

of other settings show analogous structure. As expected, for a given power setting there is a

distance within which essentially all nodes have good connectivity. The size of this effective

region increases with transmit power. There is also a point beyond which essentially all

nodes have poor connectivity. However, in this clear region, some very distant nodes

occasionally do receive packets successfully. Between these two points is the transitional

region, where the average link quality falls off relatively smoothly, but individual pairs

exhibit high variation. Some relatively close pairs have poor connectivity, while some distant

pairs have excellent connectivity. A fraction of the pairs have intermediate loss rates and

asymmetric links are common in the transitional region. This three-region communication

structure is observed on both Mica and Mica 2 platform.

These observations imply that the usual concept of the communication range of

a pair of nodes can be quite misleading when we consider many pairs of nodes together.

The general concept of connectivity used in most studies, such as [58, 44, 59, 57], is either

a sharp fall off of link quality at the end of the communication range (as defined by the

specification of the radio) or a degradation in link quality over distance at the same rate

for all nodes, since they follow the same path-loss model. In fact, the communication range

consists of three unique regions, with the noisy, transitional region making up most of the


0 10 20 30 40 50 60 700

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Feet

Rec

eptio

n S

ucce

ss R

ate

Link QualityStd. Dev.Mean

Each circle representsthe link quality of a directed edge in the topology Edges with the same distance can have very different reliability.

EffectiveRegion

TransitionalRegion

Clear Region

Figure 3.1: Reception probability of all links in a network, with a line topology on a tennis

court. Note that each link pair appears twice to indicate link quality in both directions.

communication range and being very sensitive to the particular sender and receiver pair.

Such a large transitional region can give a false impression that the reliable communication

range is very large, especially when a few long reliable links do exist. In a dense deployment,

nodes are close to each other, and many neighboring nodes would fall within the effective

region; good connectivity for routing should exist. If the deployment is too sparse, most of

the neighbors would fall in the clear region and a network cannot be established. There is

also the point that if the network is not dense enough and all links in the network end up

falling within the transitional region, reliable routing would be difficult since the underlying

links that build up the derived connectivity graph for routing can have large variations in

reliability. Therefore, we stress the importance of the spacing of nodes within the effective

communication range in actual deployments. One can achieve this by measuring the effective


communication range in each deployment at the desired transmit power, and using the

resulting estimation of the effective communication range to guide the nodal distance. An

alternative is to rely on protocols to configure the system to achieve this property and adapt

to a given deployment.

3.1.2 Time Variations

Our observations have focused on the relationship between link quality and dis-

tance. We now turn to observations that study the time variations of link quality even when

nodes are stationary. We start with a fixed source node sending to a receiver at a given

distance in an indoor environment. Figure 3.2 shows a situation where a transmitter sends

8 packets/s to a receiver 15 feet apart for the first twenty minutes. Note that the mean

is about 20%, but the fluctuations range from ±20% to ±10%, using a sample size of 240

packets. Although there are no observable interferences, link quality varies a lot within such

a short period. The same pair of nodes were placed 8 feet apart after the first 20 minutes

and remained stationary for more than four hours. We see that link quality again undergoes

abrupt changes in these four hours. For example, it exhibits a mean of about 65% with

about a ±10% swing, using a sample size of 240 packets. Despite the fluctuations, this mean

and the fluctuation swing remain relatively stable over the course of the experiment. This

implies that if a link is characterized, the time window for the characterization to predict

future link quality can be relatively long, given there is no observable interference from

other traffic. However, this is not true in all cases. Instances exist where the mean and the

degree of variations in link quality can vary over time so much that link characterization


0 50 100 150 200 250 3000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Time (minutes)

Rec

eptio

n P

roba

bilit

y

Figure 3.2: After 20 minutes, the sender is moved from 15 ft to 8 ft from the receiver and

remained stationary for four hours.

needs to adapt to these changes quickly. Figure 3.3 shows link quality varying over a pair

of nodes which are deliberately placed close to the end of their communication range in an

indoor laboratory environment. Link quality varies from 0% up to 70% over a period of

7 hours. This evidence suggests that even though nodes are immobile and no observable

influences occur in the physical environment, link quality can vary significantly over time.

As a result, agility in link estimation becomes an important metric in our study of link

estimators in the next chapter.

3.1.3 Obstructions and Mobility

In many cases, obstructions from a moving object, such as a person, can affect

the quality of links between nodes. We attempt to capture such effects by measuring how


Figure 3.3: Link quality variation over a 7 hour period in an indoor laboratory environment.

the packet reception rate changes when a person stands in the vicinity receiver. Figure 3.4

shows the result of an experiment in which a person deliberately stands beside the receiver

for about five minutes (from time at 15 minutes to 20 minutes). It shows that the reception

probability is very sensitive to the person’s position, with discrete changes of substantial

magnitude. At some times it blocks communication, at other times it has no effect, and at

others it actually improves matters. These traces are from an indoor environment; however,

our outdoor traces show similar behavior.

Figure 3.5 shows a more complete scenario that involves moving a sender and

receiver pair to different distances. Although mobility is not expected since, in many sensor

networks, it is easy to envision opportunities where nodes can be moved by external forces

from the environment, as when wind or moving objects are being sensed. Again, we observe

how the packet reception rate of a link changes as we deliberately move the sending node to


Figure 3.4: Obstruction effects on packet loss behavior. A person deliberately stands beside

the receiver in the interval 15-20 minutes.

different distances from the receiver. The experiment begins with the sender placed at 14

feet from the receiver. After 9.5 minutes, the sender is placed 8 feet from the receiver. At

17.5 minutes, it is placed 4 feet from the receiver. At 21 minutes, it is moved back to 12 feet

from the receiver. Finally, at 26.5 minutes, it is placed 4 feet from the receiver again. The

results show a strong correlation of link quality with the distance between the two nodes.

We have observed instances of abrupt changes and substantial variations of link

quality between a pair of nodes from these experiments. The instances reinforce the need

for link estimation to track these changes quickly and accurately in the derived connectivity

graph, over which the routing process would be aware of and adapt to these changes.


Figure 3.5: Movement effects on packet loss behavior. Transmitter is deliberately moved to

different distances at various times.

3.1.4 Irregular Connectivity Cell

Our observations have focused on connectivity issues between a pair of nodes. The

lossy nature can best be seen if we observe a typical connectivity cell over a set of nodes

in a two-dimensional field. Figure 2.3 shows such a cell which illustrates how the packet

reception probability of a sender falls off over a 150-node network deployed as a grid on

an open tennis court. The experiment is done over the RFM radio with a power setting of

70. As seen from the graph, the connectivity cell is very irregular, with the effective region

covering a much smaller area than the transitional region. Furthermore, there exist many

nodes whose probability of reception is less than 20%; these nodes would be treated as

neighbors at the protocol level if the boolean connectivity assumption is used. Not shown is

the degree of link quality asymmetries, which is expected to be significant in the transitional


region.

3.1.5 Implications: Connectivity and Hop-Count

An important implication of this section is to give a new perspective on the defini-

tion of connectivity and a new understanding of communication range and hop-count. The

lossy data reported in this section led us to define connectivity relative to link estimation.

That is to say, without knowing the actual link quality, connectivity is meaningless. For

example, a link with more than 95% loss rate is not useful at all. In the process of building

a derived connectivity graph for routing purposes, therefore, the link quality of each edge

must be defined for the graph to be meaningful.

With this probabilistic view of connectivity, we can provide a better definition of

communication range. Our data shows that the communication range indeed consists of

three distinct regions: effective, transitional, and clear. A conservative approach would take

the communication range as the effective region where the link qualities for all links in that

region is above 90% in both directions. As shown in our data, this is much shorter than

the observed connectivity cell.

The definition of a hop-count becomes more complicated. The usual concept is to

bind hop-count with connectivity. However, when connectivity is probabilistic, the concept

of a hop-count needs to be revisited. One can define all nodes with any physical connectivity

as a one-hop neighbor. However, many of these one-hop neighbors would be far away and

have very unreliable links. Once they are considered as neighbors, they are attractive for

routing since they may yield shortest routing paths. This is the reason why routing protocols

3.2. MODELING THE OBSERVED LINK CHARACTERISTICS 48

that simply rely on hearing a message to define connectivity perform poorly. Therefore, we

only define nodes that a node hears as potential neighbors. Following our probabilistic

perspective on connectivity, one can define a one-hop neighbor relative to link estimation;

for example, a node with link quality above some threshold. This in effect has created a

logical connectivity graph that the neighborhood management process is responsible for.

We will revisit the concept of a hop-count as we discuss the neighborhood management

process in Chapter 5 and the routing process in Chapter 6

3.2 Modeling the Observed Link Characteristics

Modeling the essence of the time-varying characteristics of the three-region con-

nectivity structure, when applied to a large field of nodes as seen in the previous section, is

the main objective of this section. Capturing these behaviors is an important step towards

making simulations of packet loss dynamics of real networks possible for protocol design

and evaluation. Rather than taking detailed models that explain the complex sources of

packet loss in a real network, we abstract these complexities by taking a probabilistic link

behavior model for simulations built from traces collected empirically.

We first compute the packet loss mean and variance from the traces collected

for Figure 3.1 to create a link quality model with respect to distance. For each directed

node pair at a given distance, we associate a link quality (packet loss probability) based on

the mean and variance extracted from the empirical data, assuming the variance follows a

normal distribution. An instance showing how this model captures a node’s connectivity cell

is shown in Figure 3.6; it matches well with the spatial irregularity shown in the empirical

3.3. BINOMIAL APPROXIMATION OF STATIONARY PACKET LOSSDYNAMICS 49

1 2 3 4 5 6 7 8 9 101

2

3

4

5

6

7

8

9

10

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.3

0.3

0.3

0.3

0.3

0.3

0.3

0.3

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.7

0.7

0.7

0.7

0.7

0.7

0.7

0.90.9

0.9

0.9

0.9

0.9

0.9

0.9

Sender

Grid

Y C

oord

inat

e

Grid X Coordinate

Figure 3.6: Cell connectivity of a node in a grid with 8-foot spacing as generated by our link

quality model.

observations in Figure 2.3. This model of packet loss characteristics is used in all simulation

studies in this thesis.

3.3 Binomial Approximation of Stationary Packet Loss Dy-

namics

The previous section illustrates how empirical traces are used to assign the dis-

tribution of packet losses over a distance. In this section, we investigate whether binomial

approximation or “coin-flipping” is adequate to capture the instantaneous variations of

packet loss, given a fixed average packet loss behavior. Since the outcome of packet recep-

tion is either a loss or success, a simple model is to treat each packet reception as a Bernoulli

Trial, with 1 denoting a success and 0 denoting a loss, where p equals probability of success

3.3. BINOMIAL APPROXIMATION OF STATIONARY PACKET LOSSDYNAMICS 50

and 1− p equals probability of loss. For each link, the value of p or 1− p can be obtained

through the packet loss probability model discussed above. The assumption of this binomial

approximation is that each trial is independently and identically distributed. In reality, this

may not be the case, but we can compare this approximation with the empirically collected

traces, and evaluate whether it is valid or not.

To investigate whether packet loss in our traces follows the binomial distribution

when there is no observable physical influence, we plot quantiles extracted from the sta-

tionary portion of our data in Figure 3.2 against quantiles derived from the theoretical

Binomial distribution. The expected value or average packet success rate from the data

set is 65%, so we set the expected value in the Binomial distribution to this value. The

resulting quantile-quantile graph is shown Figure 3.7. By a quantile, we mean the fraction

of points below the given value. If the data in Figure 3.2 follows the Binomial distribution,

the data set should be linear along the 45 degree line.

Figure 3.7 shows a good match when the quantile is near the mean, but a slight

deviation at both extremes. This suggests that the empirical data has a larger degree of

variance than the Binomial distribution model. Nonetheless, Figure 3.7 suggests that the

Binomial distribution is a fairly good model to approximate the instantaneous dynamics of

packet loss. Furthermore, the Binomial distribution also supports the macroscopic behavior

observed in empirical link quality variations. That is, variation seems significant when

packet loss is around 50%, while it is minimal at both extremes (0% and 100%).

3.4. SYNTHETIC TRACE GENERATION 51

40 50 60 70 80 9040

45

50

55

60

65

70

75

80

85

90

Quantile (Theoretical Binomial Distribution)

Qua

ntile

(E

mpi

rical

Dat

a)

Empirical Data and Binomial Distribution Quantile−Quantile Plot

Quantile−Quantile Liney=x

Figure 3.7: Quantile of empirical data against quantile of binomial distribution.

3.4 Synthetic Trace Generation

In this section, we expand our link quality model such that we can synthetically

generate packet traces that resemble empirical traces as a mean of initial evaluation. This

ability allows us to evaluate protocols or link estimators using mostly synthetic traces, with

which we have the full control and information required to drive systematic studies. The

previous sections allow us to model packet loss dynamics for a given loss probability. To

model changes of link quality resulting from mobility or obstacles at the receivers, we make

the loss probability p a piecewise function of time p(t). In order to generate a synthetic

trace similar to that in Figure 3.5, we define p(t) as the sequence of steps shown in Table 3.1.

These values are chosen by partitioning the traces found in Figure 3.5 into five different

regions and matching the average reception probability over each 30 second interval within

3.4. SYNTHETIC TRACE GENERATION 52

Figure 3.8: Time series comparison of empirical traces with simulated traces.

that time.

t (minutes) p(t)0 - 9.5 2.63%

9.5 - 17.5 46.57%17.5 - 21 83.40%21 - 26.5 28.22%26.5 - 30 91.18%

Table 3.1: Definition of p(t) to model Figure 3.5

The resulting trace derived from p(t) using the binomial approximation is surpris-

ingly close to the empirical trace as shown in Figure 3.8. The simulated trace captures the

essence of the empirical trace, except for a smaller degree of variance due to the deficiency

from the binomial approximation. This form of synthetic trace generation is used heavily

in evaluating the different link quality estimators in Chapter 4.

3.5. EFFECTIVE CHANNEL CAPACITY: SINGLE AND MULTIHOP 53

3.5 Effective Channel Capacity: Single and Multihop

Another important issue that we need to understand about the link layer is the

difference between the channel bit rate, as defined by physical hardware capability, and

the effective channel bandwidth, as defined by the performance of the media access control

(MAC) layer when multiple nodes share and contend for the same wireless channel. Since the

channel is normally shared among different nodes in a common connectivity cell, only one

transmitter can access the channel and send at a given time; otherwise, packet collisions

will occur. The goal of the MAC layer is to arbitrate such channel accesses among the

different senders to avoid collisions. As a result, in order to quantify the actual deliverable

bandwidth at the link layer under heavy traffic conditions from multiple senders, we need

to measure it explicitly for the two Mica platforms. Both platforms use a similar CSMA

MAC layer as discussed in the previous chapter.

Figure 3.9 shows how the channel utilization changes as the offered load increases

by adding more transmitting nodes. Each node is set to send periodically at 10 packet-

s/sec. The channel utilization peaks when the offered load is about 30 packets/sec, which

is equivalent to about 75% channel utilization. As the number of nodes increases to 4, the

offered load reaches 40 packets/s, which is close to the theoretical capacity, and significant

backoff is seen as a result. The effective bandwidth drops back to about 20 packets/sec (or

about 50% utilization).

Figure 3.10 shows the channel capacity for the ChipCon radio on the Mica 2

platform. A series of improvements from TinyOS-1.1 over channel utilization under different

offered load is shown here. The new improved B-MAC [61] not only increases the channel

3.5. EFFECTIVE CHANNEL CAPACITY: SINGLE AND MULTIHOP 54

1 2 3 45

10

15

20

25

30Offered Load vs. Aggregate Delivered BW in One Cell

Number of Node with Offered Load at 10 packet/s

Agg

rega

ted

Del

iver

ed B

W (

Pac

ket/s

)

Figure 3.9: Channel capacity of the Mica/RFM platform using TinyOS 1.0 radio stack.

capacity to about 50 packets/sec from 42 packets/sec, but also achieves a 85% channel

utilization under congested traffic. Furthermore, channel utilization seems to sustain at the

same level without much degradation as offered load increases.

For multihop traffic, it is important to realize that the effective bandwidth available

is only 1/3 of the above measured utilization. This is a theoretical limit because each

multihop packet occupies a communication cell three times during the process of forwarding:

from a child to its one-hop parent, from the one-hop parent to the child’s two-hop parent,

and finally, from the two-hop parent to the three-hop parent. In these three cases, the

packet will occupy the one-hop parent’s cell three times. If the path takes on more hops,

the packet will occupy the communication cell of the parent at each hop three times, and

thus, the effective bandwidth is reduced to 1/3. As a result, bandwidth is a tight resource

3.6. RECEIVED SIGNAL STRENGTH AND LINK QUALITY 55

Figure 3.10: Channel capacity of the Mica2/Chipcon platform using different versions of the

TinyOS radio stack.

for multihop traffic.

3.6 Received Signal Strength and Link Quality

One of the interesting link layer characteristics that is related to link quality is

received signal strength. It is very attractive if link quality can be reliably inferred by simply

measuring the received signal strength from each packet received. Theoretically, the bit-

error-rate (BER) is expected to have a direct correlation with the received signal-to-noise

ratio of the packet, and the packet error rate is a function of the BER and coding. A similar

measure to the signal-to-noise ratio that can be obtained on the current Mica platforms is

the received signal strength indicator (RSSI). In order to to determine if RSSI values can

be used as an indicator to determine link quality, we need to collect experimental data.

3.6. RECEIVED SIGNAL STRENGTH AND LINK QUALITY 56

The RFM radio on the Mica platform does not directly support RSSI measure-

ments. We need to measure the baseband signal indirectly to infer RSSI values. However,

the CC1000 radio provides a fairly accurate measurements of RSSI values by default. There-

fore, we performed our study with the Mica 2 platform. The experiments were done over

an open grass field, with 20 nodes deployed 2 meter apart as a line topology. Each node

took turns to transmit 200 packets every 250ms, and all other nodes listened and collected

link reliability statistics. All nodes are also situated 3 inches above the ground.

Figure 3.11 shows the relationship of average RSSI values and link reliability from

our experiments; each data point represents a link in one direction. Note that lower RSSI

values on the mote means stronger received signal strength. The data shows that if RSSI

values are below a threshold value around 300, the link qualities are very good or about

90% reliability. In general, the graph shows good correlation between RSSI and link quality.

However, the circled region reveals that some links end up having zero reliability because of

a failing CRC checksum even though the RSSI values are below the 300 threshold. These

links are not asymmetrical links; they are on the clear region as only a few packets were

received and no packets are received at all in the opposite direction. If we used 300 as our

RSSI threshold to infer reliable links, these links would yield a false positive because they

indeed have zero reliability. One may argue that if a stronger threshold, such as 200 is used,

weak links will be filtered out and the remaining links would be reliable. We agree that a

stronger threshold may achieve this goal; however, in situations where the traffic is not a

controlled experiment, collisions can, in fact, affect link quality even though the received

signal is very strong. Figure 3.12 shows that under other traffic condition, the reception

3.7. RELATED WORK 57

Figure 3.11: Relationship of RSSI signal strength and link quality on the Mica2/Chipcon

platform.

probability of a link drops to below 10% even though the received signal strength stays

relatively strong and stable. All in all, these results show us that RSSI provides a good hint

of link reliability. Situations where unreliable links having strong RSSI values do occur.

Furthermore, as with any other threshold-based selection, it is difficult to find a generic

threshold that works in all cases. However, a strong RSSI value is certainly a useful hint

at potentially reliable links. This can be a useful mechanism to quickly select reliable links

for higher-level protocols; one example of this usage is discussed in [71].

3.7 Related Work

Packet loss characteristics in sensor networks have also been studied extensively

in other research efforts. A thorough study of understanding link characteristics of the


0 100 200 300 400 500 6000

0.2

0.4

0.6

0.8

1

Minutes

Rec

eptio

n P

roba

bilit

y

0 100 200 300 400 500 6000

50

100

150

200

Minutes

RS

SI V

alue

s

Figure 3.12: Example showing strong RSSI values may not be a good indicator for link quality.

Mica/RFM platform over indoor, outdoor, and habitat environments has been done in [82].

The loss characteristics over a distance are similar to what we observed, with a large portion

of the communication range (the transitional region) consisting of links with a large variation

of reliability and degree of asymmetry. Furthermore, it showed that these characteristics

persist across the different environments studied and forward-error-correction coding does

not help to reduce the ratio of unreliable links in the gray (transitional) region. They also

found that received signal strength is not a good indicator of link quality. Similar results on

the packet loss over distance behavior are also documented in [20]. In another experimental

study done by [17] on both Mica and Mica2 platforms across different environments, the

time variations of the packet losses are also similar to our findings. These studies concluded

that imperfect hardware calibrations across the different radios and antennas on each node is

likely to be the main reason accounting for the wide variations in link quality and asymmetry

3.8. CONNECTIVITY: A PROBABILISTIC PERSPECTIVE 59

in the transitional region.

There exists extensive prior work on modeling loss characteristics in various wire-

less networks. For example, [33] used a trace-based approach to modeling wireless errors,

and [13] collected error traces on WaveLAN and developed a Gilbert model for packet losses.

In addition, [8] collected GSM traces and created a Markov-Based channel model. The ob-

served loss characteristics from these experimental studies are different from the results that

we observed over our platform. Much of the work in the WaveLAN and GSM traces are

done over a pair or a few pairs of nodes. Without a large number of nodes collecting packet

loss traces, they do not reflect the lossy characteristics, such as the extent of the transitional

region, that we did in our experiments. Furthermore, these sophisticated wireless platforms

potentially have very different reactions to background interference, environmental effects,

and mobility than our low-power radios. Nonetheless, we draw upon the methodology de-

veloped in these studies to build an empirical characterization of our regime and to study

how well the established techniques carry over.

3.8 Connectivity: A Probabilistic Perspective

In this chapter, we have shown, through many empirical measurements, that wire-

less connectivity is far from a circular-disc model and it is much more appropriate to take a

probabilistic perspective. That is, connectivity should be defined relative to link estimation.

A connectivity cell does not only fall off irregularly, but the communication range can also

be classified into three distinct connectivity regions, with each of them having very different

link quality characteristics. The relationship among node placement, effective communica-

3.8. CONNECTIVITY: A PROBABILISTIC PERSPECTIVE 60

tion range, and RF transmit power is an important one to understand at each deployment

site. It should be understood to ensure there are neighboring nodes that fall in the effective

communication region connected with reliable links. As a result, blindly using geographic

technique to establish routes would yield poor routing performances.

It is expected that the majority of the links fall in the transitional region or

gray area where link quality can vary significantly and link quality asymmetry is common.

There also exist nodes in the clear region that have low, but non-zero connectivity. These

complications make it difficult to assume a neighboring node simply by hearing a message

from it, since such a node can fall in the clear region and the probability of hearing it again

can be very low. Furthermore, we have empirical results that show that link quality can

also be time-varying even though nodes are immobile and no observable physical influences

are present. Therefore, nodes must have an on-line local process to discover neighbors

by maintaining statistics to characterize link quality probabilistically for each neighbor,

which is the first local process in our holistic approach that we have shown in Figure 2.4.

This process lays out the foundation of having the network discover itself and characterize

connectivity to build a derived connectivity graph, with each edge having bi-directional

link qualities. Together with the neighborhood management process, a connectivity graph

is built. We take this probabilistic perspective all the way up to the routing layer, where a

reliable routing topology is built upon such a discovered connectivity graph. This holistic

approach is a fundamental design choice that we take in coping with the lossy dynamics

found in sensor networks. We will discuss the three local processes in detail in the next

three chapters.

61

Chapter 4

Characterizing Connectivity using

Link Estimators

Our empirical study of the wireless characteristics of our sensor networking plat-

form has led us to take a probabilistic perspective on connectivity. That is, connectivity

should be defined relative to the link quality obtained through link estimation. Thus, an

on-line, distributed link estimation process is an important building block for self-organizing

network protocols. Following our holistic approach, reliable multihop routing must be built

upon a self-discovered connectivity graph. Each node must locally collect statistical mea-

surements of its connectivity quality with respect to its neighboring nodes in creating such

a graph. Higher-level protocols can use these statistics to select paths that are efficient and

reliable for multihop communication.

Designing such an estimator is not as straightforward as it might seem because it

must strike a balance between stability, agility, and resource usage as a sensor network is

4.1. LINK ESTIMATION AS PART OF NETWORKSELF-ORGANIZATION 62

highly resource-constrained. Thus, simplicity and efficiency are the two important design

principles that we follow. As a result, we take a passive rather than an active approach

to link estimation. We propose a general estimator framework that allows us to consider

different kinds of estimation schemes within the same evaluation platform. We describe a

set of metrics that are important for evaluating the different estimators. These metrics are

compared in order to find the best link estimator. We also study the intricate relationships

among agility, stability, and the amount of history required that will help us understand the

effects in tuning each estimator. With the methodology explained, we define the objectives

in tuning the estimators, and present many different candidate estimator designs along

with the tuned parameters in meeting the tuning objectives. Such process allows us to

fairly identify the best estimator among our candidate estimators. The related work on

link quality estimation is rather narrow, but abundant. We attempt to give an overview of

the different techniques that researchers have used. Finally, we state the limitations of our

link estimation approach, and address implications of these limitations for multihop routing

protocols that build upon link estimation.

4.1 Link Estimation as Part of Network Self-Organization

Vast networks of low-power, wireless devices, such as sensor networks, raise a

family of challenges that are variants of classical problems in a qualitatively new setting.

One of these is the question of link loss rate estimation based on observed packet arrivals.

Traditionally, this problem has been addressed in such contexts as determining the degree

of error coding redundancy or the expected number of retransmissions on a link. In sensor

4.1. LINK ESTIMATION AS PART OF NETWORKSELF-ORGANIZATION 63

networks, the problem arises as part of network self-organization as nodes must discover

neighbors and estimate link qualities among them in order to build a connectivity graph for

higher level process such as routing.

To realize our probabilistic view on connectivity, each node keeps track of a set

of nodes that it hears, either through packets addressed to it or packets it snoops off the

channel, and builds a statistical model of the connectivity to/from each. Thus, we would like

to gain a good estimate of Pij(t) at each node j from packets it hears (and does not hear) so

we can use this to define the weight of each edge in a connectivity graph. Maintaining many

local link success (or loss) rate estimations is essential for self-organization into multihop

routing topologies in sensor networks. However, there are several challenges. The storage

capacity of the nodes are very constrained and their processors are not very powerful. Thus,

the estimators must use very little space and be simple to compute. Furthermore, it is not

sufficient that the estimator eventually converge, since the link status changes fairly quickly.

We want the estimator to be agile and to have a small settling time, so route selection can

adapt reasonably quickly to changes in the underlying connectivity. However, there are also

transient variations in the link, and we want a stable estimator, so the routing topology does

not change chaotically. Moreover, fluctuations and errors in the estimate may potentially

introduce (temporary) cycles in the routing graph due to inconsistent partial information

used in the route selection. These desires are clearly in conflict, stable estimators tend to

be less agile and agile estimators tend to be less stable, especially ones that are inexpensive

to compute.

The constraints and conflicts discussed above motivate us to investigate the be-

4.2. ESTIMATOR DESIGN FRAMEWORK AND METHODOLOGY 64

havior of a wide range of link estimators in the context of low-power wireless connectivity

for the purpose of multihop route selection. The basis for estimation is the sequence of

packets that a node observes. Thus, we can view this as a series of binary events over time.

We only get to observe the ’1s’ directly - the arrival of a packet. However, when we receive

a packet we can infer the intervening zeros from the sequence number. Of course, if we

stop hearing from a node, the zeros are silent; we cannot, in general, know the expected

packet rate from the node, even if we know its sample rate, since it may be performing

local data compression, as well as routing traffic for other nodes. So, additional measures

are required to estimate silent losses, which should be incorporated into the design of link

quality estimators in general.

Our proposed link estimation process only yields an in-bound quality estimation

because only packet reception statistics are collected. However, obtaining the out-bound

link quality estimate (success rate of a node’s packet as received by neighboring nodes)

is as important, since it measures the success rate of forwarding a packet from a node

among its neighboring nodes and reveals if link asymmetry occurs. Therefore, a simple and

efficient mechanism is required for nodes to obtain out-bound link quality estimates among

its neighboring nodes. We will discuss the details in the later part of this chapter.

4.2 Estimator Design Framework and Methodology

Our goal is to design a link reliability estimator that is responsive, yet stable,

reasonably accurate, simple with little computational requirement, and memory efficient.

While there exist potentially many estimation approaches, we focus on ones that passively


snoop on packet arrivals and maintain statistics to estimate link quality. Such approach

can be generalized into a framework, as shown in Figure 4.1, such that different kinds of

estimation techniques can be fitted into it and evaluated using the same input and output

format.

The inputs to the framework include external events from packet arrivals, M , and

internal periodic timer events, T . We assume that each packet contains a source ID and a

link sequence number. Since a lost packet does not generate any message arrival events, we

can only infer packet loss events based on the gaps in the link sequence number. Therefore,

since M denotes a packet arrival event, it is equivalent to signal zero or more packet loss

events followed by a packet success event. If we denote successes as 1’s and losses as 0’s,

M is always started with one or more 0s followed by a 1.

The periodic timer event provides a synchronous input to the estimator that allows

it better estimate losses when message events are infrequent. For example, if a node were

to disappear, no later message events would occur, yet the connectivity estimate should go

to zero. One temporal assumption is that higher layer protocols can provide a minimum

message rate, R, for neighboring nodes. If R is known, estimators can safely infer the

minimum number of packet losses over the time period, T , and compensate accordingly.

Since the minimum message rate is usually much lower than the actual data rate, if R

is not known, a conservative R can be used, such that good links can still be estimated

correctly while bad links that are heard infrequently will not be mistaken to be good.

The above process would yield an in-bound estimation of the link quality of neigh-

boring nodes. For out-bound estimation, each node must collect from the neighboring nodes


the in-bound link estimation of itself. That is, nodes must exchange their in-bound link

quality estimation to others for them to establish bi-directional link quality estimation.

Since such a process is similar to a local broadcast, one efficient approach is to piggyback

in-bound link estimates over the the minimum data rate traffic that is required by link

estimation. Such traffic is often realized as route update messages in routing protocols. We

will discuss how this is realized in Chapter 6.

The out-bound link estimation can become stale when nodes are moved, ob-

structed, or disappear. Therefore, a mechanism for decaying is required to prune such

information as it becomes stale. We choose a binary exponential decay mechanism to age

an out-bound link estimation if it is not updated for a period of time, which is defined

by the parameter OutBoundDecayWindow. That is, if an out-bound link estimation of

a neighbor is not updated after a period defined by OutBoundDecayWindow, at each T

event, the out bound estimation will be halved until it reaches zero or is updated again.

We will revisit the effect of this on routing in Chapter 7.

Our high-level evaluation methodology is as follows. For each estimator, there is a

continuous tuning space. To make fair comparisons among different estimators, we pick two

meaningful points in the tuning space as our tuning objectives. One point is to tune for best

agility given a stability target. The other is to tune for best stability given an agility target.

Individual estimators are well tuned to meet these objectives before being compared against

each other. We use the simulated trace, denoted as W (t), shown in Figure 3.8, as the trace

generator output, M , to the tuning process. The data rate was set to be 8 packets/s, which

is the same as the empirical trace, denoted as W (t), also shown in Figure 3.8. We also set


TraceGenerator

Estimator

p T

Stable Estimation, P

AgileEstimation, P

R

^

^

Timer EventMinimumData RateConstant

M = Message Arrival Event

Figure 4.1: General framework of passive link estimators.

R to this value in order to derive the best out of these estimators.

4.2.1 Metrics of Evaluation

We use a set of metrics to evaluate our estimator designs relative to our framework

in Figure 4.1; they include settling time, crossing time, mean square error, coefficient of

variance, and sum of errors. Recall that in Chapter 3.4, a synthetic trace can be generated

using a step function p(t). Table 3.1 shows a p(t) created based on one of the empirical

traces, which we call W (t). This p(t) is used to generate input M as indicated in 4.1, and

P is the current estimation of p.

The following defines the metrics in greater detail. Settling Time is the length of

time after a step in p(t) before P reaches within ±ε% of p(t) and remains within that error

bound. We use a threshold of ε = 10%. Crossing Time is the length of time after a step in

p(t) before P first crosses ±ε% of p(t). Since p is known to us at all times, we can compute

the mean square error (p−P )2 which not only captures the degree of error, but also places a

higher penalty on large overshoot or undershoot. Coefficient of variance measures how


stable an estimator is after reaching steady state. Sum of errors is used to capture if the

estimator is biased, which may lead to systematic errors. Finally, Memory Resources

and Computation Complexity measure the degree of efficiency and simplicity for each

estimator design. One important concept to clarify is the measurement of settling time and

crossing time. Since both of them depend on the packet arrival rate, they are measured

in terms of number of packet opportunities rather than raw elapsed time. If the average

packet arrival rate is known, the two metrics can be converted back to the time domain.

4.2.2 Error, Stability, and Memory Relationship

It is important to understand the tradeoff between estimation stability, agility,

and the amount of history used to generate the estimation. Section 3.3 shows that binomial

distribution can be an adequate approximation to the channel variations, where the average

link quality remains roughly constant. With this independently and identically distributed

(i.i.d.) assumption, we can use the central limit theorem to learn the relationship between

the number of samples required and the corresponding error bound on our mean estimation

(link quality) with a 95% confidence interval. From that, we can infer the relationship of

error, stability, and potential memory requirement.

By the central limit theorem, to yield a 95% confident mean estimation with at

most a ε% error of a Bernoulli process, the minimum number of samples n required can be

expressed as: n > 4p(1−p)ε2

, where p ∈ 0, 1 is the true mean. Although this approximation

requires a large n to begin with, various relationships can still be learned from it. First,

the true mean p has a non-linear effect on n. The worse case occurs when p = 0.5; this


will maximize n for a given ε. Second, changes in ε have an inverse, quadratic effect on n.

That is, halving the error requires increasing the history size by four times. Third, agility

also has a quadratic tradeoff with error since a smaller n tends to increase ε quadratically.

Finally, the expression shows that to achieve a 10% error would require n > 100 for p = 0.5.

(n > 36 for p = 0.9 if we do not take the worst case). In general, these relationships imply

that many samples (O( 1ε2

)) are required to achieve a stable and accurate estimator. Agile

estimation is possible, but the error will be large. We will explore these effects in simulation

when we study our candidate estimators in detail.

4.2.3 Confidence Interval Approximation

Estimating the confidence interval of the estimation P from the tuned estimator

can be valuable for higher-level protocols. The typical method is to use the normal ap-

proximation of the binomial distribution, which is an appropriate approximation when n

is large and p is around 0.5 [47]. However, since p can range from 0 to 1, one would like

to understand what technique should be used to estimate the confidence interval over a

different p. Results from [47] show that the normal approximation has less than 4% error as

long as p ≥ 0.2 (note p is symmetrical at 0.5). Thus, for small p, the Poisson approximation

should be used to estimate the confidence interval.

It is certainly useful to use the normal approximation since we would like to esti-

mate semi-good links (p around 0.5). However, for links with large or small p, estimating

the confidence interval may not be useful at all. Very bad links are not utilized for rout-

ing while for the very good links, the variance of P is low and the confidence interval is

4.3. ESTIMATOR DESIGN AND EVALUATION 70

within our error tolerance for the worst case. Therefore, we do not find an urgent need to

approximate the confidence interval using the Poisson distribution.

4.3 Estimator Design and Evaluation

In this section, we first introduce the basic terminology that we will use to stan-

dardize the descriptions of the six different estimator designs that we will discuss in detail.

We then discuss our tuning objectives such that each estimator can be compared fairly at

the end. The six different estimator designs are EWMA (Exponentially Weighted Mov-

ing Average), Flip-Flop EWMA, moving average, time-weighted moving average, Flip-Flop

packet loss and success interval with EWMA, and WMEWMA (window mean with EWMA).

These estimators are chosen because they are simple estimators that utilize relatively small

storage space.

4.3.1 Terminology

We first establish the relevant terminology that we will use to present the different

estimator designs. The symbols and the corresponding definitions are summarized in Table

4.1.

If the input is an M event, a packet must have been received successfully. There-

fore, we set t equals to the current time stamp. To calculate m, we extract the sequence

number from the successful packet in M and subtract it from the last sequence number

heard plus one. In general, the number of missed packets accumulated since the last esti-

mation, l, is max(m, k). Note the process of maintaining k, t, last sequence number heard,


Symbol DefinitionP The current estimation.T The period timer event.M Last message arrival event.m The number of currently known missed

packets based on link sequence numbersince last estimation is done.

t Time stamp of the last M event.R Minimum data rate (packet/s).k Estimated missed packets based on R

and elapsed time from t.l Number of missed packets accumulated

since last estimation is done.

Table 4.1: Terminology used for describing link estimator design.

and the calculation of l apply to all estimators and is orthogonal to the actual estimator’s

algorithm. For example, if no messages have been received for the entire period T , then

m = 0, k = R ∗ T , and l = k.

4.3.2 Tuning Objectives

We tune each estimator design to satisfy two different objectives: stability and

agility. For the stability objective, we aim to minimize the settling time while requiring

the total error ε < 10%. For the agility objective, we aim to minimize the total error while

requiring the crossing time, (i.e. P reaches ±10% of p), be within 40 packet opportunities.

The crossing time is chosen somewhat arbitrarily since our concern is to reveal the differ-

ent shortcomings among different estimators when they are tuned to the same objective.

However, 40 packet opportunities is slightly fewer than half of what we would expect for

the most reactive stable estimators, using the central limit theorem with a binomial distri-

bution assumption and a 10% interval with 95% confidence. With these tuned estimators,


we compare settling time, crossing time, mean square error, and coefficient of variance, as

well as memory resources and computational requirement. Note that settling time is only

meaningful when we consider the stability objective because the agility objective would

yield a much greater error which would undermine the meaning of the settling time.

4.3.3 Candidate Estimator Design and Evaluation

EWMA (Exponentially Weighted Moving Average)

The exponentially weighted moving average (EWMA) estimator is very simple and

memory efficient, requiring only a constant storage of the last estimation for any kind of

tuning. Since EWMA is so simple and widely used, we use it as the basis for comparison

with other estimator designs. EWMA computes a linear combination of infinite history,

weighted exponentially. It has the property of being reactive to small shifts and is often

used as a responsive change detector in many statistical process control applications [56].

The estimator works as follows. Let 0 < α < 1 be the tuning parameter. At any

M or T event, repeat P = P ∗α for l times. If it is an M event, compute P = P ∗α+(1−α).

The implementation of EWMA will take 4 bytes (floating point) or 1 byte ( fixed point) to

store P and the amount of computation involved is 2 multiplications and 1 addition.

Figure 4.2(a) shows P (t) of the tuned, stable estimator, with α = 0.99. It reveals

that to keep within 10% error, EWMA is already set very close to its maximum gain of

1. With such a large gain, agility is not to be expected. The crossing time for EWMA is

167 packets while the settling time is close to 180 packets. Figure 4.2(b) shows the agile

version with α = 0.9125. It is probably not a useful estimator since it has large overshoots


and undershoots, which is expected since EWMA is sensitive to small shifts. Nevertheless,

the agile version is good for detecting disappearance of a neighboring node over a relatively

short time. Note that a small decrease of α from 0.99 to 0.9125 has a large effect on agility

and error, something we do not normally see in other contexts. Furthermore, in practice,

representing α using a fixed point to avoid heavy weight floating point operations may

create extra complexity since α needs to be quite precise.

Flip-Flop EWMA(αstable, αagile)

A Flip-Flop between two EWMA estimators, with a different stability and agility

setting, is suggested to be a good estimator to provide both agility and stability in [50].

Such a design uses statistical control theory to dynamically estimate the upper bound

error, which is used as a switching policy between stability and agility. That is, if the spot

value of the stable estimation is beyond the estimated upper bound limit, the estimator

automatically switches to the agility setting; otherwise, the stable estimation is used. To

explore the effectiveness of such a flip-flop design in our sensor network context, we follow

a similar flip-flop approach using the agile and stable tunings found in the above EWMA

study. Since αstable is tuned to have a 10% noise margin, a simpler switching policy is

to switch whenever the difference between the output of the two EWMA is greater than

10%. Note that such a switch can go in two directions, and we simulated both of them.

One is to be agile by default. When the agile output deviates by more than 10% from the

stable estimation, we fall back on to the stable estimation. The resulting P (t) is shown in

Figure 4.2(c). The other approach is to be stable by default, but switch to be agile since


detection on sudden change such as mobility can be detected much earlier. Similarly, 10%

is also used for the switching threshold, and the resulting P (t) is shown in Figure 4.2(d).

These two graphs suggest that the flip-flop idea does not provide an advantage

over the simple EWMA in our setting. This is because fluctuations of the agile estimator

are so bad that it only introduces instability and error. The study in [50] does not show the

dynamics of either estimator separately over time, so it is difficult to isolate why it does so

much better in that setting.

Moving Average(n)

The moving average estimator is another simple estimator that is widely used for

packet loss rate estimation, including in IGRP routers. The algorithm works as follows.

Let n be the tuning parameter specifying the maximum number of bits of a sliding history

window, h. At any given event, append l zeros to the end of the window, and append 1 to

the end if it is an M event. The window h will be left shifted logically by the corresponding

number of bits inserted. Then, P =∑n

i=1 h(i)n .

To avoid a large error in P when there are only a few samples, the estimator

gives no estimation, P = 0, if the number of samples is below some threshold, φ. The

implementation of such an algorithm will take dn8 e bytes for storage and the amount of

computation involved in computing P is n bit shifts, 1 addition, and log2(n) shifts rather

than a full division. For ease of implementation, the tuning process takes n in multiples of

8.

Figure 4.2(e) illustrates P (t) of the moving average estimator tuned for the stability


objective with n = 144. This estimator achieves a settling and crossing time of about 120

packets, a much shorter time than EWMA tuned for the same error objective. However,

with n = 144 or 18 bytes of storage per link estimation, it is expensive to keep track of link

quality for a reasonable number of neighboring nodes. Figure 4.2(f) shows the agile case with

n = 24. Tuned for the same agility objective, the moving average estimator appears to have

less error and variance than those of the EWMA. Compared to Figure 4.2(f), Figure 4.2(b)

shows that EWMA is more sensitive to small changes.

Time Weighted Moving Average (TWMA)(n,w)

The moving average estimator applies the same weight on all packets within the

sliding window. A common improvement is to apply a weighting function, which places

heavier weight on more recent samples so that the estimation can be more adaptive to

temporal changes. The basic algorithm works the same as the moving average except for

an addition of a time weighted function, w. Thus, the tuning parameters for this estimator

are n and w. In our study, we stick to one weighting function, w, and only tune n. While

w is not the perfect function, it serves a purpose for observing the effect on weighting.

The w that we choose is a sequence of coefficients that weight the elements of the

sliding window differently. Let h be the sliding window with elements ∈ 0, 1 and s be the

number of elements currently in h. Then, w is a sequence of length s, and the weight that

it applies for the most recent ds/2e elements in h is 1. For the rest of the bs/2c samples,

the weight is linearly decreased from 1 to 1ds/2e , with the smallest weight applied for the

most stale element. Therefore, P =∑s

i=1 w(i)∗h(i)∑si=1 w(i)

, where s <= n.


(a) Stable EWMA(α = 0.99) and p(t). (b) Agile EWMA(α = 0.9125) estimation.

(c) Flip-Flop EWMA(αstable = 0.99, αagile = 0.9125).It uses the stable estimation if the agile estimationgoes beyond 10% from the stable estimation.

(d) Flip-Flop EWMA(αstable = 0.99, αagile = 0.9125).It uses the agile estimation if that goes beyond10% from the stable estimation.

(e) Stable moving average (n = 144) and p(t). (f) Agile moving average with (n = 24).

Figure 4.2: P (t) for different estimators at both stable and agile configuration.


The implementation of this estimator also takes dn8 e bytes to store the sliding

window in h bits. As for the amount of computation, since h ∈ 0, 1, the multiplications can

be turned into additions. As a result, there are n additions and 1 division. This carries

more complexity compared to the moving average. Since w is different for different s, a

fixed size lookup table can be used to store all w given n is fixed. Note that for ease of

implementation, the tuning process also takes n in multiples of 8.

Figure 4.3(a) illustrates P (t) of the tuned, stable TWMA estimator with n = 168.

Figure 4.3(b) shows the agile version with n = 32. Visual comparison of the same figures

for the moving average show that the two are very similar and both have better settling

time and less high frequency fluctuation than EWMA. However, the effect of the weighting

function is twofold. First, it increases the history required to achieve our stability objective

from 144 to 168, making it less memory efficient as compared to the moving average. Second,

it is likely that w requires floating point operations, which we try to avoid. Nevertheless, as

indicated in Table 4.2, w improves over moving average by decreasing the average settling

time from 122 to 113 packet time, while maintaining the same amount of error and variation.

As for the agile case, n = 32 also requires more memory than 24 for the moving average

case. However, the resulting mean square error and coefficient variance are smaller than

those of the EWMA and the moving as shown in Table 4.3.

Flip-Flop Packet Loss and Success Interval with EWMA (FFPLSI) (αsuccess, αloss, ff)

The packet loss interval is the number of consecutive successful packets in between

two successive packet loss events. That is, it measures the number of 1s in between two


0s. The greater the interval, the better is the reception probability. An estimation of the

packet loss interval adapts slowly to bursts of packet successes, but reacts quickly to bursts

of packet losses.

The estimator works as follows. The tuning parameter is αloss. Let I be the

current loss interval average computed using an EWMA. Let i be the most updated number

of consecutive successes when a packet loss is detected through either a T or M event. The

average I is computed as follows: for each i, I = I ∗ αloss + (1− αloss) ∗ i. At any instance,

P (t) = II+1 . 1 is added in the denominator to avoid any division by 0.

The packet success interval is the reverse of the packet loss interval. That is, it

measures the number of 0s between two 1s. The estimation of this average corresponds to

the average burst of errors. Therefore, the greater the interval is, the worse is the quality of

the link. Unlike the packet loss interval, the packet success interval adapts slowly to bursts

of packet losses, but it reacts quickly to bursts of packet successes.

The computation is similar to packet loss interval, with I being the current average

of packet success interval computed by an EWMA. Similarly, i is the most updated number

of consecutive losses when a packet success is detected. The tuning parameter is αsuccess.

For each i, I = I ∗ αsuccess + (1− αsuccess) ∗ i. At any instance, P (t) = 1− II+1 . 1 is added

in the denominator to avoid any division by 0.

The flip-flop mechanism can be used to capture the best of both worlds. For

stability, the packet loss interval should be used when successes are frequent (e.g. P >=

50%) while the packet success interval should be used when losses are frequent (e.g. P <

50%). We call this configuration ff=STABLE. For agile estimations, it should be the


reverse, and we call this ff=AGILE.

Since EWMA is used for averaging, the implementation of this estimator is very

efficient. For each entry, it only takes 2 bytes to store the intervals and 2 bytes (fixed point)

or 8 bytes(floating point) to store P . Like EWMA, parameter tuning does not affect the

storage requirement.

Figure 4.3(c) shows P (t) for the tuned, stable (ff=STABLE) estimator, with

αsuccess = 0.98 and αloss = 0.98. This estimator is very stable and smooth around both

extremes at 0 and 100%. However, the slow rising edges show that its settling and crossing

time are much larger compared to other estimators, even with the EWMA. The agile case

is shown in Figure 4.3(d), with αsuccess = 0.85 and αloss = 0.85. This estimator is not a

good agile candidate, since its fluctuations are large.

Window Mean with EWMA (WMEWMA) (t, α)

So far, all estimators that we have discussed update the estimation for every M

event. It is possible to perform low-pass filtering by taking an average over a time window

and adjusting the estimation using the latest average. This average is actually an obser-

vation of P , and EWMA can be used for more filtering to yield a better estimation. The

tuning parameters are the time window, t, and α for the EWMA. Let t be the time window

represented in the number of message opportunities between two T events, and 0 < α < 1.

The algorithm works as follows. P is only updated at each T event. Let t defines

the time interval between two T events. Let r be the number of received messages (i.e., the

number of 1s from the M events) during this time interval. Thus, at the time of each T

4.4. CANDIDATE ESTIMATOR COMPARISONS 80

event, the mean µ = r/(r + l), and P = P ∗ α + (1− α) ∗ µ.

For each entry in this estimator, it will take 2 bytes for storing r and l, and 1

byte (fixed point) or 4 bytes (floating point) for storing P . The amount of computation

involves 2 additions, 1 division, and two multiplications. The computation is done per T

event rather than per M event. Similar to EWMA, this estimator’s storage requirement is

independent of parameter tuning.

Figure 4.3(e) shows P (t) of the tuned, stable estimator with t = 30 message time

and α = 0.60. The observed settling time and crossing time are relatively small, and the high

frequency components in the estimation are clearly removed as compared to Figure 4.2(a)

in the EWMA case. In fact, the settling time of this estimator is comparable to the fastest

time weighted moving average as shown in Table 4.2.

Figure 4.3(f) shows the agile version with t = 10 message time and α = 0.3.

Although the windowing effect has low-pass filtering effect, using a small t actually creates

variations that EWMA is sensitive to. As a result, the performance in the agile scenario

does not show significant improvement over the EWMA.

4.4 Candidate Estimator Comparisons

With the controlled study of our candidate estimators in hand, we return to the

question of what is the best estimator relative to the metrics we consider for both tuning

objectives.


(a) Stable TWMA estimation (n = 168) and p(t). (b) Agile TWMA estimation (n = 32).

(c) Stable FFPLSI estimation (αsuccess=0.98,αloss=0.98, ff = STABLE) and p(t).

(d) Agile FFPLSI estimation (αsuccess=0.85,αloss=0.85, ff = AGILE).

(e) Stable WMEWMA(t = 30, α = 0.6) and p(t). (f) Agile WMEWMA(t = 10, α = 0.3).

Figure 4.3: P (t) for different estimators at both stable and agile configuration.


4.4.1 Stable Estimators

Table 4.2 summarizes the results of the stable estimators. We first look at the

sum of errors. The value should approach 0 if the estimator is unbiased. For all the

estimators, the sum of errors is small, showing that they are not biased. The mean square

error penalizes estimators that have large overshoots or undershoots. FFPLSI is very stable

at the extremes. As a result, it achieves the smallest mean square error as expected, though

other estimators come close to it. The coefficient of variance measures the effectiveness of the

estimator in staying within the true value. FFPLSI has the largest coefficient of variance

while EWMA has the best. Again, values for other estimators are relatively close. The

major determining factor is the settling time. Moving Average, TWMA, and WMEWMA

all have much smaller settling times than the rest of the estimators. It is desirable to have

the most agile estimator that can still stay within 10% of the true value, even if it does

not have the best mean square error and coefficient of variance. One would hope that the

crossing time for EWMA and FFPLSI will be much smaller than the actual setting time.

However, from the Figures of P (t) and Table 4.2, it is clear that the crossing times are only

slightly smaller than the settling times. Another important constraint is storage space.

Since Moving Average and TWMA do not have constant storage space, WMEWMA seems

to be the best choice given that it is well balanced in all dimensions.

4.4.2 Agile Estimators

Table 4.3 summarizes the performance of the agile estimators. For sum of errors,

FFPLSI is five times larger than its stable counterpart. EWMA also increases three times.


Estimator Settling Crossing Mean Coefficient Sum of StorageTime Time Square Error of Errors per Entry

(Packets) (Packets) (%2) Variance (bytes)EWMA 178 167 9.32x10−4 0.19% 0.10% 1

(α = 0.99)(2 mul,1 add)

Moving Average 122 122 11x10−4 0.22% 0.26% dn8 e

(n = 144)(1 add, n + log(n) shifts)

TWMA 113 113 11x10−4 0.22% 0.23% dn8 e

(n = 168)(1 div, n add)

FFPLSI 292 271 8.98x10−4 0.33% -0.18% 2(αsuccess = 0.98)(αloss = 0.98)(4 mul, 2 add)WMEWMA 118 113 13x10−4 0.27% 0.16% 3

(t = 30, α = 0.6)(2 mul,1 div, 2 add)

Table 4.2: Simulation results of all estimators in stability settings.

This suggests that these two estimators can be biased in agile configuration. Our agility

settings decrease the settling time by 5 to 10 times relative to the stability settings. However,

mean square error and coefficient of variance increase by roughly the same factor. It appears

that the agile estimator is only useful to discover a significant change in link reliability

quickly, such as disappearance of a node.

4.4.3 Performance based on Empirical Traces

Since our study suggests that WMEWMA is a good estimator, we now focus on its

performance based on input from empirical traces. Figure 4.4 shows how the WMEWMA

estimator, tuned for the stability objective, performs on the empirical trace input that

shaped the trace generator. Our final choice estimator tracks the empirical trace well. The


Estimator Settling Mean Coefficient Sum ofTime Square Error of Errors

(Packets) (%2) VarianceEWMA 21 65x10−4 2.41% 0.28%

(α = 0.9125)(2 mul,1 add)

Moving Average 23 55x10−4 2.1% 0.24%(n = 24)

(1 add, n + log(n) shifts)TWMA 25 48x10−4 1.87% 0.22%(n = 32)

(1 div, n add)FFPLSI 23 80x10−4 3.14% -0.98%

(αsuccess = 0.85)(αloss = 0.85)(4 mul, 2 add)WMEWMA 21 70x10−4 2.7% 0.18%

(t = 10, α = 0.3)(2 mul,1 div, 2 add)

Table 4.3: Simulation results of all estimators in agility settings.

degree of overshoot and undershoot is higher than in simulation. This is expected since

real traces W (t) have larger variances that W (t). As a result, estimators tuned using W (t)

should be tuned for more stability when applied in real situations.

4.4.4 Confidence Interval Estimation with WMEWMA

We can improve our link estimator by using normal approximation to derive confi-

dence intervals of the estimated mean. We use the WMEWMA estimator and study how the

confidence interval changes across the different link quality, P . We relax α from 0.6 to 0.5

to capture the likelihood of using bit shifting rather than divisions in real implementations.

Figure 4.5 shows the 95% confidence interval approximation when P changes from 90% to

50%. According to the normal approximation equation, the confidence interval lies between

4.5. ALTERNATIVE ESTIMATION TECHNIQUES 85

Figure 4.4: Output from the stable WMEWMA estimator using empirical data input.

[P − zσn√n

, P + zσn√n

]. Since n is fixed by the estimator tuning and z can be looked up from

the normal distribution table, the variance, σn, affects the confidence interval calculation

the most. For Binomial distribution, σn depends on P . Figure 4.5 shows that the 95%

confidence interval can vary from 6% to 11% as P changes. Since the expected 10% noise

margin agrees with the range of the estimated confidence interval, we believe that an on-line

approximate of the confidence interval can be omitted unless higher-level protocols require

an accurate estimate of it.

4.5 Alternative Estimation Techniques

The resource constraints on our platform significantly limit the amount of pro-

cessing and storage one can do, which narrows the choice of estimators. Computing a

statistically meaningful median already raises a concern on storage constraint, given there

4.5. ALTERNATIVE ESTIMATION TECHNIQUES 86

0 50 100 150 200 250 300 350 400 450 5000

10

20

30

40

50

60

70

80

90

100

T Events

Per

cent

age

%

95% Confidence IntervalActual ProbabilityWMEWMA(30,0.5)

Figure 4.5: Confidence interval estimation with respect to the WMEWMA(30,0.5) estimator

for different link quality.

are potentially many neighboring nodes one needs to estimate. Despite the rich litera-

ture on estimation techniques, such as linear regression, the Kalman filter, or the Hidden

Markov Chain, their use is not practical for such a low-level mechanism; some of them may

even require a detailed model of the channel, which is difficult to achieve for all kinds of

environments. There may exist other estimation techniques at the packet level that are

more effective than what we have explored. Without a channel model, they would be non-

Bayesian estimators and their performance, as approximated by the central limit theorem,

would be very close to what we have already achieved with our candidate estimators. The

new IEEE 802.15.4 radios, such as the Chipcon 2420 [2], provide hardware link quality in-

dicator support on a per packet basis at the physical layer. The units of measurements are

not normalized to reception rate probability; such support can be very useful to augment


our link quality estimation at the packet level. The hardware is new and future work is

required to utilize such capability. For the studies in the remaining chapters, we use the

WMEWMA estimator and explore its effectiveness with respect to the routing layer.

4.6 Related Work

Passive probing to estimate link reliability for the wired and Wi-Fi type of wireless

networks is well established. In wired networks, it is widely deployed over the Internet in

protocols such as the Internal Gateway Routing Protocol (IGRP) [35] and the Enhanced

IGRP (EIGRP) [10]. Reliability is measured as the percent of packets that arrived un-

damaged on a link. It is reported by the network interface hardware or firmware, and is

calculated as a moving average. In IGRP, link reliability of a route equals the minimum

link reliability along the path. This is one example illustrating how link estimation is used

in the context of routing in the Internet.

In wireless networks such as 802.11 [1], a wide range of link estimation techniques

have been proposed and implemented. It is necessary to perform link level estimation

because 802.11 only characterizes link at the frame level and uses it together with the signal-

to-noise ratio to determine the appropriate bandwidth setting for links to communicate

reliably. Such information is not necessarily exposed by the firmware and also depends on

whether the operation mode is infrastructure mode or ad-hoc mode.

The received signal strength to infer link quality has been used to systematically

study the 802.11 link characteristics [28]. However, [24] shows that using the received signal

strength to infer link quality in 802.11 networks is not accurate, and they propose a packet-


level window moving average estimator based on the periodic broadcast of link probes.

The tight decoupling of link estimation and routing protocols limits [24] from piggybacking

information over routing packets; thus, such an estimation technique is no longer passive.

Another approach is based on burst-wise C/I estimates [7], in which a carrier signal estimate

is calculated by convoluting a training sequence with the channel impulse response; however,

this requires support from the radio hardware and a correct impulse response of the channel.

Other methods that utilize the acknowledgment history of the most recently transmitted

packets have also been proposed [16, 55]; however, these mechanisms require explicit packet

transmissions to each neighbor.

There is also a rich literature on network performance estimation, especially in

the context of multicast and overlay topology management. Most of these efforts focus on

active probing by injecting measurement traffic into the system, since direct measurement

is not built-in as it is for link interfaces. For example, special multicast probes are used to

estimate the internal multicast network packet loss rate and infer the overall topology [63].

To minimize the power consumed by link estimation, we focus on passive techniques and

avoid sending probe packets by piggybacking over route update packets.

Most prior work using passive estimation seeks to estimate a value from a large

set, where each observation is itself a direct measurement instead of an event. For example,

the round trip time estimator in TCP [76] can adjust its estimation based on each round

trip time measurement. In contrast, we must estimate the probability of reception from

each discrete boolean event - the arrival of a packet or the silent failure to receive a packet.

Thus, estimators that have proved effective in other regimes may not be effective here. For

4.7. SUMMARY AND MULTIHOP ROUTING IMPLICATIONS 89

example, a highly relevant work in [50] studies the behavior of a collection of estimators in

a context where both responsiveness and stability are desired in the face of sudden changes.

By measuring round trip delay, they calculate the latest available bandwidth and filter it

with an estimator. They assert that flip-flopping based on statistical process control between

two EWMAs, with agile and stable gain settings, provides the best estimator. Since each

measurement reveals the latest estimate of available bandwidth, it can determine if the

latest bandwidth falls within a certain prediction. If the agile estimation deviates so much

that the process is likely to go out of control, the flip-flop drops the agile estimation and

relies on the stable prediction. We do not have the same ability on a sample by sample basis,

and, as observed in our estimator studies, such a flip-flop scheme does not yield significant

benefits in our regime.

4.7 Summary and Multihop Routing Implications

The goal of this chapter has been to explore the design space and select a simple,

passive link estimator that performs well according to our metrics and can be efficiently

implemented on our resource-constrained platform. Such resource limitations allow us to

filter out many complex or inefficient estimation techniques and help us focus on a few

efficient estimators. Without an accurate a priori channel model, these estimators are non-

Bayesian. Through a systematic study of six different estimators, we found that EWMA

over an average time window (WMEWMA) performs best overall. In our study, it provides

stable estimations within a 10% error, with a corresponding reaction time to large link

quality changes of about 100 packets, which agrees with the lower bound approximated by

4.7. SUMMARY AND MULTIHOP ROUTING IMPLICATIONS 90

the central limit theorem. Furthermore, the storage requirement is constant for all tunings,

making it an attractive estimator for our resource constrained platform. Our study has

narrowed down the estimator design space into this one estimator and suggests a reasonable

parameter setting that yields the above performance.

Real time estimation of link reliability is vital in any self-organizing network, wired

or wireless, since routing paths should never be constructed over poor links. Understanding

how the performance of link estimations may affect higher-level algorithms is important for

our holistic design. Our results suggest that one can either build an agile estimator with

large errors or a stable estimator with a settling time of about 100 packets. For routing

purposes, the stable estimator is a clear choice. Even so, the estimate will only be accurate

to about 10%. In choosing a parent for routing, fluctuations of up to about 10% should

be tolerated before switching to a better alternative. For routing algorithms that use cost

metrics composed of link estimations as aggregated routing costs, care must be exercised

to avoid cycles due to variations and errors in the estimates.

The stability of the estimator can affect the global stability of the topology, es-

pecially if the routing cost metric is built upon the estimations. Although all the link

estimator designs that we consider are passive, they all rely on an implicit minimum data

rate set by the application. Such minimum data rate is often realized as beacons (such as

route updates) and will affect both bandwidth utilization and rate of topology adaptation.

All in all, while link estimation is purely an underlying mechanism, such a minimum data

rate is a policy that higher-level protocols should consider.

91

Chapter 5

Neighborhood Management under

Limited Memory

We have laid out the groundwork of using passive link estimators to characterize

link quality for assigning weights for each edge in building a connectivity graph for routing.

The next step in our holistic approach is for each node to build up its local neighborhood

using a fixed size neighbor table, which is often small due to memory constraint on the plat-

form; such a logical neighborhood defines the local connectivity options of a node. The sum

of all the local neighborhood information from the entire network thus forms a distributed

logical connectivity graph for routing. The usual concept of defining a neighbor is based

on boolean connectivity. With the probabilistic view of connectivity, as defined relative

to link estimation, neighborhood becomes a fuzzy concept. Therefore, in this chapter, we

revisit the basic concept of neighborhood management under this probabilistic approach.

The challenge in such a process is to achieve network scalability while using only limited

5.1. DENSE AND FUZZY NEIGHBORHOODS 92

resources on each node; in typical deployments, there would be more potential neighbors

than what a node, using its limited memory, can keep track of. Thus, each node must iden-

tify a subset of neighbors with reliable connectivity. However, we cannot rely exclusively

on link estimation to determine if a node should be tracked as a neighbor, since estimation

requires memory. How should we determine whether a potential neighbor might be a good

neighbor to keep in the neighbor table? We describe a framework of such a local process,

borrowing techniques from cache design policies and data-stream estimation techniques in

database literature to solve the problem. A thorough evaluation of the different techniques

is presented, and the best is selected for the routing study in the remaining chapters. We

survey the related work, with most of the prior work found in the packet radio literature.

We then discuss the overall implication of neighbor management to routing.

5.1 Dense and Fuzzy Neighborhoods

Chapter 3 shows that the connectivity cell of a node is irregular and the communi-

cation range consists of three distinct regions. To ensure reliable links, nodes are typically

spaced within the effective region of the communication range. Since the transitional region

is much larger and has a mix of many good and bad links, this is likely to create a relatively

dense network. In addition, there are many potential neighbors with unreliable links, not

suitable for routing. For example, Figure 5.1 illustrates the potential ratio of the number

of nodes in the effective region (darker region) to that in the transitional region (lighter

region). That is, in a typical deployment, the number of nodes in the effective region is

small compared with that in the transitional region. Furthermore, not all the nodes in

5.2. CHALLENGES OF NEIGHBORHOOD DISCOVERY UNDERLIMITED MEMORY 93

the transitional region have bad links. The darker circles in the shaded region represent

nodes that have good links suitable for routing. Simply hearing a message cannot determine

whether a node is in the effective region or has bad or good links in the transitional region.

One simple approach is to use a link quality threshold and only consider nodes that are

above the threshold as neighbors. While this technique is simple, it is difficult to determine

one appropriate value for all deployments. For example, a sparse network would require a

very different value from a dense network. If the node layout is not uniform, it is difficult

to expect a single threshold would apply for the entire network. In addition, interferences

from other traffic and environmental effects can lead to link quality fluctuations around

the threshold. The result is links coming and going over time, which may lead to network

partitions. In the next section, we discuss further how memory constraint would make this

thresholding approach impractical.

5.2 Challenges of Neighborhood Discovery under Limited

Memory

Typical approach to neighbor discovery is to record information about all nodes

from which packets are received (potential neighbors), either as a result of passive traffic

monitoring or active probing through beacons. Link quality can then be estimated and used

for neighbor discovery. This implies that memory resources must potentially be allocated

for each potential neighbor. Even though the link estimator that we have chosen is simple

and memory efficient, placing too many memory resources in such a low-level operation is

not efficient from the overall system point of view. For example, if an entry of a neighbor

5.2. CHALLENGES OF NEIGHBORHOOD DISCOVERY UNDERLIMITED MEMORY 94

Figure 5.1: Illustration of the potential neighbors of a center node in a dense network. The

darker shaded region shows the effective region while the lighter region shows the transitional

region. The cross indicates the center node.

table requires 10 bytes for maintaining and estimating the quality of each neighboring node,

a node that hears 50 nodes would require a 500-byte neighbor table, which is 1/8 of the

total memory available for the entire node on the Mica or Mica 2 platform. Furthermore,

in a dense network, not only does a node receive packets from more potential neighbors

that it can represent in its neighbor table, most of these potential neighbors would have

unreliable links that are not suitable for routing to begin with. As a result, how does a node

determine, over time, in which nodes it should invest its limited neighbor table resources to

maintain link statistics? The problem is that if a node is not in the table and the table is

full, there is no place to record the link statistics of the node. As a result, the receiver cannot

determine the link quality of this node and determine whether to invest precious memory

5.3. AN ON-LINE NEIGHBORHOOD SELECTION PROCESS 95

resources on it or on the current set of neighbors in the table. Therefore, the process of

link estimation and neighborhood management are mutually related. In fact, neighborhood

management must itself infer link reliability without the use of link estimation. Controlling

the transmission power to adjust cell density does not solve the problem either since the

parameter is often application or deployment specific. For example, [64] adjusts the transmit

power to control the topology and minimize energy required to transport data. All in all, it

is fundamental that sensor network applications can only maintain statistics about a subset

of the potential neighbors. That is, we need an on-line neighborhood selection process that

keeps track of a set of good neighbors with a limited size table regardless of cell density.

The selection criteria of neighbors heavily depend on the nature of the higher-level protocol

or application. For routing, there are many ways to define good neighbors, but we first

focus on the basic concept in finding the reliable ones without the need of link estimation.

5.3 An On-line Neighborhood Selection Process

The neighborhood management process essentially has three components: inser-

tion, eviction, and reinforcement. For each incoming packet upon which neighbor analysis

is performed, the source is considered either for insertion if it does not reside in the table

or reinforcement if it does. If the source is not present and the table is full, the node must

decide whether to evict another node from the table.

We seek to develop a neighborhood management algorithm that will keep a suffi-

cient number of good (reliable) neighbors in the table regardless of cell density. Ultimately,

the goodness criterion should reflect which nodes are most useful for routing. For example,


we would want to discard nodes with low-quality links, as they are poor routing neighbors.

With a link quality distribution as indicated by Figure 3.1(a), a node in a field of sensors

will hear from many more weakly connected, distant nodes than from well-connected ones.

However, a node should hear from the well-connected nodes more frequently, since a smaller

fraction of their packets are lost, assuming every node has a roughly uniform transmit rate.

Therefore, we rely on reception frequency (or rate) to infer the likelihood of a link being

reliable. Since frequency can be a traffic dependent measure, a fairer comparison of link

quality is obtained by using only periodic messages, such as beacons. The management

algorithm should prevent the table from being polluted by many low utility neighbors, but

at the same time allow new valuable neighbors to enter.

In this section, we describe such an on-line process and relate how it can be

approached by traditional cache design techniques. We also take an alternative approach

by borrowing techniques from the database community. We focus on passive neighborhood

discovery, where nodes snoop on periodic data messages. Insertions are always performed

if the table is not full, while evictions are performed only if the table is full. An adaptive

down-sampling insertion policy is used, which governs the rate of insertion into the table

when it is full, to avoid table being polluted by unreliable neighbors. Within the table,

each entry contains the relevant data for link estimation, table management data and all

relevant information related to routing, as defined by the routing layer. When a node is

evicted from the table, its link estimation along with the routing information is lost.


5.3.1 Adaptive Down-sampling Insertion Policy

Upon hearing from a non-resident node, we must determine whether to insert it

when the table becomes full. No historical information can be used, since there is no table

entry allocated. In some cases, geographic information or signal strength associated with

the packet can guide the selection process. However, geographic data is often absent and

does not account for obstructions, while signal strength can be highly variable. Therefore,

we look for a simple statistical method. The insertion policy should avoid over-running the

neighbor table with a high rate of insertion in order to establish a stable set of neighbors.

To avoid over-running the table, the insertion rate must be much less than the rate of

reinforcement such that nodes in the table can stay to get reinforced before being evicted

by new insertions. That is, the eviction rate due to new insertions cannot be greater

than the rate of reinforcement. For periodic traffic commonly found in sensor networks,

the maximum insertion/eviction rate can be estimated by the data rate multiplied by the

number of neighbors with physical connectivity; the reinforcement rate is just the data rate.

Therefore, controlling the insertion rate is critical and one simple technique is to rely on

probabilistic down-sampling; when a node not in the table is encountered, we only consider

it for insertion with some probability, p. For neighbors in the table, they will be reinforced

as normal. This is similar to the sticky policy described in [32].

The down-sampling rate, p, in effect controls the insertion rate and needs to adapt

to different cell densities. One simple approach is to set p to the ratio of the neighbor

table size, |T |, and the number of distinct potential neighbors, N . This ratio, in the worst

case, evicts all |T | entries if every beacon message from all potential neighbors is received.


Input: Number of neighboring nodes N , neighbor table T , and node to beinserted nOutput: None.Downsampling(N,T, n)(1) if n ∃ in T(2) call REINFORCE(n,T )(3) else(4) if rand(0,1) ≤ min( |T |N ,1)(5) call INSERT(n, T )

Figure 5.2: Downsampling process.

Since not all potential neighbors have good links, nodes in the table can have a chance to

get established before being evicted. Our assumption on periodic messages can be relaxed

because insertion is simply a mechanism to allow nodes to establish themselves as neighbors

in forming the logical connectivity graph; if a node is heard frequently, it is likely to be a

good neighbor. This down-sampling process is summarized in Figure 5.2. We will investigate

the effect of changing this ratio later in the chapter.

To estimate N , there exists prior work in the database literature to estimate the

number of distinct values over a continuous stream [31]. However, in our case when periodic

beacons are present, we simply count the average number of beacons received over a period

to determine N .

5.3.2 Cache-Based Eviction and Reinforcement

For on-line eviction and reinforcement, a simple approach is to borrow techniques

from traditional cache policies, since they also seem to maintain the most frequent data

or instructions in a limited table. We consider FIFO, Least-Recently Heard (LRH), or


CLOCK algorithm approximations to LRU. For FIFO, eviction is based on order of entry,

with the entry that has resided in the table longest being the candidate for eviction; no

reinforcement is performed. For LRH, a resident entry is made most recently heard upon a

message reception from the node, thereby reinforcing it in the table. The entry that has not

been heard for the longest time will be removed upon eviction. For the CLOCK algorithm,

reinforcement sets the reference bit to 1. On eviction, the table is scanned, clearing reference

bits, till an unreferenced entry is found.

5.3.3 Frequency-Based Eviction and Reinforcement

A similar problem of using limited memory to find the most frequently occurring

tokens in a data-stream appears in the database literature. One effective policy is the

FREQUENCY algorithm [25]. The algorithm is shown in Figure 5.3 and works as follows.

It keeps a frequency count for each entry in the table. A node is reinforced by incrementing

its count. A new node will be inserted in the table if there is an entry with a count of zero

to be replaced; otherwise, the count of all entries is decremented by one and the insertion

fails, with the new candidate being dropped. The most frequent entries will be retained

by the table. In contrast with all the cache policies in our study, considering a node for

insertion does not always lead to eviction. This is an important difference since it affects

how well the algorithm can maintain its entries, as we see in the next section.

5.4. EVALUATION METHODOLOGY 100

Input: Node n to be inserted. Neighbor table TOutput: Success or Fail.Insert(n, T )(1) if ∃ an entry e in T where ecounter = 0(2) Use e to store n in table T(3) return SUCCESS(4) else(5) foreach entry e in T(6) ecounter = ecounter − 1(7) return FAIL

Input: Node n and table T .Output: Success or Fail.Reinforce(n)(1) if n is in T ’s entry e(2) ecounter = ecounter + 1(3) return SUCCESS(4) else(5) return FAIL

Figure 5.3: Insertion and reinforcement in Frequency algorithm.

5.4 Evaluation Methodology

We explore the effectiveness of the different eviction and reinforcement policies

described before by evaluating them using simulations. The simulation setup works as

follows. We use the probabilistic link model for connectivity derived from Figure 3.1. We

simulated a large dense network of 6400 nodes placed uniformly as a grid. Using a 80x80 grid

with 4 feet spacing, so that the effective region covers nodes within 3 grid points in either

direction, we consider the neighborhood of a typical node near the center in a dense network.

Such a node in this simulated scenario has 207 potential neighbors, i.e., nodes from which it

hears at least one packet; each node transmitted 100 packets in the simulation. Figure 5.4

5.5. RESULTS 101

shows the cumulative distribution function of the link quality of all of this node’s potential

neighbors. About 30% of the nodes have link quality greater than 75% while about 40%

of the nodes have link quality less than 25%. As expected, many potential neighbors have

unreliable links and only a small fraction of the links have very good reliability. Repeating

the study for different grid spacings shows that this ratio remains roughly constant (25%

to 30%) as the number of potential neighbors ranges from 20 to 200. This is expected for a

grid layout, but it is also true for any uniformly random layout. For this study, we define

a good neighbor to be a potential node with link quality greater than 75%. Recall that the

goal for a neighbor management policy is to retain as many good neighbors in the table as

possible regardless of cell density. To evaluate the different policies, we measure yield, i.e.,

the fraction of good neighbors that are found in the table more than 75% of the time.

This yield metric captures two notions about neighbors in general. In a sparse

network, most of the neighbors would stay in the table, but only a few have good links. In

a dense network, many potential neighbors would have good links, but they may not stay

in the table for long because the management policy may not be able to maintain a stable

set of neighbors. Yield captures these two scenarios very well.

5.5 Results

In this section, we first explore the effect on yield when the FREQUENCY algo-

rithm uses our adaptive down sampling mechanism. We then incorporate the mechanism

with other management policies and evaluate yield for each of them.

5.5. RESULTS 102

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1CDF of Neighbor Link Quality

Link Quality

F(x

)

Figure 5.4: Cumulative distributive function showing the link quality distribution of the 207

neighbors of a center node in a 80x80 grid network with 4 feet spacing using our empirical link

model.

5.5.1 Effect of Adaptive Down-Sampling

Figure 5.5 shows a case illustrating a contour plot on the yield of FREQUENCY

at different cell densities and table sizes, with the down sampling insertion policy disabled.

In contrast, Figure 5.6 shows the same case but with a down sampling rate, p, set to 50%.

The difference is very dramatic; the contour lines are pushed much lower to the lower right

corner with down sampling. This observation signals that a much smaller table can be

used to maintain all the good neighbors. For the case without down sampling, the table

is polluted by many of the unreliable neighbors, and therefore requires a larger table to

maintain the good ones. This demonstrates the importance of the down sampling insertion

policy.

5.5. RESULTS 103

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Number of Neighbors (N)

Tab

le S

ize

(M)

and

M <

N

Table Size vs. Number of Neighbors using FREQ without Downsampling

0.10.2

0.3

0.3

0.4

0.4

0.5

0.50.6

0.6

0.7

0.7

0.7

0.8

0.8

0.8

0.9

0.9

0.9

0.9

0.9

0.9

Figure 5.5: Contour plot on yield of the FREQUENCY algorithm for different cell densities

and table size with no down sampling for insertion.

The system is adaptive as long as the p adapts to the changing N . We experi-

mented with different cell densities and found that as long as p is greater than ableSizeNumberNeighbors

or |T |/N , the results are very similar. This is because the insertion rate has already lowered

to avoid over-running the table. Thus, our adaptive scheme follows this ratio to adjust the

down sampling rate for different cell densities.

5.5.2 Eviction and Reinforcement Policy

We evaluate the different policies by setting the table size to be constant and

measure the yield as node density increases. Figure 5.7 shows how the different policies

perform at different densities with a table size of 40 entries. We can analyze Figure 5.7 by

breaking it into 3 regions. First, as expected, all policies perform well when the table can

5.5. RESULTS 104

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Number of Neighbors (N)

Tab

le S

ize

(M)

and

M <

N

Table Size vs. Number of Neighbors using FREQ with Downsampling = 50%

0.1

0.1

0.2

0.2

0.3

0.3

0.4

0.4

0.5

0.5

0.6

0.6

0.6

0.7

0.7

0.7

0.8

0.8

0.8

0.9

0.9

0.9

Figure 5.6: Contour plot on yield of the FREQUENCY algorithm for different cell densities

and table sizes with down sampling rate of 50% for insertion.

hold all the potential neighbors. Second, when the number of potential neighbors exceeds

the table size, but the number of good neighbors is less than the table size, all policies still

maintain most of the good neighbors. However, when the number of good neighbors exceeds

the table size, i.e., when the number of potential neighbors is three times the table size, the

cache based policies are unable to hold onto a subset of good neighbors, even though they

are plentiful. In contrast, FREQUENCY retains at least 20 neighbors, a 50% yield, even

at high densities.

We continue our evaluation by changing two variables independently: cell density

and table size relative to cell density. We vary the cell density from 20 to 220 potential

neighbors and the table size from 150% of the number of potential neighbors to 50%. The

results are shown in Figure 5.8. In each figure, the x-axis shows the number of good

5.5. RESULTS 105

0 50 100 150 200 2500

5

10

15

20

25

30

35

Num

ber

of G

ood

N

eigh

bors

Mai

ntai

nabl

e

Number of Neigbhors

FREQLRHCLOCKFIFO

Figure 5.7: Number of good neighbors maintainable at different densities with a table size of

40 entries.

neighbors in the cell as the number of potential neighbors increases from 20 to 220. The

y-axis shows the yield. The series of sub-figures represent different yields among different

management policies as table size decreases from 1.5 times the number of potential neighbors

to only half of it. Note that the uniform layout keeps the number of good neighbors to be

about 25% to 30% of the number of potential neighbors, even as the number of potential

neighbors increases. The FREQUENCY policy performs much better than all other policies

across all the scenarios.In Figure 5.8(a), where the table size is greater than the number

of good neighbors, all policies perform very well. As table size decreases, all of the cache

policies start to degrade significantly. When the table size is only half of the number of good

neighbors, all the cache policies can no longer maintain any good neighbors, as indicated

in Figure 5.8(d). In contrast, FREQUENCY policy maintains a yield of about 30% of the

5.6. OTHER GOODNESS METRICS 106

good neighbors, even though its size can only fit 50% of them, which is a 60% efficiency.

In fact, across all the table sizes, the average efficiency of FREQUENCY is 70%, which

is very effective. Furthermore, the yield of the FREQUENCY policy experiences smaller

fluctuations at different densities as compared to other cache policies.

We conclude that FREQUENCY is very effective in maintaining a subset of good

neighbors over a fixed-size table, even for densities much greater than the table size. For

example, with a table of 32 entries, this policy yields at least 10 good neighbors at all

measured densities. One of the reasons why the cache policies under-perform at high den-

sities is that for each insertion it is designed to evict an entry, while FREQUENCY would

drop the insertion since no entries are replaceable. We believe this is the main reason why

FREQUENCY can maintain a stable set of good neighbors even at very high density.

5.6 Other Goodness Metrics

The frequency count goodness metric is the most basic way to infer reliability

for neighborhood management. However, there are many other ways such a metric can

be augmented. For example, the neighborhood management policy can take into account

routing cost, geographical location, energy/lifetime of a neighbor, time scheduling issues,

or aggregation opportunity. A wide variety of metrics can be defined, so the design space is

large. In this study, we take the basic goodness criteria and focus mostly on the frequency

metric. However, in Chapter 7, we augment the route table management policy further by

taking into account routing cost as the goodness metric. The idea is to avoid maintaining

sibling nodes (nodes with roughly the same routing cost) in the table since they are likely

5.6. OTHER GOODNESS METRICS 107

20 25 30 35 40 45 50 55 60 650

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of Good Neighbors (|S|)

% G

ood

Nei

ghbo

rs s

tay

>=

75%

Tim

e in

Tab

le

Yield vs. Number of Good Neighbors(|S|) with Table Size=1.5*|S|

FIFOLRHCLOCKFREQ|S|/Total Number of Neighbors

(a) Table size equals 1.5 times the number of goodneighbors.

20 25 30 35 40 45 50 55 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


% G

ood

Nei

ghbo

rs s

tay

>=

75%

Tim

e in

Tab

le

Yield vs. Number of Good Neighbors(|S|) with Table Size=1*|S|

FIFOLRHCLOCKFREQ|S|/Total Number of Neighbors

(b) Table size equals the number of good neighbors.

20 25 30 35 40 45 50 55 60 650

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


% G

ood

Nei

ghbo

rs s

tay

>=

75%

Tim

e in

Tab

le


FIFOLRHCLOCKFREQ|S|/Total Number of NeighborsIdeal

(c) Table size equals 75% the number of good neigh-bors.

20 25 30 35 40 45 50 55 60 650

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


% G

ood

Nei

ghbo

rs s

tay

>=

75%

Tim

e in

Tab

le


FIFOLRHCLOCKFREQ|S|/Total Number of NeighborsIdeal

(d) Table size equals 50% the number of good neigh-bors.

Figure 5.8: Yield for different table sizes and cell densities.


to find their own routing paths. Therefore, it is more advantageous to free up the entries for

potential parents or children. One simple technique is to apply a threshold on the routing

cost difference between neighboring nodes.

5.7 Related Work

Similar problems were studied in the packet radio literature, as they also needed

to manage a large set of potential neighbors with limited on-node memory. One such prior

work randomly selects neighboring nodes with physical connectivity into the routing table

and limits a node with a maximum allowed degree [70]. This approach is similar to creating

random graphs, which establish links between potential neighbors with a probability p.

When a potential neighbor is heard, a coin is flipped with probability p. If successful,

a handshake occurs between the two nodes to define this neighborhood relationship. It

relies on random graph percolation theories, which proved that random selection can create

a globally connected graph given that the value of p is high. Once a node has reached

its maximal degree, the process of neighbor selection ends. However, to allow space to

accommodate new nodes when links die, the protocol always keeps the node degree to a

steady state below the maximum degree. If the protocol detects that the routing graph is

not connected, links will be randomly evicted to re-establish the neighborhood relationships;

such a process repeats if partition at the routing graph persists.

The above mechanism shows a simple distributed scheme to build logical connec-

tivity graphs. However, it treats every potential neighbor equally; it does not address the

issue of lossy connectivity and the necessity to select neighbors with good links. Our ap-


proach integrates this process by having a biased sampling reinforce nodes that are heard

frequently and allow routing layers to have an opportunity to influence the selection process.

Another approach to neighborhood management that considers individual link

quality is discussed in [54]. The idea is to build up a candidacy list for potential neighbors

to be considered for inserting into the neighbor table. The paper assumes that if the node

already has a two-or-more-hop routing path to a potential neighbor at the routing graph,

it is not necessary to keep this potential neighbor in the candidate list. Otherwise, this

potential neighbor would stay in the candidate list for its link estimation to build up. If the

candidate is determined to be good (have link quality above some threshold), it may replace

a table entry if there is one worse than it. If the candidate is bad, it will be removed from

the candidate list. It is unfortunate that no evaluation of the algorithm is presented in [54].

However, this algorithm is close to the holistic approach that we advocate for neighborhood

management. Note that the neighbor exclusion criteria requires any-to-any routing support

since a node should not become a new logical neighbor if there is already a route to it.

The problem of neighborhood management has aspects in common with cache

management and with statistical estimation techniques in databases. There is a growing

body of work on gathering statistics on Internet packet streams using memory much smaller

than the number of distinct classes of packets. Heuristics are used in [27] to identify a set of

most frequently occurring packet classes. Two algorithms are presented in [32] to identify all

values with frequency of occurrence exceeding a user specified threshold. A sliding window

approach is used in [23] that can be generalized to estimate statistical information of a data

stream. Finally, [25] showed a simple FREQUENCY algorithm that estimates the frequency

5.8. MULTIHOP ROUTING IMPLICATIONS 110

of Internet packet streams with limited space. We explored these different techniques during

our design process and identified the FREQUENCY approach as our solution.

5.8 Multihop Routing Implications

We have demonstrated that it is possible for a local process to maintain a stable

subset of good neighbors using a limited size neighbor table much smaller than the actual

number of potential neighbors. With such a stable set of neighbors, statistics can be col-

lected for them such that link estimation can be performed. This subset of neighbors defines

the local connectivity options of the node. Together, a logical connectivity graph with each

edge characterized by link estimation is discovered.

The definition of a neighbor is now relative to a set of local rules judging all the

nodes that are heard, and only the most competitive ones will be kept as neighbors by

the neighbor table. It is important to note that in deriving such a reliable neighborhood,

no explicit threshold setting is required and the problem of setting too high a threshold

that results in network partition would not occur. Furthermore because the FREQUENCY

algorithm works well regardless of cell densities, higher-level protocols can adjust the cell

density without affecting the robustness of the routing system. That is, the logical connec-

tivity graph built by the nodes will adapt to the different cell density according to its best

ability with the limited resources.

We have shown how to use link reliability as one of the basic criteria for neigh-

borhood selection. This definition of the selection criteria can be further augmented to

achieve better selections. Since higher-level services may need to specify its own goodness

5.8. MULTIHOP ROUTING IMPLICATIONS 111

metric for neighborhood management, designing a flexible neighborhood abstraction and

architecture such that different services can cooperate to influence the choice of neighbor

will be important research that is left as future work. In the next chapter, we revisit the

high-level picture of what role neighborhood management plays in the context of network

self-organization and study the overall routing problem.

112

Chapter 6

Cost-Based Routing

With the local processes of link estimation and neighborhood management, a dis-

tributed logical connectivity graph is created. Each edge on the graph is characterized by

link quality as a probabilistic metric of reliability. Routing protocols should build topologies

upon this graph and the resulting topology is a subgraph of the logical connectivity graph.

The primary focus of this chapter is to explore the design of such a routing process to form a

stable and reliable routing topology. We first introduce a typical distributed distance-vector

tree formation process and extend it to a general framework to support different kinds of

cost-based routing. We focus on tree formation since data-collection is the most common

form of communication pattern for sensor networks; it also brings forward the issues that

need to be considered in any pattern. Such a tree formation process has to be integrated

with the other two processes: link estimation and neighborhood management. We give

an overview of how these three processes work together to form a routing subsystem, and

present a set of underlying system issues when it is implemented. With the routing frame-

6.1. DISTRIBUTED TREE BUILDING PROCESS 113

work in place, we discuss the different routing cost functions that run upon the logical

connectivity graph. Finally, we survey the relevant related work in the context of wireless

ad hoc routing and discuss how the design approaches in the literature are different from

ours.

6.1 Distributed Tree Building Process

As discussed in Chapter 2, many of the sensor network applications require a basic

form of tree-based routing for data collection and tree-based in-network processing. In this

section, we focus on a general framework of such a distributed tree building process in the

context of distance-vector based routing with an arbitrary routing cost function. Such a

framework can be extended to form multiple trees rooted at different nodes when a spanning

forest topology is required.

We first discuss the basic process of a distributed distance-vector based routing

protocol in the context of tree building. The root of the tree or the sink node always has a

routing cost of 0. For all other nodes, the routing cost is initialized to infinity. Every node in

the network periodically transmits route messages, which contain the node address, and the

estimated routing cost to the tree root. Upon reception of route messages from neighboring

nodes, each node extracts the information from the message and stores it in the neighbor

table. As the neighbor table is updated, a local parent selection function is invoked to select

the best parent. For the basic distance-vector based routing, the best parent is the one that

carries the smallest routing cost or “distance” to the root of the tree. Once parent selection

has identified the best parent, the routing cost of the node is computed by adding the link


Input: Last Routing Message Received M . Neighbor Table T .Output: Success or Fail.TreeBuild(M,T )(1) if sinkNode(2) PathCost = 0(3) Parent = itself(4) else(5) PathCost = ∞(6) Parent = nil(7) update T with M(SourceAddress, RoutingCost)(8) foreach entry e in T(9) eLinkCost = EvalLinkCost(e)(10) RoutingCost = EvalCost(ePathCost, eLinkCost)(11) PathCost = Min(PathCost, RoutingCost)(12) Parent = node with the Min(PathCost)(13) Send Route Message(14) if Parent = nil(15) return FAIL(16) else(17) return SUCCESS

Figure 6.1: Distributed tree building algorithm framework.

cost to the routing cost of the parent. The next route message would convey the new cost

to neighboring nodes.

The above is formalized in a general framework shown in Figure 6.1. Line 9 in

Figure 6.1 shows the EvalLinkCost function, which defines the cost of a link, and the

EvalCost function on line 10, which combines the link cost with the routing cost of the

path from each potential parent. For example, for the cost functions to perform shortest hop

routing, PathCost and LinkCost would be in the units of hop-counts, and EvalLinkCost

would return 1 for all links in T and EvalCost would be a simple addition operation. One

of the assumptions of such a distributed tree building process is that no matter what the

routing cost function is, the link cost must be positive and EvalCost must always increase


the cost.

Figure 6.2 shows a more advanced framework that takes the probabilistic nature of

connectivity into the routing layer. As discussed in Chapter 4, out-bound link estimations

among neighbors are obtained through piggybacking in-bound link estimations over route

messages, which are sent periodically to disseminate routing information and maintain the

minimum data rate for link estimation. Line 10 shows such a mechanism of obtaining out-

bound link estimation feedback from the node that originates the route message. With both

in-bound and out-bound link estimations, asymmetric links can be avoided, depending on

the actual criteria defined by the EvalLinkCost function. Line 13 shows a simple mechanism

that avoids creating two-hop cycles. It works by simply not choosing immediate children as

potential parents. Line 15 avoids selecting a parent that does not have a parent. This does

not imply that this potential parent is connected to the root of the tree. Line 17 breaks

cycles that are more than one hop when they are detected by the cycle detection mechanism.

Line 19 ensures that the potential parent is connected to the root of the tree, a mechanism to

cope with the counting-to-infinity problem, which is discussed later in the chapter. Finally,

with the understanding that link quality estimation has at least 10% fluctuations, line 26

provides a hysteresis to lower the potential occurrences of route flapping. Note that the

switching threshold is useful for error-prone routing cost functions such as those derived

from link estimations. With the high-level tree building framework presented, we turn to

investigate an appropriate routing cost function for sensor networks using such a framework.


Input: Last Routing Message Received M . Neighbor Table T .Output: Success or Fail.TreeBuild(M,T )(1) if sinkNode(2) PathCost = 0(3) Parent = itself(4) return SUCCESS(5) else(6) OldParent = Parent(7) OldPathCost = ∞(8) PathCost = ∞(9) Parent = nil(10) update T with M(SourceAddress, RoutingCost,OutboundLinkEstimation)

(11) foreach entry e in T(12) eLinkCost = EvalLinkCost(e)(13) if e is a child(14) continue(15) if e has no parent(16) continue(17) if Cycle is detected with e(18) continue(19) if ePathCost != 0 and eRootConnected is FALSE(20) continue(21) RoutingCost = EvalCost(ePathCost, eLinkCost)(22) if eaddr = OldParent(23) OldPathCost = RoutingCost(24) PathCost = Min(PathCost, RoutingCost)(25) Select new Parent with Minimum PathCost(26) if (OldPathCost ! = ∞) and (OldPathCost - PathCost >

SwitchThreshold)(27) Keep using OldParent and OldPathCost(28) if Parent = nil(29) return FAIL(30) else(31) return SUCCESS

Figure 6.2: Distributed tree building algorithm framework with link estimation incorporated.

6.2. OVERVIEW OF THE SYSTEM ROUTING ARCHITECTURE 117

6.2 Overview of the System Routing Architecture

In this section, we present the underlying details of the core mechanisms in sup-

porting the general framework shown in Figure 6.2. We first focus on the core system

architecture in Figure 6.3, which captures the high-level interactions of all the components

implementing the routing framework.

There are several concurrent processes operating together in Figure 6.3. Upon

message reception over snooping the channel, if the source node of the message is not in

the table and the message is neither a route message nor an originated data message, the

message is dropped since the neighborhood management assumes the same message rate for

each node in considering for table insertion. This restriction can be relaxed as discussed in

Chapter 5. If the message is not dropped, the source node will be considered for insertion

into the neighbor table by the Table Management component.

If the source node is already in the neighbor table or to be inserted into the table,

information in the table needs to be updated. Figure 6.4 shows the data structure of the

neighbor table. It contains node status and routing entries for neighbors. Its fields include:

MAC address of the neighbor, neighbor’s parent address, routing cost, children information,

internal management flags, duplicate packet elimination information, reception (in-bound)

link quality, send (out-bound) link quality, and link estimator statistics. These fields are

updated accordingly by the different components depending on the information on each in-

coming message. For example, the link estimator would maintain estimates of the in-bound

(reception) link quality of each neighbor in the neighbor table. Out-bound estimations

are piggybacked on the route messages; the Neighbor Table component would extract the


Table Management

Timer

Parent Selection

Cycle Detection Estimator

Route or originated data message• Insert or discard

Route message• save information

All message• sniff and estimate

Data message

Cycle detected• choose other parent

Run parent selectionand send route message periodically

ApplicationSend originated data message

All Messages• discard non data packet• discard duplicate packet

Filter

Forward QueueNeighbor

Table

Originating Queue

Forwardingmessage

Send route update message

Transmit

Receive

Figure 6.3: Message flow chart to illustrate the core components for implementing our routing

subsystem.

estimates and store them accordingly. The Neighbor Table component also decays the out-

bound estimation if it is not updated by the period specified by OutBoundDecayWindow,

which is defined in Section 4.2.

Parent selection is run periodically to identify one of the neighbors for routing.

The Timer component generates timing events to run the parent selection component, which

broadcasts (locally) a route message to disseminate routing information to neighbors after

completing the parent selection. That is, both parent selection and route message updates

run at the same rate. Such a process is described in the pseudocode previously shown in

Figure 6.2. The route messages include parent address, estimated routing cost to the sink,

and a list of reception link estimations of neighbors. When a route message is received from

a node that is resident in the neighbor table, the corresponding entry is updated. Otherwise,


typedef struct TableEntry {uint16_t id; // Neighbor MAC Addressuint16_t parent; // Neighbor’s Parent’s MAC Addressuint16_t cost; // Neighbor’s Routing Costuint8_t hop; // Neighbor’s Hop-Countuint8_t rootConnected; // Neighbor’s Path Connection to Rootuint8_t childLiveliness; // For cycle detectionuint8_t flags; // Internal management flagsuint8_t lastPacketNo; // For duplicate packet eliminationuint16_t missed; // Estimator statisticsuint16_t received; // Estimator statisticsint16_t lastSeqno; // Estimator statisticsuint8_t liveliness; // Outbound decay windowuint8_t receiveEst; // Inbound estimationuint8_t sendEst; // Outbound estimation

} TableEntry;

TableEntry NeighborTbl[ROUTE_TABLE_SIZE];

Figure 6.4: Typical data structure of the neighbor table. ROUTE TABLE SIZE determines

the size of the neighbor table.

the neighbor table manager decides whether to insert the node or drop the update.

Data packets originating from the node, i.e., outputs of local sensor processing, are

queued for sending with the parent as the destination. Incoming data packets are selectively

forwarded through the forwarding queue with the current parent as destination address. The

corresponding neighbor table entry is flagged as a child to avoid cycles in parent selection.

Packet sequence numbers are used to suppress forwarding duplicate packets as shown in

Figure 6.4. When cycles are detected on forwarding packets, parent selection is triggered

with the current parent demoted to break the cycle. Such a cycle detection process can be

eliminated if the routing protocol is guaranteed to be cycle-free.

6.3. UNDERLYING SYSTEM ISSUES 120

6.3 Underlying System Issues

We now present the issues underlying the routing process described in Figure 6.3.

They include rate of parent selection, packet snooping, the counting-to-infinity problem,

cycles, duplicate packet elimination, queue management, and relationship to link quality

estimation.

6.3.1 Rate of Parent Change

Regardless of the routing algorithm, routes can be changed whenever the parent

selection algorithm is scheduled to run. For fast adaptation, it is tempting to schedule the

parent selection component to evaluate new routes for every route update received from

neighboring nodes and to generate a new route message if the parent is changed. However,

a domino effect of route changes is likely to be triggered across the entire network, especially

when routing costs are very sensitive. This not only creates topology instability, but also

leads to an unbounded message overhead, since a parent change can cause more route

update messages.

To address this issue, we limit the rate of parent change and attempt to bound

the message overhead when the network is unstable. We simply run the parent selection

algorithm synchronously using the timer event. That is, routes are evaluated on a periodic

basis as a route damping mechanism, rather than asynchronously upon receiving a route

update, except when a cycle is detected. Thus, the rate of parent change is bounded over

time. It also bounds the route message update rate and conveniently defines the minimum

data rate for link estimation. Therefore, the rate of parent selection can affect the adaptation


rate for topology and link quality. For sensor networks that are relatively immobile, it is

possible to reduce such a rate once the network has stabilized.

6.3.2 Packet Snooping

Given that the wireless network is a broadcast medium, a lot of information can

be extracted by snooping packets on the channel. Link estimation is one example. At the

routing level, since each node is a router, snooping on forwarding packets allows a node

to learn about all its children, which is useful to prevent cycle formation. Furthermore,

snooping on a neighboring node’s messages is a quick way to learn about its parent, which

decreases the chance of stale information causing a direct two-hop cycle. The same technique

can also be used to prune children quickly in the case of a network partition. When a node

with an unreachable route receives a forwarding message from its child, it will NACK

by forwarding the child’s message with a ’NO ROUTE’ address. All neighboring nodes,

including its children, that snoop on this packet can quickly learn about an unreachable

route. In fact, this provides a natural feedback deep down into the tree that the routing

path has become invalid.

Packet snooping requires support from the underlying data link layer. In particu-

lar, the data link layer should not produce any link-level acknowledgments for packets not

destined to it while allowing the flexibility for a higher level to snoop on different kinds

of packets. Our sensornet platform allows such a flexibility of packet snooping; however,

some other platforms, such as the 802.15.4 [3], may need to disable automatic link-level

acknowledgment at a cost of packet snooping.


6.3.3 Counting-To-Infinity Problem

The classic counting-to-infinity problem occurs when a network partition causes

the routing distances to increase slowly and requires many messages to detect. A simple

solution is for nodes that are connected to the tree root to set a flag periodically. The flag

is propagated over route messages and is stored in the neighbor table. Other nodes that

wish to join the network must select parents with the flag set. Since nodes that are not

connected to the tree root cannot set their own flags, this implies that the selected routing

path must be connected to the root. Every node will expire the flag after a period, and

the process will repeat. If the flag of the current parent is not set after it has expired for

some time, it is assumed that the path is unreachable due to network partition. If no other

potential parents have their flags set, the node becomes disjoint from the tree; when all

nodes become disjoint from the tree, the tree is pruned automatically. This mechanism,

which is reflected on line 19 in Figure 6.2, solves the counting-to-infinity problem efficiently

and also works for multiple tree roots.

6.3.4 Cycles

For many-to-one routing over relatively stationary sensor networks, we use simple

mechanisms to mostly avoid loop formation and to break cycles when they are detected,

rather than to employ heavy weight protocols with inter-nodal coordination. DSDV [58]

provides an attractive approach to avoid cycles for mobile networks, but it requires sequence

number propagation and sequence number settling time tuning, which may differ in each

deployment.


We rely on techniques similar to poison-reverse or split-horizon [34]. By moni-

toring forwarding traffic and snooping on the parent address in each neighbor’s messages,

neighboring child nodes can be identified and will not be considered as potential parents.

We only need to maintain this information for nodes in the neighbor table. Route invalida-

tion when a node becomes disjoint from a tree and tree pruning by ’NACKing’ children’s

traffic are used to prune stale routing information, which leads to cycles.

With these simple mechanisms, cycles may potentially occur and must be detected.

Since each node is a router and a data source, cycles can be detected quickly when a node

in a loop originates a packet and sees it returning. That is, one of the nodes in a cycle can

detect it. This mechanism works as long as the queue management policy avoids letting

forwarding traffic starve originated traffic. (Otherwise, packets may get stuck in a loop in

the middle of a route without detection.) This level of fairness is an appropriate policy in

any case. Once a cycle is detected, discarding the parent by choosing a new one or becoming

disjoint from the tree will break it. Alternatively, a Time-To-Live field can be added, but

we did not use one in our evaluation.

6.3.5 Duplicate Packet Elimination

Duplicate packets can be created upon retransmission when the ACK is lost. With-

out duplicate packet elimination, they will be forwarded, creating a multiplicative effect and

wasting more bandwidth and energy. To avoid duplicate packets from link retransmissions,

the routing layer uses a different sequence number from the link sequence number and stores

it in the neighbor table to detect retransmitted packets as shown in Figure 6.4. When a


duplicate packet is received from a child, the same sequence number would match the one

stored in the neighbor table, and the corresponding packet is dropped. This approach relies

on in-order packet delivery during retransmission and assumes that the neighbor table is

able to track children.

6.3.6 Queue Management

Nodes high in the tree forward many more messages than they originate. Care must

be taken to ensure that forwarding messages does not entirely dominate the transmission

queue, since it would prevent the node from originating data and undermine cycle detection.

We separate the forwarding and originating messages into two queues so that upstream

bandwidth is allocated according to a sharing policy that attempts to bias against upstream

forwarding traffic. The policy that we implemented is very simple. With the assumption

that originating data rate is low compared to that of forwarding messages, we give priority

to originating traffic over traffic from distant nodes. For data collection it is possible to

estimate the ratio of forwarding to originating packets by counting the descendents of each

parent, but a general treatment of fair queuing is beyond the focus of this study.

6.3.7 Relationship to Link Estimation

The usual approach to routing assumes links are either good or bad. Therefore,

link failure detection is employed and is based on a fixed number of consecutive transmission

failures. Our approach is to define connectivity relative to link estimation and incorporate

it with the routing cost function. Thus, the stability and agility of link estimation can

directly affect the stability of the routes and the rate of route adaptation. We will explore

6.4. COST METRICS FOR CONNECTIVITY-BASED ROUTING 125

such a stability issue in Chapter 7.

6.4 Cost Metrics for Connectivity-Based Routing

In this section, we propose different routing cost functions for the distance-vector

based tree building process described in Figure 6.2. They include shortest path, shortest

path with threshold, path reliability, and minimum transmission. They all instantiate the

EvalLinkCost and EvalCost functions in Figure 6.2.

The traditional cost function for distance-vector routing is shortest path using

hop-count. In power-rich wired networks with highly reliable links, retransmissions are

infrequent and hop-count adequately captures the underlying cost of packet delivery to

the destination. Hop-count is also well defined in a wired network. For shortest path using

hop-count, the EvalLinkCost and the EvalCost functions would be implemented as follows.

Input: Neighbor Table Entry e.Output: Routing Cost in Hop-Count.EvalLinkCost ShortestPath(e)(1) return 1

Input: PathCost in Hop-Count, LinkCost in Hop-Count.Output: Routing Cost in Hop-Count.EvalCost ShortestPath(PathCost, LinkCost)(1) return PathCost + LinkCost

However, with lossy links, as found in many sensor networks, hop-count based

on physical connectivity is not an appropriate cost function. As explained in Chapter 2,

shortest path with hop-count tends to select links at the edge of the connectivity cell,

because these links usually yield paths with minimal hop-count in reaching the destination.


If link estimation is not used, these links on the cell edge are likely to be unreliable for data

delivery. We explore the actual routing performance of shortest path cost function in the

next chapter.

Nevertheless, shortest path routing can still be useful in unreliable networks, given

that we define it relative to some probabilistic connectivity. A simple technique is to apply

shortest path routing only to links that have estimated link quality above a predetermined

threshold. For this shortest path with link quality threshold, the EvalLinkCost and the

EvalCost functions would be implemented as follows.

Input: Neighbor Table Entry e.Output: Routing Cost in Hop-Count.EvalLinkCost ShortestPathLinkThreshold(e)(1) if esendEst > Threshold and erecvEst > Threshold(2) return 1(3) else(4) return ∞

Input: PathCost in Hop-Count, LinkCost in Hop-Count.Output: Routing Cost in Hop-Count.EvalCost ShortestPathLinkThreshold(PathCost, LinkCost)(1) return PathCost + LinkCost

As discussed in Chapter 2, this has an effect of increasing the depth of the network

since links above the threshold are likely not to be close to the edge of the connectivity cell.

The assumption of this routing cost function is that each link on the logical connectivity

graph should have link quality greater than the threshold in either or both directions; this

assumption may break down in actual deployments. We investigate these issues in the next

chapter.

A different way to incorporate link quality into a routing cost function is path


reliability, which is a product of link qualities along the entire path in the forward direction.

The EvalLinkCost and the EvalCost functions for path reliability would be implemented

as follows.

Input: Neighbor Table Entry e.Output: Routing Cost in Path Reliability.EvalLinkCost PathReliability(e)(1) if esendEst > 0 and erecvEst > 0(2) return esendEst

(3) else(4) return ∞

Input: PathCost in log(Path Reliability), LinkCost in Path Reliability.Output: Routing Cost in log(Path Reliability).EvalCost PathReliability(PathCost, LinkCost)(1) return PathCost + log(LinkCost)

Such a metric would yield a path with the most likelihood of success in reaching

the base station without considering any link retransmissions. The logarithm turns multi-

plications into additions. It is used in [80] to optimize the end-to-end success rate to the

base station. While this cost metric does not require any threshold tuning, it has a ten-

dency to exploit short reliable links, which can yield routing paths with many short hops.

Furthermore, this routing cost function assumes no link retransmissions, which is essential

to cope with the exponential packet drop in multihop routing. Thus, we do not study this

routing cost function.

An alternative way to utilize link quality information is to use the expected number

of transmissions along the whole path as the cost metric for routing. That is, the best path

is the one that minimizes the total number of transmissions (including retransmissions)

in delivering a packet over potentially multiple hops to the destination. We call this the


Minimum Transmission (MT) metric, which is also proposed in [21]. This metric combines

both hop-count and link retransmissions into consideration during route selection. That is,

a link retransmission is similar to increasing the hop-count by one. With links of varying

quality, a longer path with fewer retransmissions may be better than a shorter path with

many retransmissions. In considering the expected number of transmissions of a link, it

is important to determine link quality for both directions since losing an acknowledgment

would also trigger a useless retransmission. The EvalLinkCost and the EvalCost functions

for MT would be implemented as follows.

Input: Neighbor Table Entry e.Output: Routing Cost in Expected Number of Transmissions.EvalLinkCost MT(e)(1) if esendEst > 0 and erecvEst > 0(2) return 1

esendEstx 1

erecvEst

(3) else(4) return ∞

Input: PathCost in Expected Number of Transmissions, LinkCost in Ex-pected Number of Transmissions.Output: Routing Cost in Expected Number of Transmissions.EvalCost MT(PathCost, LinkCost)(1) return PathCost + LinkCost

Note that MT also eliminates the need for predetermined link quality thresholds.

However, the stability of MT routing is potentially an issue, since it utilizes link estimations

in a non-linear fashion. Thus, for MT a noise margin should be used in parent selection to

enhance stability.


6.5 Related Work

Many ad hoc routing protocols exist in the computing literature. In general, they

can be classified into two categories: table-driven and source-initiated on-demand routing.

The distance-vector based routing protocols, such as Bellman-Ford [45], DSDV [58] and

our protocol in Figure 6.2, fall in the table-driven category. Another kind of table-driven

protocol is link-state routing, such as OSPF [42] in the Internet. Link-state protocols are

not attractive in ad hoc networks because they require significant overhead in maintaining

an up-to-date global knowledge of the entire routing topology on each node.

The more recent ad hoc routing protocols in the literature of mobile computing

take the on-demand approach. The on-demand approach suits mobile computing traffic

well, as it comprises many independent pair-wise data flows in the network and thus routes

are established and maintained to support only the actual data flows, which reduces the

amount of state and protocol overhead.

For sensor network applications, such a model of many independent pair-wise

traffic flows is not common. Instead, each node would originate data that needs to be

forwarded to the data sinks. Thus, source-initiated on-demand routing does not fit well

with the sensor network traffic model. In contrast, sink-initiated on-demand routing, where

an interested sink node would initiate a route discovery to establish reverse-path routes, is

the norm in sensor networks. Since these two kinds of on-demand routing share a common

set of underlying issues that affect routing performances, understanding source-initiated

on-demand routing is also important for sensor network research.


6.5.1 Table-Driven Routing

We discuss, in more detail, some of the key table-driven routing protocols in the

literature. The amount of prior work in this space is large, but there exists relevant work,

such as [68], that provides a survey of the overall space if the reader is interested in probing

further.

The Destination-Sequenced Distance-Vector Routing protocol (DSDV) presented

in [58] is a table-driven routing protocol, which improves upon the distributed Bellman-Ford

routing algorithm [45] by providing loop-free topologies and solving the counting-to-infinity

problem. Every node maintains a routing table that records the routing distance in hop-

count to all other nodes in the network. In addition, each destination has a sequence number

and sends it along with each route message. The destination increments the number to

provide a temporal order of its route “freshness”. Routes to the destination are constructed

using the latest sequence number, and any stale routes would entail a smaller sequence

number. Thus, a routing path is chosen based on freshness before considering the shortest

hop path; if multiple routing paths to the same destination have the same freshness, the

selection will be based on shortest hop-count. By ensuring that no stale information is used

and the route always descends downhill along the hop-count gradient, no cycles will result.

Instability can occur because the rate of route information propagation over different paths

may vary, and a node may switch its route simply because a fresh new route is found. To

avoid this problem, each node keeps track of the settling time of its best route and does not

change route until the settling time has expired.

In contrast to the approach we take in defining connectivity relative to link esti-


mation, DSDV assumes connectivity to be bimodal, either good or bad, and relies on link

failure detection to avoid routing over failed links. A link is declared failed if a fixed number

of consecutive packet transmissions occurs without being acknowledged by the receiver. In

wireless networks, where connectivity is lossy, such a mechanism would result in many false

positives of link failures, which lead to network instability. In the next chapter, we provide

the details in comparing the routing performance of DSDV to our approach.

The Expected Transmission cost function, discussed in [24], is the same as our

MT cost function, since both take connectivity as a probabilistic metric defined by link

estimation and have the routing layer exploit this information. The study was performed

over 802.11 wireless networks using laptop size computers over a static deployment. Since

the memory resource is not a concern on such platforms, they did not investigate the issue of

neighborhood management under memory constraints. They modify the DSDV protocol to

use the expected transmission cost function. Another study in [26] explores the same routing

cost function under mobility and concludes that such a cost function performs best when

the network is static. These studies provide empirical evidence in other wireless networks

that the MT cost function can yield better performance than the typical hop-count based

cost functions.

There are other kinds of protocols that exemplify the flexibility of defining different

kinds of routing cost functions over the same distance-vector based routing framework.

There exist routing cost functions that optimize the network for network lifetime or energy

consumption [19, 69, 72]. While these protocols demonstrate great reduction in energy

consumption, they also assume links are boolean in general and neglect the lossy wireless


characteristics.

In the packet-radio literature, there are table-driven ad hoc routing protocols that

attempt to enhance the reliability of communication by modifying the cost function to

route over less congested or interfered paths, such as Least-interference routing [73], Least-

resistance routing [62], and Maximum-minimum residual capacity routing [14]. These pro-

tocols do not directly address the fundamental issue of lossy connectivity. Instead, they

attempt to form topologies to load balance the network. Unfortunately, we are unable to

find empirical evaluations of these protocols on real packet radio nodes.

6.5.2 Source-Initiated On-Demand Routing

In this section, we discuss some of the key source-initiated on-demand routing

protocols in the mobile computing literature. Many of the issues of this kind of protocol

are similar to the sink-initiated on-demand routing for sensor networks, such as Directed

Diffusion [40].

For on-demand routing, the discovery process generally begins with the source

node flooding the entire network to discover its destination node. The destination node or

intermediate nodes with a route to the destination reply to the source using the reverse path

of the flood. This path will be the routing path for the source to communicate with the

destination. Because the reverse path is used for routing, these kinds of protocols assume

links are symmetric.

The Ad Hoc On-Demand Distance Vector (AODV) protocol presented in [59] is

an on-demand routing protocol that improves upon the DSDV protocol. During route


discovery, each node records the first packet that it receives, discards subsequent redundant

packets, and rebroadcasts the route request packet. The destination replies upon the first

path that reaches it in the flood. The reply message flows along the reverse path and sets up

the route table for each node in the path. Like DSDV, AODV utilizes destination sequence

numbers to ensure all routes are loop-free and contain the most recent route information.

The Dynamic Source Routing (DSR) protocol discussed in [44] is an on-demand

routing protocol based on the concept of source routing. Each node maintains route caches

containing the source routes that it hears. Route discovery relies on the same flooding

process except that each route request contains the evolving source routing path; a node

adds its address into the path before rebroadcasting the request. If a node has a route in its

cache that can reach the destination, it will reply to the source without rebroadcasting the

route request. If not, the end destination will reply to the source using the shortest path it

hears. Because DSR uses source routing, nodes overhearing the traffic can learn new routes

or improve old routes in the cache. Although the route cache is used to suppress some of

the rebroadcasts, there are still a lot redundant rebroadcasts across the network.

The Temporally Ordered Routing Algorithm (TORA) [57] is another protocol that

is built upon link reversal. During route discovery, a flood is used to establish a directed

acyclic graph (DAG) rooted at the source, with the the destination node being the sink of

the DAG. A “height” metric is used to set up a gradient in such a DAG. Reversing the

links in the DAG is the primary mechanism to deal with node mobility and link failures.

Therefore, nodes need to maintain accurate routing information about all adjacent (one-

hop) nodes and link symmetry is always assumed. The “height” metric is composed of many


parameters, including the logical time of a link failure, a propagation ordering parameter, a

node ID, and other protocol specific information. Since timing is important in determining

the “height”, TORA assumes that all nodes have synchronized clocks.

A different kind of on-demand routing is proposed in GRAd [67]. Like other on-

demand routing protocols, if node A needs to talk to node B, it first floods the network

and B replies using the reverse path. All the intermediate nodes will record the routing

cost to node A in hop-counts. Unlike other on-demand protocols, GRAd exploits the local

broadcast nature of the wireless medium; rather than sending unicast messages to the next

hop for multihop forwarding, messages are sent as local broadcasts carrying the source

address, the end destination address, a sequence number, and the remaining routing cost

to the destination. Neighboring nodes that receive the message and have a lower routing

cost than the remaining routing cost forward the message. That is, route selection becomes

receiver based. Since many nodes may qualify to relay the message, especially in a dense

network, this may seed a local broadcast storm. GRAd relies on the MAC layer’s back-

off to arbitrate the order of accessing the channel during the broadcast storm, which only

provides a limited spreading in time among the different rebroadcasting nodes. The sequence

number and the source address create a unique identifier for each message and GRAd uses it

to suppress redundant forwarding by removing them from the MAC layer’s queue and limits

the extent of the broadcast scope. This approach to routing is resilient to mobility with

high reliability; however, the potential dissemination overhead to support such reliability

can be high. We found no empirical data in the literature to measure the extent of this

overhead.


For all of these on-demand routing protocols, reverse path routing based on route

discovery flooding is a fundamental mechanism for these protocols to perform well as they

assume links are either good or bad and symmetric. As we have discussed throughout

this thesis, these assumptions are not true in reality, as shown by our empirical findings

and other related work. This implies that special attention to link characteristics must

be made before relying on reverse-path routing. Another problem is timing control of the

rebroadcast to avoid potential broadcast storm problems, which may result in nodes left

out from the flood or selecting inefficient routing paths. Prior work in [29] has shown

that blindly selecting the first node as parent in the route discovery flood can yield very

unreliable routing paths and the broadcast storm problem can have interesting effects on the

resulting topology. These shortcomings may be acceptable for mobile computing since the

protocols are designed to cope with mobility and an inefficient routing path is better than

disconnection. For sensor networks that are relatively static, optimizing for efficient and

reliable routing paths is an important way to cope with the tight limitation of bandwidth

and energy. Thus, a careful tree-building process using sink-based on-demand routing must

address both the link quality and broadcast-storm issues. One sample design of this careful

tree-building process that addresses both of these issues is discussed in [71].

A more recent work building upon GRAd is GRAdient Broadcast [81]. It is a

sink-initiated on-demand routing protocol that builds a gradient using the RF transmission

energy as the cost metric. Like GRAd, each message has a credit or remaining cost, and

the receiver that has a smaller cost would forward the packet, with a random delay before

each forwarding to avoid potential collisions. However, since energy rather than hop-count


is used as the cost metric, the GRAdient Broadcast has a higher granularity to scope the

potential number of receivers. Furthermore, the nonlinear decrease of the credit in the

packet allows the protocol to further limit the scope of the travel paths towards the sink.

As with GRAd, no experimental data is found on the performance of the protocol.

6.5.3 Summary

We have discussed a range of important approaches to ad hoc routing that exist

in the literature. None of the work takes the holistic approach that we advocate in defining

connectivity as a probabilistic concept and carrying it from link estimation to neighbor

management and routing cost function. However, it is possible to extend these protocols to

build reliable topologies in sensor networks as long as they run upon a concretely discovered

logical connectivity graph, and use cost functions other than shortest hop in order to exploit

the link quality information over such a connectivity graph. Protocols that require a flood

discovery process must be carefully done to avoid potential broadcast-storm problems, which

would yield ill-formed trees. The resulting tradeoff in relying on such logical connectivity

graphs is a decrease in responsiveness to mobility, as the logical connectivity graph governs

the rate of adaptation. Nonetheless, since sensor networks are relatively static, this tradeoff

is acceptable for many applications.

The many-to-few routing characteristics reduce the amount of state required at

the routing layer to O(destinations) since the network only needs to know how to route to a

few destinations. However, the majority of the state, if our holistic approach is taken, would

be used for managing local link quality statistics and neighborhood information, which is


governed by the size of the neighbor table. This favors protocols that require a routing table

anyway, such as DSDV, AODV, and TORA. However, it is an extra overhead for protocols

such as DSR, which only maintains a cache of recent routes. Our protocol demonstrates one

of the simplest ways to achieve tree-based routing. With a slight overhead in maintaining

destination information, our work can improve upon DSDV or AODV. The communication

overhead in sending periodic route messages or flooding the entire network periodically as

in AODV can be configured to be the same.

Receiver-based protocols such as GRAd would require an O(cell density) state

maintenance in the worst case, since a node must maintain information about each trans-

mitted packet from all its potential neighbors within a time window. For example, a lucky

long link with a potential neighbor would generate a retransmission. Maintaining a neigh-

bor table with a logical connectivity graph is an effective way to bound the state required

and limit the scope of dissemination. However, it would reduce the resilience against mo-

bility. This kind of protocol relies on primitive MAC layer support to deal with broadcast

storm issues. Techniques for careful tree building should be applied here to avoid potential

collisions that may hinder dissemination.

Like DSDV and AODV, our protocol utilizes only one routing path. Other pro-

tocols such as DSR and TORA can use multiple paths to enhance the reliability of data

delivery. Receiver-based routing protocols may yield higher reliability since many redun-

dant routing paths may potentially be used. There certainly exists a tradeoff in the overall

degree of reliability and routing efficiency between the various approaches, but we do not

study their effects in this thesis.

6.6. SUMMARY 138

6.6 Summary

Many factors have influenced the design of the distributed routing process that we

present in this chapter. Since the common sensor networking applications are data-collection

oriented, designing an efficient and reliable routing layer that supports a spanning forest

topology, with each sink node maintaining its own tree, becomes an important challenge.

Such a challenge is complicated by the lossy wireless connectivity and limited resources

found over our sensor network platform. When we explore the rich set of available protocols

in the literature, we find that most of them either assume the connectivity graph is given

or are easy to discover since links are bimodal (good or bad) and symmetric in general.

While these assumptions may be true on those platforms, they do not hold in the sensor

networking regime as we have shown in Chapter 3. This motivates us to define connectivity

relative to link estimation and brings forward such a probabilistic view to the routing layer.

It opens up new cost functions for routing and clarifies the fundamental concept of a hop,

which is relative to how the cost function views connectivity and how competitive a node

is with respect to the neighborhood selection criteria.

With these understandings, we look back on some of the protocols in the literature

and try to evolve them with our probabilistic view on connectivity. We decide to extend

the tree-based routing protocol based on the classic distributed Bellman-Ford algorithm,

where the routing cost function needs not be hop-count as long as it increases monotonically.

The end result is a general distance-vector based routing framework that incorporates two

other important local routing processes, link estimation and neighborhood management,

while leaving the routing cost function open. The key remaining question is to identify an

6.6. SUMMARY 139

appropriate cost function that runs upon the discovered connectivity graph, with each edge

characterized by link estimations, to yield reliable and stable topologies. We have identified

interesting cost functions in this chapter. In the next chapter, we will implement our routing

framework and the different cost functions in order to perform extensive evaluation of both

simulations and empirical experiments.

140

Chapter 7

Evaluation

In this chapter, we turn our design framework, presented in the previous chap-

ter, into real implementations in order to evaluate and understand the performance of the

different cost functions when they are integrated with the process of link estimation and

neighborhood management. Our evaluation methodology has three levels, spanning from

high-level large-scale simulations, to empirical experiments in real deployment settings.

Each level of the evaluation process allows us to narrow the scope towards a smaller set of

workable solutions. We set up the relevant metrics of evaluation and present our customized

simulation framework for running graph and packet-level simulations. Since simulations can

only approximate reality to a certain degree of fidelity, we evaluate our design with reason-

able size networks and even deliberately drive the network into congestion to observe the

effects on the routing protocol. This evaluation process allows us to arrive with a work-

ing solution for tree-based multihop routing that can support the common data collection

applications found in sensor networks. In the process of interpreting the data, we gain


an understanding of how performance can be affected by some of the subtle interactions

among the three local routing subprocesses. Understanding these issues are important in

understanding some of the root causes that hinder performance metrics, such as end-to-end

success rate and topology stability.

7.1 Evaluation Methodology

Having established the framework for concrete implementations of a variety of

routing protocols and the underlying building blocks in Chapter 6, this section seeks to

compare and evaluate a suite of distance-vector routing protocols in the context of data

collection over a large field of networked sensors. We proceed through three levels of eval-

uation. The ideal behavior of these protocols, with perfect link estimation and no traffic

effects, is assessed on large (400 node) networks using a simple analysis of network graphs

with link qualities obtained from our probabilistic link characterization. The dynamics of

the estimations and the protocols is then captured in abstract terms using a packet-level

simulator. A wide range of protocols is investigated on 100-node networks under simulation.

This narrows the set of choices and sheds light on key factors. The best protocols are then

tested in greater detail on real networks on the scale of 50 nodes.

7.1.1 Candidate Routing Protocols

The set of routing protocols under evaluation includes broadcast, Destination Se-

quenced Distance Vector (DSDV), shortest path, shortest path with threshold, and mini-

mum transmissions. We discuss the details of these protocols in the context of our evalua-


tion.

Broadcast is a simple protocol that builds a network routing topology by peri-

odically flooding the entire network from the root. For each iteration, a sequence number

is added incrementally to prune the tree built last time. The parent selection mechanism

is simple. Upon reception of the first broadcast message, each node would select the source

address of the message as its parent for routing. This mechanism builds upon no link

estimation and requires no neighborhood management. This form of routing essentially

captures the route discovery phase in many of the on-demand reverse-path based routing

protocols found in the mobile computing literature, such as DSR [44] and AODV [59]. The

difference is that instead of the source initiating the route discovery flood, the sink node,

being the root of the tree, originates the flood and all the source will take the reverse path

to send data to the sink node.

In Chapter 6, we discuss how such a simple form of tree-building can lead to

unreliable routing trees. Nonetheless, since it is a fundamental mechanism used by many

on-demand routing protocols, we decide to evaluate its performance also. However, it is

important to point out that better routing trees can be built if the tree building process

is done carefully. In particular, a mechanism is required to control the timing of the re-

broadcast so that the broadcast storm problem can be avoided. Such a mechanism can be

implemented above the MAC layer to dampen the rate of rebroadcast and thus eliminate

the storming effect. Furthermore, instead of relying solely on the first broadcast packet to

build the tree and then discarding the subsequent packets, it is possible to compare these

subsequent packets to the first one and select the best among these choices. The best one


could be a combination of a set of criteria, such as hop count, received signal strength, or

link quality if such information is available. In this thesis, we do not evaluate the perfor-

mance of the routing tree built with these improvements. Nonetheless, these techniques

have been shown to build reasonable routing trees in one of the sensor network applications

for intruder detection [71].

Shortest Path (SP and SP(t)) are the conventional distance-vector routing

cost functions that we have discussed in Chapter 6, following the framework in 6.2. In SP

a node is a neighbor if a packet is ever received from it. For SP(t) a node is a neighbor

if its link quality exceeds a tunable threshold t as shown in Chapter 6. Thus, shortest

path routing is performed within a sub-graph of high quality links. Based on Figure 3.1

in Chapter 3, we consider two values for t. With t = 70%, we consider only links in the

effective region, while leaving a significant noise margin for the estimators. With t = 40%,

we allow most of the good links in the transitional region, resulting in larger, less regular

cells. For the implementation of the link estimators on the real platform, unsigned bytes

are used to represent link quality from 0 to 100%.

Destination Sequenced Distance Vector (DSDV) We customize the general

DSDV protocol into our framework shown in Figure 6.2 and preserve the essence of the

protocol; a parent is chosen based on the ’freshest’ sequence number from the root while

maintaining a minimum hop count when possible. That is, only nodes that do not have

failed links and have the latest sequence number from the tree root in the neighbor table

would be considered in the “for” loop on line 11 in Figure 6.2. Similar to SP, DSDV ignores

link quality and considers all nodes it hears as neighbors.


The original DSDV protocol suggests a damping mechanism through the use of a

settling timer to avoid route flapping due to different propagation delays on route messages.

We instead use the periodic parent selection mechanism as route damping in this case. That

is, a parent will be changed only when the period is up or a node becomes disjoint from the

network due to link failure for example.

To detect link failure, a fixed number of consecutive packet losses to the next hop

is used, as in the original DSDV protocol. When link failure is detected, a node is dis-

joint from the network and declares the route unreachable to its neighbor through periodic

route messages. Our DSDV exploits packet snooping for early detection of unreachable

routes. Since each node would still send its data traffic using the broadcast address when

its route becomes unreachable, snooping on a parent’s traffic allows our DSDV to detect an

unreachable route without waiting for any route messages.

Minimum Transmission (MT) uses the expected number of transmissions as

its cost metric. In the actual implementation on the sensor nodes, the routing cost com-

putations are done using unsigned 32-bit integers. Each link estimation is represented as

an unsigned byte to avoid floating point calculations, with 255 representing 100% reliabil-

ity. The routing cost in EvalLinkCost are computed using these estimates, rounded to

the nearest integer, and scaled by 256 to avoid maintaining floating point numbers. The

SwitchThreshold on line 26 in Figure 6.2 is set to a default value of 0.75 of a transmission.

7.1.2 Evaluation Metrics

We define four important metrics for evaluating the performance of these protocols.


Hop Distribution measures routing depth of nodes throughout the network,

which reflects both end-to-end latency and energy usage.

Path Reliability and End-to-End Success Rate are two ways to estimate the

end-to-end reliability to the root of the tree of all the nodes in the network. Path reliability

approximates the end-to-end reliability of a routing path in the absence of retransmission.

It is calculated like the path reliability routing cost function discussed in Chapter 6. By

taking the product of link quality, in the forwarding direction, along the path from each

node in the network, we can infer the probability of reaching the sink node without any

link retransmissions. We only use this metric for network graph analysis.

For simulations and empirical experiments, we can directly measure the end-to-

end success rate, which is the number of packets received at the sink for a node divided by

the number originated. A maximum number of link retransmissions is performed at each

hop. Losing packets before reaching the sink not only wastes energy and network resources,

but also degrades the quality of the application. Another subtle issue is fairness. Nodes far

away from the sink are likely to have a lower end-to-end success rate than nodes that are

close. The breakdown of the success rate by hop or distance should show this behavior.

Stability measures the total number of route changes in the network for each

route update cycle since the parent selection mechanism, as shown in Figure 6.2, is run at

the same rate as the route updates. We use such a metric to evaluate the stability of the

routing topology.

7.2. NETWORK GRAPH ANALYSIS 146

0 2 4 6 8 10 120

20

40

60

80

100

120

140

160

Hop Count

Num

ber

of N

odes

MTSP (70%)SP (50%)SP

Figure 7.1: Hop distribution from graph analysis of a 400 node network with 8 feet grid size.

7.2 Network Graph Analysis

The first method of evaluation is to explore the different routing cost functions

using high-level graph analysis. That is, given a static connectivity graph with fixed proba-

bilistic link qualities of all edges derived from inter-node distance, we compute optimal trees

using the distributed Bellman-Ford algorithm [45] for each routing cost function, including

SP, SP(70%), SP(50%) and MT. Without packet level dynamics, only hop distribution and

path reliability are meaningful in this case. Nonetheless, this high-level analysis enables us

to explore large scale networks; it also establishes optimistic bounds on routing costs.

We analyze a network of 400 nodes, organized as a 20x20 grid with 8 foot spacing.

The sink node is placed at the corner to maximize network depth. Connectivity information

is derived from the data shown in Figure 3.1. Figure 7.1 shows the expected hop-count

7.2. NETWORK GRAPH ANALYSIS 147

0 50 100 150 200 2500.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Distance(Feet)

Pat

h R

elia

bilit

y


(a)

0 50 100 150 200 2500

0.2

0.4

0.6

0.8

1

Distance(Feet)

Pat

h R

elia

bilit

y


(b)

Figure 7.2: Path reliability to tree root from graph analysis of a 400 node network with 8 feet

grid size.

7.3. EFFECT OF NEIGHBORHOOD MANAGEMENT USING ROUTINGCOST 148

distribution for the four different cost metrics. SP builds a very shallow network, while

the rest yield deeper networks with wider hop distribution. With most nodes being 2 and

3 hops away from the root in a network of 160-foot extent, many of the links must cover

40 to 50 feet. This suggests they are at the border of the transitional and clear region of

Figure 3.1 and have very low quality.

Figure 7.2 shows the corresponding expected path reliability. Indeed, reliability

for SP drops below 5% for nodes of distance greater than 50 feet. Protocols that utilize

link quality estimates yield much higher path reliability by taking more, higher quality

hops. For SP(70%), the lowest expected path reliability for two and three hop paths are

((0.7)2 = 49% and (0.7)3 = 34.3%). SP(50%) takes advantage of links in the transitional

region for fewer, longer hops, but reliability is hindered as a result. MT takes reliability into

account and performs best without the need to set a threshold. This higher path reliability

comes with the tradeoff of a slightly higher hop count for MT.

7.3 Effect of Neighborhood Management using Routing Cost

Recall, from Section 5.6, that neighbor selection can go beyond the basic link

quality inference through frequency estimation. In particular, it is possible for the routing

layer to influence the neighbor selection process, such that better neighbors can be used

for routing purposes. In this section, we show one instance of this approach by extending

the FREQUENCY algorithm discussed in Chapter 5 to incorporate routing information for

neighbor selection.

From the routing layer perspective, one approach is to influence the neighbor


selection to avoid maintaining sibling nodes that are unlikely to be used for routing. By

sibling nodes, we mean neighboring nodes that have almost the same routing cost to the

destination. For example, if we consider hop count, all neighboring nodes having the same

hop count are unlikely to become either a parent or a child. Therefore, it is better to avoid

maintaining these neighbors and save the precious table entries for neighbors that may be

potential parents or children.

To explore the effect of this neighborhood selection based on routing cost differ-

ence, we simulate the FREQUENCY neighbor table management process, using the same

simulation method as described in Chapter 5, a 80x80 grid network with each node trans-

mitting 100 packets. In addition, we use the routing tree built using the MT cost function

from the network graph analysis in the previous section to determine the routing cost of

each node in the network. The modification to the FREQUENCY algorithm is in the inser-

tion and eviction process, which is shown in Figure 7.3. That is, the neighbor to be inserted

must have an absolute routing cost of at least CostDiff from the node’s routing cost, where

CostDiff is a tunable parameter as shown in line 1 and 3 in Figure 7.3. Similarly, priority

of eviction is given to nodes whose absolute routing cost difference is less than CostDiff .

To evaluate the effectiveness of this approach, we first examine what kinds of

neighbors are maintained by the FREQUENCY algorithm without such a routing cost

influence, which should guide us to finding an appropriate value of CostDiff for MT. We

choose a center node in a 400-node grid network built using the network graph analysis.

Figure 7.4 shows the dynamics of the neighbor table of such a node running only the

FREQUENCY algorithm with a table size of only 40 entries. It shows that for most of the


Input: Node n to be inserted. Node n’s routing cost, c. Neighbor table T .Output: Success or Fail.Insert(n, c, T )(1) if |c−PathCost| < CostDiff(2) return FAIL(3) if ∃ an entry e in T where ecounter = 0 or |ePathCost - PathCost| <

CostDiff(4) Use e to store n in table T(5) return SUCCESS(6) else(7) foreach entry e in T(8) ecounter = ecounter − 1(9) return FAIL

Input: Node n and neighbor table T .Output: Success or Fail.Reinforce(n)(1) if n is in T ’s entry e(2) ecounter = ecounter + 1(3) return SUCCESS(4) else(5) return FAIL

Figure 7.3: Insertion and reinforcement in Frequency algorithm using routing cost difference.

neighbors that spend a relatively long time in the neighbor table, many of their routing

cost differences with respect to the receiving node are close to 0, as indicated in the top-

center portion of the graph. The circle with a cross in its center indicates the routing cost

difference between this particular node and its parent. As expected, the cost difference is

around 1 transmission.

Figure 7.4 shows that it is inefficient to maintain neighbors that have a very small

routing cost difference when the table is full as there are many potential parents or children

that have a MT routing cost difference between 1 and 2 transmissions. This shortcoming can


−3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1FREQUENCY w/o Cost Threshold (Table Size=40)

Routing Cost Difference

Per

cent

of T

ime

Spe

nt in

Nei

ghbo

r T

able

Figure 7.4: Percentage of time spent in the neighbor table of the different neighbors vs. their

difference in routing cost relative to the receiving node running the FREQUENCY algorithm.

The cross indicates that node is chosen as the parent.

be minimized if we augment the FREQUENCY algorithm with the routing cost influence

and set CostDiff to be around a quarter of a transmission, which should eliminate many

of the neighbors around the center in Figure 7.4. Figure 7.5 shows the result with such an

augmentation implemented. It shows that for most of the nodes that spend a long time

in the neighbor table, only a few nodes have a routing cost difference close to 0. Instead,

a majority of the neighbors in the table have about a cost difference of 1. Furthermore,

it is also interesting to observe that neighboring nodes that have a cost difference of 2 are

frequent enough to be maintained by the table. This is because these nodes become more

competitive as the number of candidate nodes trying to compete for the neighbor table

7.4. PACKET LEVEL SIMULATIONS 152

−3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1FREQUENCY with Cost Threshold=0.25 (Table Size=40)

Routing Cost Difference

Per

cent

of T

ime

Spe

nt in

Tab

le

Figure 7.5: Percentage of time spent in the neighbor table of the different neighbors and their

difference in routing cost relative to the receiving node running the FREQUENCY algorithm

with routing cost filtering. The cross indicates that node is chosen as the parent.

resources decreases with such a routing cost filter. This is advantageous for routing since

these new nodes have a smaller routing cost and can become potential parents.

These results demonstrate how such a simple technique can effectively achieve

the original goal of maintaining non-sibling neighboring nodes. The simulation results that

follow will explore how such a technique would affect overall routing performance in general.

7.4 Packet Level Simulations

We turn to packet level simulations to understand the dynamic behavior of the

routing protocols and their interactions with link estimation and neighbor table manage-


ment. We build a custom discrete time, event-driven network simulator in MATLAB for

all of our packet-level simulations. While results from our network graph analysis allow us

to not consider the shortest path routing cost function, the packet-level simulations allow

us to further explore other cost functions, as well as other protocols, such as Broadcast and

DSDV.

7.4.1 A Packet-Level Simulator

We have developed a packet-level simulator using MATLAB. It is a discrete time,

event-driven simulator that utilizes the MATLAB graphical environment to provide a vi-

sualization of the evolution of the network topology. The architecture of the simulator is

modular such that different radio models, interference models, media access control pro-

tocols, routing protocols, and applications can be mixed and matched in each simulation.

The simulator also allows the user to configure the network with different kinds of node

placements, such as a grid or random layout. Figure 7.6 is a screen shot of the simulator,

showing a sample routing tree topology where the root of the tree is located at the lower

left corner.

We implemented the network stack found in TinyOS 1.0 for this simulator. The

low-level details of link acknowledgment and media access are also captured by the simulator,

while the radio connectivity model is based on Figure 3.1. To capture the effect of collisions,

we performed an empirical study using a similar set up to that in Section 3.1.1. The Mica

nodes were laid out as a line topology 3 inches above the ground over an open tennis court.

Instead of scheduling the network to have only one transmitter at a time, we use a RF


Figure 7.6: Screen shot of the packet-level simulator.

broadcast to synchronize two transmitters to send packets within a bit time and disable the

media access control layer. For nodes that are not scheduled to transmit, they record all

the received packet traces. We vary transmit power and the transmit schedule.

The resulting traces suggest interesting observations in understanding the collision

behavior among three nodes at different distances in our line topology: a sender, a receiver,

and a collider that also transmits. To distinguish the sender from the collider, we consider

the sender as the node physically closer to the receiver. In general, with the same transmit


power used on the sender and the collider, noticeable interference is observed only if a

collider is within the transitional region of the receiver. That is, the receiver can still

receive some packets from the sender if the sender is in the effective region of the receiver

while the collider may be at the same or greater distance from the receiver. If the sender

is in the transitional region of the receiver, a fraction of packets from both the collider and

the sender are received. Almost no reception is possible if the collider is within the effective

region of the receiver.

Based on the above empirical observations, instead of pursuing a correct statisti-

cal model to simulate interferences and collisions, we just approximate the essence of the

observed behavior using a simple probabilistic model in simulation. Such a model builds

upon the probabilistic reception link qualities among all nodes that we obtain through our

radio connectivity model. Assume pi,j is the probability of successful reception for node i to

receive j’s message. Let node b be the receiver and node a be the sender. The probability

for b to receive a’s message given there are k colliders equals pb,a ∗∏

i∈k 1−pb,i. This model

captures the effect that, if the colliders are in the effective region, the probability of the

receiver’s receiving the sender’s packet in the presence of strong interference is small. The

probabilistic interference behavior in the transitional region is also provided by this model.

7.4.2 Simulation Results on Routing

Using the simulator described in this chapter, we analyze the different candidate

routing protocols over a 100-node network, placed as a 10x10 grid with 8 foot spacing, with

the sink node located at the lower left corner of the grid. Again, 8 foot spacing is chosen


because the grid spacing, even diagonally, is close and within the edge of the effective region

as shown in Figure 3.1. The simulation time for each experiment is 2000 seconds. Each

node offers a load of periodic traffic at 10s/data packet and 20s/route packet. With such

a traffic load, the network is fairly congested, especially when there is a maximum of 2

retransmissions per link. The route packet generation rate is higher than what would be

used in practice. For example, in the Great Duck Island application, the route packet

generation rate is a packet every 2 minutes. We increase the rate for the convenience

of reducing simulation time. With a simulation time of 2000 seconds, 20s/route packet

corresponds to 100 rounds of parent selection cycle.

We simulated all the protocols, except SP, since graph analysis in the previous

section has shown its poor performance, confirming our experience in practice. For protocols

that utilize link estimations, WMEWMA is used with the stable settings as described in

Chapter 4. For MT, we additionally consider the effect of using the FREQUENCY algorithm

to manage a neighbor table of only twenty entries, with the addition of the routing cost

difference selection. We call this case MTTM. All other protocols use a table large enough

to hold all neighbors.

Figure 7.7 shows the resulting hop distributions. These agree with graph analysis

fairly well even though this network has half the physical extent in each dimension of that

used in graph analysis. In both evaluation approaches, MT and the two SP(t) cost functions

all yield a network that is about 10 hops deep, with most nodes having a hop count of 6.

Since link quality information is fixed rather than estimated in graph analysis, link quality

of the long links are stable for the routing protocols to exploit. Furthermore, the presence


of network traffic in simulations would eliminate some of these long links, yielding longer

routing paths. Figure 7.8 shows a CDF of the physical distances traveled by all the links in

the network found in both graph analysis and simulation using the MT cost function. As

expected, most of the routing links in packet level simulation cover shorter distances than

those in graph analysis. For example, up to 60% of the links are below 12 feet in packet

simulation compared with 20% in graph analysis.

In Figure 7.7, SP(40%), Broadcast, and DSDV all have tight distributions, but

wider than SP in graph analysis. SP(70%) and MT yield wider spreads in hop distribution

and generally take more hops. For DSDV, about 15% of the nodes have no routes or infinite

hops at the end of the simulation; these nodes have become disjoint from the network as

a result of link failures or unreachable routes. Without link quality information, long,

unreliable links are likely to be selected for routing and these are likely to experience link

failures, causing nodes and their children to become disjoint from the network.

We observe the average actual path reliability obtained by accumulating the link

qualities of each packet that moves through the network in Figure 7.9. The top graph

includes the protocols that utilize only high quality links in route formation. These yield

relatively high path reliability even at 100 feet (or 6 to 9 routing hops). The differences

between MT and SP(70%) are much smaller than those under graph analysis. This is

because link estimation has at least ±10% error as compared to the perfect information

available in graph analysis, and thus, SP(70%) has fewer opportunities to greedily exploit

less reliable links close to 70%. As a result, actual path reliability of SP(70%) is slightly

better as compared to graph analysis.


0 2 4 6 8 100

5

10

15

20

25

30

35

40

Hop Count

MTMTTMSP (70%)SP (40%)DSDVBroadcast

Infinite

Num

ber

of N

odes

Figure 7.7: Hop distribution from simulations.

Note that MTTM shows only a slight drop in path reliability relative to MT in

Figure 7.9 while the hop distribution is shallower than all other link estimation based cost

functions. This demonstrates the effectiveness of influencing the neighbor selection at the

neighborhood management layer using routing cost information. Because sibling nodes are

excluded from the neighbor table when it is full, neighbors with less reliability but covering

farther physical distance are maintained and yield networks with shorter hop-count without

sacrificing much on path reliability.

In the bottom graph in Figure 7.9, although SP(40%) exploits link estimates in

determining the next hop, a higher tolerance of lossy links yields poor performance similar

to DSDV and Broadcast. Protocols having similar hop distributions yield similar path

reliability over distance; a higher majority hop count yields higher path reliability over

distance.


8 10 12 14 16 18 20 22 24 260

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x (Feet)

Em

piric

al C

DF

of H

op D

ista

nces

Empirical CDF of the Hop Distances using MT

Graph AnalysisPacket Level Simulation

Figure 7.8: Cumulative distributive function of the distances of all the links in the network

using MT over graph analysis and packet level simulation.

Figure 7.10 shows the stability over time of the routing structures due to stochas-

tic variations in packet loss and the associated estimation error. Broadcast and DSDV are

highly unstable. Broadcast is unstable because its parent selection mechanism is oppor-

tunistic; it depends on whether the parent can be heard during a route discovery flood and

that can be different for each flood. DSDV suffers because poor links trigger link failure

detection, which causes nodes to join and disjoin from a tree. The other protocols yield

stable routing trees. MTTM is the most stable one as indicated in the graph. For all other

protocols, the size of a neighbor table is unlimited and can maintain all the neighbors a

node can hear. For MTTM, the number of potential parents is limited by the table size.

As a result, the number of alternative parents in the neighbor table is reduced, while still


0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Distance (Feet)

Ave

rage

Act

ual P

ath

Rel

iabi

lity

MTMT w/Table ManagementSP(70%)

(a)

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Distance (Feet)

Ave

rage

Act

ual P

ath

Rel

iabi

lity

SP(40%)DSDVBroadcast

(b)

Figure 7.9: Path reliability over distance from simulations.


0 500 1000 1500 20000

20

40

60

80

100

120

140

160

Time (seconds)

Par

ent c

hang

es in

1 r

oute

upd

ate

(20

secs

)


Figure 7.10: Stability from simulations.

presenting some good parents. Furthermore, the dynamic neighbor management process

acts as a low-pass filter and dampens the parent selection process. This result suggests that

constrained resources actually improve selection stability.

Figure 7.11 shows that, given a maximum of two link layer retransmissions, the

end-to-end success rate is close to 90% for protocols that utilize high quality links. SP(40%)

suffers non-negligible packet loss. DSDV suffers from nodes joining and disjoining from the

network, while Broadcast performs very poorly even with retransmissions.

In all of the simulation runs, no cycles occur. Furthermore, MTTM yields no

significant difference in overall performance; it maintains an adequate number of good

choices for route formation to succeed.

Packet level simulations allow us to explore the protocol dynamics and investi-

gate protocol design issues that go beyond the capability of graph analysis. However, our

7.5. EMPIRICAL EXPERIMENTS 162

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Distance (Feet)

Ave

rage

Suc

cess

Rat

e


Figure 7.11: End-to-End success rate over distance from simulations.

interference and collision model is not adequate to capture reality. Therefore, we rely on

empirical studies to further validate our investigation in the next section.

7.5 Empirical Experiments

The previous high-level evaluation processes are good approaches to explore issues

at scale and to identify early designs that yield poor performances. However, the models

of the wireless communication used by these simulations are primitive, and details of the

hardware and software systems are often missing. As a result, many issues that are not

present in high-level simulations will appear as surprises during real deployment and the

performance can be very different.

Our original goal is to overcome the real world noisy wireless characteristics for

building reliable networks. To fully evaluate our designs, we evaluate our systems em-


pirically. Our graph analysis and simulation results allow us to further narrow down our

evaluation space and focus on SP(40%), SP(70%), and MT in realistic settings. We imple-

ment these protocols and the WMEWMA link estimator on the TinyOS platform.

7.5.1 Experiments over an Indoor 5x10 Grid Network (Mica)

Our first realistic testbed was a 50-node mica network placed as a 5x10 grid with

8 foot spacing in the the foyer of the Hearst Mining building on the UC Berkeley campus,

as shown in Figure 7.12. The nodes were placed on cups 3 inches above the ground, since

ground reflection can significantly hinder the range of these radios. The sink node was

placed in the middle of the short edge of the 5x10 grid to avoid the potential interference

from the metal building supports at the corners of the grid. It was attached to a laptop

computer over a serial port interface for data collection. A typical run lasted about three

hours and was performed at night when pedestrian traffic was low.

We found that to set the radio transmission power levels appropriately and to

understand the behavior of the protocols, we had to repeat the connectivity vs. distance

study of Figure 3.1 in this indoor setting. We deployed a 10-node line topology network

diagonally across the foyer with 8 foot inter-node spacing. To have several hops while

preserving good neighbor connectivity, we wanted to find the lowest power setting so that

the effective region would cover the grid spacing. Figure 7.13 shows the reliability scatter

plot for a low transmit power setting. The fall off is more complex, presumably due to

various multipath effects, even though the space is quite open. At 8 feet, most of the links

are above 90%. It is apparent that a significant number of reliable, long links exists, with


Figure 7.12: Deployment on the foyer in the Hearst Mining building.

a few of them covering more than half of the network extent.

We performed the data collection experiments with the above transmission power

setting for SP(70%), SP(40%), and MT. The maximum number of link retransmissions was

two. The link estimator setting for WMEWMA was (t = 30, α = 0.5). We used a neighbor

table size of 30 in all our 50-node experiments. The traffic load was 30s/data packet and

60s/route packet per node, which offered a 2.5 packets/s average load, which was 30% of

the available multihop bandwidth. This setting was smaller than the simulation study due

to lower effective bandwidth on real nodes and all the nodes had a randomized start time to

avoid bursty traffic. We also explored the effect of tripling the data rate and route update

rate on MT, without any rate control, to deliberately drive the network into congestion. To

expedite the warm up phase of the estimator, the route update rate was 10s/route packet

for the first 10 minutes.


0 20 40 60 800

0.2

0.4

0.6

0.8

1

Distance (Feet)

Rec

eptio

n P

roba

bilit

y

Figure 7.13: Indoor reception probability of all links of a network in a line topology at low

transmit power setting (70) in the foyer.

Figure 7.14 shows the hop distribution for SP(40%) and MT. SP(70%) is not

shown in all the figures in this section because it failed to construct a viable routing tree

in all cases, which is different from what our simulations have predicted. We will explain

why this occurred later in the section. From our simulation results, we expected SP(40%)

would yield a topology with fewer hops and narrower distribution than MT. However, the

empirical results show that the distributions for SP(40%) and MT are quite similar and

both surprisingly shallow, given that the transmission strength was set to just cover the

grid spacing. Also MT is the shallower of the two, unlike in the simulation. To see why

this occurs, a contour plot of average hop-count over the grid is shown in Figure 7.15.

This contour plot represents an aggregation of an evolving routing tree over the run of the

experiment. The sink node is located at (1,3). Three nodes in column 9 are at one hop,


0 1 2 3 40

5

10

15

20

25

30

35

Average Hop Count

Num

ber

of N

odes

MTSP (40%)MT Congested

Figure 7.14: Hop distribution for the indoor 50-node deployment.

even though nodes in column 6 are at three hops. These are long, stable links with good

connectivity. Similarly, nodes in column 4 are at 1 hop, while nodes in column 3 are at

2 hops. The nodes in column 9 are usually at the first level of the tree and the nodes at

(3,6) and (3,7) are generally deep in the tree, but their parents may be neighbors in any

directions.

For the congested case, we see a reduction in the reliability of links in the upstream

direction, causing more nodes to take more hops. The curve for MT Congested shifts to

the right.

To see why SP(70%) fails to form a routing tree, even though Figure 7.13 suggests

that the average link quality to neighboring nodes should be around 90%, we had each

node include link estimates to and from the parents under MT in its data packets. These


1 2 3 4 5 6 7 8 9 101

2

3

4

5

Grid X Coordinate

Grid

Y C

oord

inat

e

0.5

11

1.5

1.5

1.5

1.5

1.5

1.5

1.5

1.51.5

1.5

1.51.5

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2.5

2.5

2.5

2.5

2.5

3

Bas

esta

tion

Figure 7.15: Average Hop over Distance Contour Plot for MT at power 70 for the indoor

50-node deployment.

average estimations are shown in Figure 7.16. (This information is not available for SP(70%)

because few nodes could deliver information.) Similar data is also observed for SP(40%).

The estimates vary right around 70%, for all nodes whose next hop is not the sink node.

(For nodes connected to the sink, the upward link is much less reliable than the downward

link.) When a node needs to route other traffic, the average link quality decreases. The

threshold in SP(70%) is no longer sufficient to maintain a connected subgraph. We will

return to this issue and understand how it effects network stability in the next section.

SP(40%) experiences a similar thresholding problem at high data rates that cause

the network to become congested. We observed that under congestion, network partition

occurs due to the link quality dropping below the threshold. As a result, the tree built by

SP(40%) fails to sustain itself, which triggers nodes to disjoin from the tree and eventually


10 20 30 40 500

20

40

60

80

100

Node ID

Link

Qua

lity

to N

ext H

op (

%)

Send to Next HopReceive from Next Hop

Figure 7.16: Non-sink node next hop link quality for MT in the foyer.

a mechanism that avoids the counting-to-infinity problem prunes down the tree. Packets

are not forwarded, reducing the contention; SP(40%) rebuilds the network as congestion

goes away and this cycle repeats. MT avoids these problems by picking the best available

paths, without an arbitrary threshold. Notice also that by tracking the link quality, routing

protocols can successfully avoid routing over asymmetrical links. The rest of the study only

focuses on SP(40%), MT, and MT Congested.

Figure 7.17 shows the end-to-end success rate versus distance of MT and SP(40%).

MT delivers roughly 80% of the originated data consistently throughout the sensor field.

This indicates that the underlying components of the protocol, including link estimation,

parent management and queue management are working together effectively. SP(40%) has

a lower success rate, but it is still much more robust than the simulations would suggest.

Even though this protocol considers links that are estimated at 40%, it appears that many


0 10 20 30 40 50 60 700

20

40

60

80

100

Distance (Feet)

Suc

cess

Rat

e (%

)

MTSP (40%)MT Congested

Figure 7.17: End-to-end success rate over distance in the foyer.

of the links it chooses are in fact of much higher quality.

To further test the robustness of MT, we examine its behavior under a high enough

load to cause substantial congestion in the network. At 3 times the data origination and

route update rate, the aggregated bandwidth is about 7.5 packets/s, which utilizes about

90% of the multihop channel bandwidth. Although the success rate drops to roughly 50%,

the network is delivering 1.4 times the absolute data rate to the sink. Even under congestion,

the success rate is only slightly impacted by distance and network depth.

The average number of link retransmissions per packet delivered to the base station

is about 1 along the entire path for MT, SP(40%), and MT Congested. With the average

next hop link quality of 70%, we would expect a higher data success rate. To probe this

issue further we extracted the link quality for nodes sending to the base station throughout

the run. The quality in sending to the base station is only 50% and drops to 40% for the


congested case. This can be seen in Figure 7.16; nodes 16 and 25 are depth 1 most of the

time and exhibit a low parent link quality. We have previously observed on this platform

that traffic over the serial port reduces quality of RF reception at the base station, which

is an artifact of the implementation of the platform.

To see the effect on nodes deeper in the network, we examined a three-hop node

and looked at the difference between its estimated number of transmissions to reach the base

station and the number of transmissions that actually occurred for packets that arrive at the

base station. The data is shown in Figure 7.18. With the estimated number of transmissions

being six, one retransmission is expected on average on each hop of a three-hop path.

However, the packets that reach the base station only experience one retransmission along

the entire path. This suggests that two retransmissions per hop are ineffective in moving

the packet along the path to cope with the exponential drop rate. With a maximum of

three retransmissions per hop, the end-to-end success rate is greater than 90%.

Figure 7.18 also shows that the estimated cost of the path, which is composed

of routing cost using the MT cost function with link estimations over three hops, is very

stable. The fluctuations are ±1, which suggests they are changes in hop count. This data

supports that our WMEWMA estimator performs well on real networks.

Figure 7.19 shows the stability of a routing tree with MT and MT Congested.

For MT after an early formation stage, the network is fairly stable. However, we see a

substantial change about every 1,000 seconds. The stability of SP(40%), not shown, is

similar to MT.

MT Congested exhibits much greater instability. It does operate at three times


0 2000 4000 6000 8000 100000

2

4

6

8

10

12

Time (seconds)

Num

ber

of T

rans

mis

sion

s

ActualEstimated

Figure 7.18: Actual and expected routing cost as computed using the MT cost function.

the data rate and route update rate, so the time-scale is effectively compressed. We return

to this issue by analyzing more detailed results from another testbed.

In all of our experiments, no cycles were detected, suggesting that simple cycle

avoidance mechanisms are sufficient for relatively immobile networks. The duplicate packet

elimination mechanism is effective, since in all of the experiments, no duplicate packets are

received at the base station. The policy on multiplexing between originating traffic and

forwarding traffic appears to be a smaller factor in these experiments, as the forwarding

queues in all the nodes are almost empty. Furthermore, even at this low power setting,

the number of potential neighbors is quite large. For example, the base station ended up

recording twenty six neighbors, which is half the network. This reinforces the need for

neighborhood management.


0 500 1000 1500 2000 2500 3000 35000

20

40

60

Num

ber

of P

aren

t C

hang

es in

1 R

oute

Upd

ate

(60

secs

)

0 500 1000 1500 2000 2500 3000 35000

20

40

60

80

Time (seconds)

MT congested

MT

Figure 7.19: Stability of the entire network in the foyer.

7.5.2 Results over a 30-node Irregular Indoor Mica Network

We repeated a similar set of experiments with 30 nodes scattered around an indoor

office space of 10,000 ft2. We did not perform any a priori analysis of connectivity and

distance relationship in this environment. We simply placed nodes on handy spots and

set the transmit power to maximum. Although this is a smaller scale network, the office

testbed provides a back channel that allows us to periodically archive information within

each node.

In this setting, SP(70%) also failed to form a routing tree. Figure 7.20 shows the

end-to-end success rate of the algorithms. MT can achieve a 90% success rate over six

hops, with an average of 1 retransmission. SP(40%) performed the same, with an average

of 1.3 retransmissions. The actual results are also not too different from the protocols.

MT Congested has a sharp drop in its end-to-end success rate and has almost 0% from a


0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

Hop Count

Ave

rage

Suc

cess

Rat

e

MTSP(40%)MT Congested

Figure 7.20: End-to-end success rate versus hop in an office environment.

6 hop node. As with the 50-node experiment, MT Congested also shows a high degree of

instability as shown in Figure 7.21. We will return to this issue in the next section.

7.5.3 Results over an Irregular Indoor Mica2 Network

All of the previous empirical results have been obtained over the Mica platform

with the RFM radio. As discussed in Chapter 2, the newer generation Mica mote, Mica2,

uses a different radio, which is a Chipcon 1000 radio. To verify that the same routing

protocol also performs well on this new platform, we deployed a 14-node network using

mica2 nodes over the same indoor office environment as in the previous 30-Mica-node study.

Figure 7.22 shows the end-to-end success rate of this 14-node network using a

maximum of two link retransmissions. The results indicate that the end-to-end success rate

is close to 100% for most of the nodes, except for one which only has a 90% end-to-end

7.6. NETWORK INSTABILITY UNDER CONGESTED TRAFFIC 174

0 1000 2000 3000 4000 50000

5

10

15

Num

ber

of P

aren

t Cha

nges

in 5

0 se

cs

0 1000 2000 3000 4000 50000

10

20

Time (seconds)

MT

MT Congested

Figure 7.21: Stability for MT in an office environment.

reliability. Furthermore, like all the experiments on the Mica platform, no duplicate packets

have been received at the base station. These results suggest that migrating the protocol

to the new Mica2 platform should yield performances comparable to or better than those

of the Mica platform.

7.6 Network Instability under Congested Traffic

As shown from our previous results, a congested traffic load can induce network in-

stability, especially when routing decisions are tightly coupled with the dynamically evolving

logical connectivity graph. For example, we have strong evidence that link estimations over

the same pair of nodes behave differently under different channel utilization, as indicated in

Figure 7.23. Under congested traffic, the derived connectivity graph characterizes changes


0 2 4 6 8 10 12 140

10

20

30

40

50

60

70

80

90

100End−to−End Success Rate

Node ID

Per

cent

(%

)

Figure 7.22: End-to-end success rate of MT on Mica2 deployed in an office environment.

of physical connectivity for the routing layer, which may choose to react by changing the

network topology. Changes on the network topology affect traffic flow and interference,

which in turn, is reflected at the derived connectivity layer, making the two layers as one

closed-loop system.

To explore this network-wide instability issue under congested traffic, we observed

changes in the routing topology and the underlying derived and logical connectivity graph

separately, even though they mutually affect each other. The way we measured network

stability was to capture the routing topology changes over time. For changes on the de-

rived and logical connectivity graph, a different approach is required. Actual changes on

the physical connectivity graph are captured by both link estimation and neighborhood

management. We focused on the logical connectivity graph as it is used by routing; it is


also a subgraph of the derived connectivity graph. To capture the estimated changes over

time on the logical connectivity graph in an aggregated sense, we relied on the testbed to

collect traces from each node to reconstruct the logical connectivity graph and observed

its evolution over time. One approach to visualizing such an evolution is to divide the

traces into non-overlapping time-windows and sum up all the link estimation changes in

the network within each time window. However, links that are not consistently maintained

by the neighbor table are not considered in the summation process since they would not

be considered for route selection in the first place. This simple technique captures both

the absolute degree of change and the frequency of change without using a different unit of

measurement.

With the new indoor mote-testbed available at UC Berkeley’s Soda Hall, we ex-

plore the internal states on each node by using the wired communication channel to archive

traces for debugging and to reconstruct the logical connectivity graph. The testbed consists

of Mica2Dot nodes, which are equivalent to Mica2 nodes from our experiment perspective,

since they use the same radio and processor.

Figure 7.24 shows the overall network stability over the first 166 minutes of a 14-

hour long experiment with 21 Mica2 nodes. We deliberately created correlated and bursty

traffic, with each node originating data at 4s/packet and sending route update messages at

20s/packet. This corresponds to a load of 6.3 packets/s on the network, which utilizes about

42% of the multihop channel bandwidth on the Mica2. The result validates that the same

network instability issue is also found on the Mica2 platform. Over the entire experiment,

we calculate that 3.02 parent changes (14.38% of the network) occur on average per each


0 2000 4000 6000 80000

50

100

0 2000 4000 6000 80000

50

100

Time (seconds)

Rec

eive

Est

imat

ion

(%)

Uncongested

Congested

Figure 7.23: Link estimation of a node to its neighbor over time in an office environment.

periodic route update.

Figure 7.25 shows the corresponding changes on the logical connectivity graph

using a time window of 1600 seconds or 80 route updates. On average, every node has bi-

directional link estimations for about 15 stable neighbors. Therefore, the connectivity graph

has about 600 directional links for 20 nodes, excluding the tree root. The graph shows that

in the beginning, all the links begin their estimations, and thus, the changes are close to

100%. Over time, the connectivity graph evolves a lot, with total link estimation changes

averaging about 450/time-window over the network or 75%/time-window for each link,

ignoring the first time-window. This implies that the logical connectivity graph becomes

unstable under congested traffic and affects stability at the routing layer. Furthermore,

while stability is an important metric, observing how the end-to-end success rate varies as

we mitigate stability under congested traffic is as important. Figure 7.26 shows the end-to-


0 50 100 150 200 250 300 350 400 450 5000

2

4

6

8

10

12

14

16

18

20

Number of Periodic Route Updates (1 Update = 20sec)

Num

ber

of P

aren

t Cha

nges

in th

e N

etw

ork

Stability of a 21−node network under Congested Traffic

Figure 7.24: 21-node network stability under congested load. (Original)

end success rate under this congested traffic. The success rate is less than the case without

the congested traffic. Due to the robustness issue in the serial port communication on the

tree root, we use a separate sniffer node to collect the data destined to it. Figure 7.24,

Figure 7.25 and Figure 7.26 form the basis of comparison for the rest of our investigation

on topology instability. We use ‘Original’ to refer to these cases in the rest of the section.

The above analyzes the network as a whole; we next focus on a small part of the

network. We arbitrarily select one node in the network and investigate its route changes

over the entire course of the 21-node experiment. Figure 7.27(a) shows the percent of time

spent on the different parents chosen by this node. The pie chart shows that it spends most

of the time among five different parents. However, it does not show how frequent parent

switching occurs since a node may spend a long time with each parent. This is reflected in


0 5 10 15 20 25 300

100

200

300

400

500

600

700

Number of 1600−sec Time Windows

Sum

of A

ll Li

nk E

stim

atio

n C

hang

es o

ver

each

Tim

e W

indo

w

Sum of Link Estimation Changes on the Connectivity Graph Over Time

Original

Figure 7.25: Network-wide link estimation changes on the logical connectivity graph over

time. (Original)

Figure 7.27(b), which shows percent breakdowns of all the parent changes among these five

parents. It shows that the node is switching among these five parents fairly evenly. The

total number of parent switches is 479 times, while the total number of parent selections per

run is 2548. That is, on average, the node changes its parent every 5.3 runs of the parent

selection algorithm.

From the routing layer perspective, these parent changes are the results of the

routing layer reacting to the variations of the routing costs derived from the logical con-

nectivity graph. Figure 7.28 shows the in-bound and out-bound link estimation values of

four potential parents. The data show that the link quality of each of the parents fluctuates

heavily over time and induces the routing cost to drop below the parent switching thresh-


2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Node ID

Per

cent

(%

)

End−to−End Success Rate

Original

Figure 7.26: 21-node network end-to-end success rate under congested load.

old; an alternative parent is selected as a result. This behavior repeats and results in route

flapping as shown in Figure 7.27. It is interesting to observe that the in-bound link quality

with respect to the old parent fluctuates much more than the out-bound estimation even

though the effective congestion level in the network is the same. In fact, our data in general

show that the in-bound link quality fluctuates much more than the out-bound reception

even though the same estimator is used. Such a fluctuation is much more than the expected

error by the link estimator design. Similar results are also observed for other nodes in the

experiment.

We probe further into this observation and find that there is an overflow error in

the implementation of the link estimator, which causes the in-bound link estimation of a

parent node to drop significantly when it forwards many packets under a high traffic load.


Node 14 25.39%

Node 15 16.52%

Node 16 9.34%

Node 17 26.49%

Node 18 17.46%

Others 4.8%

Distribution of Parent Choice Over Time

(a)

Node 14 19.83%

Node 15 19.21%

Node 16 19.21%

Node 17 21.5%

Node 18 15.87%

Others 4.38%

Route Flapping Frequency with Respect to Each Parent

(b)

Figure 7.27: Route instability of a node: distribution of time spent on different parents (a)

and the parent distribution of all the route switches of the node (b).


0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Link Estimation of Node 14

Number of Data Dump Cycles (1 Cycle = 80s)

OutboundInbound

(a) Estimation by node 14.

0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


Number of Data Dump Cycles (1 Cycle = 80sec)

OutboundInbound

(b) Estimation by node 15.

0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



OutboundInbound

(c) Estimation by node 16.

0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



OutboundInbound

(d) Estimation by node 17.

Figure 7.28: Variations of link quality estimations of the different parents selected by a node

over an experiment with congested traffic.


When overflow error occurs in the link estimation computation, the parent’s link estimation

drops significantly, the children automatically switching over to an alternative parent. The

in-bound link estimation of the new parent will experience the same overflow issue while

the estimation of the old parent will rebound to the correct level, and the cycle repeats.

This is one reason why route flapping occurs frequently under congested traffic.

With the overflow problem fixed, the large fluctuations in link estimations in Figure

7.28 are eliminated as shown in Figure 7.29. Using the same traffic load, Figure 7.29 shows

that link estimation variations of the different parents have become less chaotic.

Since route flapping is partially caused by routing cost changes due to large fluc-

tuations of link estimations, we expect, with the overflow error fixed, the routing cost

fluctuations to be less and the network topology to be more stable. Figure 7.30 shows the

stability of the network for the first 166 minutes of the experiment using the same traf-

fic load and set up as in Figure 7.24. The graph shows a slight improvement of network

stability, from 3.02 parent changes per parent selection run to 2.49, a 17.5% improvement.

To gain a better understanding of why nodes switched parents, it is important to

collect information about the routing cost difference between the old and new parent since

the main reason of such changes is that a new parent with a lower routing cost is found. The

instability found in our data indicates that the switching threshold of 0.75 of a transmission

is small. This leads us to explore the distribution of such routing cost differences across the

whole network. Figure 7.31 shows the empirical cumulative distributive functions derived

from the collected routing cost differences during route changes over the entire network,

with data from before and after fixing the overflow error. Comparing Figure 7.31(a) with


0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


Number of Debug Dump Cycles (1 Cycle = 80sec)

Link

Qua

lity

OutboundInbound

(a) Estimation by node 15.

0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



Link

Qua

lity

OutboundInbound

(b) Estimation by node 16.

0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



Link

Qua

lity

OutboundInbound

(c) Estimation by node 17.

0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



Link

Qua

lity

OutboundInbound

(d) Estimation by node 18.

Figure 7.29: Variations of link quality estimations of the different parents selected by a node

over an experiment, with congested traffic and the overflow error fixed.


0 50 100 150 200 250 300 350 400 450 5000

2

4

6

8

10

12

14

16

18

20Stability of a 21−node network under Congested Traffic


Num

ber

of P

aren

t Cha

nges

in th

e N

etw

ork

Figure 7.30: 21-node network stability under congested load with overflow error fixed.

Figure 7.31(b), we learn that correcting the overflow error helps to shorten the long tail of

large cost differences as shown in (b); the fraction of the switching cost that is greater than

4 transmissions is reduced from 20% to 5%, which corresponds to the fraction of reduction

in network stability as discussed above. While this reduction is important, it shows there

are other sources of instability that cause the network to change. Although one can use a

larger switching threshold to reduce the route flapping behavior, the result would make the

system retain less reliable parents or routing paths for long periods, which would hurt the

end-to-end success rate of data delivery.

The correction of the overflow error has a much larger effect on the stability of

the logical connectivity graph itself. Figure 7.32 shows that the average amount of link

estimation changes have been reduced from 450/time-window to 208 or 35%/time-window

7.7. TECHNIQUES TO MITIGATE NETWORK INSTABILITY 186

for each link, which is more than a 50% reduction. This is expected since the overflow

error introduces significant fluctuations on link estimations. The end-to-end success rate

as shown in Figure 7.33 is about the same as before. In the next section, we explore other

issues that may induce topology instability and introduce various techniques to address it.

7.7 Techniques to Mitigate Network Instability

In this section, we discuss some of the issues that may potentially affect overall

topology stability. These issues come from the different layers of routing and exemplify how

the different routing subproblems interact and influence each other. Understanding these

intricate interactions allows us to explore techniques to mitigate instability.

7.7.1 Out-bound Estimation Decay Window

Recall from Chapter 4 that the OutBoundDecayWindow parameter is used to

control a time window before decaying a stale out-bound estimation. Since a binary ex-

ponential decay can impose a heavy penalty, if OutBoundDecayWindow is not chosen

appropriately, an out-bound link estimation will be decayed significantly during the con-

gested period, which would act like noise to the estimations and lead to network instability.

Therefore, a more conservative value should be chosen. In particular, it should take into

account the possible losses of route packets. Furthermore, since each route packet can only

convey the out-bond estimations of a subset of its logical neighbors in the neighbor table,

the ratio of the size of this subset (S) and the neighbor table size (|T |) should be used to set

the OutBoundDecayWindow. The size of S is determined by the difference between the


0 2 4 6 8 10 12 14 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Switching Cost Difference

F(x

)

Empirical CDF of the Switching Cost Difference of 21−node network under Congested Traffic

(a) With overflow error in link estimation.

0 2 4 6 8 10 12 14 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


F(x

)

Empirical CDF of the Switching Cost Difference under Congested Traffic

(b) With overflow error fixed.

Figure 7.31: Empirical cumulative distributive functions of the parent switching cost difference

of a 21-node network under congested load, with and without the overflow error.


5 10 15 20 25 300

100

200

300

400

500

600

700Sum of Link Estimation Changes on the Connectivity Graph Over Time


Sum

of A

ll Li

nk E

stim

atio

n C

hang

es o

ver

each

Tim

e W

indo

w

OriginalOverflow Error Fixed


time.

2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Node ID

Per

cent

(%

)


OriginalOverflow Error Fixed



packet payload size and the size of a route message. If a decay penalty should be applied

after n consecutive losses of out-bound estimations, the OutBoundDecayWindow should

be set to be at least n|T |S . Our previous experiments set the OutBoundDecayWindow

arbitrarily rather than following this guideline.

7.7.2 Spreading Route Update Messages

Route update messages also consume bandwidth. Although their fraction of band-

width can potentially be small, it is important to avoid creating correlated, bursty route

update traffic, which leads to congestion and affects topology stability. Furthermore, if

the application and the routing layer can co-operate, the two traffic flows can be sent over

different times to avoid potential congestion. If the application sends periodic messages,

which is the common case in sensor networks, such co-operation can be achieved by phase

shifting the route update traffic from the application traffic.

7.7.3 Estimator Tuning and Confidence Interval

Since variations of link estimations can have a great impact on the logical con-

nectivity graph and the network topology about it, techniques that help stabilize link esti-

mations can help stabilize both layers. Recall from Chapter 4 that the link estimator can

be tuned to increase stability at a cost of agility, which can improve topology instability.

However, this approach can slow down the connectivity adaption rate and does not solve

the inherent variations resulting from congested traffic. A better approach is to apply a con-

fidence interval to filter out noise fluctuations in link estimations, since, if new estimations

fall within the confidence interval, no new information is gained. Results from Chapter 4


show that the confidence interval can vary from 6% to 11% when the stable WMEWMA’s

estimator setting is used, depending on the actual link quality. Instead of using an on-line

approximation of the confidence interval, a simple approach is to assume the most variations

and apply the corresponding confidence interval for different levels of link quality. Taking

this approach, link estimation is only updated if the new estimation exceeds the confidence

interval. This requires a slight increase in memory footprint, since extra memory is allo-

cated to maintain an instantaneous in-bound link estimation for each neighbor in the table.

Once the instantaneous estimation falls outside the confidence interval, the link estimation

on the logical connectivity graph is updated.

7.7.4 Technique Evaluation

To explore the effectiveness of these stabilizing techniques, we modify our protocol

by adding in a confidence interval to filter out all link estimation variations within ±13%,

phase shifting route update messages to avoid contention with application traffic, increasing

OutBoundDecayWindow to accommodate up to 6 consecutive losses of out-bound estima-

tions before exponentially decaying the estimates, and increasing the switching threshold

from 0.75 to 1.5 transmissions, which should reduce close to 40% of parent switching oc-

currences according to Figure 7.31(b). We use the same traffic generation and the same

21-node network to evaluate these techniques.

Figure 7.34 shows the network-wide stability with the additions of all the stabiliz-

ing techniques discussed before. It shows the stability of the network for the entire 21-hour

long experiment. One can see different degrees of stability at different periods. Visually,


0 500 1000 1500 2000 2500 3000 3500 40000

2

4

6

8

10

12

14

16

18

20


Num

ber

of P

aren

t Cha

nges

in th

e N

etw

ork


Figure 7.34: 21-node network stability under congested load with stabilizing techniques.

the network topology is much more stable than previous results in Figure 7.24 and Figure

7.30. Quantitatively, the number of parent changes per parent selection cycle decreases

significantly from 2.49 in Figure 7.30 to 0.528, a 78.8% reduction.

Figure 7.35 shows the new empirical cumulative distributive function on the cost

difference in parent switching from all the nodes in the network. It shows that these

techniques eliminate switching cost differences below 3 transmissions. That is, they reduce

switching costs which are small. However, more than 30% of the switching cost difference

is still greater than 4 transmissions in a 2 to 3 hop network; this suggests there are still

sources of instability that induce large fluctuations in routing cost.

We next explore the logical connectivity graph changes as seen by the routing

layer. Figure 7.36 shows that the underlying connectivity graph has another 50% reduction


0 2 4 6 8 10 12 14 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


F(x

)


Figure 7.35: Empirical cumulative distributive function of the parent switching cost difference

of a 21-node network under congested traffic, with stabilizing techniques including confidence

interval filtering, larger parent switching threshold, phase-shifted route update messages, and

OutBoundDecayWindow tolerating up to 6 consecutive losses.

in overall connectivity changes. Furthermore, stability does not come with a cost of lowering

the end-to-end success rate as shown in Figure 7.37; the resulting end-to-end success rate

is similar to those before applying the stabilizing techniques.

The combination of these techniques has increased stability at both the network

layer and the logical connectivity graph. The result suggests that the design is not to make

the derivation of the logical connectivity graph insensitive to physical connectivity changes

to achieve stability. Instead, the observed improvement is a result of achieving stability at

these two layers that inherently influence each other.


5 10 15 20 25 300

100

200

300

400

500

600

700


Sum

of A

ll Li

nk E

stim

atio

n C

hang

es o

ver

each

Tim

e W

indo

w


OriginalOverflow Error Fixedw/Stabilizing Tech


time.

2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Node ID

Per

cent

(%

)


OriginalOverflow Error Fixedw/Stabilizing Tech



7.7.5 Link Estimation of the Root Node and Stability

Since upstream traffic flows towards the root of the tree, the communication cell

of the tree root would be one of the most congested. Furthermore, since the tree root is

a data sink, its wireless traffic would only consist of the minimum data rate, i.e. route

updates. Indeed, under the congested traffic load, packet loss is potentially high, and such

a minimum data rate, which is often relaxed to be smaller than the required settings of the

link estimator, can yield large fluctuations in link estimations. While this relaxation works

for nodes that have data traffic to make up the difference, the tree roots have no data traffic

over the wireless channel to provide adequate samples for others to estimate its link quality

over the link estimator time window. Figure 7.38 shows how the link estimation of the

tree root, as performed by a node physically close to it, fluctuates significantly under the

same traffic load as before. Note that both in-bound and out-bound link estimations suffer

a similar degree of fluctuation. Since all routing costs in the network are directly affected

by the link estimation of the tree node, such fluctuations would create instability over the

entire network.

Increasing the minimum data rate at the tree root is one way to solve this problem.

This maintains the same level of agility, but hinders the available bandwidth, which is

critical at the tree root since all upstream data flows towards it. An alternative is to

keep the minimum data rate, but fall back on the settings required by the link estimator.

This will hurt agility since more samples are required to derive an estimation. Figure 7.39

shows the improved link estimation of the tree root. The large fluctuations are eliminated,

and the link estimations remain relatively stable and smooth even under highly congested


0 100 200 300 400 500 600 700 800 9000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Link Quality of the Tree Root as Estimated by a Physically Close Node


Link

Qua

lity

OutboundInbound

Figure 7.38: Link quality of the tree root as estimated by a near-by node using the minimum

data rate relaxation under congested load.

traffic. Furthermore, the large difference in link quality between the in-bound and out-

bound estimations in Figure 7.38 is removed.

With this instability in link estimation corrected, we reexamine the overall network

stability as shown in Figure 7.40. The result shows that the network is extremely stable,

with an average of 0.1 parent changes per route update message, a 97% reduction from the

original data in Figure 7.24.

Figure 7.41 shows the new empirical cumulative distributive function on the cost

difference in parent switching from all nodes in the network. Note that the long tail signaling

the large parent switching costs disappears. Furthermore, all of the parent switching cost

differences are within 3 to 4 transmissions, which is within expectation in a 2 to 3 hop


0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Link Quality of the Tree Root as Estimated by a Physically Close Node


Link

Qua

lity

OutboundInbound

Figure 7.39: Link quality of the tree root as estimated by a near-by node under congested

traffic load, with the relaxation in link estimation removed.

network.

The corresponding logical connectivity graph changes are shown in Figure 7.42.

The overall connectivity changes are reduced from an average of about 94/time-window to

32, which is another 66% reduction. However, Figure 7.43 shows that the resulting end-to-

end success rate remains similar to those before. The results above suggest that instability in

the link estimation of the tree root poses a significant influence to instability at the routing

layer. However, the large reduction in the changes of the logical connectivity graph reveals

that enhancing stability at the routing layer has an indirect effect in stabilizing the logical

connectivity graph. We investigate this relationship in the opposite direction by relaxing

the parent switching threshold parameter to its original setting of 0.75 of a transmission and


0 500 1000 1500 2000 25000

2

4

6

8

10

12

14

16

18

20


Num

ber

of P

aren

t Cha

nges

in th

e N

etw

ork


Figure 7.40: 21-node network stability under congested load, with relaxation in link estimation

of the tree root removed.

explore how a more unstable routing topology would affect the logical connectivity graph.

Figure 7.44 shows the resulting network stability graph. The network is very

stable, with a 14-hour average of 0.14 parent changes per route update as to 0.1 in Fig-

ure 7.40. The logical connectivity graph changes are shown in Figure 7.45. They shows

that the derived connectivity captures 56% more changes, with the average connectivity

changes increase from 32/time-window to 50/time-window. That is, relaxing route selec-

tions increases instability in routing topology, which in turn, also increases instability on

the logical connectivity graph. While our results cannot show a causal effect on stability

between the routing topology and the logical connectivity graph, we can certainly observe

positive correlations between the two. We observe no significant changes to the end-to-end

success rate in Figure 7.46. In fact, relaxing the parent switching threshold yields the best


0 2 4 6 8 10 12 14 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


F(x

)


Figure 7.41: Empirical cumulative distributive function of the parent switching cost difference

of a 21-node network under congested load, with relaxation in link estimation of the tree root

removed.

end-to-end success rate; this is expected since the system is less tolerant to unreliable links.

7.7.6 Adaptivity and Stability

Our previous experiments investigate instability from the perspective of the routing

layer. We have seen how stability at the routing layer can impact stability on the logical

connectivity graph. To further investigate the issue of adaptivity and stability, we influence

stability of the actual link connectivities in the network and observe how the the logical

connectivity graph and the routing topology adapt to these changes. Note that the the way

we capture the physical connectivity changes is to observe the approximations as seen by the

logical connectivity graph. We added a new node near one end of the network, away from


5 10 15 20 25 300

100

200

300

400

500

600

700


Sum

of A

ll Li

nk E

stim

atio

n C

hang

es o

ver

each

Tim

e W

indo

w


OriginalOverflow Error Fixedw/Stabilizing Techw/Stabilizing Tech & Tree Root Est. Fixed


time.

2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Node ID

Per

cent

(%

)


OriginalOverflow Error Fixedw/Stabilizing Techw/Stabilizing Tech & Tree Root Est. Fixed



0 500 1000 1500 2000 2500 3000 3500 4000 45000

2

4

6

8

10

12

14

16

18

20


Num

ber

of P

aren

t Cha

nges

in th

e N

etw

ork


Figure 7.44: 21-node network stability under congested load, with the parent switching thresh-

old relaxed to its original setting (0.75 transmission).

the tree root, to generate cyclic interfering traffic every other 10 minutes at 10 packets/s.

The interfering traffic is cyclic because a constant interference would cause the network to

adapt only once.

Figure 7.47 shows the variations in connectivity as compared with the case with-

out our induced interference. The interfering traffic has doubled the amount of link quality

changes across the whole network as compared to the case with the parent switching thresh-

old of 0.75 transmission. The end-to-end success rate is slightly lower than before because

of the interfering traffic.

If we explore the network topology stability in Figure 7.49, we observe that the

rate of parent changes per route update has increased from 0.1 in Figure 7.40 and 0.14 in


5 10 15 20 25 300

100

200

300

400

500

600

700


Sum

of A

ll Li

nk E

stim

atio

n C

hang

es o

ver

each

Tim

e W

indo

w


OriginalOverflow Error Fixedw/Stabilizing Techw/Stabilizing Tech & Tree Root Est. Fixedw/Stabilizing Tech & Tree Root Est. Fixed & 0.75 Switching Thresh.


time.

2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Node ID

Per

cent

(%

)


OriginalOverflow Error Fixedw/Stabilizing Techw/Stabilizing Tech & Tree Root Est. Fixedw/Stabilizing Tech & Tree Root Est. Fixed & 0.75 Switching Thresh.



5 10 15 20 25 300

50

100

150


Sum

of A

ll Li

nk E

stim

atio

n C

hang

es o

ver

each

Tim

e W

indo

w


w/Stabilizing Tech & Tree Root Est. Fixed & 0.75 Switching Thresh.w/Stabilizing Tech & Tree Root Est. Fixed & 0.75 Switching Thresh. & Interference


time.

Figure 7.44 to 0.3. This result confirms that our system is adaptive to physical connectivity

changes.

A different kind of change that we attempt to induce on the logical connectivity

graph is node failure. Figure 7.50 shows an instance where a node with 6 children was

deliberately killed around the 197th route update. The link estimator is set to be updated

every 10 route updates, starting from 0. Thus, the link estimator has 3 route updates

(1 minute) remaining to detect this failure. At the 200th route update instance, all the

six children have successfully detected the failure and switched to alternative parents as

shown in Figure 7.50. The reaction has a second-phase of parent changes that occurred

around the 210th to 240th route updates. This suggests that reacting to node failure has

created a new topology that would influence physical connectivity as reflected in the logical

7.8. SUMMARY 203

2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Node ID

Per

cent

(%

)


w/Stabilizing Tech & Tree Root Est. Fixed & 0.75 Switching Thresh.w/Stabilizing Tech & Tree Root Est. Fixed & 0.75 Switching Thresh. & Interference

Figure 7.48: 21-node network end-to-end success rate under congested load, with a periodic

interfering traffic.

connectivity graph. Therefore, the whole system would require some time to readjust before

settling down to a stable topology again. All in all, these results suggest that our stabilizing

techniques can yield a routing system that creates stable topologies and is still adaptive to

the underlying connectivity changes.

7.8 Summary

Our concrete results from high-level simulations and empirical studies on real

nodes lead us to conclude that our proposed routing framework, using the MT routing cost

function, is well integrated with the logical connectivity graph built up by the underlying

local processes in link estimations and neighborhood management. Together they provide

7.8. SUMMARY 204

0 500 1000 1500 2000 2500 3000 3500 40000

2

4

6

8

10

12

14

16

18

20


Num

ber

of P

aren

t Cha

nges

in th

e N

etw

ork

Figure 7.49: 21-node network stability under congested load, with a periodic interfering traffic.

a self-organizing ability to form reliable and stable routing topologies. Our simulation

results confirm our expectations that routing protocols not taking a probabilistic approach

to connectivity would suffer significant losses. We observe that neighbor management over

constrained resources can yield better topology stability and shorter hop distributions when

neighbor selection interacts with the routing layer. Our simulation results have eliminated

all cost functions that do not take connectivity relative to link estimation. These results

lead us to explore the connectivity-based cost functions on the real nodes. To our surprise,

our empirical studies have revealed various issues that we did not expect and observe in

simulations. In particular, protocols that define connectivity with a link quality cut-off

can suffer from potential network partitions since link quality can vary with traffic and

fluctuate around the threshold, resulting in intermittent connectivity. Instead, MT requires

no such threshold; link cost simply increases as link quality decreases. Therefore, MT

7.8. SUMMARY 205

0 100 200 300 400 500 600 7000

2

4

6

8

10

12

14

16

18


Num

ber

of P

aren

t Cha

nges

in th

e N

etw

ork


A node with6 childrenis disableddeliberately.

Figure 7.50: 21-node network stability under congested load, with one of the node disabled

in the middle of the experiment.

adapts well when link quality fluctuates. Since MT builds upon individual link estimations,

these fluctuations can lead to instability in the network. There are many reasons, such as

congested traffic and noisy environment, that would lead to link quality fluctuations. We

also found that link quality is traffic dependent. This implies that the logical connectivity

graph and the network topology are dependent upon each other, and cannot be separated.

In our investigation in topology stability, we must explore these two layers together. We

introduce a set of techniques to mitigate instability, such as using a confidence interval

filter for link estimations. We have identified a subtle overflow error in the implementation

of the link estimator that leads to instability during congested traffic. The assumption of

relaxing the link estimator requirement by relying on the minimum data rate alone does

7.8. SUMMARY 206

not work well for tree roots. Since all routing costs in the network are derived from the

link estimates of the tree root, poor estimates of these links would lead to instability over

the whole network. We have shown that the process of achieving stability is not a result

of suppressing adaptivity. In addition, the end-to-end success rate in data delivery is not

affected much for a different level of network stability. Experiments have demonstrated

that our routing scheme is stable yet adaptive to connectivity changes and node failure.

Our study shows a strong positive correlation for stability among the derived and logical

connectivity graphs and the network topology (the later two are subgraphs of the former

one). It also allows us to expose the intricate interactions between the two layers, which

reinforces our theme of understanding routing in wireless networks using a holistic approach.

In other words, protocols should not neglect the underlying connectivity and focus only on

the routing layer since the two work as one system.

207

Chapter 8

Concluding Remarks

A theme that has been developing throughout this thesis is the importance of

understanding the underlying issues of a system within the framework of a higher-level

problem that motivates us to begin the study in the first place. Our goal in seeking a

self-organizing multihop routing protocol for sensor networks exemplifies such a theme. By

analyzing the platform constraints and collecting extensive evidence in understanding the

lossy characteristics of the wireless channel, we come up with a new perspective on wire-

less connectivity and change our approach to the problem of routing. We decouple the

distributed routing process into three local subproblems: link quality estimation, neighbor-

hood management, and connectivity-based route selections. These processes interact and

build upon each other to support a multihop routing system for sensor networks. At the

lower level, we define probabilistic connectivity relative to link estimation. Each node must

have a link estimator to characterize the physical connectivity of the nodes that it can hear.

Above it is the neighborhood management process that decides how the node should invest

208

its precious memory resources across the potential neighbor set for maintaining both link

estimation statistics and routing information. The process should identify, regardless of the

cell density, a logical subset of neighbors within the size of the neighbor table that benefits

routing. Together these two processes form a distributed logical connectivity graph with

each edge’s connectivity defined through link estimation. The remaining subproblem is to

identify a routing cost function that builds a network topology above such a weighted logical

connectivity graph. This holistic approach to the problem of routing demonstrates our gen-

eral theme in conducting a study on a real system, which is a key underlying contribution

of this thesis.

The identification of the subproblems of a routing process allows us to tease apart

relevant issues, specialize the study on each individual one, and evaluate them separately

and collectively as a system. Our treatment to network stability exemplifies such a need

in teasing apart the intricate relationships between routing and the underlying topology

management. We have shown that the two are inseparable, since they mutually affect each

other and form a closed-loop system. As a result, one cannot simply separate the two layers

and only focus at the routing layer. Our holistic approach leads to specializations that yield

simple designs, which is a key to success in sensor networks. Whole-system analysis allows

the flexibility of interposition and cross-layer optimizations.

For link estimation, WMEWMA is a simple, memory efficient link estimator that

reacts quickly, yet is stable enough for path characterization in connectivity-based routing.

Both analysis and simulations support that a 95% confident 10%-error link estimation

performed at the packet-level requires roughly one hundred packet samples, in the worst

209

case, to react to link quality changes. This understanding is important because it bounds

how quickly routing can meaningfully react. Mobile networks that change structure more

rapidly than this regime require link quality to be estimated quickly over other mechanisms,

such as the link quality indicator provided by the 802.15.4 radios. A future direction is to

explore the correlation between such an indicator and the actual packet-level link quality

and observe if it can accurately track link quality variations under various conditions.

For neighborhood management, constraints in memory along with our refined def-

inition of connectivity require a new concept in neighbor selection. The limitation of the

neighbor table size causes a node to select a logical set of neighbors suitable for routing from

all nodes with physical connectivity (potential neighbors), which could be very large. The

key challenge in such a neighbor selection process is to select a good set of reliable neighbors

for routing regardless of the number of potential neighbors using a fixed size neighbor table.

Our study shows that the FREQUENCY algorithm performs well in maintaining a subset

of good neighbors in a constrained neighbor table regardless of cell density. The incorpo-

ration of the routing cost in augmenting the FREQUENCY algorithm has demonstrated

an interesting way in which the routing layer can influence the neighbor selection process.

An intelligent process can yield better network stability and hop distribution even under

resource constraints.

With the new concept of connectivity and neighborhood, the concept of a hop

needs to be revisited. The definition of a one-hop neighbor is defined relative to the com-

petitiveness among the other neighbors in terms of the neighbor selection criteria. Only the

most competitive ones are inserted into the table and can become one-hop neighbors on the

210

logical connectivity graph. Thus, routing topology is a subgraph of this connectivity graph.

For the routing process, we used a routing framework derived from the distance-

vector based protocol to build tree topologies suitable for sensor network data-collection

applications. Our study concludes that Minimum Transmissions is an effective routing

cost function. It does not require a predefined link quality threshold and is robust under

varying connectivity characteristics. We deliberately put the network into congestion in our

study and provided effective cross-layer methods to alleviate the resulting instability issues

without sacrificing adaptivity or lowering the end-to-end success rate.

This thesis work can be extended into many future directions. Supporting bi-

directional traffic from the sources to the sink and vice-versa over the same tree topology is

one natural extension. The sink can participate in the routing process and direct traffic back

to the sources. It can learn about the entire network topology and use source routing to

route traffic downstream by embedding the direct routing path in the packet. This provides

the basic few-to-many routing mechanisms on top of the many-to-few routing topologies.

This few-to-many routing support is useful for disseminating queries or commands to a

particular node or region.

The sink node can also act as a rendezvous point to support any-to-any routing

similar to the landmark routing approach presented in [75]. Although the resulting routing

paths may not be efficient, the overhead in route formation is small since the same tree

topologies are reused and the frequency of such any-to-any traffic is expected to be small.

More intelligent mechanisms can be used to cope with such inefficiency. For example, a

centralized approach would exploit the sink node to set up a virtual circuit between the

211

source and the destination for packet delivery, assuming the network is relatively stable. A

distributed approach may take the mobile computing source-initiated on-demand routing

to establish routes between a source and a destination. However, the process must rely on

the discovered logical connectivity graph to select reverse paths back to the source. While

there are many ways to provide such an any-to-any routing support, the important criterion

is to fall back on the logical connectivity graph for route selections.

The lessons we learned from this thesis also apply to receiver-based routing such

as GRAd [67]. The key issue in this kind of routing is to rely on mechanisms that narrow

the scope of dissemination and perform suppression well to reduce duplicate forwarding.

Relying on the basic MAC layer support is often not adequate, as demonstrated by our

broadcast simulations and [29]. Lessons from the careful tree building using broadcast

discussed in [71] can be used to avoid the local broadcast storm issues and enhance the

suppression mechanism. Furthermore, dissemination can be scoped better if the policy only

relies on receivers on the logical connectivity graph as forwarding candidates. However, such

an approach would limit the ability to cope with mobility, which is one main advantage of

receiver-based routing, because the rate of adaptation is now bounded by the connectivity

graph.

A new research direction in sensor networking is to take our holistic theme further

to the application layer and enable more cross-layer optimizations to yield better perfor-

mances. One example is query optimized routing, where applications may want to influence

routing and rate of topology adaptation to make in-network processing more efficient. A

list of potential routing parents can be provided to the application layer, which can use its

212

semantic information to inform the routing layer to select a topology for better in-network

processing. Once the query-optimized routing topology is formed, the overhead for the

in-network processing algorithm to adapt to topological changes can be quite significant.

Therefore, it may be advantageous to maintain the same topology and increase the overhead

in link retransmissions to overcome link quality variations. Such a whole-system approach

requires a new networking architecture that provides a set of tightly-coupled interfaces be-

tween the application and the routing layer while maintaining the flexibility to select and

co-exist with different routing protocols and networking services. A more detailed discussion

of this new direction is described in [78].

In conclusion, our study crosses several system layers and illustrates interactions

among global network structure, high-level protocols, and the underlying low-level issues.

The empirical data should shed light on application deployment by illustrating the trade-

offs among transmit power, inter-nodal spacing for data sampling, average and maximum

network hop count, and overall network load. Although new generations of radios will have

different connectivity characteristics from the two that we sampled, the observed three-

region structure is expected to persist; link estimation, neighborhood table management,

and reliability-based cost metrics are likely to remain as core issues for reliable routing in

wireless networks.

213

Bibliography

[1] ANSI/IEEE Std 802.11 1999 Edition.

[2] Chipcon. http://www.chipcon.com.

[3] IEEE 802.15.4-2003 Standard. http://standards.ieee.org/getieee802/download/802.15.4-

2003.pdf.

[4] Network of Embedded Systems Technology. http://webs.cs.berkeley.edu.

[5] RF Monolithics. http://www.rfm.com/products/data/tr1000.pdf.

[6] Atmega 103L(Complete) Datasheet.

[7] 3GPP TS 05.09. Digital cellular telecommunications system (Phase 2+); Link Adap-

tation. In ETSI TS 101 709, version 8.5.0, Release 1999.

[8] A. Konrad, B.Y. Zhao, A.D. Joseph, Reiner Ludwig. Explicit Loss Notification and

Wireless Web Performance. In The Fourth International Workshop on Modeling, Anal-

ysis, and Simulation of Wireless and Mobile Systems (MSWIM), Rome, Italy, 2001.

ACM Press.

http://www.chipcon.com

http://webs.cs.berkeley.edu

http://www.rfm.com/products/data/tr1000.pdf

BIBLIOGRAPHY 214

[9] A. S. Tanenbaum. Computer Networks, 1989.

[10] Bob Albrightson, J.J. Garcia-Luna-Aceves, and Joanne Boyle. EIGRP - A Fast Routing

Protocol Based On Distance Vectors. In Proceedings of Networld/Interop, Paris, France,

May 1994. Interop Europe.

[11] A. Arora, P. Dutta, S. Bapat, V. Kulathumani, H. Zhang, V. Naik, V. Mittal, H. Cao,

M. Gouda, Y. Choi, T. Herman, S. Kularni, U. Arumugam, M. Nesterenko, A. Vora,

and M. Miyashita. Line in the Sand: A Wireless Sensor Network for Target Detection,

Classification, and Tracking. In Special Issue of Elsevier Computer Networks on Future

Advances in Military Wireless Communications, October 2004.

[12] Atmel, Inc. Atmega 128L Preliminary (Complete) Datasheet.

[13] H. Balakrishnan and R. Katz. Explicit Loss Notification and Wireless Web Perfor-

mance. In IEEE Globecom Internet Mini-Conference, Sydney, Australia, November

1998. IEEE.

[14] D. Beyer, M. Frankel, J. Hight, D. Lee, M. Lewis, P. McKenney, J. Naar, R. Ogier,

N. Shacham, and W. Zaumen. Packet Radio Network Research, Development, and

Application. In Proceedings of the SHAPE Packet Radio Symposium, 1989.

[15] B. Blum, P. Nagaraddi, A. Wood, T. Abdelzaher, S. Son, and J. Stankovic. An Entity

Maintenance and Connection Service for Sensor Networks. In The First International

Conference on Mobile Systems, Applications, and Services (MOBISYS ‘03), San Fran-

cisco, California, May 2003.

BIBLIOGRAPHY 215

[16] S. Bruhn, P. Blocher, K. Hellwig, and J. Sjoberg. Concepts and solutions for link

adaptation and inband signaling for GSM AMR speech coding standard. In Proceedings

of IEEE 49th Vehicular Technology Conference. IEEE, July 1999.

[17] Alberto Cerpa, Naim Busek, and Deborah Estrin. SCALE: A tool for Connectivity

Assessment in Lossy Environments. Technical Report CENS-TR 0021, Center For

Embedded Networked Sensing, UCLA.

[18] Alberto Cerpa, Jeremy Elson, Deborah Estrin, Lewis Girod, Michael Hamilton, and

Jerry Zhao. Habitat monitoring: application driver for wireless communications tech-

nology. In SIGCOMM Comput. Commun. Rev., volume 31, pages 20–41, San Diego,

California, 2001. ACM Press.

[19] Jae-Hwan Chang and Leandros Tassiulas. Energy Conserving Routing in Wireless

Ad-hoc Networks. In INFOCOM, volume (1), pages 22–31, 2000.

[20] Y. Choi, Mohamed G. Gouda, Moon C. Kim, and Anish Arora. The Mote Connec-

tivity Protocol. Technical Report TR-03-08, Department of Computer Sciences, The

University of Texas at Austin, 2003.

[21] Douglas De Couto, Daniel Aguayo, Benjamin Chambers, and Robert Morris. Perfor-

mance of Multihop Wireless. In First Workshop on Hot Topics in Networks (HotNets-

I), Princeton, New Jersy, October 2002.

[22] Crossbow, Inc. www.xbow.com.

BIBLIOGRAPHY 216

[23] M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over

sliding windows. In 13th Annual ACM-SIAM Symposium on Discrete Algorithms, 2002.

[24] Douglas S. J. De Couto, Daniel Aguayo, John Bicket, and Robert Morris. A high-

throughput path metric for multi-hop wireless routing. In Proceedings of the 9th annual

international conference on Mobile computing and networking, pages 134–146. ACM

Press, 2003.

[25] E. D. Demaine, A. Lopez-Ortiz, and J.I.Munro. Frequency estimation of internet packet

streams with limited space. In Proceedings of the 10th Annual European Symposium

on Algorithms ESA 2002, pages 348–360, September 2002.

[26] Richard Draves, Jitu Padhye, and Brian Zill. Comparison of Routing Metrics for Static

Multi-Hop Wireless Networks. In ACM SIGCOMM, Portland, Oregon, August 2004.

ACM Press.

[27] C. Estan and G.Varghese. New directions in traffic measurement and accounting. In

ACM SIGCOMM Internet Measurement Workshop, 2001.

[28] Gregor Gaertner and Vinny Cahill. Understanding Link Quality in 802.11 Mobile Ad

Hoc Networks. In IEEE Internet Computing, Jan/Feb Issue 2004.

[29] Deepak Ganesan, Bhaskar Krishnamachari, Alec Woo, David Culler, Deborah Estrin,

and Stephen Wicker. Complex Behavior at Scale: An Experimental Study of Low-

Power Wireless Sensor Networks. Technical Report UCLA/CSD-TR 02-0013, Univer-

sity of California, Los Angeles, February 2002.

BIBLIOGRAPHY 217

[30] David Gay, Philip Levis, Robert von Behren, Matt Welsh, Eric Brewer, and David

Culler. The nesC language: A holistic approach to networked embedded systems. In

Proceedings of the ACM SIGPLAN 2003 conference on Programming language design

and implementation, pages 1–11. ACM Press, 2003.

[31] Philip G. Gibbons and Y.Matias. New sampling- based summary statistics for im-

proving approximate query answers. In Proceedings of ACM SIGMOD International

Conference on Management of Data, pages 311–342, June 1998.

[32] G.Manku and R.Motwani. Approximate frequency counts over data streams. In Pro-

ceedings of the 28th International Conference on Very Large Data Bases, August 2002.

[33] G.T. Nguyen, R. Katz. A trace-based approach for modeling wireless channel behavior.

In Proceedings of the Winter Simulation Conference, pages 597–604, 1996.

[34] C. Hedrick. Routing information protocol. In RFC 1058, June 1988.

[35] C. Hedrick. An Introduction to IGRP. Technical report, Rutgers - The State University

of New Jersey Technical Publication, Laboratory for Computer Science, August 1991.

[36] Jason Hill and David Culler. Mica: a wireless platform for deeply embedded networks.

IEEE Micro, 22(6):12–24, nov/dec 2002.

[37] Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David E. Culler, and Kristofer

S. J. Pister. System Architecture Directions for Networked Sensors. In Architectural

Support for Programming Languages and Operating Systems, pages 93–104, Boston,

MA, USA, November 2000. ACM Press.

BIBLIOGRAPHY 218

[38] Jason Lester Hill. System Architecture for Wireless Sensor Networks. PhD thesis,

University of California, Berkeley, 2003.

[39] Jonathan W. Hui and David Culler. The Dynamic Behavior of a Data Dissemination

Protocol for Network Programming at Scale. In Proceedings of the second international

conference on Embedded networked sensor systems. ACM Press, 2004.

[40] Chalermek Intanagonwiwat, Ramesh Govindan, and Deborah Estrin. Directed diffu-

sion: a scalable and robust communication paradigm for sensor networks. In Proceed-

ings of the International Conference on Mobile Computing and Networking Mobicom,

pages 56–67, August 2000.

[41] J. Jubin, J. D. Tornow. The DARPA packet radio network protocols. In Proceddings

of the IEEE, January 1987.

[42] J. Moy. OSPF Version 2. In RFC 1247, July 1994.

[43] Jaein Jeong and Cheng Tien Ee. Forward Error Correction in Sensor Networks. Tech-

nical report, UC Berkeley, May 2003.

[44] D. Johnson and D. Maltz. Dynamic source routing in ad hoc wireless networks. In

Mobile Computing, pages 153–181. Kluwer Academic Publishers, 1996.

[45] L. R. Ford Jr. and D. R. Fulkerson. Flows in Networks. Princeton University Press,

1962.

[46] Philo Juang, Hidekazu Oki, Yong Wang, Margaret Martonosi, Li Shiuan Peh, and

Daniel Rubenstein. Energy-efficient computing for wildlife tracking: design tradeoffs

BIBLIOGRAPHY 219

and early experiences with ZebraNet. In Proceedings of the 10th international confer-

ence on Architectural support for programming languages and operating systems, pages

96–107, San Jose, California, 2002. ACM Press.

[47] Lawrence M. Lemmis and Kishor S. Trivedi. A Comparison of Approximate Interval

Estimators for the Bernoulli Parameter. In The American Statistician, volume 50(1),

pages 63–68, 1996.

[48] Philip Levis, Sam Madden, David Gay, Joe Polastre, Robert Szewczyk, Alec Woo, Eric

Brewer, and David Culler. The Emergence of Networking Abstractions and Techniques

in TinyOS. In First Symposium on Networked Systems Design and Implementation.

USENIX, 2004.

[49] Xin Li, Young Jin Kim, Ramesh Govindan, and Wei Hong. Multi-dimensional range

queries in sensor networks. In Proceedings of the first international conference on

Embedded networked sensor systems, pages 63–75. ACM Press, 2003.

[50] M. Kim, B. Noble. Mobile Network Estimation. In ACM Mobicom, 2001.

[51] Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. The

Design of an Acquisitional Query Processor For Sensor Networks. In ACM SIGMOD,

2003.

[52] Samuel Madden, Wei Hong, Joe Hellerstein, and Michael Franklin. TinyDB Web Page.

http://telegraph.cs.berkeley.edu/tinydb.

[53] Alan Mainwaring, Joseph Polastre, Robert Szewczyk, David Culler, and John Ander-

BIBLIOGRAPHY 220

son. Wireless Sensor Networks for Habitat Monitoring. In ACM International Work-

shop on Wireless Sensor Networks and Applications (WSNA’02), Atlanta, GA, USA,

September 2002. ACM Press.

[54] B. Miller. Limiting Logical Neighborhood Size. In SURAN Program Technical Note

(SRNTN) 43. Rockwell Inc., September 1986. Available from Defense Technical Infor-

mation Center (DTIC).

[55] E. Modiano. An adaptive algorithm for optimizing the packet size used in wireless

ARQ protocols ARQ protocols. In Wireless Networks, volume 5, pages 279–286, July

1999.

[56] D.C. Montgomery. Introduction to statistical quality control. John Wiley & Sons, Inc.,

3rd edition, 1997.

[57] Vincent D. Park and M. Scott Corson. A Highly Adaptive Distributed Routing Algo-

rithm for Mobile Wireless Networks. In INFOCOM (3), pages 1405–1413, 1997.

[58] Charles E. Perkins and Parvin Bhagwat. Highly dynamic destination-sequenced

distance-vector routing (DSDV) for mobile computers. In Proceedings of the ACM

SIGCOMM, pages 234–244, August 1994.

[59] Charles E. Perkins and Elizabeth M. Royer. Ad hoc On-Demand Distance-Vector

(AODV) Routing. In Proceedings of the 2nd IEEE Workshop on Mobile Computing

Systems and Applications, 1999.

[60] Philip Levis and Neil Patel and David Culler and Scott Shenker. Trickle: A Self-

BIBLIOGRAPHY 221

Regulating Algorithm for Code Propagation and Maintenance in Wireless Sensor Net-

works, booktitle =.

[61] Joseph Polastre, Jason Hill, and David Culler. Versatile Low Power Media Access for

Wireless Sensor Networks. In Proceedings of the second international conference on

Embedded networked sensor systems. ACM Press, 2004.

[62] M. Pursley and H. Russell. Adaptive Forwarding and Routing in Frequency-Hop

Spread-Spectrum Packet Radio Networks with Partial-Band Jamming. In Proceedings

of the MILCOM ’89 Conference, pages 230–234, 1989.

[63] R. Cceres, N. G. Duffield, J. Horowitz, F. Lo Presti, and D. Towsley. Loss-based

Inference of Multicast Network Topology. In IEEE Conference on Decision and Control,

1999.

[64] Ram Ramanathan and Regina Rosales-Hain. Topology Control of Multihop Wireless

Networks Using Transmit Power Adjustment. In IEEE Infocom, March 2000.

[65] Ananth Rao, Christos Papadimitriou, Scott Shenker, and Ion Stoica. Geographic rout-

ing without location information. In Proceedings of the 9th annual international con-

ference on Mobile computing and networking, pages 96–108. ACM Press, 2003.

[66] Sylvia Ratnasamy, Brad Karp, Scott Shenker, Deborah Estrin, Ramesh Govindan,

Li Yin, and Fang Yu. Data-centric storage in sensornets with GHT, a geographic hash

table. Mob. Netw. Appl., 8(4):427–442, 2003.

BIBLIOGRAPHY 222

[67] Robert Poor. Gradient Routing in Ad Hoc Networks. www.media.mit.edu/pia/

Research/ESP/texts/poorieeepaper.pdf.

[68] E. Royer and C. Toh. A Review of Current Routing Protocols for Ad-Hoc Mobile

Wireless Networks. In IEEE Personal Communications, April 1999.

[69] Arvind Sankar and Zhen Liu. Maximum Lifetime Routing in Wireless Ad-hoc Net-

works. In INFOCOM, 2004.

[70] N. Shacham and D. Beyer. A Distributed Protocol for Reducing Neighborhood Size in

Radio Networks. In Proceedings of the International Conference on Communications

ICC ’88, June 1988.

[71] Cory Sharp, Shawn Schaffert, Alec Woo, Naveen Sastry, Chris Karlof, Shankar Sastry,

and David Culler. Design and Implementation of a Sensor Network System for Vechicle

Tracking and Autonomous Intercpetion. Technical report, UC Berkeley, 2004.

[72] Suresh Singh, Mike Woo, and C. S. Raghavendra. Power-aware routing in mobile ad

hoc networks. In Proceedings of the 4th annual ACM/IEEE international conference

on Mobile computing and networking, pages 181–190. ACM Press, 1998.

[73] J. Stevens. Spatial reuse through dynamic power and routing control in common-

channel random-access packet radio networks. In SURAN Program Technical Note

(SPRNTN) 59. Rockwell Inc., August 1988. Available from Defense Technical Infor-

mation Center (DTIC).

www.media.mit.edu/pia/Research/ESP/texts/poorieeepaper.pdf

www.media.mit.edu/pia/Research/ESP/texts/poorieeepaper.pdf

BIBLIOGRAPHY 223

[74] Robert Szewczyk, Alan Mainwaring, Joseph Polastre, and David Culler. An Analysis

of a Large Scale Habitat Monitoring Application. 2004.

[75] P.F. Tsuchiya. The landmark hierarchy, a new hierarchy for routing in very large

networks. In Special Interest Group on Data Communication (SIGCOMM), pages 36–

42, 1988.

[76] V. Jacobson. Congestion avoidance and control. In ACM SIGCOMM, pages 314–329,

Stanford, California, 1988. ACM Press.

[77] Alec Woo and David Culler. A Transmission Control Scheme for Media Access in

Sensor Networks. In International Conference on Mobile Computing and Networking

(MobiCom 2001), page 221, Rome, Italy, July 2001.

[78] Alec Woo, Sam Madden, and Ramesh Govindan. Networking support for query pro-

cessing in sensor networks. Commun. ACM, 47(6):47–52, 2004.

[79] Yong Yao and J. E. Gehrke. Query Processing in Sensor Networks. In The First

Biennial Conference on Innovative Data Systems Research, January 2003.

[80] Mark D. Yarvis, W. Steven Conner, Lakshman Krishnamurthy, Alan Mainwaring, Jas-

meet Chhabra, and Brent Elliott. Real-World experiences with an interactive ad hoc

sensor network. In International Conference on Parallel Processing Workshops, August

2002.

[81] Fan Ye, Gary Zhong, Songwu Lu, and Lixia Zhang. GRAdient Broadcast: A Robust

BIBLIOGRAPHY 224

Data Delivery Protocol for Large Scale Sensor Networks. In ACM Wireless Networks

(WINET), volume 11, March 2005.

[82] Jerry Zhao and Ramesh Govindan. Understanding Packet Delivery Performance in

Dense Wireless Sensor Networks. In Proceedings of the first international conference

on Embedded networked sensor systems, pages 1–13. ACM Press, 2003.

A HOLISTIC APPROACH TO MULTIHOP ROUTING IN SENSOR …cs.uccs.edu/~cs526/wsn/AlecWooPhDthesis.pdf ·...

Documents

Transcript of A HOLISTIC APPROACH TO MULTIHOP ROUTING IN SENSOR …cs.uccs.edu/~cs526/wsn/AlecWooPhDthesis.pdf ·...