MULTISCALE DISTRIBUTED ESTIMATION WITH …zg345yw6848/DissertationS...multiscale distributed...

MULTISCALE DISTRIBUTED ESTIMATION

WITH APPLICATIONS TO GPS AUGMENTATION

AND NETWORK SPECTRA

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF AERONAUTICS AND

ASTRONAUTICS

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Christina Selle

June 2010

http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/zg345yw6848

© 2010 by Christina Selle. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

ii



http://purl.stanford.edu/zg345yw6848

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Matthew West, Primary Adviser


Sanjay Lall, Co-Adviser


Per Enge

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

Abstract

Distributed estimation uses a network of sensors to measure a set of variables. The

computation tasks required for finding the optimal estimate can be divided among

the sensor nodes in a way that can be implemented as an iterative process using

nodes with little computational power. Most algorithms for distributed estimation

work for small networks, but convergence rates decrease with network size, making

them impractical for use in large networks. We present a consensus algorithm with a

convergence rate that scales logarithmically with network size by arranging nodes in

a multigrid network structure. The algorithm can adapt to changes in the network

structure and allows for selection of several parameters, representing a trade-off be-

tween performance and robustness of the network. We also describe how the algorithm

is adapted to account for time-varying measurements and measurement weights.

We present two applications of these methods. Our first application is an algo-

rithm that allows us to determine the spectral properties of a state transition matrix

on the network. Since the convergence rate of a consensus algorithm is related to

the spectral properties of the state transition matrix, we can use this information to

evaluate the effects of changes to the network structure.

Our second application is a distributed GPS augmentation system. Traditional

GPS augmentation systems use reference receivers to find a set of error correction

values, which is broadcast to surrounding mobile receivers. Our distributed augmen-

tation system uses only mobile receivers with unknown locations, which are able to

obtain a set of correction values by sharing and processing data in a distributed net-

work. The resulting method can be used to improve GPS point positioning accuracy

in areas where fixed augmentation systems are not available.

iv

Acknowledgments

This work was supported by a William R. and Sara Hart Kimball Stanford Graduate

Fellowship, and I am deeply thankful to the Kimball family for this support.

I would like to thank my adviser, Matt West, for all of the great ideas, guidance,

advice, suggestions, encouragement, LaTeX tips, and math lessons he shared with me

during my time at Stanford.

I also want to express my gratitude to Sanjay Lall, who stepped in as my official

Stanford adviser half-way through my journey. Per Enge and Sheri Sheppard have

been great professors for me to work with as a teaching assistant, and have also

provided valuable advice.

Sigrid Close and Ellen Kuhl provided some helpful feedback during and after my

Ph.D. oral examination.

I could not have made it through graduate school without the support of my

friends and family, including my parents Hartmut and Marie-Luise Mester, my sister

Mareike Mester, my grandparents, my friends Adam Grossman, Fraser Cameron,

Marianne Karplus, Tracy Rubin, the Carlstrom family, and the group of Aeronautics

and Astronautics graduate students who shared this journey with me.

Last but not least, I would like to thank my husband Andrew Selle for all of his

love and support.

v

Contents

Abstract iv

Acknowledgments v

1 Introduction 1

2 Multiscale consensus algorithms 5

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Construction of a multilevel network . . . . . . . . . . . . . . . . . . 8

2.3 Invariant distribution offset factor determination . . . . . . . . . . . . 12

2.4 Adjusting self-weights for improved performance . . . . . . . . . . . . 14

2.5 Adjusting the network for broken edges and nodes . . . . . . . . . . . 16

2.6 Performance and Robustness trade-offs . . . . . . . . . . . . . . . . . 18

2.7 Two dimensional numerical example . . . . . . . . . . . . . . . . . . 22

2.8 Measurement updates . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.9 Sensor weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.10 Network spectral properties . . . . . . . . . . . . . . . . . . . . . . . 26

2.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Distributed GPS augmentation 34

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 Position solution for a single receiver . . . . . . . . . . . . . . . . . . 39

3.2.1 Gauss-Newton method . . . . . . . . . . . . . . . . . . . . . . 40

3.2.2 Gradient descent method . . . . . . . . . . . . . . . . . . . . . 41

vi

3.2.3 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.4 Comparison of different methods . . . . . . . . . . . . . . . . 43

3.3 Multiple receivers with delay estimation . . . . . . . . . . . . . . . . 44

3.3.1 Gauss-Newton method . . . . . . . . . . . . . . . . . . . . . . 45

3.3.2 Accuracy and sensitivity to random errors . . . . . . . . . . . 47

3.3.3 Regularized delay estimation . . . . . . . . . . . . . . . . . . . 49

3.4 Distributed delay estimation . . . . . . . . . . . . . . . . . . . . . . . 50

3.4.1 Regularized distributed delay estimation . . . . . . . . . . . . 52

3.4.2 Comparison of the different methods . . . . . . . . . . . . . . 53

3.5 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.6 Multigrid methods for distributed delay estimation . . . . . . . . . . 61

3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4 Distributed spectral methods 65

4.1 Introduction and Assumptions . . . . . . . . . . . . . . . . . . . . . . 66

4.2 Spectral methods for symmetric matrices . . . . . . . . . . . . . . . . 67

4.3 Adapting spectral methods for distributed networks . . . . . . . . . . 69

4.4 Spectral methods for nonsymmetric matrices . . . . . . . . . . . . . . 72

4.5 Distributed concurrent computation of eigenvalues . . . . . . . . . . . 73

4.6 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.7 Using spectral information for supernode placement . . . . . . . . . . 80

4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

A Distributed spectral algorithms 85

A.1 Power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

A.2 QR-factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

vii

List of Tables

3.1 Typical GPS error budget (RMS values). . . . . . . . . . . . . . . . . 36

viii

List of Figures

2.1 Simple two-level network with five base-level nodes (gray) and two

supernodes (black). The base-level nodes form a ring. . . . . . . . . . 7

2.2 Transition matrix for Theorem 2.2.1 . . . . . . . . . . . . . . . . . . . 11

2.3 Linear system for Theorem 2.2.1. . . . . . . . . . . . . . . . . . . . . 11

2.4 State transition matrix with adjusted supernode self-weights. . . . . . 15

2.5 Comparison of convergence for a ring network with three levels using

Metropolis weights and the multigrid weights described here with and

without supernode self-weight adjustments. . . . . . . . . . . . . . . . 16

2.6 Spectral gap vs. number of nodes in the base level . . . . . . . . . . . 19

2.7 Centralization Robustness vs. Performance trade-off . . . . . . . . . . 20

2.8 Performance vs. Robustness for various α and β values. The red cross

indicates the parameter values chosen for subsequent numerical examples. 21

2.9 Example network layout . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.10 Convergence results for the example network. . . . . . . . . . . . . . 23

2.11 Eigenvalues of network with 400 base level nodes with various numbers

of levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.12 Selected eigenvectors of a single level ring with 400 nodes. . . . . . . 28

2.13 Convergence times of different ring-shaped networks given the eigen-

vectors of a single level ring as starting value. . . . . . . . . . . . . . 29

2.14 Eigenvalues of network with various numbers of levels. . . . . . . . . 30

2.15 v2 of the base level of the network shown in figure 2.9. . . . . . . . . . 31

2.16 v6 of the base level of the network shown in figure 2.9. . . . . . . . . . 31

2.17 v30 of the base level of the network shown in figure 2.9. . . . . . . . . 32

ix

2.18 Convergence times of different networks given the eigenvectors of a

single level network as starting value. . . . . . . . . . . . . . . . . . . 32

3.1 Convergence for 50 receivers without delay estimation . . . . . . . . . 43

3.2 Effect of including delay estimation on position estimates . . . . . . . 48

3.3 Convergence for 500 receivers without delay estimation . . . . . . . . 54

3.4 Convergence for 50 receivers with delay estimation . . . . . . . . . . . 54

3.5 Convergence for 500 receivers with delay estimation . . . . . . . . . . 55

3.6 Mean positioning error as a function of the number of satellites . . . 55

3.7 Mean objective value function per receiver as a function of the number

of receivers, with and without delay estimation, and for a hypothetical

case where correlated delays are set to zero. . . . . . . . . . . . . . . 56

3.8 Ratio of total position error without and with delay estimation . . . . 57

3.9 Ratio of total position error without and with delay estimation in an

extended network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.10 Ratio of total position error without and with delay estimation with

large multipath error . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.11 Ratio of total position error without and with delay estimation in an

extended network with large multipath errors . . . . . . . . . . . . . 60

3.12 Example network layout . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.13 Positioning error convergence for the receiver network example. . . . 63

3.14 Objective function value for the receiver network example. . . . . . . 63

4.1 Convergence of the orthogonal basis for the distributed QR method . 77

4.2 Convergence of the orthogonal basis for the distributed power method 77

4.3 Convergence of the eigenvectors for the distributed QR method. . . . 78

4.4 Convergence of the eigenvectors for the distributed power method. . . 78

4.5 Convergence of the eigenvalues for the distributed QR method. . . . . 79

4.6 Convergence of the eigenvalues for the distributed power method. . . 79

4.7 Network from figure 2.9 in v2-v3 space. . . . . . . . . . . . . . . . . . 81

4.8 Final supernode placements in v2-v3 space. . . . . . . . . . . . . . . . 82

4.9 Final supernode placements in x-y space. . . . . . . . . . . . . . . . . 83

x

Chapter 1

Introduction

1

CHAPTER 1. INTRODUCTION 2

This thesis describes a distributed multigrid consensus algorithm, as well as ap-

plications of this algorithm to GPS augmentation and graph-spectrum computations.

Distributed estimation algorithms are used to provide optimal estimates of a vari-

able, based on a set of measurements taken by a network of sensors. Distributed

estimation algorithms have several advantages and disadvantages compared to cen-

tralized algorithms. While centralized algorithms require the availability of a single

processor that is capable of running the estimation algorithm, distributed methods

divide the computational tasks into smaller tasks that can be performed by nodes

with lower computational capabilities. For the algorithms described in this thesis, we

assume that the sensors themselves have some computational capabilities and form

the network of nodes that runs the distributed estimation algorithms. Distributed

methods can also be more robust than centralized methods, in many cases making it

possible to obtain good results even if some of the nodes or communication links in

the network fail.

The networks used here for distributed estimation can be modeled as graphs,

where the sensors are the nodes or vertices of the graph, and the communication

links between nodes form the edges. Every node can store some limited amount of

data for later use, and thus is modeled as having a self-loop. We also assume that

all communication links are two-way links, but that weights associated with different

directions of transmission between two nodes do not have to be equal, making the

network a directed graph. At every discrete time step, each node receives data from

adjacent nodes, and updates its stored variables.

Chapter 2 describes an algorithm for distributed consensus. While consensus is a

very basic operation for a distributed network to perform, there are many complex

computations that can be reduced to a combination of consensus steps and simple

operations that can be performed by each node in the network individually. The con-

sensus algorithm described in chapter 2 is different from other consensus algorithms

in that it uses a multigrid network structure. Multigrid methods are a tool commonly

used for improving convergence rates of algorithms for solving differential equations

by using several levels of increasing resolution in the discretization. Chapter 2 shows

how a multigrid structure can be created to run a consensus algorithm in a distributed


network. In addition, the performance and robustness trade offs of this algorithm are

studied, and convergence rates and their dependencies on noise characteristics are

compared to those of single level networks. Chapter 2 also proposes some extensions

of the basic multigrid algorithm for measurement updates and assigning weights to

the node measurements.

Chapter 3 describes how distributed methods, including the multigrid algorithm

from chapter 2, can be used to create a distributed GPS augmentation system. Tradi-

tional GPS augmentation systems use a reference station to create error corrections,

which are broadcast to mobile receivers and used in point positioning. The aug-

mentation system described here does not use a fixed reference receiver, but instead

calculates correction terms based only on the measurements obtained from a network

of mobile receivers. If distributed methods are used, the augmentation system also

does not require the use of a centralized station to compute the corrections, since all

computation is done by the network of receivers.

Chapter 4 describes a distributed eigenvalue method for nonsymmetric matrices.

Most eigenvalue methods are difficult to adapt to distributed systems due to their

dependence on matrix factorization, but the algorithm presented here can be reduced

to a series of consensus processes and simple computations, and can therefore be

run on a distributed network. This is of particular interest since it can be used to

find a worst-case estimate of the convergence rate of a consensus algorithm, and thus

monitor the status of the network if the structure of the network changes over time.

If the right number of levels is selected in the construction of the multigrid net-

works described in this thesis, the convergence rate scales logarithmically with net-

work size, making them practical for use in very large networks. Since microcontrollers

and microprocessors are included in a wide variety of devices, and wireless communi-

cation is becoming more and more ubiquitous, there are many potential application

areas where distributed estimation and control could be applied to large networks.

The main contributions of this thesis can be summarized as follows. A novel

multigrid algorithm for distributed consensus is presented, along with analysis of the

trade-offs between robustness and performance that occur when various parameters


are selected for this algorithm. The convergence of this consensus method is com-

pared to other single-level methods under various noise conditions. Chapter 3 includes

an algorithm for a distributed GPS augmentation system, which differs from exist-

ing augmentation systems in that it requires neither stationary reference receivers

with known positions, nor reference stations for centralized computations. Chapter

4 extends an existing distributed eigenvalue method for symmetric matrices to non-

symmetric matrices, while also describing how the power method can be adapted for

distributed systems. The spectral information of a network is then used for deter-

mining appropriate supernode locations in a network.

A review of the relevant literature is provided in the introduction section of each

of the individual chapters in this thesis. Conclusions and some ideas for future work

are provided at the end of each chapter.

Chapter 2

Multiscale networks for distributed

consensus algorithms

5

CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 6

2.1 Introduction

Distributed consensus algorithms [11][30][45] allow a network of computational nodes

to iteratively exchange information between neighbors in order to compute the global

average of a quantity. They can be used as the basis for many applications, such

as distributed optimization methods [16] or control schemes [6][33]. While typically

less efficient than a centralized algorithm, consensus methods have the advantages of

distributing the work across all nodes in the network and of being robust to node and

connection failure.

The general framework for consensus methods considers each node synchronously

updating its own value to a weighted average of the current values of its neighbors

(as distinct from asynchronous gossip algorithms [3], for example). One of the most

natural questions, therefore, is what graph structure and what weights should be

chosen to give the fastest convergence of the algorithm to the consensus value while

guaranteeing convergence [7].

The choice of optimal weights has been investigated in depth by [2][39][44], who

used convex optimization and semidefinite programming to find the weights that

minimize the magnitude of the second largest eigenvalue of the Markov chain defined

by the consensus update. While such an approach gives the optimal choice of weights,

it requires a centralized scheme for solving the optimization problem for the weights.

An alternative to solving for the optimal weights is to choose a graph structure

that gives fast consensus with some weight choice. This can be done by optimiza-

tion [15], or by using a heuristic such as taking advantage of the fact that small-world

networks [42] have fast consensus [40] and thus trying to add edges or nodes to en-

hance this property. Other possibilities include making the node updates random [19]

or otherwise time-varying [20][29]. Networks can also have time-varying inputs or

topologies due to the nature of the network rather than the consensus algorithm [31].

In this chapter we present an alternate scheme for producing a network to achieve

fast consensus, based on the idea of multiscale networks. A simple example of a

multiscale network with two levels is shown in figure 2.1. Figure 2.9 in section 2.7

shows a more complex network with three levels, that is used for numeric examples


Figure 2.1: Simple two-level network with five base-level nodes (gray) and two su-pernodes (black). The base-level nodes form a ring.

throughout this thesis. We observe that a regular consensus method is similar to

using Jacobi’s method to solve the equation Lx = 0, where L is the graph Laplacian

or a similar matrix. Unfortunately, the convergence rate of Jacobi’s method is poor

and scales badly as system size grows [36]. This is due to the fact that errors that

vary slowly across the network are only slowly driven to zero by the Jacobi iteration,

which uses only nearest-neighbor updates. One standard way of overcoming these

deficiencies is to use multilevel algorithms, such as the multigrid method [41], where

coarsened versions of the base-level graph are used to enhance the decay of slowly

varying components.

We build on this insight and give an algorithm for constructing multilevel networks

for consensus. The basic multilevel network construction is presented in section 2.2,

with a heuristic for adjusting the weights to enhance convergence in Section 2.4. An

algorithm for adjusting the edge weights in the presence of node and edge failures

is given in section 2.5 and the trade-off between performance and robustness is in-

vestigated numerically in section 2.6. Section 2.7 presents a numerical example for

a randomly generated graph embedded in 2D. Section 2.8 describes how the algo-

rithm can be used if node measurements are time-varying, and section 2.9 presents


equations for adjusting the algorithm for calculating a weighted average of node mea-

surements. Finally, in section 2.10 we present some examples for the changes to

the network spectral properties due to adding a multigrid structure, and how this

influences correlations between noise spatial frequency and convergence rates.

While the algorithm described in this chapter only finds the mean of a single

variable, it can be used as a basis for performing many more complex computations.

For example, the variance of a the node measurements can be found using a sequence

of two consensus operations, the first to find the mean of the measurement values,

and the second to sum the squares of deviations from the mean. Some applications,

including the distributed GPS augmentation system described in chapter 3 and the

spectral algorithm described in chapter 4 require adding vectors, which can be done

by simply letting the state x of the network be a matrix, where each node stores the

information contained in one row of the state matrix.

2.2 Construction of a multilevel network

By a multilevel network we mean one where nodes are arranged in levels or classes.

All nodes are not equal in their connection structures, but are grouped. In a spatially

embedded network, lower levels contain more nodes and have physically short-range

connections, while higher levels contain fewer nodes that have longer-range connec-

tions. This thus mimics the multiscale structure generated by multilevel algorithms

such as multigrid [41]. We refer to nodes in all upper levels as “supernodes”, to

distinguish them from the nodes in the base level.

A consensus problem starts with a network with a set of nodes N and a set of

edges E connecting these nodes. Each node is given an initial value, and the purpose

of the consensus algorithm is to find the mean of the initial states of all nodes. The

initial values of all nodes are stored in the vector x(0). Starting with the initial

values, at any time step t each node i takes a weighted average of the state values of

its neighboring nodes to compute its own new state value xi(t+ 1). This process can


be represented as a multiplication with a state transition matrix P :

x(t+ 1) = P Tx(t) (2.2.1)

For a single-level network, Metropolis weights can be used to propagate the state as

described in [45]. With Metropolis weights, the state transformation matrix is

Pij =

1

1+max di,djif {i, j} ∈ E

1−∑{i,k}∈E Pik if i = j

0 otherwise.

(2.2.2)

This is equivalent to the evolution of probability distributions in Markov Chains and

we assume irreducibility and aperiodicity so the state converges to a unique final state

π, where

P Tπ = π (2.2.3)

State transition matrices that result from applying Metropolis weights are symmetric,

and all row- and column-sums are equal to one. The invariant distribution is uniform,

and represents the average of the initial states of the nodes:

π =1

n

n∑i=1

xi(0) (2.2.4)

Metropolis weights can be computed quickly by the distributed network, and can be

efficient for single-level networks. However, they result in inefficiencies when applied

to multilevel networks. In particular, Metropolis weights for connections between

supernodes in upper levels of the network are smaller than they would need to be to

maximize the convergence rate, since Metropolis weights take into account only the

degree of a node, but not other aspects of the geometry of the network, such as the

length of edges in a spatial embedding.

One method for constructing multilevel networks and finding their state transition

matrices and invariant distributions is to first generate the base level, and then add

the upper levels. Each superior level is generated by making an identical copy of the


next lower level, and merging several nodes into supernodes. The nodes in each level

are connected to their equivalent nodes in the levels directly above and below.

This method can be used for constructing multiscale networks based on an arbi-

trary layout of the base-level network. It does however constrain the construction of

the upper levels and the connections between levels, in that connections between su-

pernodes must mirror the connections in the lowest level. It can therefore be applied

in situations where the geometry of the upper levels of the network can be chosen to

fit these constraints, or when the layers of supernodes are created by selecting some

of the regular nodes in the base level to double as supernodes, and the base-level

connections between nodes are also used to implement supernode edges.

The first step in creating the multilevel network is duplicating the base level to

create upper levels, and connecting each node to its corresponding node in the levels

directly above and below, giving a so-called ladder network. The connections between

different levels initially all have equal weights going up and down. For such a network,

the invariant distribution of each level is equal to the invariant distribution π of the

original base level, so that the overall invariant distribution is

π =1

n

[πT , πT , . . . , πT

]T(2.2.5)

Next, weights are added for connections between different levels, so that the values a

node receives from superior levels can be given more weight than those from inferior

levels. Using coefficients α1, α2, . . . , αn to denote weights for connections between

nodes in each level, and β1, β2, . . . , βn−1 for weights of connections between levels, the

new state transition matrix is

P =

α1P (1− α1)I 0 · · · 0

β1I α2P (1− α2 − β1)I · · · 0

0 β2I α3P · · · 0...

......

. . ....

0 0 0 · · · αnP

(2.2.6)

The merging of nodes to form q supernodes from p nodes in a level is described by

the transformation matrix Bi ∈ Rp×q, where Bij = 1 if and only if the original node


¯P =

α1P (1− α1)B2 0 · · · 0 0

β1B†2 α2B

†2PB2 (1− α2 − β1)B†2B3 · · · 0 0

0 β2B†3B2 α3B

†3PB3 · · · 0 0

......

.... . .

......

0 0 0 · · · (1− αn)B†nBn−1 αnB†nPBn

(2.2.9)

Figure 2.2: Transition matrix for Theorem 2.2.1. B† denotes the pseudoinverse of B,

i.e. B† =(BTB

)−1BT

.

(α1 − 1) β1 0 · · · 0 0

0 (α2 + β1 − 1) β2 · · · 0 00 0 (α3 + β2 − 1) · · · 0 0...

......

. . ....

...0 0 0 · · · (αn−1 + βn−2 − 1) βn−1

1 1 1... 1 1

γ1γ1 + γ2

γ1 + γ2 + γ3...∑n−1

i=1 γi1

=

000...01

(2.2.10)

Figure 2.3: Linear system for Theorem 2.2.1.

i is merged into supernode j. Bi thus describes the mapping from the base level of

nodes to level i. Note that B1 = I. The transformation from the ladder network to

the final network is described by

B = diag(B1, B2, . . . , Bn) (2.2.7)

¯P =

(BT B

)−1BT P B = B†P B (2.2.8)

Theorem 2.2.1. A multilevel network constructed as described above will have the

state transition matrix (2.2.9) in figure 2.2 and invariant distribution ¯π given by

¯π =[(γ1π)T , (γ2B

T2 π)T , (γ3B

T3 π)T , . . . , (γnB

Tn π)T

]T, (2.2.11)

where the coefficients γ1, γ2, . . . , γn are found by solving the linear system (2.2.10)

shown in figure 2.3.

Proof. Using P from eq. 2.2.6 in eq. 2.2.8 yields the state transition matrix shown in

figure 2.2. Given P , we can show that ¯π in equation 2.2.11 is indeed the invariant


distribution:

¯P T ¯π =

α1γ1π + β1π

((1− α1)γ1 + α2γ2 + β2γ3)BT2 π

...

((1− αi−1 − βi−2)γi−1 + αiγi + βiγi+1)BTi π

...

((1− αn−1 − βn−2)γn−1 + αnγn)BTn π

= ¯π (2.2.12)

Since the α and β coefficients are known, this can be written as a system of linear

equations. Omitting the last row, which is redundant since each column of the original

system sums to zero, and adding the condition that the sum of the γ’s has to be one,

we get

(α1 − 1) β1 · · · 0

(1− α1) (α2 − 1) · · · 0

0 (1− α2 − β1) · · · 0...

.... . .

...

0 0 · · · βn−1

1 1... 1

γ1

γ2

γ3...

γn−1

γn

=

0

0

0...

0

1

(2.2.13)

The linear system in figure 2.3 is constructed by taking the sum of each row except

the last with all rows above it. As long as αi + βi−1 < 1 for all i, there is a unique

solution. Given this solution, each node can determine the consensus value from the

invariant distribution. �

The resulting invariant distribution is not uniform, and in order to determine the

consensus value, the state of each node has to be multiplied with a factor that can

be obtained by solving the linear equation above for the invariant distribution.

2.3 Invariant distribution offset factor determina-

tion through consensus

As an alternative to solving the equations presented above for deriving the consen-

sus value from the invariant distribution, the factors can also be found by using the


consensus method itself. This also makes it easy to relax the assumption that ev-

ery node starts out with a measurement value. Until now it was assumed that each

node had access to a unique measurement, but this might not be the case in some

implementations of this method. In a real-life situation, a sensor could malfunction

while the computation and communication capabilities of a node might be working

normally. Another scenario where this occurs is when supernodes are implemented

using the hardware of an already existing network.

Theorem 2.3.1. Let the elements of the vector κ be the factors the consensus value

has to be multiplied with to obtain the invariant distribution,

ˆπi = κi

n∑k=1

xk(0) (2.3.1)

Let xk(0) = 0 if node k has no measurement available for inclusion in the consensus

process. Also, let η be a vector so that ηi = 1 if node i has a measurement, and ηi = 0

otherwise. Then κ can be found by applying the consensus method to η:

κ = (P T )∞η (2.3.2)

Proof. Let m be the consensus value, which is the mean of all node measurement

values:

m =

∑nk=1 xi(0)∑nk=1 ηi

(2.3.3)

The invariant distribution can be expressed in terms of κ and m as

ˆπi = κim (2.3.4)

Now, for the consensus process that uses η as the initial state vector

ˆπi = κi

∑nk=1 ηi∑nk=1 ηi

= κi (2.3.5)

�


In the case where every node has a measurement value, this simplifies to

κ = (P T )∞1 (2.3.6)

2.4 Adjusting self-weights for improved performance

The method for constructing the propagation matrix has one deficiency: since su-

pernodes are constructed by merging several nodes in one level into one supernode,

the self-weights of the supernodes are on average significantly higher than the weights

for transmitting states between supernodes in a level. The convergence rate can be

improved by reducing the self-weights of the supernodes, so that they are on average

equal to the weights between nodes. This can be done by taking advantage of the

fact that ((1 + δ)P T − δI

)π = P Tπ (2.4.1)

Such an adjustment is applied to all submatrices that describe the connections be-

tween supernodes in their respective level, i.e. all block matrices on the diagonal of¯P with the exception of the first block matrix on the diagonal, which describes the

connections between the base-level nodes. The δ coefficients for each level are cho-

sen such that the mean weight for connections between nodes are equal to the mean

self-weights.

Theorem 2.4.1. If the multilevel network with state transition matrix¯P in equa-

tion (2.2.9) has weight changes given by

δj = min

{a− b

1− a− b,min {diag(Aj)}

1−min {diagAj}

}(2.4.2)

a =1

ntrace(A) (2.4.3)

b =1

n2 − n ‖Aj‖1 − trace(Aj) (2.4.4)

Aj = B†jPBj (2.4.5)

then it will have the same invariant distribution as the unmodified network. With


¯Pa =

α1P (1− α1)B2 0 · · · 0

β1B†2 α2

((1 + δ2)B

†2PB2 − δ2I

)(1− α2 − β1)B†2B3 · · · 0

0 β2B†3B2 α3

((1 + δ3)B

†3PB3 − δ3I

)· · · 0

......

.... . .

...

0 0 0 · · · αn

((1 + δn)B

†nPBn − δnI

)

(2.4.7)

Figure 2.4: State transition matrix with adjusted supernode self-weights.

these changes, the new state transfer matrix is equation (2.4.7) in figure 2.4.

Proof. Adjusting the blocks on the diagonal of¯P as described above does not change

the products of those entries with the corresponding parts of the invariant distribu-

tion: ((1 + δi)B

†iPBi − δiI

)γiB

Ti π = B†iPBiγiB

Ti π (2.4.6)

Therefore, the invariant distribution ¯π remains the same when the supernode self-

weights are adjusted. �

While adjusting super-node self-weights does not necessarily result in optimal

values for¯Pa, it is a heuristic that yields significant improvements in the spectral gap

ρ = 1− λ2.Figure 2.5 demonstrates the effect that adjusting supernode self-weights has on

the convergence rate. For a ring-shaped network with three levels of nodes, three

methods were used to construct the state transitions matrix: Metropolis weights, and

the method described in the previous section with and without supernode self-weight

adjustments. The computational cost of generating the networks was not taken into

account here, since it is assumed that networks are used for multiple computations.

Using the multigrid method, and initial improvement in the convergence rate com-

pared to Metropolis could be achieved, as averaging of states of nodes connected to

the same supernode is accelerated compared to Metropolis weights. However, since

connections between supernodes are weak, convergence slows down after a few steps.

With the improvement of adjusting supernode self-weights, a significantly higher con-

vergence rate is achieved even after these initial steps.


0

0.2

0.4

0.6

0.8

1

Residual

norm

||ri||/||r 0

||

0 10 20 30 40 50

Time t

Metropolisweights not adjustedweights adjusted

Figure 2.5: Comparison of convergence for a ring network with three levels usingMetropolis weights and the multigrid weights described here with and without su-pernode self-weight adjustments.

2.5 Adjusting the network for broken edges and

nodes

In order to be robust, the network should continue to function when one or more of its

edges or nodes stop functioning, as long as the network is still connected. A broken

node is a special case of multiple broken edges, since it is equivalent to breaking all

edges of the effected node and removing it from the network.

One simple method for adjusting for a broken edge is for the adjacent nodes to

modify their self-weights so that the row sums of the weight matrix are again equal

to 1. Affected nodes only need to know the weights of their remaining edges to do

this. When this method is used, the invariant distribution does not change, as long as

the network is still connected. This can be shown by considering the joint probability

matrix W, where

Wij = Pijπi (2.5.1)


The column sums of W are equal to the invariant distribution:

∑j

Wji = πi (2.5.2)

When the edge between nodes p and q is broken, P is adjusted in the following way:

P ′pq = P ′qp = 0 (2.5.3)

P ′pp = Ppp + Ppq (2.5.4)

P ′qq = Pqq + Pqp (2.5.5)

This results in the following adjustments to W:

W ′pq = W ′

qp = 0 (2.5.6)

W ′pp

π′p=Wpp

πp+Wpq

πp(2.5.7)

W ′qq

π′q=Wqq

πq+Wqp

πq(2.5.8)

These adjustments preserve the symmetry of W . The column sums of W ′ are:

∑j

W ′ji =

∑j

Wji = πi = π′i for i 6= p, q (2.5.9a)

∑j

W ′jq =

∑j 6=p,q

W ′ji +W ′

pp +W ′qp

=∑j 6=p,q

Wjp + (Wpp +Wpq)π′pπp

for i = p (2.5.9b)

∑j

W ′jq =

∑j 6=p,q

Wji +W ′qq +W ′

pq

=∑j 6=p,q

Wjq + (Wqq +Wpq)π′qπq

for i = q (2.5.9c)


Theorem 2.5.1. If the multilevel network with state transition matrix¯Pa in equa-

tion (2.4.7) has some edges removed but remains connected, then updating the tran-

sition matrix by (2.5.3)–(2.5.5) ensures that the invariant distribution remains un-

changed.

Proof. Equations (2.5.9b) and (2.5.9c) can be solved for π′p and π′q:

π′p =(πp −Wpp −Wqp)πpπp −Wpp −Wqp

= πp (2.5.10)

π′q =(πq −Wqq −Wpq)πqπq −Wqq −Wpq

= πq (2.5.11)

Therefore, π′i = πi for all i. �

If the network becomes disconnected as a result of broken edges, or if one or more

nodes break, the resulting invariant distribution of the remaining or partial network

is not the same as that of the original network, since information is lost in the process.

However, the method for adjusting the network described above can still be used to

determine the average of the values of the remaining nodes at the time the network

was disconnected.

2.6 Performance and Robustness trade-offs

There are many useful measures of performance for consensus algorithms. One such

performance measure is the second largest eigenvalue modulus (SLEM) [34][24]. The

SLEM is a measure of the worst-case convergence rate, which applies if the initial guess

is aligned with the second eigenvector, or the convergence rate that is reached when

all differences in node states along other eigenvectors of the system are sufficiently

reduced.

Figure 2.6 shows the spectral gap ρ = (1 − SLEM) for multilevel networks with

various numbers of levels, where the nodes and edges within each level form a ring. In

these networks, every node is connected to its two neighboring nodes within its level,

so that each level forms a ring. In addition, each node is connected to one supernode


N=1

N=2

N=3

N=4

N=5

N=6

10−5

10−4

10−3

10−2

10−1

100

Spectralgapρ

Spectralgapρ

1 2 5 10 20 50 100 200 500 1000

Number of Nodes nNumber of Nodes n

Figure 2.6: Spectral gap vs. number of nodes n in the base level for networks withvarious numbers of levels N .

in the level above. Each supernode in one of the upper levels has the same number

of subnodes.

As demonstrated in the figure, the spectral gap for a single-level network is in-

versely proportional to the square of the number of nodes in the network. However,

if the number of levels in the network is allowed to vary and is sufficiently large, it

scales logarithmically instead.

One simple measure of robustness is the connectivity of the network. Additional

measures of robustness are necessary to evaluate how the network convergence rate

is affected by failures of some edges or nodes that do not lead to parts of the network

becoming disconnected. One such measures of robustness is the worst-case spectral

gap of a network with a specific number of broken edges or nodes.

Another measure of performance that can be used is the inverse of the number of

steps tc required for convergence of node values to within a small error margin of the

invariant distribution. Similarly, robustness can be defined as the ratio between the

number of steps required for convergence for the intact network and for a network

with a number of broken edges or nodes.


0

500

1000

1500

2000

Tim

eto

convergen

cet c

Tim

eto

convergen

cet c

0 5 10 15 20

Number of Supernodes n2Number of Supernodes n2

Worst case single node failureIntact network

Figure 2.7: Centralization Robustness vs. Performance trade-off — Single Node failureworst case performance.

In constructing a multilevel network, there are a number of parameters one can

chose that influence the performance and robustness of the network. The extreme

cases are often equivalent to a single level distributed network, which is very robust

but has low performance, or to a network with a single supernode, which has high

performance and low robustness.

The first choices to make are the number of levels and the ratio of nodes per

supernode for each level. The effects of the number of levels on the SLEM for a ring-

shaped network are shown in figure 2.6. Figure 2.7 shows an example of the number

of time steps required for convergence of a ring-shaped network with two levels and

40 base nodes as a function of the number of supernodes n2 in the second level. In

the case where all nodes are functioning, the convergence rate is lower for networks

with more supernodes. However, if any one of the supernodes breaks, the time to

convergence increases dramatically for a network with few supernodes, while networks

with more supernodes are not affected as much. In this case, adding more than six

supernodes to a network does not lead to faster convergence if one of them breaks,

since the effect of lowering the convergence rate is larger than the benefit of added


0

0.2

0.4

0.6

0.8

1

Rob

ustnesst c/t

′ cRob

ustnesst c/t

′ c

0 0.02 0.04 0.06 0.08

Performance 1/tcPerformance 1/tc

Pareto frontierCoefficients used for examples below

Figure 2.8: Performance vs. Robustness for various α and β values. The red crossindicates the parameter values chosen for subsequent numerical examples.

robustness. However, if several nodes malfunction, having additional supernodes can

be beneficial. While the ideal number of levels depends primarily on the number of

nodes in the network, the best ratio between the number of nodes in different levels

depends on the expected failure rate of nodes and edges, as well as the desired level

of robustness.

Additional parameters that have to be chosen are the α and β coefficients in the

state transition matrix¯P (figure 2.2). Selecting large values for the coefficients that

govern data exchanges between supernodes and from supernodes to base nodes yields

high performance and lower robustness, while giving base level nodes more weight

increases robustness and lowers performance. Figure 2.8 shows the Pareto frontier of

all possible combinations of these coefficients for a ring-shaped network with three

levels and 64 base-level nodes. The times to convergence for the intact network

and for a network with ten broken edges were used to evaluate performance and

robustness. The majority of possible combinations of the four α and β parameters in

this case are not on the Pareto frontier and should not be selected. Each point on the

Pareto frontier represents a different performance-robustness trade-off, and selection


0

2

4

6

8

10

0 2 4 6 8 10

Figure 2.9: Example network layout (connections between different levels are notshown), where upper levels use larger nodes and thicker edges.

of a specific parameter combination depends on the desired level of performance or

robustness.

2.7 Two dimensional numerical example

To demonstrate how the algorithm described above might be used in a real network,

a two dimensional network consisting of 324 randomly positioned nodes was created.

The probability of having an edge between any two nodes in the base level was in-

versely proportional to the square of the distance between the nodes. Two supernode

levels were created by dividing the base level layout into a 6 × 6 grid for the second

level, and a 2× 2 grid for the third level, and selecting the node closest to the center

of each grid square to double as a supernode. The layout of this network is shown in

figure 2.9.

Figure 2.10 shows the convergence of the node values to the invariant distribution


10−12

10−9

10−6

10−3

100

Residual

norm

||ri||/||r 0

||Residual

norm

||ri||/||r 0

||

0 20 40 60 80

Time tTime t

Single level with broken edgesSingle level MetropolisThree levels with broken edgesThree level multigrid

Figure 2.10: Convergence results for the example network.

for both the multigrid network and a network consisting of the base level only. As

expected, the multigrid network converges significantly faster. Also plotted is a case

where all edges have a probability of being functional of 0.5 at any time step. While

this decreases the convergence rate, the multigrid network still performs significantly

better than the single level network.

2.8 Measurement updates

The method described in the previous sections is applicable to situations where each

node takes only one measurement. In many potential applications, the value that is

being estimated changes over time, and nodes update their measurements periodi-

cally. One option to handle measurement updates would be to restart the consensus

process with each new set of sensor measurements. However, this can be ineffective,

especially if variations between different sensors are larger than variations of a par-

ticular sensor’s values over time, since all progress towards consensus based on the

previous values would be discarded. In addition, it would require all nodes to per-

form measurement updates at the same prearranged time, and would not allow for


unscheduled asynchronous updates.

The following theorem describes a way of updating the state of a node to incor-

porate a new measurement without restarting the consensus process. It can also be

applied if only some or just a single node update their measurements, and unlike

restarting the consensus process, it does not require any synchronized action between

nodes. The only disadvantage of this method is that nodes need to store their previous

measurement values in addition to their current state.

Theorem 2.8.1. Let y be a vector of previous measurement values, and let y′ be a

vector of updated measurement values. Update the state vector as follows:

x′ = x+ (y′ − y) (2.8.1)

Then the new invariant distribution reflects the mean of the new measurement values,

i.e.

ˆπ′ = P∞y′ (2.8.2)

Proof. If the measurement update is performed at time t, then the node state before

and after the measurement update are

x(t) = (P T )ty (2.8.3)

x′(t) = (P T )ty + (y′ − y) (2.8.4)

The new invariant distribution is

ˆπ′ = P∞x′ = P∞(P T )ty + P∞y′ − P∞y = P∞y′ (2.8.5)

�

2.9 Sensor weights

The methods described above lead to a consensus that reflects the mean of the mea-

surement values of all nodes. In this section, we describe how to adapt the methods


to allow for giving nodes unequal weights, so that nodes that have access to more

accurate measurements can be given higher weights than nodes with less accurate

measurements.

Theorem 2.9.1. Let yi be a measurement value associated with node i and let φi be

the weight assigned to it. Then the weighted average of the measurements of all nodes

in the network can be found by running two separate consensus processes on variables

x and z with initial values as defined below:

xi(0) = φiyi (2.9.1)

zi(0) = φi (2.9.2)

The weighted average of yi is obtained at each node after both consensus processes

converge by dividing xi by zi.

xi(∞)

zi(∞)=

∑nk=1 φkyk∑nk=1 φk

(2.9.3)

Proof. Applying equation 2.3.1,

xi(t) = κi

n∑k=1

xk(0) (2.9.4)

zi(t) = κi

n∑k=1

zk(0) (2.9.5)

The factors κ are the same for both consensus processes.

xi(∞)

zi(∞)=κi∑n

k=1 φkykκi∑n

k=1 φk=

∑nk=1 φkyk∑nk=1 φk

(2.9.6)

�

With this method it is even possible to alter sensor weights from φi to new values

φ′i at some time during the consensus process by using the method described in the


previous section and applying equation 2.8.1 to both x and z, i.e.

x′i = xi + (φ′iyi − φiyi) (2.9.7)

z′ = zi + (φ′i − φi) (2.9.8)

Note that the sensor weights do not need to sum to one here, since we divide by their

sum z. This is particularly useful, since it means that a node can change its weight

by simply altering its own stored values of xi and zi, and no additional interaction

with other nodes is required.

If the noise in the node measurements is expected to be independent for each

node and normally distributed, setting the sensors weights equal to the inverse of the

variance σ2i of each node minimizes the overall error:

φi =1

σ2i

(2.9.9)

In some situations, the nodes might be able to provide an estimate of the accuracy of

their estimate that varies over time. The equations above can then be used to update

the node weights to reflect this change in the estimated accuracy.

2.10 Network spectral properties and convergence

rates

In this section we study how spectral properties of a network are influenced by adding

supernode levels to a network. Results presented in previous sections have shown that

multigrid methods can reduce the second-largest eigenvalue λ2 and thereby increase

the spectral gap ρ = 1−λ2 of a network. To show the effects on additional eigenvalues

and eigenvectors of the network, we start with the example of a ring-shaped network,

where the base level network consists of a simple ring of nodes, and every node has

exactly two neighbors. For such a ring-shaped network with n nodes, the eigenvalues

are given by the following expression, where k takes values from 0 to n/2 for even n,


and from 0 to (n− 1)/2 for odd n:

λ2k−1 =1

3+

2

3cos

(k

n2π

)(2.10.1)

For k larger than 0 and smaller than n, the multiplicity of the eigenvalue is 2. The

eigenvectors have the following forms, where vk,i is the i-th entry of the k-th eigen-

vector, and the vectors v′ have to be normalized to obtain the eigenvectors v:

v′2k−1,i = sin

(ki

n2π

)(2.10.2)

v′2k,i = cos

(ki

n2π

)(2.10.3)

The eigenvalue moduli for a ring-shaped network of 400 nodes are shown in figure 2.11.

Three of the eigenvectors are shown in figure 2.12. Overall, for a ring-shaped network,

eigenvectors corresponding to eigenvalues with high moduli are low frequency sinu-

soids, and eigenvectors corresponding to low modulus eigenvalues are high frequency

sinusoids. If a consensus process is run on such a network, high-spatial-frequency

noise therefore is averaged out quickly, while low-spatial-frequency noise persists for

a larger number of time steps.

Figure 2.11 also shows the eigenvalues of multigrid networks that use the simple

ring-shaped network as their base layer. The eigenvalues shown are for networks

with three and six layers. Both multigrid networks have the same number of nodes

in the top level, so that the three layer network represents a relatively centralized

network, and the six layer network represents a more robust network. As expected, the

eigenvalues of the multigrid network are lower in magnitude than those of the single

layer network, with the three level network having the overall smallest eigenvalues.

Most importantly, λ2 is significantly lower for the multigrid networks.

To study how the eigenvalues and eigenvectors relate to convergence rates, a con-

sensus algorithm was run on the ring-shaped network with the eigenvectors of the

single level ring as an input. For each of the eigenvectors, the process was started

with each node initialized to the corresponding entry of the eigenvector, and the


0

0.2

0.4

0.6

0.8

1

Eigenvaluemodulus

0 0.2 0.4 0.6 0.8 1

Eigenvalue number / n

1 level3 levels6 levels

Figure 2.11: Eigenvalues of network with 400 base level nodes with various numbersof levels.

−0.1

−0.05

0

0.05

0.1

Eigenvectorvalue

0 100 200 300 400

Node number

v2 v6 v40

Figure 2.12: Selected eigenvectors of a single level ring with 400 nodes.


100

101

102

103

104

105

106

Stepsto

convergence

0 100 200 300 400

Eigenvector number

Single level ringThree level ringSix level ring

Figure 2.13: Convergence times of different ring-shaped networks given the eigenvec-tors of a single level ring as starting value.

number of steps until the invariant distribution (within a tolerance) was reached was

recorded. The results are shown in figure 2.13. For the single level network, the num-

ber of steps to convergence decreases with eigenvector number, as the corresponding

eigenvalue modulus decreases. For the multilevel networks on the other hand, the

convergence rates are nearly constant across all eigenvectors, indicating that the con-

sensus algorithms for multigrid networks described above can eliminate high and low

frequency noise equally in approximately the same number of time steps. Multigrid

networks are therefore particularly useful if low-spatial-frequency noise is present,

while single level networks can be used for eliminating high-spatial-frequency noise.

Figures 2.14 through 2.18 show similar results for the network shown in figure

2.9 instead of the ring-shaped network. Figure 2.14 compares the eigenvalues of the

base level network to the multigrid network, showing significantly lower eigenvalues

for the multigrid network. The next three figures are plots of some selected eigen-

vectors of the base network. Unlike the ring-shaped network, the node layout of this

two dimensional network cannot be simply represented by the node number alone.


0

0.2

0.4

0.6

0.8

1

Eigenvaluemodulus

0 0.2 0.4 0.6 0.8 1

Eigenvalue number / n

1 level3 levels

Figure 2.14: Eigenvalues of network with various numbers of levels.

Therefore, in the plots of those eigenvectors, the two plot axes represent the loca-

tion of the node, while the color indicates the value of the entry of the eigenvector

corresponding to a particular node. While the eigenvectors for this irregular net-

work are not sinusoids, it is still true that eigenvectors corresponding to the large

eigenvalues usually vary slowly across neighboring nodes, and represent low-spatial-

frequency noise, while eigenvectors corresponding to smaller eigenvalues vary quickly

across neighboring nodes.

Figure 2.18 is the equivalent of figure 2.13, but for the two dimensional network.

Just as for the ring-shaped network, the time until convergence stays almost constant

for all initial conditions.

2.11 Conclusion

In this chapter we introduced a new multilevel and multiscale network construction

that accelerates distributed consensus algorithms run on the network. We also gave


0

2

4

6

8

10

0 2 4 6 8 10

Figure 2.15: v2 of the base level of the network shown in figure 2.9.

0

2

4

6

8

10

0 2 4 6 8 10



0

2

4

6

8

10

0 2 4 6 8 10


100

101

102

103

104

Stepsto

convergen

ce

0 50 100 150 200 250 300

Eigenvector number

Single level networkMultilevel network

Figure 2.18: Convergence times of different networks given the eigenvectors of a singlelevel network as starting value.


update rules to show how the consensus transition matrix should be adjusted in re-

sponse to node and edge failure to ensure that the invariant distribution is preserved.

Using our multilevel construction we were able to explore the trade-off between heav-

ily weighting the coarsest levels of the network, resulting in high performance but

low robustness to failure, and more heavily weighting the fine base level, giving high

robustness but low performance. The algorithms presented constitute a heuristic

method. A detailed mathematical model for the resulting improvements in conver-

gence rates would be an interesting area for future work, but is beyond the scope here.

The accelerated performance of consensus methods on such multilevel networks was

demonstrated with an example of a random network embedded in 2D. The spectral

properties of this example network, as well as a ring-shaped network, were studied

and the times until convergence with different initial noise conditions for the multigrid

network were compared to the base network, indicating that the multigrid methods

described here are particularly useful in the presence of noise that varies slowly across

neighboring nodes. Furthermore, we described how time-varying node measurements

can be incorporated, and how the algorithm can be adapted if weights are introduced

for individual node measurements.

Chapter 3

Distributed GPS augmentation

34

CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 35

3.1 Introduction

The purpose of this chapter is to describe a method that increases the point position-

ing accuracy of global navigation satellite systems such as GPS by sharing information

about measurement errors between receivers. Unlike other augmentation systems, the

method described here does not require the placement of reference receivers, but in-

stead uses a network of mobile receivers to compute error corrections.

GPS signals are subject to several sources of errors, including differences between

the true ephemeris of the satellite and the values that are broadcasted in the nav-

igation message, ionospheric and tropospheric signal delays, multipath errors, and

receiver noise. While models for ionospheric and tropospheric delays are available

and commonly used in receivers, the differences between actual and estimated delays

are significant and make up the majority of the error in pseudorange measurements.

For single frequency receivers using standard models of these delays, the RMS dif-

ference between the modeled and actual delays is 5 meters for ionospheric delays,

and 1 m for tropospheric delays, while the RMS range error due to ephemeris errors

is 3 meters[23]. Receiver noise and multipath effects are usually between 0.5 and 1

meter each, depending on the quality of the receiver, the type of antenna used, and

the topography of the terrain. Multipath errors can be significantly larger in some

areas, including urban canyons. Unless otherwise noted, we used a 1 meter error for

combined multipath and receiver noise.


the true ephemeris of the satellite and the values that are broadcast in the navi-

gation message, ionospheric and tropospheric signal delays, multipath errors, and


and commonly used in receivers, the differences between actual and estimated delays


For single frequency receivers using standard models of these delays, the RMS (root-

mean-square) difference between the modeled and actual delays is 5 m for ionospheric

delays, and 1 m for tropospheric delays, while the RMS range error due to ephemeris

errors ans spacecraft clock modeling errors is 3 m[23]. Receiver noise and multipath


Ephemeris and satellite clock model errors 3 mTropospheric delay model error 1 mIonospheric delay model error 5 mReceiver noise and multipath 1 m

Table 3.1: Typical GPS error budget (RMS values).

effects are usually between 0.5 m and 1 m each, depending on the quality of the

receiver, the type of antenna used, and the topography of the terrain. Multipath

errors can be significantly larger in some areas, including urban canyons. Unless oth-

erwise noted, we used a 1 m error for combined multipath and receiver noise, which

is appropriate for locations with a relatively clear sky.


the true ephemeris of the satellite and the values that are broadcasted in the nav-

igation message, ionospheric and tropospheric signal delays, multipath errors, and


and commonly used in receivers, the differences between actual and modeled delays


For single frequency receivers using standard models of these delays, the RMS dif-

ference between the modeled and actual delays is 5 meters for ionospheric delays,

and 1 m for tropospheric delays, while the RMS range error due to ephemeris errors

is 3 meters[23]. Receiver noise and multipath effects are usually between 0.5 and 1

meter each, depending on the quality of the receiver, the type of antenna used, and

the topography of the terrain. Multipath errors can be significantly larger in some

areas, including urban canyons. Unless otherwise noted, we used a 1 meter error for

combined multipath and receiver noise.

A number of methods for reducing these errors are currently being used or will

be available in the future. Ionospheric errors can be mostly eliminated by the use

of dual-frequency receivers, which are currently not commercially sold, but will be

available to users of GPS and Galileo in the future. There are also a variety of

differential GPS (DGPS) methods that are commonly used. DGPS systems typically

consist of one or more reference receivers with a known position. Based on the


positions and measurements that are computed by these reference receivers, a set of

corrections can be computed and transmitted to other receivers in the area. A mobile

receiver in the vicinity of the reference station can then apply those corrections to

its own pseudorange measurements. Corrections can take the form of scalar values

that are estimates of the total errors in the pseudorange for each satellite. The

accuracy of these types of corrections for the mobile receivers naturally depends on

the distance to the reference station. Wide-area DGPS systems on the other hand

broadcast vector corrections, where the exact correction that is applied by the user

depends on the user’s location. These systems can also estimate various types of

errors (such as ephemeris and ionospheric delays) separately. Scalar corrections are

generally sufficient if the distance between the reference station and the user is less

than 100 km [17].

One DGPS systems that is widely used today in the United States is the Wide Area

Augmentation System (WAAS) [13], which employs reference receivers in locations

across the United States to provide pseudorange vector corrections that eliminate

most of the error due to ionospheric delays, ephemeris errors, and satellite clock

biases. While WAAS is very effective in improving positioning accuracy, the distance

between a user and the closest reference station might be considerable, which limits

the accuracy that can be achieved. Several other countries have implemented similar

systems, including the European EGNOS [14] and the Japanese MSAS systems. At

some airports in the United States, Local Area Augmentation Systems (LAAS) [12]

are used to provide scalar GPS corrections, which are more accurate near the reference

stations, but are typically available only to aviation users in that particular area.

There are, however, still many areas in the world where GPS augmentation is not

available, creating a problem for users that require high positioning accuracy. Some of

the uses of GPS that require a higher accuracy than can be provided by un-augmented

GPS include agriculture and navigation for vision-impaired people [22].

The augmentation system proposed here does not rely on reference receivers, and

could therefore function in any region of the world. A receiver needs to have access to

signals from four satellites in order to obtain a position fix, but GPS receivers often


receive signals from five or more satellites at a time. Having more information avail-

able than required for a position estimate enables receivers to produce estimates of

the signal delays due to ionospheric, tropospheric, and ephemeris errors. After com-

puting a position estimate, the pseudorange residuals of a receiver form an estimate

of the signal delays. If produced by a single receiver, such an estimate would not

be very accurate due to the presence of receiver noise and possibly multipath effects,

but if a number of receivers collaborate, pseudorange corrections can be computed

from the combined data from all receivers. Currently, most GPS receivers do not

use additional pseudorange data to compute error corrections, while some receivers

use the additional pseudoranges from more than four satellites to perform integrity

checks. Since GPS receivers have become a common feature of cell phones and other

devices, it is conceivable that a network of receivers could be created within a rela-

tively small area, so that all receivers experience similar errors due to tropospheric

and ionospheric delays and ephemeris errors. Receiver noise and multipath errors

differ between receivers even if placed in close proximity to one another.

The augmentation system proposed below would find scalar pseudorange correc-

tions. It is however conceivable that the methods used could be extended to finding

vector corrections, as well as performing integrity checks. We also assume that the

receivers do not use carrier phase methods.

While all examples included here are based on the GPS constellation of satellites,

the same methods apply to receivers of other satellite navigation systems, and could

also be used for receivers that receive signals from multiple systems. These algorithms

might also be useful for systems that combine GPS with other positioning signals,

such as TV transmissions [32].

Methods for using distributed networks with GPS receivers for node localization

are described in [28]. In these types of networks, some nodes either do not have a

GPS receiver, or do not get a clear signal at their location, and ranging between

nodes in the network is used to enable positioning for those nodes. In contrast, for

our work described here we assume that every node is capable of finding a position es-

timate individually, and the interaction between nodes is used to improve positioning

accuracy, without directly measuring ranges between nodes.


The algorithms proposed here could be implemented using a variety of different

means of communication between the nodes. Given the ubiquity of GPS receivers in

smartphones, it is conceivable that wireless communication via WiFi or cell phone

networks could be used. The use of networks of GPS-enabled cell phones is currently

being studied for uses in traffic monitoring [43]. Several existing DGPS systems,

ranging from small local systems such as the one described in [38] to wide area systems

such as NASA’s Global Differential GPS System [25][26] currently use the Internet

to send corrections to users. All of these systems feature fixed reference stations.

3.2 Position solution for a single receiver

This section describes algorithms for finding a position estimate for a single receiver.

These algorithms will form the basis for the development of algorithms for networks

of receivers in later sections of this chapter.

Let ρ be the vector of measured pseudo-ranges to GPS satellites, sj be the position

of satellite j, x be the position of the receiver, b be the the clock bias of the receiver,

ε be a vector containing delays (i.e. errors that are experienced by all receivers in a

specific area) associated with the satellite pseudo-ranges, and ν be a random noise

vector. The following equation relates these quantities:

ρj = ‖sj − x‖+ b+ εj + νj (3.2.1)

As a first step, we describe methods for finding a position solution that do not attempt

to find the correlated delays, and assume these to be a part of the noise vector ν, so

that

ρj = ‖sj − x‖+ b+ νj (3.2.2)

We want to find estimates of the position of the receiver x and the receiver clock bias

b, and therefore define a state vector y that includes these variables:

y =

[x

b

](3.2.3)


The least-squares optimal position solution is the estimate of the receiver location

that minimizes following the objective function, where Ns is the number of satellites

for which pseudorange measurements are available to the receiver:

f(y) =1

2

Ns∑j

(‖sj − x‖+ b− ρj

)2(3.2.4)

This function has a unique minimum if more than four satellites are available [1].

There is a large variety of methods that could be used to find the minimum of the

objective function. The Gauss-Newton method is commonly used in practice. Since

our objectives go beyond exploring single-receiver solutions, we also include a discus-

sion of additional algorithms, which are not commonly used for single receiver point

positioning, and compare their performance later in this chapter.

3.2.1 Single receiver Gauss-Newton method

The standard method for solving this non-linear least-squares problem is the Gauss-

Newton method. Starting with an initial guess y(0), we iterate the following until

convergence:

δρj(t) = ‖sj − x(t)‖+ b(t)− ρj (3.2.5)

δy(t) =[(G(t)TG(t)

)−1G(t)T

]δρ(t) (3.2.6)

y(t+ 1) = y(t) + δy(t) (3.2.7)

The matrix G, also called the observation or navigation matrix, is the Jacobian

G =∂(δρ)

∂y=

−`T1 1

−`T2 1...

...

−`Tm 1

(3.2.8)

The line-of-sight vector `j is the unit vector pointing from the estimated receiver


position to satellite j, i.e.

`j =sj − x‖sj − x‖

(3.2.9)

Since GPS satellites are arranged in a constellation at a fixed altitude, the line of

sight vectors of visible satellites are linearly independent, and GTG is invertible as

long as at least four satellites are in view.

3.2.2 Single receiver gradient descent method

For a single receiver, the position solution can also be found with a gradient descent

algorithm. This algorithm uses the gradient ∇f(y(t)) of the objective function shown

in equation 3.2.4. Defining gk to be the kth column of G, the kth element of the

gradient can be expressed as

[∇f(y(t))]k = δρ(t)Tgk(t) (3.2.10)

The descent direction is equal to the negative gradient:

∆y(t) = −∇f(y(t)) (3.2.11)

Backtracking line search is used to determine the step size as described in [4]:

while f(y(t) + τ∆y(t)

)> f

(y(t)) + ατ∇f(y(t)

)T∆y(t) set τ = βτ (3.2.12)

where β is a scalar parameter.

The convergence rate can be improved by scaling the clock bias so that its ex-

pected value has the same order of magnitude as the position of the receiver. In the

computation of the gradient the value g4 has to be adjusted accordingly.

3.2.3 Single receiver Newton’s method

Newton’s method is a second order method that is conceptually similar to the gradient

descent method described above, but uses the Hessian in addition to the gradient to


find the step direction:

∆ynt(t) = −∇2f(y(t))−1∇f(y(t)) (3.2.13)

The gradient ∇f(y(t)) can be found using the equations presented in the previous

section. The first derivative of the objective function with respect to the clock bias

is:∂f(y)

∂b=

Ns∑j

(‖sj − x‖+ b− ρj

)(3.2.14)

Let x(k) for k = 1 . . . 3 be the k-th element of x, and let s(k)j be the k-th component

of sj. The derivative of f(y) with respect to x(k) is:

∂f(y)

∂x(k)=

Ns∑j

(x(k) − s(k)j

)‖x− sj‖

(‖x− sj‖+ b− ρj

)(3.2.15)

Let δρj be the j-th component of δρ. The elements of the Hessian are:

∂2f(y)

∂b2= Ns (3.2.16)

∂2f(y)

∂x(k)∂b=

Ns∑j

(x(k) − s(k)j

)‖x− sj‖

(3.2.17)

∂2f(y)

∂x(k)∂x(`)=

Ns∑j

(x(k) − s(k)j

)‖x− sj‖2

(x(`) − s(`)j

)(1− δρj‖x− sj‖

)(3.2.18)

∂2f(y)

∂x(k)∂x(k)=

Ns∑j

(x(k) − s(k)j

)2‖x− sj‖2

(1− δρj‖x− sj‖

)+

δρj‖x− sj‖

(3.2.19)

where k 6= `.

The equations presented here for Newton’s method are more complex than those

for the Gauss-Newton method, and require finding second derivatives. Many of the

individual terms presented above are however similar or even identical to the corre-

sponding terms for the Gauss-Newton method.


100

104

108

1012

1016

Objectivefunctionvalue

100 101 102 103 104

Time step

Gradient descentNewton’s methodGauss-Newton

Figure 3.1: Convergence of total objective function divided by the number of receiversfor 50 receivers without delay estimation.

3.2.4 Comparison of different single receiver methods

Figure 3.1 shows the convergence of the sum of the objective function values for 50

receivers. For the purpose of this and other simulations, we assumed that the receivers

were located in an area of six minutes in latitude by eight minutes in longitude,

roughly equal to the city of San Francisco in size and geographic location. The actual

ephemeris data of the GPS constellation was used. Pseudoranges for the simulation

were created by adding errors that were generated randomly as Gaussian random

variables with zero mean and standard deviations given in table 3.1 to the sum of the

distances between receivers and satellites and the clock biases. A probability function

was used to simulate the effect of buildings and other objects that might obscure the

field of view of the receiver, making it more likely for high elevation satellites to be

visible.

As expected, the Gauss-Newton method has the best convergence, requiring only a

few steps. Newton’s method also results in fast convergence, but the gradient descent

algorithm is slow and takes a large number of steps to converge.


3.3 Point positioning for multiple receivers with

delay estimation

The errors associated with pseudo-range measurements can be divided into three

different types: (1) the clock bias associated with a particular receiver, which changes

the pseudoranges to all satellites seen by that receiver equally; (2) ionospheric and

tropospheric delays, as well as satellite clock biases, which are associated with each

satellite and are experienced equally by all receivers in a specific location; (3) an

uncorrelated random error consisting of multipath errors and receiver noise. Let ε be

a vector of the correlated delays, consisting of one scalar per satellite. The following

sections describe how a network of GPS receivers can estimate ε and use this estimate

to improve positioning accuracy for its individual receivers. Let εj denote the signal

delay associated with satellite j. If enough information is available, the delays can

be estimated along with the receiver positions in the process of finding a navigation

solution. In this section, we describe how this can be done in a centralized way using

the Gauss-Newton method. We want to minimize the total sum of squares of the

error over all receivers. After including the delays, the new objective function for a

set of Nr receivers which receive signals from the same set of Ns satellites is

f(y) =1

2

Nr∑i=1

Ns∑j=1

(‖sj − xi‖+ bi + εj − ρi,j

)2(3.3.1)

Here, xi denotes the estimated position of satellite i, bi is the estimated clock bias of

satellite i, ρi,j is the measured pseudorange from receiver i to satellite j, and y is a

vector that contains all xi and bi.

Let ηi,j be 1 if receiver i receives a signal from satellite j, and 0 if the receiver

does not have access to a signal from that satellite. Then a more general form of the

objective function, which does not require all receivers to see the same set of satellites,

is

f(y) =1

2

Nr∑i=1

Ns∑j=1

ηi,j


)2(3.3.2)


3.3.1 Multi-receiver Gauss-Newton method

To solve this least-squares problem, we need to add the delays ε to the vector of

variables to be estimated. We therefore create a vector y, which consists of the

estimated positions and velocities of all receivers, and the correlated delays. If yi =[xTi , bi

]Tfor i = 1, . . . , Nr, then this vector y is

y =[yT1 , y

T2 , . . . , y

Tn , ε

T]T

(3.3.3)

There is however a problem that arises with estimating all components of y simultane-

ously: There is an ambiguity in estimating all clock biases and correlated delays, since

adding a constant to all components of the estimated clock biases and the estimated

delays produces no change in the objective function value, i.e.,

f(x, b+ c1, ε− c1

)= f

(x, b, ε

)(3.3.4)

It is possible to resolve this ambiguity by assuming that either the clock biases or

delays have zero mean. With either assumption, the accuracy of the position estimate

is not affected by the validity of the assumption, i.e. the position solution will be

accurate even if the assumption does not apply. The accuracy of the time estimate

is however affected if the clock biases are assumed to be zero-mean, but aren’t, as is

likely to be the case for a receiver network. If, for example, the clock biases have a

mean of b, but are assumed to be zero-mean, the delays ε in the final solution would

be offset by b from the value they would have if the clock biases were zero-mean, and

the estimated position would not be affected, but the estimated receiver time would

also be offset by b. In the following algorithms, we take the delays ε to be zero-mean

in order to obtain an accurate time estimate. We therefore remove one of the delays,

εNs from y, and compute it instead from the estimates of the remaining delays, i.e.

εNs = −Ns−1∑k=1

εk (3.3.5)


In order for the least squares problem to have a unique solution, it is necessary that

the total number of pseudorange measurements across all receivers is at least equal

to the number of entries of y, which is 4Nr +Ns − 1.

Let ρi be a vector that contains all of the pseudorange measurements for receiver i.

We then create a vector ρ that contains all pseudorange measurements of all receivers,

i.e.

ρ =[ρT1 , ρ

T2 , . . . , ρ

Tn

]T(3.3.6)

Let Gi be the observation matrix used for solving the single receiver point positioning

problem for receiver i, as described in 3.2.8. The observation matrix G for the multi-

observer problem contains the single receiver observation matrices in a block diagonal:

G =

G1 0 0 . . . 0 B1

0 G2 0 . . . 0 B2

0 0 G3 . . . 0 B3

......

.... . .

......

0 0 0 . . . GNr BNr

(3.3.7)

If a receiver has access to signals to all satellites that are visible to any other

receiver, then the matrix Bi associated with that receiver is

Bi =∂ρi∂ε

=

1 0 0 . . . 0

0 1 0 . . . 0

0 0 1 . . . 0...

......

. . ....

0 0 0 . . . 1

−1 −1 −1 . . . −1

(3.3.8)

If a receiver does not receive a signal from a specific satellite, then the correspond-

ing row of Bi in equation 3.3.8 is removed, so that the number of rows of Bi is equal

to the number of pseudorange measurements of that receiver.

Given these definitions of ρ, y, and G, we can now write the iterative equations

for solving the multi-receiver least squares problem with delay estimation using the


Gauss-Newton method as follows. At every time step t,

δy(t) =

[(G(t)T G(t)

)−1G(t)T

]δρ(t) (3.3.9)

y(t+ 1) = y(t) + δy(t) (3.3.10)

3.3.2 Accuracy and sensitivity to random errors

To obtain good accuracy for single receiver point positioning, it is important that the

satellites seen by the receiver are located in different parts of the sky relative to the

receiver. If a large portion of the sky is occluded, and the angle between the satellites

as seen by the receiver is small, then the accuracy of the position solution is low. A

commonly used measure for this is dilution of precision (DOP), where various DOPs

are functions of the diagonal entries of the covariance matrix

W =(GTG

)−1(3.3.11)

In estimating the correlated delays of the pseudoranges, a similar issue arises. In order

to get a very accurate delay estimate, it is necessary that the receivers are not located

too close to one another. The inverse of the covariance matrix for the multi-receiver

problem is

W−1 = GT G =

GT1G1 0 0 . . . GT

1B1

0 GT2G2 0 . . . GT

2B2

0 0 GT3G3 . . . GT

3B3

. . . . . . . . .. . . . . .

BT1 G1 BT

2 G2 BT3 G3 . . .

∑Nr

i=1BTi Bi

(3.3.12)

If the estimated positions of the receivers are all in the same location, then GT G is a

singular matrix, and position solution cannot be found using the algorithm described

above. If the receivers are not in exactly the same positions but located within short

distances of one another, then GT G can be ill-conditioned, leading to large errors.


−7500

−5000

−2500

0

2500

5000

7500

North

distance

from

reference

point(m

)

−6000 −4000 −2000 0 2000 4000 6000

East distance from reference point (m)

Figure 3.2: Effect of including delay estimation on receiver position estimates: Graydots show the actual receiver positions relative to the geographic center of the net-works, black dots show the estimated positions that minimize the objective functionin 3.3.2.

Another way to look at this issue is to consider the Taylor expansion of the multi-

receiver problem near a point y0. Let y(k) be the k-th entry of y, then

f(y) ≈ f(y0) +Ns∑j=1

Nr∑i=1

(4Nr+Ns−1)∑k=1

(ηi,j

(‖sj − xi,0‖+ bi,0 + εj,0 − ρi,j

))× ∂

∂y(k)

(‖sj − xi,0‖+ bi,0 + εj,0 − ρi,j

)(y(k) − y(k)0

)(3.3.13)

The constant term is not affected by any changes in the estimated position. If we

take point y0 to be the point where the estimated pseudoranges(‖sj − xi‖+ bi + εj

)are equal to the measured pseudoranges ρi,j, then the linear term is also invariant

under changes in the estimated position. The quadratic term in this case would be

non-zero, but relatively small for realistic values of y.

Figure 3.2 demonstrates what can happen to the final position estimate as a

result of the issues described above: The estimated position of the constellation of

receivers appears to be shifted significantly from their true positions. The estimated


delays for this example are very large. While the receivers are moved towards one of

the satellites, the corresponding estimated delay is shortened, resulting in very little

overall change to the objective function. Without the presence of noise, these shifts

would not occur, but even small amounts of noise can have the effect that shifts of

the estimated position cause minor reductions in the objective function value.

Several approaches can be taken to mitigate this problem. Since the shifts in

the estimated position are related to large estimated pseudorange delays, it helps to

penalize large delays in our objective value function. This is described in the following

section.

3.3.3 Regularized delay estimation

If we take the single receiver positioning solution for each receiver and find the value

of the multi-receiver objective function (equation 3.3.1) at those points, and compare

that to the value of the objective function at the true position of the receivers, the

difference is small but significant for scenarios with typical noise. Comparing the

values at the true position to the solution from the multi-receiver positioning algo-

rithm described above, the differences in the objective value function are very small,

even though the differences in the receiver positions and correlated delays are large.

Regularized least-squares methods can provide a solution where large values of ε are

penalized, so that the magnitude of ε can only be significantly larger than zero if

it provides a significant improvement in the objective function. Regularized least-

squares therefore prevents the position solution for the entire network from shifting

as seen in figure 3.2.

With the penalty for ε added with coefficient µ, the new objective function is:

freg(y) =1

2

Nr∑i=1

Ns∑j=1

ηi,j


)2+

1

2µ‖ε‖2 (3.3.14)

For finding the regularized least-squares solution, it helps to express ε as a linear

function of y.

‖ε‖ =∥∥∥[ 0 B

]y∥∥∥ = ‖F y‖ (3.3.15)


B ∈ RNs×Ns is composed of the matrices Bi described in equation 3.3.8:

B =[BT

1 BT2 . . . BT

Nr

]T(3.3.16)

The update equation for the Gauss-Newton method is

δy(t) =

[(G(t)T G(t) + µF (t)TF (t)

)−1G(t)T

]δρ(t) (3.3.17)

3.4 Distributed delay estimation

The method described in the previous section works if the number of nodes is small

and the computation is performed in a centralized way. If the number of nodes is

large, it becomes difficult to find the pseudoinverse of G. We would like to perform

the computations in a decentralized way, where each receiver has access to limited

information about the other receivers, and where the amount of computation to be

performed by each receiver is small, so the method described above is not suitable for

our purposes. Any method for finding the positions of the receivers in a decentralized

way should yield the same solution as the centralized method described above.

The previous section explains how the optimization problem can be solved by using

a least-squares method which varies all components of y concurrently to minimize

f(y). To be able to solve the problem in a decentralized way, each step in the

optimization process can instead be broken down into two parts: First, new values

of the position estimates x and the clock bias estimates b are found that reduce the

value of the objective function f(y) while keeping the delays ε constant, and then

the minimizing values of ε are found while keeping the estimated receiver positions

constant.

The first part of the time step is then equivalent to finding the position solution

for each receiver independently, and can be completed using any of the methods for

single receivers described above. The only modification necessary is that the previous


estimate of the delays needs to be included, so that equation 3.2.5 becomes

δρj(t+ 1) = ‖sj − x(t)‖+ b(t) + εj(t)− ρj (3.4.1)

Minimizing the norm of the residuals of the pseudoranges with respect to the

delays while keeping the receiver position estimate constant is relatively simple. The

least squares solution is actually equivalent to setting the delay associated with each

satellite equal to the mean of the residual of the pseudoranges without including

the previous delay estimate. To show that this is indeed the minimum, we take the

derivative of 3.3.2 with respect to ε and set it to zero:

∂f(y)

∂εj=

Nr∑i=1

ηi,j


)= 0 (3.4.2)

This equation is then solved for εj to find the following update equation for εj:

εj(t) =1∑Nr

i=1 ηi,j

Nr∑i=1

ηi,j

(‖sj − xi(t)‖+ bi(t)− ρi,j

)(3.4.3)

While this equation for updating the delay estimates requires coordination and

sharing of information between nodes, the only distributed operation that is necessary

is finding a sum of a vector, which can be accomplished using a consensus method.

This is fairly straightforward, and a variety of different distributed consensus meth-

ods are available and could be used for the implementation, including the multigrid

method described in chapter 2. Given such a consensus method, there are two ways

in which they could be integrated here: The first possibility would be to run a consen-

sus method until some convergence criteria is reached every time ε is updated. The

second method would be to only perform a fixed number of steps in the consensus

process, continuing the consensus process for a few steps each time epsilon is updated,

and using the new values of x and b at each time step to update the variables at each

node that are used as inputs to the consensus process. This would require that the

consensus algorithm that is used can handle dynamic updates of the initial node val-

ues, but also has a few advantages: It is likely to lead to faster overall convergence,


since it does not require completing a full consensus process at each update of x and

b, and it requires less coordination between nodes, since convergence does not have

to be detected at each time step, and the sequence of operations is pre-determined.

The two steps of minimizing with respect to the receiver estimated position and

with respect to the delays can be alternated until the solution converges. If the re-

ceiver positions are completely unknown, it can be helpful to perform a few iterations

on the receiver positions only, until the solution gets reasonably close to the true posi-

tion, since the delays are typically small. This is particularly useful for methods that

converge slowly, such as gradient descent, but does not yield significant improvements

in convergence time if the Gauss-Newton method is used. Instead of alternating the

two steps, it is also possible to perform multiple iterations of the position update

before each ε update. The results presented below however use alternating single

iterations of each step.

3.4.1 Regularized distributed delay estimation

For the distributed delay estimation the same accuracy problems arise as for the

centralized method described above. It is still possible for small changes in the random

errors to cause large shifts in the position solutions, and therefore regularized-least

squares methods are still useful for keeping the values of ε low.

Just like in the centralized case, we minimize the regularized least-squares objec-

tive function (equation 3.3.14). In the two-step process of the distributed optimization

method, the first step of minimizing with respect to the positions and clock biases do

not change if the regularization term is added, since the extra term does not contain

those variables. For the minimization with respect to ε, we find the partial derivative

of the new objective function with respect to ε and set it to zero:

∂freg∂εj

=Nr∑i=1

ηi,j


)+ 2εjµ = 0 (3.4.4)

⇒ εj(t) =

∑Nr

i=1 ηi,j

(‖sj − xi(t)‖+ bi(t)− ρi,j

)∑Nr

i=1 ηi,j + 2µ(3.4.5)


3.4.2 Comparison of the different methods

The first part of each time step is equivalent to finding the single-receiver position

solution for each receiver individually. Any of the three methods described in section

3.2 can be used for that step. We previously determined that if we don’t attempt

to estimate the delays, the Gauss-Newton method converges fastest. Figures 3.4

and 3.5 show that this is also true with delay estimation. Simulations were run for

networks of 50 and 500 receivers, using the same receiver locations and pseudorange

errors that were used for the simulation without delay estimation. The regularization

method described in 3.4.1 was implemented with µ = 0.1. For both network sizes,

the Gauss-Newton method performed best, while the convergence of the Gradient

Descent method was very slow. Based on these results, we use the Gauss-Newton

method for all further simulations presented below.

The figures show the value of the objective function divided by the number of

receivers as a function of time. The final value of f(y) after convergence is lower

for the simulations that included delay estimation. The objective function value per

receiver for the 500 receiver network with delay estimation is lower than for the 50

receiver network, although the difference is small.

3.5 Performance Comparison

Figure 3.6 shows what kind of improvements in positioning accuracy would be achieved

if the correlated delays could be determined exactly. The figure shows the mean po-

sitioning error for a receiver located in San Francisco over the course of a day as

a function of the number of satellites in view, using both clear-sky cases and cases

where some of the satellites above the horizon were blocked out. Errors are plotted

both for regular pseudoranges that are influenced by all of the errors mentioned in

section 3.1, as well as for pseudorange data that did not include any of the correlated

delays.

As could be expected, the positioning errors are dramatically reduced if the corre-

lated delays are removed. The results show that in the absence of delay estimation or


100

104

108

1012

1016


100 101 102 103 104

Time step


Figure 3.3: Convergence of total objective function divided by the number of receiversfor 500 receivers without delay estimation.

100

104

108

1012

1016


100 101 102 103 104

Time step


Figure 3.4: Convergence of total objective function divided by the number of receiversfor 50 receivers with delay estimation.


100

104

108

1012

1016


100 101 102 103 104

Time step


Figure 3.5: Convergence of total objective function divided by the number of receiversfor 500 receivers with delay estimation.

0

20

40

60

80

Meanpositionerror(m

)

4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0

Number of satellites visible

With correlated errorsRandom errors only

Figure 3.6: Mean positioning error as a function of the number of satellites signalsare received from, for pseudorange errors as described in table 3.1. Random errorsinclude receiver noise and multipath.


0

10

20

30

40

50

f(y)/N

r(m

2)

1 2 5 10 20 50 100 200 500 1000

Number of receivers

Without delay estimationWith delay estimationWithout delays

Figure 3.7: Mean objective value function per receiver as a function of the numberof receivers, with and without delay estimation, and for a hypothetical case wherecorrelated delays are set to zero.

the use of some other augmentation method, positioning errors can be fairly large if

the satellite visibility is not good. For four visible satellites, the mean error is larger

than 75 meters, which poses a challenge for many common applications of GPS. For

those types of application, augmentation can be a necessity.

Figure 3.7 shows the average objective function value on a per receiver basis for

three different cases, including regular single receiver point positioning as described in

section 3.2 with all errors included, single receiver point positioning for pseudorange

measurements with correlated delays removed, and multi-receiver point positioning

with delay estimation as described in section 3.3. The results indicate that multi-

receiver positioning with delay estimation is very effective in reducing the value of

the objective function, almost lowering it to the values that would be achieved in the

absence of correlated delays.

Figure 3.8 shows the accuracy improvement that is achieved by using delay esti-

mation for networks with various numbers of receivers. The accuracy improvement

factor in the figure is the ratio between the mean positioning error without and with


0

1

2

3

4

5

Accuracy

improvem

entfactor

5 10 20 50 100 200 500

Number of receivers

Figure 3.8: Ratio of total position error for algorithm without delay estimation toerror with delay estimation as a function of the number of receivers for a networkcovering a space of 7.3 km × 11.7 km × 100 m. Each data point represents the meanposition error per receiver for 100 trials with different noise and satellite visibility.Error bars represent one standard deviation.


0

1

2

3

4

5

Accuracy

improvementfactor

5 10 20 50 100 200 500

Number of receivers

Figure 3.9: Ratio of total position error for algorithm without delay estimation toerror with delay estimation as a function of the number of receivers for a networkcovering a space of 73 km × 117 km × 10 000 m.

delay estimation. The results show that for a network of receivers spread across the

area described above, the positioning error can be reduced by a factor of two or more

if the network consists of 30 of more receivers. Increasing the network size to more

than 30 receivers does not yield significant improvements in the positioning accuracy.

The reason for this lies in the conditioning problems described in section 3.3.2, which

are particularly bad for networks with short receiver baselines. If the receivers are

spread out more, especially in altitude, a much better accuracy improvement can be

achieved, and larger network sizes are required to get the full benefit. Figure 3.9

shows the accuracy improvement if the receivers are spread out over 10 000 m in al-

titude, 40 minutes in latitude, and 80 minutes in longitude. Unfortunately this kind

of geometry is not feasible for most geographic areas without the use of airborne re-

ceivers. As mentioned before, spreading the receivers too much will result in reduced

correlation between the errors experienced by the different receivers, which is not

modeled in the simulations presented here. Note that for very small networks, the

accuracy improvement of the more spread-out network is less than for the original


0

1

2

3

4

5

Accuracy

improvementfactor

5 10 20 50 100 200 500

Number of receivers

Figure 3.10: Ratio of total position error for algorithm without delay estimation toerror with delay estimation as a function of the number of receivers for a networkcovering a space of 7.3 km × 11.7 km × 100 m with random and multipath errors of20 m.

network due to the fact that the receivers are less likely to see have the same satellites

in view.

One potential remedy for the decreased correlation of pseudorange errors between

receivers in a spread-out network would be to model the correlated delays not as

single values, but as functions of geographic location with a number of parameters,

and to use an extended version of the methods described in the preceeding sections

to estimate these parameters.

Figures 3.10 and 3.11 are equivalent to figures 3.8 and 3.9, but with random

and multipath errors of 20 m. The plots show no significant changes in positioning

accuracy if delay estimation is used, indicating that delay estimation does not work

for the sizes of networks studied here if multipath errors are very high, although delay

estimation also does not reduce the point positioning accuracy.


0

1

2

3

4

5

Accuracy

improvem

entfactor

5 10 20 50 100 200 500

Number of receivers

Figure 3.11: Ratio of total position error for algorithm without delay estimation toerror with delay estimation as a function of the number of receivers for a networkcovering a space of 73 km × 117 km × 10 000 m with random and multipath errorsof 20 m.


3.6 Multigrid methods for distributed delay esti-

mation

The method for point positioning using distributed networks described above can be

implemented using a variety of different consensus algorithms for finding the delay

estimates. For small networks, a Metropolis algorithm [45] can be used, while the

algorithm described in chapter 2 would be appropriate for large networks. The method

for adjusting node measurement values described in section 2.8 is useful for updating

the residual values for each node after x and b are updated, so the consensus process

does not have to be run to complete convergence at each time step. To demonstrate

this implementation, this section describes the results from a numerical simulation of

point positioning in such a multigrid network.

For this simulation, we used the same network structure as for the two-dimensional

example in chapter 2. The x- and y-coordinates of the nodes in the network were

scaled so that the network would cover the 7.3 km × 11.7 km area around San Fran-

cisco described above. The resulting network, including the latitudes and longitudes

of the nodes, is shown in figure 3.12. In addition, random values of altitude between

0 and 100 m were assigned to the nodes.

Figures 3.13 and 3.14 show the convergence of the positioning error and objective

function for this simulation, comparing the multigrid algorithm with a solution each

receiver obtained individually without delay estimation. For the multigrid simulation,

each time step consists of one iteration with respect to x and b, followed by one

iteration of optimizing with respect to ε with eight steps in the consensus process.

The use of delay estimation reduces the positioning error by 53% in this case, which

is consistent with the results shown in figure 3.8.

3.7 Conclusion

In this chapter we described a method that estimates the portion of GPS pseudorange

measurement errors that is experienced by all receivers located in a specific area, in-

cluding errors due to inaccurate ephemerides, as well as tropospheric and ionospheric


37.72

37.73

37.74

37.75

37.76

37.77

37.78

Longitude(deg)

−121.65 −121.6 −121.55 −121.5

Latitude (deg)

Figure 3.12: Example network layout (connections between different levels notshown).


101

102

103

104

105

106

107

Totalposition

errornorm

(m)

1 2 5 10 20

Time step

Multigrid methodSingle receiver

Figure 3.13: Positioning error convergence for the receiver network example.

100

103

106

109

1012

1015

Meanob

jectivefunctionvalue(m

2)

1 2 5 10 20

Time step

Multigrid methodSingle receiver

Figure 3.14: Objective function value for the receiver network example.


delays. The resulting estimates of correlated delays are used to improve the point

positioning accuracy of the receivers. We found that there is an advantage of spread-

ing the receivers far apart for obtaining a receiver network geometry that results in

a well-conditioned solution, although this would also reduce the correlation between

the errors experienced by different receivers. Due to these issues, the accuracy im-

provements that can be accomplished are not as good as the improvements possible

with augmentation methods that use a network of fixed reference receivers. Our

simulations show that even given these issues, we can still reduce positioning errors

significantly by using distributed augmentation. Since the algorithms described here

do not rely on reference receivers, they could be particularly useful for places where

no augmentation networks exist, which includes most places on the Earth outside

North America and Europe.

We also showed that the correlated delays can be estimated using either centralized

or distributed computation. The multigrid algorithms described in chapter 2 can be

applied here, making distributed estimation feasible even for very large networks of

receivers.

The methods described here model the pseudorange correlated delays as a single

variable, assuming the receivers are located sufficiently close together that they ex-

perience the same delays. For future work, it would be interesting to combine the

algorithm presented here with regression analysis methods to find an estimate of the

pseudorange delays as a function of geographic location. This would make it more

reasonable to include receivers that are located further apart, and it might make it

possible to reduce errors further by including more receivers in the network, resulting

in relatively large networks that could take full advantage of the multigrid methods.

Chapter 4

Spectral methods for distributed

networks

65

CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 66

4.1 Introduction and Assumptions

In chapter 2 we used the spectral gap of the state transition matrix as a measure of

performance of a distributed network for the purpose of running consensus algorithms.

We also pointed out that the eigenvectors corresponding to the largest eigenvalues of

the state transition matrix are an indicator for the types of noise that result in slow

convergence. In this chapter we describe how the distributed network itself can be

used to find some spectral properties of the state transition matrix, focusing on the

largest eigenvalues and corresponding eigenvectors.

For the types of state transition matrices described in chapter 2, the largest eigen-

value is always one, and the corresponding eigenvector is equivalent to the invariant

distribution of the system. We furthermore assume that all eigenvalues are real,

unique, and well separated. It is possible to construct networks with relatively sym-

metric and regular topologies that have repeated eigenvalues, but we will not consider

these networks here.

There is of course a large variety of eigenvalue methods that are commonly used

for large systems, including the Jacobi method, Lanczos’ method [21], Davidson’s

method [10], Krylov subspace methods, and combinations and variations thereof, such

as in [37]. Most of these algorithms are difficult to adapt for distributed networks,

since they rely on matrix factorizations. Many methods also only work for symmetric

matrices.

Kempe and McSherry developed an algorithm for finding the spectral properties of

a distributed network [18], and we build on that algorithm in this chapter. However,

this method is based on the assumption that the state transition matrix is symmet-

ric, which is a valid assumption for many networks, such as those using Metropolis

weights [45] with a uniform invariant distribution. The multiscale methods we pre-

sented in chapter 2, however, result in nonsymmetric state transition matrices. In the

following sections, we describe how the method in [18] can be extended to nonsym-

metric matrices. We also describe another algorithm, based on the power method,

that can be used as an alternative. In section 4.6 we study the performance of the

two algorithms with a numerical example. The power method is used for finding


the first and second largest eigenvalue and eigenvector of a symmetric matrix with

distributed computation in [5] as well as in [46], which then uses this result to control

connectivity in a network of robots.

In addition to monitoring the worst-case convergence rate of the network, knowl-

edge of the spectral properties can also be used to provide some guidance for su-

pernode placements in multigrid networks. As show in [8], [9], and [27], a given the

spectral properties of the state transition matrix, a diffusion map of the network can

be created. In section 4.7 we use a map of the largest eigenvectors of a network

as a guide for placing supernodes, and present a heuristic for fast selection of good

placements.

4.2 Spectral methods for symmetric matrices

Many of the most efficient spectral methods that are in common use today are difficult

to adapt for distributed computation, since they require operations such as matrix

factorization that cannot easily be broken down into small parts that can be performed

by individual nodes in a network. The simplest types of operations that can be

performed by a distributed network are consensus computations and multiplication

of a vector by the state transition matrix. We will therefore only consider spectral

methods that can be reduced to a series of operations that are either of these two

types, or that are simple enough for nodes to perform individually without knowing

the states of other nodes. In this section, we present two methods that can be adapted

for distributed systems. Since the fully distributed versions of these algorithms are

rather complex, we first present their centralized forms. Section 4.3 describes which

modifications should be made for applications in distributed networks.

One method that can be used in distributed networks is the power method. In

its most simple form, it can be used to find the eigenvalue with the largest modulus

and its corresponding eigenvector. This is done by multiplying a random vector

q repeatedly by a matrix P, and normalizing the result. The power method can

therefore be expressed as iterating over the following two equations, where q(t)→ v1


as t→∞.

q(t+ 1) = Pq(t) (4.2.1)

q(t) =q(t)

‖q(t)‖ (4.2.2)

If the largest eigenvalue is either very large or close to zero, the normalization should

be done at every time step to keep the size of the vector q(t) within a reasonable

range. For eigenvalues close to one, it can be sufficient to normalize once every few

time steps.

For the type of distributed system we are interested in here, this method in this

simple form is not very useful, since the largest eigenvalue is already known to be 1,

and the corresponding eigenvector is equal to the invariant distribution.

A shifted version of the power method can be used to find additional eigenvalues

and eigenvectors. Once the first eigenvalue is known, the power method can be run

using the shifted matrix (P − λ1v1vT1 ), where v1 is the eigenvector associated with

the largest eigenvalue. Thus, the second eigenvalue and associated eigenvector can

be found, which can then be used to find additional eigenvectors and eigenvalues.

Instead of a single vector q, we start the algorithm with a matrix Q ∈ RN,n, where N

is the number of nodes (or the size of P ), and n is the number of eigenvalues we want

to find. We let qj be the j-th column of Q. Using the following equations, qj(t)→ vj

as t→∞ [35].

qj(t+ 1) =

(P −

j−1∑k=1

λkvk(t)vTk (t)

)qj(t) (4.2.3)

qj(t) =qj(t)

‖qj(t)‖(4.2.4)

Another simple shifting method can be used to find the eigenvector associated with

the largest eigenvalue (in absolute value) of the sign opposite to λ1. In the specific

case considered here, since λ1 = 1, this will find the smallest eigenvalue. To find the

associated eigenvector, one can simply run the power method on the matrix P − σI,

where any σ larger than λ1 (and some that are smaller) will work.

An alternative to the power method is the algorithm described in [18]. The main


idea behind this algorithm is similar to the power method in that a matrix is alter-

nately multiplied with P and orthonormalized. The method differs from the power

method in the way the orthonormalization is performed.

Theorem 4.2.1. Let Q(0) be a random matrix and P be the symmetric state-transition

matrix of a network. At each time step, a matrix V is computed:

W (t) = PQ(t− 1) (4.2.5)

Consider the QR-factorization of W, W = QR. Let K be equal to W TW , so that

K = W TW = RTQTQR = RTR (4.2.6)

The matrix R is found by computing the Cholesky factorization of K, and the resulting

matrix is inverted and used to compute an updated value for Q.

Q(t) = W (t)R−1(t) (4.2.7)

As t→∞, the columns of Q converge to the eigenvectors of P .

Proof. A proof for this theorem is given by [18]. �

While these algorithms only yield the eigenvectors for symmetric matrices P , they

can be run using a nonsymmetric P . The columns of the resulting matrix Q do not

converge to the eigenvectors in this case, but instead form an orthonormal basis,

where the first j columns span the same space as the first j eigenvectors. We take

advantage of this fact when we extend these methods to nonsymmetric matrices in

section 4.4.

4.3 Adapting spectral methods for distributed net-

works

The two spectral methods, in the way they are presented in the previous section, are

appropriate for centralized systems. This section describes how they can be adapted


for distributed networks. We assume that vectors of length N are stored by the

system by storing each scalar component of the vector in the corresponding node.

Adding vectors and multiplication of vectors by scalars in distributed networks can

thus be straightforwardly performed with only node-local computations. Multiplying

a vector by the transition matrix P is also possible in an efficient manner, involving

only nearest-neighbor exchanges of information.

To find the mean of a vector, we multiply the vector repeatedly with P , so that the

invariant distribution a of the vector a ∈ RN is found using the following equation,

where TF is an integer that is large enough for the algorithm to converge.

aκ = P TF a (4.3.1)

As described chapter 2, the mean of a is related to the invariant distribution by the

offset factors κ, which can be found by either solving a system of linear equations, or

by simply running the same consensus algorithm on 1:

κ = P TF 1 (4.3.2)

The mean of the entries of a can then be found by each node i by dividing the

corresponding entry of a, a(i), by the i-th entry of κ:

1Ta

N=a(i)

κ(i)(4.3.3)

If we need to find the sum of the entries of a vector, we simply multiply the mean by

the number of nodes N .

For the power method, both equations 4.2.3 and 4.2.4 need to be modified. Equa-

tion 4.2.3 can be written as

qj(t+ 1) = Pqj(t)−j−1∑k=1

(λkvk(t)

(vTk (t)qj(t)

))(4.3.4)

The challenge in distributing this computation is finding vTk (t)qj(t). In order to do


that, we introduce a set of n matrices Gk, where for each k < n,

Gk(t) = diag(vk(t))Q(t) (4.3.5)

The matrix Q in the equation above contains all vectors qj, each vector being a column

of Q. The expression vTk qj is then equal to the sum of the entries in the j-th column

of Gk. The column sums of Gk can be found by running a consensus algorithm on

each column of Gk, and multiplying the resulting average by N to obtain the column

sum.

The second difficulty in using the shifted power method on a distributed network

arises with the normalization of qj. This norm can also be computed by using a

consensus algorithm. Let h ∈ RN be the element-wise square of qj, so that for

i = 1, . . . , N , the i-th entry of hj is

h(i)j =

(q(i)j

)2(4.3.6)

After running a consensus algorithm on hj, each node can compute the norm of the

qj from its entry h(i) of the invariant distribution:

hj = P TFhj (4.3.7)

‖qj‖ =

√h(i)j N

κ(i)(4.3.8)

For the QR-factorization method, the only challenge for running the algorithm in a

distributed network is computing W TW . As described in [18], this can also be found

using a consensus algorithm. If we let wj be the j-th column of W , then each node

can compute the matrix wjwTj , and summing each entry of the corresponding matrix

across all nodes yields W TW .

Whenever one of the algorithms mentioned here requires the use of a consensus

algorithm for a specific step, we have a choice of running the consensus process to

convergence at each iteration of the spectral method, or of performing only a specific

number of iterations of the consensus process. In the numeric examples we present


below, we selected a fixed number of iterations for each consensus process. In addi-

tion, instead of restarting the consensus process anew at each iteration, we used the

equations for updating sensor measurements presented in section 2.8, which improves

convergence of the consensus processes once the spectral algorithm approaches the

limit.

4.4 Adapting spectral methods for nonsymmetric

matrices

As described above, several methods for finding the eigenvectors of a symmetric ma-

trix can be used to find an orthonormal basis for the eigenvectors of a nonsymmetric

matrix. This sections describes how the actual eigenvectors can be found if such an

orthonormal basis is known. The method below works for nonsymmetric matrices,

but we assume here that all eigenvalues are unique and well separated.

From one of the methods described above we get a matrix Q, where any column

qj of Q is a linear combination of the first j eigenvectors (sorted by magnitude of

the corresponding eigenvalue). For each qj we can then use a method based on the

following theorem to eliminate components of the first j − 1 eigenvectors to obtain

the eigenvector vj.

Theorem 4.4.1. Let qk be a unit vector that is a linear combination of the first k

eigenvectors of a matrix P that has real, well separated eigenvalues. Then(k∏i=1

(λiI − P )

)qk = 0 (4.4.1)

Proof. The vector qk can be expressed as a linear combination of the first k eigenvec-

tors:

qk =k∑j=1

αjvj (4.4.2)


Plugging this into the right-hand side of equation 4.4.1 yields(k∏i=1

(λiI − P )

)qk =

k∑j=1

((k∏i=1

(λiI − P )

)αjvj

)(4.4.3)(

k∏i=1

(λiI − P )

)αjvj =

(k∏i=1

(λi − λj))αjvj

= 0 for all j ≤ k (4.4.4)

⇒k∑j=1

((k∏i=1

(λiI − P )

)αjvj

)= 0 (4.4.5)

�

A method for finding eigenvector vk (assuming λ1 through λk−1 have already been

determined) is to first eliminate all components of qk that are not parallel to vk by

using the result stated in the theorem above. The resulting vector is then normalized

to obtain the eigenvector.

vk =

(k−1∏i=1

(λiI − P )

)qk (4.4.6)

vk =vk‖vk‖

(4.4.7)

Since the product in equation 4.4.6 includes only the first k−1 terms, the component

of qk along vk remains, and the resultant vector vk only has to be normalized to obtain

the kth eigenvector.

4.5 Distributed concurrent computation of eigen-

values

The shifted power method described above finds the eigenvalues of a matrix succes-

sively, so that to find the j-th eigenvalue, eigenvalue j−1 needs to be known already,

and the orthonormal basis for all eigenvectors has to be determined before finding


any of the eigenvalues. Since both finding the orthonormal basis and finding individ-

ual eigenvalues are iterative processes, we could use estimates of some of the values

required for the next step in the process, instead of waiting for convergence of one

computational step before starting the next step. For example, instead of waiting

for the computation of λj−1 to converge before starting to compute λj, we use the

current estimate of λj−1 at each time step to find an approximation for λj. This is

particularly useful for distributed networks, where detecting convergence and starting

a computational process in all nodes in a synchronized way are much more difficult

than in a centralized system. This method is also useful if the state-transition matrix

changes slowly over time, so the process does not have to be reinitiated from the

beginning when P changes.

Theorem 4.5.1. Let Q(t) ∈ RN×n be a matrix with columns qi(t), so that for any

k, the first k columns of Q(t) are an estimate for a set of vectors that form an

orthonormal basis for the space spanned by the first k eigenvectors of a matrix P ∈RN×N with real, well separated eigenvalues, with the eigenvectors sorted in order of

descending modulus. We assume that the estimate Q(t) is updated at each time step,

and Q(t)→ Q as t→∞, where the columns of Q are orthonormal and exactly span

the same spaces as the eigenvectors. For each i ≤ n we define a matrix Fi ∈ RN×n,

so that at each time step t,

F1(t) = Q(t) (4.5.1)

Fi(t) =(Li−1(t− 1)− P

)Fi−1(t− 1) for all i > 1 (4.5.2)

For each i = 1, . . . , n, let vi be equal to the i-th column of Fi. If ei ∈ RN has a 1 in

the i-th column, with all other entries being 0, then

vi(t) = Fi(t)ei (4.5.3)

We also define a set of n matrices Li, where

Li(t) =(diag(P vi)

)(diag(vi)

)−1(4.5.4)


Then as t → ∞, we have that vi → aivi, where ai is a scalar value, vi is the i-th

eigenvector of P , and the diagonal entries of Li all converge to λi.

Proof. First, we show that vi → vi and Li → λiI for i = 1:

F1(∞) = Q(∞) = Q (4.5.5)

v1(t) = F1(t)e1 = q1 = v1 (4.5.6)

L1(t) =(diag(Pv1)

)(diag(v1)

)−1= λ1I (4.5.7)

We prove convergence of additional eigenvectors and eigenvalues by induction. As-

sume that for some k, Li → λiI for all i < k. Then,

Fk(∞) = (λk−1I − P )Fk−1(∞) =k−1∏i=1

(λiI − P )Q (4.5.8)

Next, we use the fact that multiplying a matrix with ek is equivalent to extracting

the k-th column, and we apply the result from equation 4.4.6 to get

vk(∞) = Fk(∞)ek =

(k−1∏i=1

(λiI − P )Q

)ek

=

(k−1∏i=1

(λiI − P )

)qk = vk = akvk (4.5.9)

The eigenvector vk and the scalar ak can be found by normalizing vk. Given either

vk or vk, we can now show that equation 4.5.4 yields the eigenvalue λk by plugging

in akvk for vk:

Lk(t) =(diag(akPvk)

)(diag(akvk)

)−1=(diag(Pvk)

)(diag(vk)

)−1= λkI (4.5.10)

This shows that λk can be found using the equations above if λi is known for all i < k.

Since we also showed that the equations above yield λ1, by induction and assuming

that Q→ Q, vk → vk and Lk → λkI for all k ≤ n. �


The algorithm described in theorem 4.5.1 can easily be performed by a distributed

network. We assume that the process that is used for finding Q stores each row of

Q(t) in the node corresponding to the row number. Similarly, we also store each row

of all Fi in the corresponding node. Each node therefore needs space for n2 values

to store all Fi. To reduce computational time and storage space, it is also worth

noting that for each Fi, only the first i columns are used for finding the eigenvectors

and eigenvalues. The other columns are not needed, and converge to zero after Q

converges.

Since P is the state transition matrix of the network itself, multiplying a matrix

by P is straightforward, as long as each node has access to the row of the matrix that

corresponds to its node number in the network. We can therefore easily find PFi for

all i. Multiplying a matrix with a diagonal matrix is also simple, as long as each node

knows the corresponding entry of the diagonal matrix. Equation 4.5.4 can therefore

be computed in a distributed fashion, as long as each (j, j) entry of Li is stored in

node j for all i.

The complete set of equations for both the power method and QR-factorization

algorithms are given in section A.

4.6 Numerical Example

In this section, we present results from the implementation of the spectral algorithms

presented in this chapter. The network used for our example is the three-level network

described in section 2.7 and shown in figure 2.9, which consists of 364 nodes arranged

in three levels. We ran both the power method and QR-factorization algorithms as

outlined in section A on this network, using M = 20 steps for each consensus process

at each iteration. The convergence of the estimates of the first four eigenvectors and

eigenvalues, as well as the estimates of the vectors q forming the orthogonal basis are

shown in figures 4.1 through 4.6.

The figures show that convergence is generally somewhat smoother for the QR-

factorization method, compared to the power method. For both methods, accuracy of

the final converged value decreases for each successively smaller eigenvalue, as could


0

0.25

0.5

0.75

1

1.25

1.5

Residualnorm

ofthecolumnsofQ

0 500 1000 1500 2000

Time step

q1q2q3q4

Figure 4.1: Convergence of the orthogonal vector basis for the distributed QR method.

0

0.25

0.5

0.75

1

1.25

1.5

Residual

norm

ofthecolumnsof

Q

0 500 1000 1500 2000

Time step

q1q2q3q4

Figure 4.2: Convergence of the orthogonal vector basis for the distributed powermethod.


10−12

10−9

10−6

10−3

100Residual

norm

softheeigenvectors

0 500 1000 1500 2000

Time step

v1v2v3v4

Figure 4.3: Convergence of the eigenvectors for the distributed QR method.

10−12

10−9

10−6

10−3

100

Residual

normsof

theeigenvectors

0 500 1000 1500 2000

Time step

v1v2v3v4

Figure 4.4: Convergence of the eigenvectors for the distributed power method.


10−12

10−8

10−4

100

104

Residualoftheeigenvalues

0 500 1000 1500 2000

Time step

`1

`2

`3

`4

Figure 4.5: Convergence of the eigenvalues for the distributed QR method.

10−12

10−8

10−4

100

104

Residual

oftheeigenvalues

0 500 1000 1500 2000

Time step

`1

`2

`3

`4

Figure 4.6: Convergence of the eigenvalues for the distributed power method.


be expected when considering how larger eigenvalues are used directly or indirectly

in the orthonormalization process. As a result, these methods would not be useful in

practice for finding most or all of the eigenvalues and eigenvectors of a matrix.

4.7 Using spectral information for supernode place-

ment

After determining some of the spectral properties of a distributed network, we can use

this information to monitor the health of the distributed network, since the largest

eigenvalues are an indicator for the worst-case convergence rates of the network.

The spectral information can also be used to adjust the structure of the network

in an attempt to improve convergence rates. The structure of the network could

be adjusted by moving nodes or adding edges to the network. If we assume, as we

did for some of the examples in this thesis, that we start with a given base-level

network, and add a number of supernodes at select locations or regular nodes, then

it would be particularly useful to use spectral methods for determining where in the

network supernodes should be added. In theory, we could test all possible supernode

locations and select the ones that result in the highest spectral gap, but given that the

number of possible configurations grows exponentially with network size, this is likely

not practical. Instead, this section describes a heuristic that, while not necessarily

determining the the globally optimal supernode locations, can at least be used to

select locations that result in reasonably good convergence rates.

We showed in section 2.10 that the when each node is associated with its corre-

sponding entry of the eigenvectors, the largest eigenvectors of a network tend to vary

slowly across the network, so that adjacent nodes have similar entries of these vectors.

If we want to find several locations for supernode placements, and we want to place

supernodes in various areas of the network and avoid placing supernodes too close to

one another, we can use the eigenvectors as a guideline for the placements. It would

also be possible to use the physical location of the base level nodes as a criterion, but

in some networks, the physical location of a node might not be known to the node. In


−0.3

−0.2

−0.1

0

0.1

v 3

−0.1 −0.05 0 0.05 0.1 0.15

v2

Figure 4.7: Network from figure 2.9 in v2-v3 space.

addition, two nodes that are within close geometric proximity of one another might

not be close to one another in the network if there is not an edge connecting the

two nodes. The previous sections showed that it is possible to find the eigenvectors

of the network even if the physical locations of the nodes and the network topology

are unknown. The method described in this section places supernodes so that the

distance between supernodes is large in a coordinate system that uses the entries of

the second and third eigenvector as coordinates. A plot of the baselevel of nodes of

our two-dimensional example from section 2.7 is shown in figure 4.7.

One method for finding placements for supernode is to start with a set of NS

randomly selected nodes and place supernodes there. Then, in an iterative process,

the supernodes hop from their current placement to an adjacent node in the network,


−0.3

−0.2

−0.1

0

0.1

v 3

−0.1 −0.05 0 0.05 0.1 0.15

v2

Figure 4.8: Final supernode placements in v2-v3 space.

at each time step selecting the neighboring node that maximizes the sum of squares

of its distances to the other supernodes in v2-v3 space. After several iterations, the

supernodes are located far from each other in the network, and connections between

supernodes help to connect distant parts of the network.

The eigenvector values used to find the node locations in v2-v3 space can either be

the eigenvectors of the base level, or the eigenvectors of the entire network with su-

pernodes added. If the eigenvectors of the entire network are used, it can be difficult

to achieve convergence to a final set of supernode placements since the node locations

in the v2-v3 space change whenever a supernode moves. In addition, having to re-

compute the eigenvectors at each time step increases the computational complexity.

The examples below therefore use the eigenvectors of the base level.


0

2

4

6

8

10

y

0 2 4 6 8 10

x

Figure 4.9: Final supernode placements in x-y space.

Figure 4.8 shows the final position in v2-v3 space of six supernodes that were added

to the base level network in figure 4.7. Figure 4.9 shows these supernode locations

in the regular x-y space. The figures show that the supernodes are located far from

one another both in v2-v3 space and the network. The two supernodes at the bottom

of the plot in x-y space are not very far from one another geometrically, but since

the underlying base level nodes are not linked, the distance between them across the

network is considerable.


4.8 Conclusion

Two algorithms for finding eigenvalues and eigenvectors of the state transition ma-

trices of distributed networks were presented in this chapter. The algorithms can

be run on the distributed network itself, even if the state transition matrices are

nonsymmetric. The algorithms perform well for finding the largest eigenvectors of

a matrix, but accuracy decreases and time to convergence increases with each addi-

tional eigenvector that is to be found. Since the largest eigenvalues and corresponding

eigenvectors are of particular importance in determining the worst-case convergence

rates and corresponding noise types for a distributed network, these method can be

used to monitor the system health of such a network.

We also showed how the spectral properties of a distributed network can be used

to guide the placement of supernodes in a multilevel network.

Appendix A

Distributed spectral algorithms for

nonsymmetric matrices

The following equations can be used for finding the first n eigenvalues and eigenvectors

of a nonsymmetric matrix P in a distributed network. The steps of finding the

the orthonormal basis for the eigenvectors Q and finding the actual eigenvectors

are interleaved, so that a rough estimate for the eigenvalues is available while Q

is being computed. To make it clear how this can be implemented on a distributed

network, the equations are written in a form that emphasizes which vector and matrix

components are available to individual nodes. The superscript (i) indicates the i-th

entry of a vector or i-th row of a matrix, and the superscript (i,j) denotes the (i, j)

entry of a matrix, both of which are store in node i. As a result, there are two types

of operations: Some equations presented below (such as equation A.0.1) depend only

on variables that are all stored within the same node, while other equations (such as

A.0.2) represent a consensus process, where a vector or matrix is multiplied by the

state transition matrix P.

As an initial step before running either one of the algorithms, the offset factors κ

should be determined by running the following process to convergence at each node:

κ(i)(0) = 1 (A.0.1)

κ(i)(t+ 1) = P (i)κ(t) (A.0.2)

85

APPENDIX A. DISTRIBUTED SPECTRAL ALGORITHMS 86

A.1 Power method

The equations for using the power method are as follows. To initialize, set the vectors

qj to a set of linearly independent unit vectors, and let

qj(0) = vj(0) = vj(0) (A.1.1)

All other variables are initially set to zero.

G(i,j)k (t) = v

(i)k (t− 1)q

(i)j (t− 1) (A.1.2)

Gk(t) = PM(Gk(t− 1) +G(t)−G(t− 1)

)(A.1.3)

q(i)j (t) = P (i)qj(t− 1)−

j−1∑k=1

(`(i)k (t)v

(i)k (t)

G(i,j)k (t)N

κ(i)

)(A.1.4)

h(i)j (t) =

(q(i)j (t)

)2(A.1.5)

hj(t) = PM(hj(t− 1) + hj(t)− hj(t− 1)

)(A.1.6)

q(i)j (t) = q

(i)j

√√√√∣∣∣∣∣ κ(i)

h(i)j (t)N

∣∣∣∣∣ (A.1.7)

vj(t) = qj(t)−(j−1∏k=1

(diag(`k(t− 1))− P

))vj(t− 1) (A.1.8)

m(i)j (t) =

(v(i)j (t)

)2(A.1.9)

mj(t) = PM(mj(t− 1) +mj(t)−mj(t− 1)

)(A.1.10)

v(i)j = v

(i)j

√√√√∣∣∣∣∣ κ(i)

m(i)j (t)N

∣∣∣∣∣ (A.1.11)

`(i)j =

P (i)vj(t)

v(i)j (t)

(A.1.12)

APPENDIX A. DISTRIBUTED SPECTRAL ALGORITHMS 87

A.2 QR-factorization

The equations for the method using the QR-factorization are:

W(i)j (t) = P (i)uj(t− 1) (A.2.1)

K(i)j,k(t) = W (i,j)(t)W (i,k)(t) (A.2.2)

Kj,k(t) = PM(K

(i)j,k + Kj,k(t)− Kj,k(t− 1)

)(A.2.3)

Each node then individually finds a matrix Ki ∈ Rn×n with entries

K(j,k)i (t) =

K(i)j,k(t)N

κ(i)(A.2.4)

Each node then finds a matrix Ri ∈ Rn×n, which is the Cholesky factorization of Ki.

`(i)j = Rj,j

i (A.2.5)

Si = R−1i (A.2.6)

U (i) = W (i)Si (A.2.7)

u(i)j = U (i,j) (A.2.8)

The second half of the algorithm is the same as for the power method:

vj(t) = uj(t)−(j−1∏k=1

(diag(`k(t− 1))− P

))vj(t− 1) (A.2.9)

m(i)j (t) =

(v(i)j (t)

)2(A.2.10)

mj(t) = PM(mj(t− 1) +mj(t)−mj(t− 1)

)(A.2.11)

v(i)j = v

(i)j

√√√√∣∣∣∣∣ κ(i)

m(i)j (t)N

∣∣∣∣∣ (A.2.12)

`(i)j =

P (i)vj(t)

v(i)j (t)

(A.2.13)

Bibliography

[1] J.S. Abel and J.W. Chaffee. Existence and uniqueness of GPS solutions. IEEE

T. Aero Elec. Sys., 27, 1991.

[2] S. Boyd, P. Diaconis, and L. Xiao. Fastest mixing Markov chain on a graph.

SIAM Review, 46(4):667–689, 2004.

[3] S. Boyd, A. Ghosh, and B. Prabhakar. Randomized gossip algorithms.

IEEE/ACM Transactions on Networking, 14:2508–2530, 2006.

[4] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University

Press, 2004.

[5] G. Canright, K. Engo-Monsen, and M. Jelasity. Efficient and robust fully dis-

tributed power method with an application to link analysis. Technical Re-

port UBLCS-2005-17, University of Bologna, Department of Computer Science,

Bologna, Italy, 2005.

[6] R. Carli, F. Fagnani, A. Speranzon, and S. Zampieri. Communication constraints

in the average consensus problem. Automatica, 44(3):671–684, 2008.

[7] S. Chatterjee and E. Seneta. Towards consensus: some convergence theorems on

repeated averaging. Journal of Applied Probability, pages 89–97, 1977.

[8] R.R. Coifman, S. Lafon, A.B. Lee, M. Maggioni, B. Nadler, F. Warner, and

S.W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure

definition of data: Diffusion maps. Proc. Natl. Acad. Sci., 102(21):7426–7431,

2005.

88

BIBLIOGRAPHY 89

[9] R.R. Coifman, S. Lafon, A.B. Lee, M. Maggioni, B. Nadler, F. Warner, and

S.W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure

definition of data: Multiscale methods. Proc. Natl. Acad. Sci., 102(21):7432–

7437, 2005.

[10] E.R. Davidson. The iterative calculation of a few of the lowest eigenvalues and

corresponding eigenvectors of large real-symmetric matrices. Journal of Compu-

tational Physics, 17:87–94, 1975.

[11] M.H. DeGroot. Reaching a consensus. Journal of the American Statistical As-

sociation, pages 118–121, 1974.

[12] P. Enge. Local area augmentation of GPS for the precision approach of aircraft.

Proceedings of the IEEE, 87(1), 1999.

[13] P. Enge and A.J. Van Dierendonck. Wide Area Augmentation System. In

B. Parkinson and J. Spilker, editors, Global Positioning System: Theory and

Applications, volume 2. AIAA, 1996.

[14] L. Gauthier, P. Michel, J. Ventura-Traveset, and J. Benedicto. EGNOS: The

first step in Europe’s contribution to the global navigation satellite system. ESA

Bulletin, 105:35–43, 2001.

[15] A. Ghosh and S. Boyd. Growing well-connected graphs. In Proceedings of the

45th IEEE Conference on Decision and Control, pages 6605–6611, 2006.

[16] B. Johansson, M. Rabi, and M. Johansson. A simple peer-to-peer algorithm for

distributed optimization in sensor networks. In Proceedings of the 46th IEEE

Conference on Decision and Control, pages 4705–4710, 2007.

[17] C. Kee. Wide Area Differential GPS. In B. Parkinson and J. Spilker, editors,

Global Positioning System: Theory and Applications, volume 2. AIAA, 1996.

[18] D. Kempe and F. McSherry. A decentralized algorithm for spectral analysis.

Journal of Computer and System Sciences, 74:70–83, 2008.

BIBLIOGRAPHY 90

[19] J.H. Kim, M. West, S. Lall, E. Scholte, and A. Banaszuk. Stochastic multiscale

approaches to consensus problems. In Proceedings of the 47th IEEE Conference

on Decision and Control, pages 5551–5557, 2008.

[20] J.H. Kim, M. West, E. Scholte, and S. Narayanan. Multiscale consensus for

decentralized estimation and its application to building systems. In American

Control Conference, 2008, pages 888–893, 2008.

[21] C. Lanczos. An iteration method for the solution of the eigenvalue problem of

linear differential and integral operators. Journal of Research of the National

Bureau of Standards, 45(4):255–282, 1950.

[22] J. Loomis, R. Golledge, and R. Klatzky. GPS-based navigation systems for

the visually impaired. In W. Barfield and T. Caudell, editors, Fundamentals of

wearable computers and augmented reality, pages 429–446. Lawrence Erlbaum

Associates Publishers, 2001.

[23] P. Misra and P. Enge. Global Positioning System Signals, Measurements, and

Performance. Ganga-Jamuna Press, 2006.

[24] R. Montenegro and P. Tetali. Mathematical aspects of mixing times in Markov

chains. Now Publishers Inc, 2006.

[25] R. Muellerschoen, W. Bertiger, M. Lough, D. Stowers, and D. Dong. An Internet-

based global differential GPS system, initial results. In ION National Technical

Meeting, Anaheim, CA, 2000.

[26] R. Muellerschoen, B. Iijima, R. Meyer, and Y. Bar-Sever. Real-time point-

positioning performance evaluation of single-frequency receivers using NASA’s

Global Differential GPS System. In ION GNSS Meeting, Long Beach, CA, 2004.

[27] B. Nadler, S. Lafon, R. Coifman, and I. Kevrekidis. Diffusion maps, spectral

clustering and reaction coordinates of dynamical systems. Applied and Compu-

tational Harmonic Analysis, 21(1):113–127, 2006. Diffusion Maps and Wavelets.

BIBLIOGRAPHY 91

[28] D. Niculescu and B. Nath. Ad hoc positioning system (APS). In Global Telecom-

munications Conference, 2001. GLOBECOM ’01. IEEE, volume 5, pages 2926–

2931, 2001.

[29] R. Olfati-Saber. Ultrafast consensus in small-world networks. In Proc. 2005

American Control Conference, pages 2371–2378, 2005.

[30] R. Olfati-Saber, J.A. Fax, and R.M. Murray. Consensus and cooperation in

networked multi-agent systems. Proceedings of the IEEE, 95(1):215–233, 2007.

[31] R. Olfati-Saber and R.M. Murray. Consensus problems in networks of agents with

switching topology and time-delays. Automatic Control, IEEE Transactions on,

49(9):1520–1533, sept. 2004.

[32] M. Rabinowitz and J.J. Spilker Jr. A new positioning system using television syn-

chronization signals. Broadcasting, IEEE Transactions on, 51(1):51–61, march

2005.

[33] W. Ren, R.W. Beard, and E.M. Atkins. A survey of consensus problems in

multi-agent coordination. In Proc. 2005 American Control Conference, pages

1859–1864, 2005.

[34] J. Rosenthal. Convergence rates of Markov chains. SIAM Rev., 37(3):387–405,

1995.

[35] Y. Saad. Numerical Methods for Large Eigenvalue Problems. Manchester Uni-

versity Press, 1992.

[36] Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, second edition,

2003.

[37] G. Sleijpen and H. Van der Vorst. A Jacobi-Davidson iteration method for linear

eigenvalue problems. SIAM Review, 42(2):267, 2000.

[38] M.G. Soares, B. Malheiro, and F.J. Restivo. A distributed system for the dissem-

ination of DGPS data through the Internet. In Proceedings of the International

BIBLIOGRAPHY 92

Conference on Advances in Infrastructure for Electronic Business, Education,

Science, Medicine and Mobile Technologies on the Internet (SSGRR 2003), 2003.

[39] J. Sun, S. Boyd, L. Xiao, and P. Diaconis. The fastest mixing Markov process

on a graph and a connection to a maximum variance unfolding problem. SIAM

Review, 48(4):681, 2006.

[40] A. Tahbaz-Salehi and A. Jadbabaie. Small world phenomenon, rapidly mixing

Markov chains, and average consensus algorithms. In Proceedings of the 46th

IEEE Conference on Decision and Control, pages 276–281, 2007.

[41] U. Trottenberg, A. Schuller, and C. W. Oosterlee. Multigrid Methods. Academic

Press, 2000.

[42] D.J. Watts and S.H. Strogatz. Collective dynamics of ‘small-world’ networks.

Nature, 393:440–442, 1998.

[43] D.B. Work, O.-P. Tossavainen, S. Blandin, A.M. Bayen, T. Iwuchukwu, and

K. Tracton. An ensemble Kalman filtering approach to highway traffic estimation

using GPS enabled mobile devices. In Proceedings of the 47th IEEE Conference

on Decision and Control, pages 5062–5068, 9-11 2008.

[44] L. Xiao and S. Boyd. Fast linear iterations for distributed averaging. Systems

and Control Letters, 53(1):65–78, 2004.

[45] L. Xiao, S. Boyd, and S. Lall. A space-time diffusion scheme for peer-to-peer

least-squares estimation. In IPSN ’06: Proceedings of the 5th international con-

ference on Information processing in sensor networks, pages 168–176, 2006.

[46] P. Yang, R.A. Freeman, G.J. Gordon, K.M. Lynch, S.S. Srinivasa, and R. Suk-

thankar. Decentralized estimation and control of graph connectivity for mobile

sensor networks. Automatica, 46(2):390–396, 2010.

MULTISCALE DISTRIBUTED ESTIMATION WITH …zg345yw6848/DissertationS...multiscale distributed...

Documents

Transcript of MULTISCALE DISTRIBUTED ESTIMATION WITH …zg345yw6848/DissertationS...multiscale distributed...