Approximate representations for multi-robot control policies that maximize mutual information.pdf

download Approximate representations for multi-robot control policies that maximize mutual information.pdf

of 18

Transcript of Approximate representations for multi-robot control policies that maximize mutual information.pdf

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    1/18

    Auton Robot (2014) 37:383400DOI 10.1007/s10514-014-9411-2

    Approximate representations for multi-robot control policiesthat maximize mutual information

    Benjamin Charrow Vijay Kumar Nathan Michael

    Received: 5 October 2013 / Accepted: 14 July 2014 / Published online: 23 August 2014 Springer Science+Business Media New York 2014

    Abstract We address the problem of controlling a smallteam of robots to estimate the location of a mobile targetusing non-linear range-only sensors. Our control law max-imizes the mutual information between the teams estimateand future measurements over a nite time horizon. Becausethe computations associated with such policies scale poorlywith the number of robots, the time horizon associated withthe policy, and typical non-parametric representations of thebelief, we design approximate representations that enablereal-time operation. The main contributions of this paperinclude the control policy, an algorithm for approximatingthe belief state, and an extensive study of the performanceof these algorithms using simulations and real world experi-ments in complex, indoor environments.

    1 Introduction

    Several important applications of robotics like environmen-tal monitoring, cooperative mapping, and search and rescue,require information to be gathered quickly and efciently. Inthis work, we consider a related information gathering prob-lem where a team of mobile robotsmust estimate the locationof a non-adversarial mobile target using range-only sensors.Given the targets mobility and the limited information range-only sensors provide, theteam must quickly determinewhichmovements will yield useful measurements.

    B. Charrow ( B ) V. KumarUniversity of Pennsylvania, Philadelphia, PA, USAe-mail: [email protected]

    V. Kumare-mail: [email protected]

    N. MichaelCarnegie Mellon University, Pittsburgh, PA, USAe-mail: [email protected]

    These problems can be thought of as active perceptionproblems where one must nd a control policy that is optimalin an information-theoretic sense (Grocholsky 2002 ; Stumpet al. 2009 ). Maximizing mutual information has been a par-ticularly successful approach ( Julian et al. 2011 ; Singh et al.2009 ; Hoffmannand Tomlin2010 ; Charrow et al.2014 ; Ryanand Hedrick 2010 ), as this policy seeks to gather measure-ments that will cause the greatest decrease in the uncertainty(i.e.,entropy)of theestimate. Most mutualinformationbasedcontrol laws are computationally tractable because they aregreedy maximizations over a single time step or are used inofine or single robot scenarios. However, we cannot makeany of these assumptions in our problem. Given that the tar-get is mobile the team cannot generate plans ofine. Theteam must coordinate their movements because of the lim-ited information that range measurements provide about thetargets state. Further, maximizing mutual information over anite timehorizon involves minimizinga costfunctionaloverthesetof possible trajectories, a problem that is computation-ally intractable. Even if the time horizon and the trajectoryare discretized, calculating mutual information requires inte-gration over the joint measurement space of the team, whichgrows with the time horizon of the plan and size of the team.

    To address these difculties we develop a new approx-imation and theoretical design criterion which enables usto build a real-time mutual information controller for ourmobile target tracking problem. A primary contribution is amethod to approximate mutual information by approximat-ing the distribution over the targets predicted location.Whilethis approximation necessarily introduces error into the con-trol law, we show that this error is bounded for Gaussianmeasurement models. Importantly, this bound yields insightinto the degree to which we can speed up the calculation of mutual information via approximation without signicantlyaffecting the teams performance. The other primary contri-

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    2/18

    384 Auton Robot (2014) 37:383400

    bution is a bound on the difference in mutual informationbetween two control inputs as a function of the noise of therange sensors. This analysis further aids in the design of areal-time system by enabling us to appropriately design theset of control inputs.

    We evaluate the proposed methodology and design trade-offs through simulations and real world experiments. Our

    results show that the approximation signicantly decreasesthe time to compute mutual information andenables the teamto quickly and successfully estimate the mobile targets posi-tion.

    In this work, we focus on range-only RF-based sensors aswe are interested in addressing the problem of target local-ization and tracking in search and rescue scenarios where thelocation and motion of targets are unknown, but the target isequipped with an RF ranging device (as found in many mod-ern mobile phones Park et al. 2010 ). As we will show, thesesensors enableoperation in a variety of environments withoutrequiring line-of-sight to the target, which is often the casein search and rescue scenarios. However, as the proposedmutual information based techniques are broadly applicable,the presentation of these methods in Sect. 4 remains general.

    This article expands and renes our previous work onapproximations for information-theoreticcontrol with range-only sensors ( Charrow et al. 2013 ). The primary addition inthis work is a new theoretical bound on how the entropy of a Gaussian mixture model changes when the means of itscomponents are perturbed (Sect. 4.1). This bound enables abetter analysis of how approximating thebelief of the targetslocation affects mutual information(Sect. 4.2 ) and the degreeto which different control inputs will have different informa-tion content (Sect. 4.3 ). This paper also addresses an over-sight in our previous work, in which we incorrectly arguedthat the approximations are guaranteed to introduce boundederror into the calculation of mutual information (Sect. 4.4 ).Finally, we present additional numerical studies to validatevarious approximations we make (Sect. 5).

    2 Related Work

    There is extensive work on information based control. Gro-cholsky (2002 )and Stump et al. (2009 ) developed controllersto localize static targets by maximizing the rate of changeof the Fisher information matrix, but their work assumes aGaussian belief. Hoffmann and Tomlin (2010 ) maximizedmutual information and used a particle lter for the belief,but limited coordination between robots to keep the obser-vation space small. We previously built on these ideas byusing an algorithm of Huber et al. (2008 ) to approximateentropy (Charrow et al. 2014 ). Hollinger and Sukhatme(2013 ) describe sampling based strategies for maximizing avariety of information metrics. Julian et al. (2011 ) developeddistributed algorithms for calculating mutual information to

    estimate the state of static environments using multi-robotteams. Ryan and Hedrick (2010 ) minimized entropy over areceding horizon to tracka mobilecar with a xed wing plane.We use dynamically simpler ground robots, but our approx-imations enable us to calculate control inputs for multiplerobots in real-time. Recognizing that minimizing communi-cation between robots can be important, Kassir et al. (2012 )

    developed a distributed information based control law to thatexplicitly modeled communication costs. Mutual informa-tion and entropy have also been applied to visual servoingproblems ( Dame andMarchand2011 ) andactiveobjectmod-eling (Whaite and Ferrie 1997 ).

    Krause and Guestrin (2005 ) derived performance guar-antees for greedy maximizations of mutual information andother submodular set functions. These results were applied tomobile robot environmental monitoring problems by Singhet al. (2009 ) and Binney et al. (2013 ). However, these perfor-manceguarantees only hold inofine settingswhereteamsdonot update their actions based on measurements they receive.Golovin and Krause (2011 ) generalized these ideas to onlinesettings, deriving performance guarantees for adaptive sub-modular functions. Unfortunately, mutual information is notan adaptive submodular function and the guarantees requireall actions to be available to all robots at all times, makingthese results not directly applicable to this work.

    The multi-robot range-only tracking problem we studyis similar to probabilistic pursuit-evasion games ( Chung etal. 2011 ). The problem also relates to work by Hollinger etal. (2011 ) and on mobile target estimation with xed anduncontrollable range radios. Djugash and Singh (2012 ) alsodeveloped a decentralizedestimationalgorithmfor localizingstaticandmobile nodes using range-only sensors. In contrast,we focus on controlling mobile robots to localize a mobiletarget.

    3 Mutual Information Based Control

    In this section we describe a mutual information based con-trol law for the mobile target tracking problem. The pro-posed approach considersa team of cooperating robotswhichconcurrently measure the distance to the target via their RFsensors. The team shares observations (following Fig. 1) to jointly estimate the targets location (Sect. 3.1 ). By predictingwhere the targetwill go next andwhat measurementsthey arelikely to receive (Sect. 3.2), the team chooses control direc-tions that maximize the predicted reduction in entropy of thetargets uncertainty over a nite time horizon (Sect. 3.3).

    Our control and estimation strategies are centralized andrequire communication throughout the team. This limitationis not signicant in this setting as typical range sensors areRF-based. If the team was in an environment where theycould not communicate, they could not make measurementseither.

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    3/18

    Auton Robot (2014) 37:383400 385

    Fig. 1 System overview with three robots. In this example, Robot 3aggregates range measurements and estimates the locationof the target.This estimate is used to predict the targets future locations over severaltime steps. The control law selects the trajectory that maximizes the

    mutual information between the targets predicted location and mea-surements the team expects to make. This trajectory is sent to the otherrobots which follow it

    3.1 Target Estimation

    Range measurements are a non-linear function of the targetstate and can easily lead to complex belief distributions such

    as rings and crescents ( Charrow et al. 2014 ; Djugash andSingh 2012 ; Hoffmann and Tomlin 2010 ). Due to the lim-ited information they provide, range measurements can alsolead to multiple valid hypotheses. These beliefs cannot bewell approximated by a single Gaussian and so we estimatethe targets 2D position at time t using a particle lter withthe typical recursive Bayesian lter equation (Thrun et al.2008 ):

    p( x t | z1:t )

    = p( zt

    | x t )

    p( x t 1

    | z1:t 1 ) p( x t

    | x t 1) dx t 1 (1)

    where is a normalization constant, x t is the 2D position of the robot at time t , and zt is thevector of range measurementsthe team receives at time t . For ease of reference, (Table 1)describes all relevant variables. This makes the standardassumptions that the targets state is a Markov process and

    Table 1 Notation

    Symbol Description

    x Mobile targets 2D position

    z Range measurement

    t Current time

    2m Measurement noise

    2 p Process noise

    Future time interval

    c Trajectory for the team to follow

    T Length of future time interval

    R Size of team

    M Number of particles for target prediction

    that measurements made at different points in time or by dif-ferent robots are conditionally independent given the state.

    Similar to other work on predicting the motion of an un-cooperative target ( Ryan andHedrick 2010 ; Vidal et al.2002 )

    we usea Gaussian random walk for theprocess model: p ( x t | x t 1) = N ( x t ; x t 1 , 2 p I ) where I is the identity matrix and N ( x ;, ) is a Gaussian with mean and covariance .Throughout this we assume that the measurement model,

    p( zt | x t ) , can be modeled as the true distance betweenthe target and measuring robot perturbed by Gaussian noise.Previous experimental work by Djugash and Singh (2012 )and the authors (Charrow et al. 2014 ) shows this assumptionis reasonable when the target and robot have line of sight andthat there are techniques for mitigating it when they do not.

    Particle lters are expressive, but can also be computa-tionally expensive, requiring many samples to accurately

    estimate a distribution. While we use KLD sampling (Fox2003 ) to keep the number of particles small, it is reasonableto ask whether the targets position should be estimated witha parametric lter such as the Extended Kalman Filter (EKF)or Unscented Kalman Filter (UKF). However, the EKF andUKFare only appropriate when thebelief canbe well approx-imated as a single Gaussian. This could be helpful once theteams estimate has converged, but at that point in time theparticle lter should be able to represent the belief using afew hundred particles, making it fast for both estimation andcontrol. Using a lter like the EKF also has its drawbacks.Hoffmann and Tomlin (2010 ) describe how linearizing the

    measurement model introduces error into the calculation of mutual information, potentially hurting the teams overallperformance.

    3.2 Target and Measurement Prediction

    To determine how they should move, the team must predictwhere the target will go and what measurements they willreceive. Specically, if the team plans a trajectory over the

    time interval = t +1 : t +T , they need to estimate the

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    4/18

    386 Auton Robot (2014) 37:383400

    targets future state: x = [ x t +1 , . . . , x t +T ].Thiscanbedoneusing the current belief and process model:

    p( x | z1:t ) = p( x t | z1:t )t +T

    i=t +1 p( x i | x i1 ) dx t

    M

    j=1w j ( x

    x j ) (2)

    where () is the Dirac delta function, x j is the trajectory of

    the j th particle, and w j is its weight.Over this time interval the team will receive range mea-

    surements z (c ) = [ zt +1 (ct +1 ) , . . . , zt +T (ct +T )] where zk is the vector of measurements the team receives at time k and c is the trajectory the team follows. The measurementdistribution can be calculated by marginalizing over the stateand applying the conditional independence assumption of themeasurements:

    p( z (c ) | z1:t ) = p( x

    | z1:t )t

    +T

    i=t +1 p ( zi (c ) | x i ) dx

    (3)

    Because the measurement model is Gaussian,substituting (2)into (3) results in a Gaussian mixture model representationof the measurement density. If the team has R robots, thenafter applying the conditional independence assumption the j th mixture componenthasa weightof w j with a distributionequal to t +T i=t +1

    Rr =1 p ( z

    r i (c

    r i ) | x i = x

    ji ) . Individual range

    measurements are 1-dimensional, which means the measure-ment space is R T -dimensional.

    3.3 Information Based Control

    3.3.1 Control Law

    Our control law maximizes the mutual information betweenthefuture stateof thetargetandfuture measurementsmadebythe team. By considering multiple measurements over time,the control law produces a better trajectory than a greedymaximization would. Formally, at time t the team plans howit will move over the time interval by considering a set of trajectories, C , that they can follow. We dene a trajectory asa discrete sequence of poses for each team member, so that atrajectory c C can be expressed as c = [ct +1 , . . . , ct +T ]where cr k is the 2D position of robot r at time k (we ignoreorientation as it does not affect range measurements). Aftergenerating C , the team selects the trajectory that maximizesmutual information:

    c = arg maxc CI[ z (c ), x ]

    = arg maxc CH [ z (c )] H [ z (c ) | x ] (4)

    (a) Points on a Circle (b) Indoor Planner

    Fig. 2 Representative trajectories for a singlerobot using a niteset of motion primitives with a nite horizon. The black boxes are the startinglocations, the black circles are the nal destinations, and the hollowcircles show intermediate points

    where I [ z (c ), x ]is the mutual information between futuremeasurements and the state, which can be calculated as

    the difference between H [ z (c )], the differential entropyof future measurements, and H [ z (c ) | x ], the condi-tional differential entropy of future measurements ( Coverand Thomas 2004 ). Although it is not explicit, both entropiesare conditioned on measurements made up to time t . Bothmeasurement distributions and their entropies change as afunction of where the team travels.

    To generate C , we use motion primitives. For example, inan open environment the set of trajectories could be com-posed of straight line trajectories whose nal destinationsare uniformly spaced on a circle (Fig. 2a). When non-convexobstacles are present (e.g., in an ofce building), C could

    be generated by selecting nal destinations and determiningintermediate points by interpolating along the shortest validpath (Fig. 2b).

    3.3.2 Calculating the Objective

    To evaluate the objective for a trajectory c , we separatelyevaluate the conditional measurement entropy and measure-ment entropy.

    Calculating the conditional entropy is straightforward.Using its denition and the assumed conditional indepen-dence of the measurements given the state:

    H [ z (c ) | x ] M

    i=1w i

    t +T

    j=t +1

    R

    r =1H zr j (c

    r j ) | x j = x i j

    (5)

    M is thenumberof particlesused to predict thetargets futurestate ( 2) and the approximate equality is due to the parti-cle representation. Because each measurement comes froma Gaussian distribution, the entropies in the summand can becalculated analytically, making the running time ( M RT ) .

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    5/18

    Auton Robot (2014) 37:383400 387

    Unfortunately, as shown in Sect. 3.2 themeasurement den-sity, p( z (c ) | z1:t ) , is a Gaussian mixture model (GMM)and there is no analytic method for calculating its entropy.Numerical integration methods can be used, but the domainof integration is RT -dimensional, making it large even whenthere are a few robots over short time scales. This necessi-tates theuse of algorithms like Monte Carlo integrationwhich

    are computationally expensive and unsuitable for real-timeperformance.

    However, there is a deterministic approximation for theentropy of a GMM developed by Huber et al. (2008 ). Thisalgorithm uses a truncatedTaylor seriesexpansion of the log-arithmic term in the entropy which allows the integration tobe performed analytically. We use the 0th order approxima-tion:

    H [ g ] M

    k =1w k log

    M

    j=1w j N ( k ; j , j ) (6)

    where g( x ) = M i=1 w i

    N ( x ; i , i ) . While this approxi-mation is not tight, we show experimentally that it sufcesfor information based control.

    Thetime complexity of (6) is ( M 2 RT ) . The dependenceon the number of particles which represent the targets futurestate, M , is problematic as it grows exponentially with thetime horizon, T . While there are algorithms for reducing thenumber of components in a GMM, their time complexity isalso quadratic in the number of components ( Runnals 2007 ;Huber and Hanebeck 2008 ). In Sect. 4.2 we describe a differ-ent approach which reduces the number of components by

    approximating the distribution over the targets future stateand prove that such approximations result in similar distrib-utions over measurements.

    4 Approximate Representations

    In Sect. 3 we outlined a general control law for a multi-robotteam. However, it is computationally infeasible as the teamssize and time horizon grow. It is also unclear which trajec-tories to consider. We address these problems in Sect. 4.2where we analyze how an approximation of thetargets futurestate affects the objective function, and in Sect. 4.3 where wedevelop a theoretical design criterion for generating trajecto-ries.Theseanalyses use an information theoretic relationshipthat we prove in Sect. 4.1.

    4.1 Bounding Entropy Difference

    Intuitively, if twoGaussianmixture models (GMMs) are sim-ilar, then their entropies should be similar as well. In particu-lar, one would expect that if the means of individual compo-

    nents of a GMM are slightly perturbed, then the entropy of the GMMwould not change substantially. Our primary theo-retical contribution is proving that this intuitionis correct. Wedo this by building on a previously known result that relatesthe L 1 distance between discrete distributions and their dif-ference in entropies. For clarity of presentation, all proofsare deferred to the Appendix.

    The L1 distance is one way of measuring the differencebetween distributions. For two densities f and g, it isdenedas f g 1 = | f ( x ) g ( x )|dx . To concisely state thefollowing results we also dene ( x ) = N ( x ;0, 1) as thestandard 1-dimensionalGaussian, ( x ) as its cumulativedis-tribution function, x = x T 1 x as the Mahalanobisdistance, and | | as the determinant of a matrix.Theorem 1 ( L1 bound on entropy) Let f and g be probabil-itydensity functionssuch that f g 1 1/ 2 , 0 f ( x ) 1and 0 g ( x ) 1. Then

    |H [ f ] H [ g] | f g 1 log f g 1

    + f g 1 H | f ( x ) g ( x )|

    f g 1Cover and Thomas ( 2004 , Thm. 17.3.3) prove this result

    for discrete distributions, but the proof extends to continuousdistributions whose likelihood is between 0 and 1. Usingthe following lemmas, we use Theorem 1 to prove that theentropy of a GMM does not vary substantially if the meansof its components are perturbed.

    Lemma 1 ( L1 between Gaussians with same covariance) If f ( x ) = N ( x ; 1 , ) and g ( x ) = N ( x ; 2 , ) are k-dimensional Gaussians then

    f g 1 = 2 ( () ()) = 2 (2 () 1)

    where = 1 2 / 2.Lemma 2 (Entropy bound of normalized difference of Gaussians) If f ( x ) = N ( x ; 1 , ) andg ( x ) = N ( x ; 2 , )are k-dimensional Gaussians with 1 = 2 then

    H | f g| f g 1

    log (2 e)k | |where = 1 +2 +

    2 ()2 () 1 with = 1 2 / 2.

    Theorem 2 (Bound on Entropy Difference of GMMs) Let f ( x ) =

    N i=1 w i f i ( x ) be a k-dimensional GMM where f i ( x ) = N ( x ; i , i ) and | i | (2) k so 0 f i ( x ) 1.

    If g( x ) = N i=1 w i gi ( x ) where g i ( x ) = N ( x ; i + i , i )

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    6/18

    388 Auton Robot (2014) 37:383400

    is the GMM created by perturbing the means of componentsof f by i such that f i gi 1 1/ 2 for all i then:

    |H [ f ] H [g ] | N

    i=1w i f i gi 1 log

    w i f i gi 1

    (2 e)k | i | iwhere i = 1 +

    2i +

    2i ( i )

    2 ( i )1 w i t h i = i i / 2 and f i gi 1 = 2(2 ( i ) 1) .There are two primary limitations to Theorem 2, both of

    which come from Theorem 1. The rst is that the bound onlyapplies to GMMs where the determinant of the covarianceof individual components is not too small; this is necessaryto ensure that the maximum likelihood of each component isat most 1. This issue is not signicant for our application asthe covariance is determined by the noise of range measure-ments, which is typically quite large ( Charrow et al. 2014 ).Another limitation is the L1 constraint between components,which requires that each perturbation not be large relative tothe covariance. This requirement can be relaxed by using avariant of Theorem 1. The L1 constraint arises in the proof of Theorem 1 from an inequality of the entropy function,h ( t ) = t log t . Fannes ( 1973 ) proved a similar inequalityfor h ( t ) that is weaker, but does not have the L 1 constraint;this inequality can be substituted into the proof for Theo-rem 1 to eliminate the requirement that components be neareach other.

    There are other results that connect similarity metricsbetween distributions to the difference of their entropies. Forexample, Silva and Parada (2011 , Thm. 2) recently provedthat in the limit as the KL divergence between boundedcontinuous distributions goes to 0, their entropies becomeequal. Research on rate distortion theory often looks at sim-ilar problems (e.g., how to nd approximations of distribu-tions without exceeding a certain distortion). However, weare not aware of prior work which bounds the entropy dif-ference of continuous densities like GMMs whose support isunbounded.

    4.2 Approximating Belief

    As discussed in Sect. 3.3 , the number of particles needed topredict the location of the target grows exponentially in time.This complicates calculating the entropy of the measurementdistribution ( 3), as it depends on the predictive distribution of the target ( 2). In this section, we show that a simple approx-imation of the predictive distribution can speed up calcula-tions of the objective ( 4), without signicantly affecting itsvalue.

    Our approach for approximating a particle set is to replacesubsets of similar particles with their average. Specically,we partition the state space with a regular square grid andcreate an approximate particle set by replacing particles in

    Teammember

    Original

    Approximate

    (a) Distributions overtarget locations

    OriginalApproximate

    (b) Distributions over rangemeasurements

    Fig. 3 a The original particle set ( circles ) can be approximated by asmaller set of particles ( triangles ) using a grid. b The different particlesets result in similar distributions over range measurements (e.g., peaksat d 1 and d 2 )

    the same cell with their weighted average. Figure 3 shows anexample of this process. In Fig. 3a the original particle setrepresenting a multi-modal distribution over the targets loca-tionis reduced via a grid. The new set issmaller, but itslightlychanges the measurement distribution as shown in Fig. 3b.

    To analyze the error introduced by this approximation,we assume that the range based measurement model has aconstant variance 2m and is unbiased. With these assump-tions, the conditional entropy (5) is H [ z (c ) | x ] (T R/ 2) log (2 e 2m ) , which doesnt depend on the numberof particles or their location. Thus, the approximation doesnot introduce any error for this term.

    The approximation does introduce error for the entropy of the measurement density, H [ z (c )]. We use Theorem 2 tobound this error, by showing that the approximate measure-ment distribution is a perturbed version of the original mea-surement distribution. Let X be the original set of particlesand f ( z) =

    M j=1 w j N ( z;

    f j , 2m I ) be the Gaussian mix-ture model (GMM) it creates with the trajectory c and mea-surement model, using superscripts to index components. LetG ( j) be a map from the index of a particle in X to the indexof the corresponding particle in the approximate particle setY . The density Y creates is g( z) =

    N i=1 i

    N ( z; gi , 2m I ) ,where the i th components mean and weight is determinedby the particles in X that map to the i th particle in Y . While f ( z) and g ( z) may have a different number of components, M

    N , the expressions are related in that each compo-

    nent in f ( z) corresponds to a single component in g( z) . Thismeans theapproximated measurement distribution canbe re-expressed as g ( z) =

    M j=1 w j

    N ( z; f j + j , 2m I ) where j is the difference between f j and gG ( j) . Each j canbe bounded. Note that measurements are unbiased. If cr k is

    robot r s position at time k and x X jk is the position of the j th

    particle in X at time k , the corresponding element of f j is

    f jk ,r = cr k x

    X jk 2 . Similarly, if x

    Y ik is the particle in Y

    that lies in the same cell as x X jk , then

    gik ,r = cr k x

    Y ik 2 .

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    7/18

    Auton Robot (2014) 37:383400 389

    Because x X jk and x

    Y ik lie in the same cell the farthest they

    can be apart is x X jk x

    Y ik 2 2 L where L is the cells

    length. Thus, each component of the perturbation is at most 2 L. As the perturbation vector is RT dimensional for all j :

    j 2m I

    2 L RT m (7)

    This bounds the Mahalanobis distance between the mix-tures corresponding means. When the difference is small,Theorem 2 bounds the difference in entropies. Importantly,the bound shows that if the length of a grid cell is below thestandard deviation of the measurement model, then the errorfrom the approximation will be negligible.

    4.3 Selecting Motion Primitives

    A central aspect of our approach is the use of motion prim-itives to generate trajectories. To select an informative tra- jectory quickly, we wish to generate as few as possible with-out ignoring important directions. In this section we showthat mutual information cannot change signicantly whenthe teams trajectory changes on the order of the standarddeviation of the measurement model. This suggests that gen-erating multiple trajectories that are close to one another iscomputationally wasteful as their mutual information will besimilar. It also suggests that the intermediate waypoints frommotion primitives (e.g., Fig. 2) should be spaced as a func-

    tion of the standard deviation of the measurement model.We exploit both of these observations when designing thesystem.

    Let c be a trajectory and f ( z) = M i=1 w i N ( z;

    f i , 2m I )be the measurement density it determines when combinedwith the measurement model and X , the particle set rep-resenting the predicted motion of the target. If is a per-turbation to the teams position at each point in time, c +

    is a new trajectory with measurement density g( z) = M i=1 w i N ( z;

    gi , 2m I ) . X is the same for c and c + ,so f ( z) and g( z) have the same number of components withidentical weights. Similar to the last section, the conditional

    entropy of the measurements is the same for the originaland perturbed trajectories. Consequently, to determine thedifference in mutual information, we need to determine thedifference in entropies between f ( z) and g ( z) .

    The entropy of f ( z) and g( z) are not necessarily equal asthe means of individual components may change. To boundthis difference, consider the distance between the i th particleand the r th robot at time k for both trajectories: f ik , r = x ik cr k 2 and

    gik , r = x ik (cr k + r k ) 2 . The triangle inequality

    tells us | f ik , r

    gik ,r | r k 2 . Dening i as a perturbation

    to f i that transforms it to gi , and summing over all timesteps and robots:

    i 2m I = gi f i 2m I

    t +T

    k =t +1

    R

    r =1

    r k 2

    m(8)

    When the change in a robots position is small relative to thestandard deviation of the measurement model, the mixturecomponents will not be signicantly perturbed. By Theo-rem 2, this means the entropy of the mixture will not changesignicantly.

    4.4 Bounding Entropy Difference using KullbackLeiblerDivergence

    The KullbackLeibler divergence (KL divergence or KLD)is one way of measuring the distance between two densi-ties. For two densities f and g, it is dened as KL [ f || g] =E f log f ( x )/ g ( x ) . In previouswork,we arguedthat theKLdivergence can be used to bound the difference in entropiesof GMMs. However, this reasoning wasawedand we incor-rectly concluded that the bounds provided useful insight intohow the entropy of a GMM can change. To correct the record,we detail this prior result and clarify the source of the issue.

    Lemma 3 (KLD Bound on Entropy for Gaussians) If f ( x )and g ( x ) are two k-dimensional Gaussian densities:

    |H [ f ] H [ g] | < min{KL[ f || g], KL[g || f ]} +k 2

    Lemma 4 (Entropy Upper Bound for Mixture Models) If f ( x ) = i w i f i ( x ) is a mixture model density then H [ f ] H [w ] + i w i H [ f i ] where H [w ] is the discrete entropy of the mixture weights (Proved by Huber et al. (2008 )).

    Lemma 5 (Entropy Lower Bound for Mixture Models) If f ( x ) = i w i f i ( x ) is a mixture model, H [ f ]

    i w i H [ f i ].

    Theorem 3 (Entropy Difference for Mixture Models) If f ( x ) =

    N i=1 w i f i ( x ) and g ( x ) =

    N i=1 w i gi ( x ) are mix-ture models whose components have equal weights then

    |H [ f ] H [g] | H [w ] + N

    i=1w i |H [ f i ] H [gi ] |

    where H [w ] is the discrete entropy of the mixture weights.

    Theorem 3 does not connect the difference in entropiesbetween f and g to their similarity as measured by KL diver-gence or a similar quantity. In previous work, we listed aslight variant of it, (Charrow et al. 2013 , Thm. 1), whichused Lemma 3 to bound the summand in Theorem 3. While

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    8/18

    390 Auton Robot (2014) 37:383400

    that is valid, it is not relevant for GMMs where each compo-nent has the same covariancebecause the summand is always0: the entropy of a Gaussian only depends on its covariance,so when corresponding components have the same covari-ance, they have the same entropy, making their difference 0.Because the xed covariance noise model results in GMMswhose components have identical covariance, it is incorrect

    to conclude, as we previously did, that a bounded KL diver-gence between components gives insight into the entropydifference between the GMMs.

    A challenge to the pragmatic contribution of Theorem 3 isthat the upper and lower bounds used to prove it are strong atdifferent times. The upper bound on the entropy (Lemma 4)is tightest when each of the components is separated andtheir shared support is minimal ( Huber et al. 2008 ). In con-trast, the lower bound (Lemma 5) is tightest when each com-ponent is essentially identical. It is also worth noting thatthe bound on entropy difference is looser as the mixtureweights come closer to each other. In the worst case, w isthe uniform distribution meaning H [ w ] = log N (Coverand Thomas 2004 ). Note that Theorem 2 has no such off-set term, making it a more insightful bound. Consequently,Theorem 2 in this paper enables us to reach the same con-clusions we previously made, which we believe resolves theissue.

    5 Experiments

    This section describes a series of simulations and real worldexperiments which evaluate the proposed methodology anddesign trade-offs. All of our software is written in C++ , runon an Intel Core i7 processor, and interfaced using the RobotOperating System ( ROS 2013 ).

    5.1 Computational Performance

    In Sect. 3.3 we discussed how the primary computationalchallenge of the objective is the calculation of the measure-ment densitys entropy. We now study the performance of Hubers 0th order Taylor approximation and show that theapproximation algorithm in Sect. 4.2 makes it tractable forlonger time horizons. We also discuss the impact of using analternative lter for estimation and calculating the objective.

    To assess theperformance of theTaylorapproximation,wecompare it to the Vegas algorithma Monte Carlo integra-tion algorithm provided by the Cuba library (Hahn 2013 )on a variety of mixture models. To generate a mixture wesample particles and a team trajectory uniformly at randomand then apply the measurement model. Figure 4 shows thetime to calculate the entropy of a single mixture using dif-ferent numbers of particles and measurement dimensions.Teams often have to evaluate dozens to hundreds of trajecto-

    0 2000 4000 6000 8000 1000010

    3

    102

    101

    100

    101

    102

    Number of Particles

    T i m

    e ( s e c

    )

    4D Taylor6D Taylor8D Taylor4D Monte Carlo6D Monte Carlo8D Monte Carlo

    Fig. 4 Monte Carlo integration vs. 0th order Taylor approximationfor evaluating mutual information. The Taylor approximation is moreefcient

    ries, so calculations longer than a fraction of a second make

    the control law infeasible. The data show that unlike MonteCarlo integration, the Taylor approximation is feasible inhigher dimensional spaces when the number of particles isnot large. Its lack of growth with dimension is not surpris-ing as its dependence on dimension is from simple matrixoperations.

    5.2 Effects of Approximations

    To understand the effect of the approximations on the teamsdecisions, we visualize their impact on the mutual informa-tion cost-surface in a few typical scenarios.

    Figure 5 depicts the effect of approximating the belief onthe calculation of mutual information as two robots are con-strained tomovealonga parameterized line from [0,1]. Givena circular belief distribution represented by a particle lter(M = 2500, Fig. 5a), mutual information is computed usingMonte Carlo integration for all possible locations of the teamwith range-only observations ( 2 p = 3 m, Fig. 5b). The valueof mutual information matches intuition; as the robots moveaway from the center of the belief (0.3 on the line), measure-ments become increasingly informative with correspondingincreases in mutual information. An approximate belief rep-resentation with fewerparticles(N = 26, Fig. 5d) correspond-ing to a grid length of 3.0 m yields mutual information valuesthat aresimilar to theoriginal distribution (Fig. 5e).The meanabsolute error between the original and approximate mutualinformation is 0.03.

    It is also important to understand the effect of approxi-mating the belief on the Taylor approximation, as that is howwe calculate mutual information in simulations and experi-ments. Comparing Fig. 5cf, it is clear that the approxima-tion introduces limited error for the Taylor approximationas well; the mean absolute error is 0.06. While this is larger

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    9/18

    Auton Robot (2014) 37:383400 391

    X (m)

    Y ( m )

    0

    1

    0.6

    50 60 705

    10

    15

    20

    25

    30

    (a) Circular belief

    R o

    b o

    t 2 P o s

    i t i o n

    Robot 1 Position0.2 0.4 0.6 0.8 1

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.5

    1

    1.5

    (b) MI (Monte Carlo)

    R o

    b o

    t 2 P o s

    i t i o n

    Robot 1 Position0.2 0.4 0.6 0.8 1

    0.2

    0.4

    0.6

    0.8

    1

    1

    0.5

    0

    0.5

    1

    (c) MI (Taylor)

    X (m)

    Y ( m )

    0

    1

    0.6

    50 60 705

    10

    15

    20

    25

    30

    (d) Approx. circular belief

    R o

    b o

    t 2 P o s

    i t i o n

    Robot 1 Position0.2 0.4 0.6 0.8 1

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.5

    1

    1.5

    (e) Approx. MI (Monte Carlo)

    R o

    b o

    t 2 P o s i

    t i o n

    Robot 1 Position0.2 0.4 0.6 0.8 1

    0.2

    0.4

    0.6

    0.8

    1

    1

    0.5

    0

    0.5

    1

    (f ) Approx. MI (Taylor)

    Fig. 5 Effect of approximating belief on mutual information (MI) astwo robots move along a parameterized line from [0,1] ( blue , a and d ).Approximating belief introduces limited error into mutual information

    when calculating it via Monte Carlo integration ( b vs. e) or the Taylorapproximation ( c vs. f ) (Color gure online)

    than it was fornumericalintegration, it still represents a smallchange.

    To further verify that the approximation introduces lim-ited error, Fig. 6 shows its effect when the belief is dis-

    tributed as a crescent or two separate hypotheses. Thesedistributions are useful examples as they often arise whenusing range-only measurements (Djugash and Singh 2012 ;Charrow et al. 2014 ). As in Fig. 5, it is clear that thebelief approximation introduces limited error for both MonteCarlo integration (mean absolute error below 0 .04) and theTaylor approximation (mean absolute error below 0 .09). Italso signicantly reduces the number of particles result-ing in a substantial computational speedup ( M = 1500 to N = 12 for the crescent and M = 500 to N = 8 fortwo hypotheses).

    Looking at the Taylor approximation in isolation mightsuggest that it performs poorly. For example, it some-times results in a negative value for mutual informationwhich is a non-negative quantity (Cover and Thomas 2004 ).However, for the purposes of selecting a control action,the location of the maxima matter more than their value.Given this, by comparing Fig. 5b and c it is clear thatusing the Taylor approximation will cause similar actionsto be selected: the maxima occur at the same place in bothplots, demonstrating that the Taylor approximation performswell.

    The previous plots demonstrate that the approximationintroduces relatively small amounts of error in typical scenar-ios. However, they do not illustrate the theoretical bound onentropy difference (Theorem 2). A simple example compar-

    ing theentropy of a singleGaussianto a two-component mix-ture model makes the bound clearer. Let f ( x ) = N ( x ;0, 1)be the standard 1D Gaussian and g( x ;) = 0.5 N ( x ;0, 1) +0.5 N ( x ;, 1) be a 2-component mixture model where themean of the second component is . Figure 7 shows theactual entropy difference, |H [ f ] H [g] |, between thesemixtures along with the upper bound from Theorem 2 forvarious values of . When is close to 0 the mixtures aresimilar andhave similar entropies, which thebound captures.As increases, the bound is weaker, becoming much largerthan the true entropy difference. Despite this fact, Theorem 2still provides useful theoretical insight into the effect of theapproximation.

    5.3 Flexibility of Motion and Number of Robots

    The team can presumably gather better measurements byevaluating more trajectories, but this adds computationalexpense, and additional trajectories may not be helpful. Toexamine these trade-offs in a controlled setting, we run simu-lations in an open environment andvary the numberof robotsin the team and destination points they use to generate tra-

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    10/18

    392 Auton Robot (2014) 37:383400

    X (m)

    Y ( m )

    0

    1

    0.6

    50 60 705

    10

    15

    20

    25

    30

    (a) Crescent belief

    R o b o t

    2 P o s

    i t i o n

    Robot 1 Position0.2 0.4 0.6 0.8 1

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.2

    0.4

    0.6

    0.8

    1

    (b) MI (Monte Carlo)

    R o b o t

    2 P o s

    i t i o n

    Robot 1 Position0.2 0.4 0.6 0.8 1

    0.2

    0.4

    0.6

    0.8

    1

    1

    0.5

    0

    0.5

    (c) MI (Taylor)

    X (m)

    Y ( m )

    0

    1

    0.6

    50 60 705

    10

    15

    20

    25

    30

    (d) Approx. crescent belief

    R o b o t 2 P o s

    i t i o n

    Robot 1 Position0.2 0.4 0.6 0.8 1

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.2

    0.4

    0.6

    0.8

    1

    (e) Approx. MI (Monte Carlo)

    R o b o t 2 P o s

    i t i o n

    Robot 1 Position0.2 0.4 0.6 0.8 1

    0.2

    0.4

    0.6

    0.8

    1

    1

    0.5

    0

    0.5

    (f) Approx. MI (Taylor)

    X (m)

    Y ( m )

    0

    1

    0.6

    50 60 705

    10

    15

    20

    25

    30

    (g) Two hypotheses

    R o b o t

    2 P o s

    i t i o n

    Robot 1 Position0.2 0.4 0.6 0.8 1

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.2

    0.4

    0.6

    0.8

    (h) MI (Monte Carlo)

    R o b o t

    2 P o s

    i t i o n

    Robot 1 Position0.2 0.4 0.6 0.8 1

    0.2

    0.4

    0.6

    0.8

    1

    0.8

    0.70.6

    0.5

    0.4

    0.3

    0.2

    (i) MI (Taylor)

    X (m)

    Y ( m )

    0

    1

    0.6

    50 60 705

    10

    15

    20

    25

    30

    (j) Approx. two hypotheses

    R o b o t

    2 P o s

    i t i o n

    Robot 1 Position0.2 0.4 0.6 0.8 1

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.2

    0.4

    0.6

    0.8

    (k) Approx. MI (Monte Carlo)

    R o b o t

    2 P o s

    i t i o n

    Robot 1 Position0.2 0.4 0.6 0.8 1

    0.2

    0.4

    0.6

    0.8

    1

    0.9

    0.8

    0.70.6

    0.5

    0.4

    0.3

    (l) Approx. MI (Taylor)

    Fig. 6 Effect of approximating different beliefs on mutual information (MI) as two robots move along a parameterized line from [0,1] ( blue , a , d ,g and j). The approximation introduces limited error for Monte Carlo integration and the Taylor approximation (Color gure online)

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    11/18

    Auton Robot (2014) 37:383400 393

    10 5 0 5 100

    1

    2

    3

    4

    E n t r o p y

    D i f f e r e n c e

    ActualBound

    Fig. 7 Comparison of actual entropy difference to bound from Theo-rem 2 for mixture models that differ by one component. The theoreticalbound is not sharp, butbecomes tighteras thecomponents become moresimilar

    jectories (see Fig. 2a). We evaluate (1) how quickly the teamobtains a stable and accurate estimate and (2) how accuratethe estimate is over longer periods of time.

    For each of these experiments the team moves at 0.2 m/swhile the target moves at 0.15 m/s. All members of the teamstart in the same location with the target 10 m to their right.The target repeatedly drives straight for 5 m and then selectsa new direction uniformly at random. To make meaningfulcomparisons across experiments, the target always followsthe same random path. The simulator generates range mea-surements between the target and each team member at 10Hz using a Gaussian whose mean and variance are equal tothe true distance. The simulator is asynchronous; the targetmoves even if the team is planning.

    At each point in time, the team plans a 15 s trajectorysimulating 3 target stepsand evaluates mutual informationat 3 points along each trajectory. Although each robot willmake signicantly more than 3 measurements while execut-ing its trajectory, it is not necessary to evaluate mutual infor-mation at each position that a measurement will be received.As described in Sect. 4.3 , mutual information at nearby loca-tions must be similar, so densely evaluating it will not furtherdistinguish which trajectories are more informative than oth-ers.

    To ensure that the motion model of the lter ( 1) generatessamples in all areas where the target could be, we set thestandard deviation to be p

    = 0.3, a small multiple of the

    maximum speed of the target multiplied by the samplinginterval. To predict the targets future location ( 2), the teamsimulates a step every 5 s with p = 0.25.

    The team acquires the target once they sustain an esti-mate for 10 seconds with a mean error below 1.5 m and aspread (i.e., the square root of the determinant of the covari-ance) below 1.5. We chose these values as they are close tothe best theestimator cando. Table 2 shows acquisition timesfor 5 trials with a variable number of robots and destinationpoints. Surprisingly, the 3 robot team does worse with more

    Table2 Effectofmotionprimitivesand teamsize on thetime to acquirethe target

    2 Robots 3 Robots

    3 Points 80.55s (22.68) 42.57s (3.55)

    4 Points 56.00s (6.07) 49.00s (15.24)

    5 Points 56.17s (7.31) 51.83s (15.94)

    6 Points 55.10s (12.24) 69.71s (9.94)Mean and standard deviation are shown for 5 trials

    destination points. This is a direct consequence of the expo-nential growth in the number of trajectories the control lawevaluates, which makes the robotsspenda long time planningtheir initial trajectories. In contrast, the 2 robot teams acqui-sition time improves by considering more than 3 destinationpoints, after which point it does not change. Planning timeis not an issue for the 2 robot teamthey have at most 36trajectorieswhich suggests that they are primarily limitedby their team size. This is emphasized by the 10 s differencein performance between the best 2 robot time and the best 3robot time.

    To assess the long term performance of the team, we ranthe same set of experiments for 5 min. Figure 8a shows theestimates error over time. While the target is acquired at dif-ferent points in time, the nal error is the same for all para-meters. The computational difculties that teams with moretrajectorieshave arenot present once thelter hasconverged,as the belief can be represented with fewer particles. Fig-ure 8b shows the long term error from a separate experimentwhere the target moves 15 m to the right of its starting loca-tion and then back. This motion makes it harder to obtain andmaintain an accurate estimate because the target moves awayfrom the team for a long time without periodically stoppingand turning. The similar results across experiments indicatethat the estimation and control approach work well.

    5.4 Indoor Experiment

    To study the real world performance of a team we run tworeal world experiments where two mobile robots track a thirdmobile robot as it moves in a loop through an ofce build-ing as shown in Fig. 9. The targets path is 55 m long andit moves at 0.12 m/s, traversing the loop in about 8 min.The team starts across the map, travels at 0.2 m/s, and plansusing destination points that are 6.0m away. Each robot isequipped with a Hokuyo-URG04LX laser scanner for local-ization, a 802.11s wireless mesh card for communication,and a range sensor that is commercially available as part of the nanoPAN 5375 Development Kit ( nanoPAN 5375 Devel-opment Kit 2013 ). The nanoPAN 5375s measurements oftenhave an error larger than 2.0 m ( Charrow et al. 2014 ).

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    12/18

    394 Auton Robot (2014) 37:383400

    0 100 200 3000

    2

    4

    6

    8

    10

    12

    14

    Time (sec) E r r o r - -

    W e i g h

    t e d A v g

    X Y P o s

    i t i o n

    ( m )

    3 Points4 Points5 Points6 Points

    (a) Random walk

    0 100 200 3000

    2

    4

    6

    8

    10

    12

    14

    Time (sec) E r r o r - -

    W e i g h

    t e d A v g

    X Y P o s

    i t i o n

    ( m )

    3 Points4 Points5 Points6 Points

    (b) Back and forth

    Fig. 8 Mean error of estimate for various motion primitives with 3robots. The number of primitives do not affect the long term errorregardless of how the target moves

    In our rst experiment, the target traverses its loop once.Figure 10a shows the error of the estimate over 7 trials. Over-all, the team is able to obtain a good estimate quickly andmaintain it. The mean root mean square error (RMSE) fromt = 60 s to the end of the experiment was 2 .43 m with astandard deviation of 0.68. The authors previously achievederrors of 1.02.0 m when localizing stationary nodes withthe same sensor (Charrow et al.2014 ). Given that the target ismoving in this experiment, we believe these results indicatethat the team performs well.

    In our second experiment, we assess the longer term per-formance of our approach by having the target traverse thesame loop 4 times. Figure 10b shows the error of 2 differ-ent trials and Fig. 11 shows the lters evolution in trial 1.Again, the team does a good job tracking the target andobtains RMSEs of 3.88 and 2 .83 m. These slightly highererror rates are due to multiple hypotheses that sometimesarise as a result of the limited information range measure-ments provides. Given the limited travel directions in thisenvironment, it is also difcult for the team to obtain mea-surements that remove ambiguities. Figures 11 d and 10b at

    X (m)

    Y ( m )

    10 0 100

    5

    10

    15

    20

    25

    30

    Fig. 9 Indoor experimental setup. The target starts at the blue X andmoves in a loop. The team starts at the red circle (Color gure online)

    t 1, 000 s show this happening. However, whenever theerror increases, it eventually decreases. It is also important tonote that when the estimates error increases, its covariancealso increases; the rise in error is not indicative of the teambeing overly-condent of a bad estimate.

    6 Conclusion

    There are many interesting directions for future research. Forinstance, in this work the team follows a sequence of way-points at a xed velocity, but could presumably increase theirperformance by moving faster or slower at various points intime. We would also like to investigate integrating other sen-

    sors. Finally, we are motivated by the theoretical results of this work and seek to prove stronger relationships betweensimilarity measures of continuous distributions and the dif-ference of their entropies.

    In this paper we present a control policy for maximiz-ing mutual information over a nite time horizon to enablea team of robots to estimate the location of a mobile targetusing range-only sensors. To enable calculation of our policyin real-time, we develop an approximation of the belief. Byproving that the entropy of a Gaussian mixture model cannotchange substantially when its components are perturbed, weargue that these approximations do not signicantly affect

    the teams behavior. We also develop a theoretical design cri-terion for generating motion primitives by connecting theincremental motions the team makes to the variance of themeasurement model. We validated our approach using simu-lations and real world experiments in which a team of robotssuccessfully track a mobile target.

    Acknowledgments We gratefully acknowledge the support of ONRGrantN00014-07-1-0829,ARLGrantW911NF-08-2-0004, andAFOSRGrant FA9550-10-1-0567. Benjamin Charrow was supported by aNDSEG fellowship from the Department of Defense.

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    13/18

    Auton Robot (2014) 37:383400 395

    Fig. 10 Distance from theestimates mean to the targetstrue location as the target drivesaround

    0 100 200 300 4000

    5

    10

    15

    20

    Time (sec) E r r o r - -

    W e i g h

    t e d A v g

    X Y P o s

    i t i o n

    ( m )

    (a) 1 loop (7 trials)

    0 500 1000 15000

    5

    10

    15

    20

    Time (sec) E r r o r - -

    W e i g h

    t e d A v g

    X Y P o s

    i t i o n

    ( m )

    (b) 4 loops (2 trials)

    Fig. 11 Evolution of particle lter for trial 1 of the 4 loop experi-ment. Blue dots represent particles and the red X shows the ltersmean. Gray boxes with numbers show the teams position. a Initially,the targets location is uncertain. bc By moving intelligently, the team

    accurately estimates the targets location. de Multiple hypotheses canarise due to the targets motion and the limited information of rangemeasurements, but they are eventually eliminated (Color gure online)

    Appendix: Proofs

    Integrating Gaussians over a half-space

    Proving Lemmas 1 and 2 requires integrating Gaussian func-tions over a half-space. The following identities will beused several times. Owen (1980 ) gives these identities for1-dimensional Gaussians, but we have been unable to nda reference for the multivariate case. For completeness, weprove them here.

    Lemma 6 If f ( x ) = N ( x ;, ) is a k-dimensionalGaussian and A

    = { x

    : a T x

    + b > 0

    } is a half-space

    then

    A f ( x ) dx = ( p) (9) A x f ( x ) dx = ( p) + ( p) a a T a (10) A xx T f ( x ) dx = ( p) ( + T )

    + ( p)a T + a T a T a p

    aa T

    a T a (11)

    where p = (aT

    + b)/

    aT

    a is a scalar, ( x ) = N ( x ;0, 1) is thePDFof thestandard 1-dimensionalGaussianand ( x ) is its CDF.

    Proof of (9) All of these integrals are evaluated by makingthe substitution x = Ry, where R is a rotation matrix thatmakes the half-space A axis aligned. Specically, dene Rsuch that a T R = a eT 1 where e1 is a k -dimensional vectorwhose rst entry is 1 and all others are 0. This substitution isadvantageous, because it makes the limits of integration forall components of y except y1[, ].

    Because RT x = y, y is a k -dimensional Gaussian withdensity q( y)

    = N ( y

    ; RT , RT R) . The determinant of

    the Jacobian of y is the determinant of a rotation matrix:

    | y/ x | = | RT | = 1. Substituting:

    a T x +b> 0 f ( x ) dx = (a T R) y+b> 0 | RT |q ( y) dy= a y1+b> 0 q ( y) dy= b/ a q1( y1) dy1

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    14/18

    396 Auton Robot (2014) 37:383400

    where q1 ( y1) = N ( y1;eT 1 ( RT ), eT 1 ( RT R)e1 ) is the den-sity for the rst component of y. The nal step followsas the limits of integration marginalize q( y) . To simplifythe remaining integral, apply the denition of R, q1( y1 ) = N ( y1; T a / a , a T a / a 2 ) , and use 1 ( x ) =

    ( x ):

    b/ a q1 ( y1 ) dy1 = 1 b T a a T a = ( p)

    Proof of (10) First, we perform a change of variables sothat the integral is evaluated over the standard multivari-ate Gaussian g( x ) = N ( x ;0, I ) . This substitution is x =1/ 2 y + which can be seen by noting that 1) | y/ x | =| |1/ 2 and 2) f ( x ) = | |1/ 2 g ( 1/ 2( x )) :

    A

    x f ( x ) dx =

    B

    ( 1/ 2 y +) g ( y) dy

    = 1/ 2 B yg( y) dy + B g( y) dy (12)where B = { y : (a T 1/ 2 ) y +(a T +b) > 0} is the trans-formed half-space.

    To evaluatethe rst term in (12), wecalculate C yg( y) dy ,where C = { y : c T y +d > 0} is a new generic half-space.This integral is easier than the original problem as g( y) is thedensity of a zero-mean Gaussian with identity covariance. Toproceed, substitute y = Rz where R is a rotation matrix thatmakes C axis-aligned (i.e., cT R = c eT 1 ) and observe thatg ( Rz) = g ( z):

    C yg ( y) dy = cT Rz+d > 0 Rzg( Rz)| RT |dz= Re1 d / c z1( z1) dz1= Re1

    d c =

    cc

    d c

    (13)

    e1 appears because g ( x ) is 0-mean; the only non-zero com-ponent of the integral is from z1 . The 1-dimensional integralfollows as d ( x )/ dx = x ( x ) .

    To nish, ( 12) can be evaluated using the formula in (9)and ( 13):

    A x f ( x ) dx = 1/ 2 1/ 2a

    a T a ( p) + ( p)

    Proof of (11) Similar to the last proof, make the substitu-tion x = 1/ 2 y + with g ( y) as the standard multivariateGaussian and expand terms.

    A xx T f ( x ) dx = 1/ 2 B yyT g ( y) dy 1/ 2+ 1/ 2 B yg( y) dy T + B yT g ( y) dy 1/ 2 + T B g( y) dy (14)

    where B = { y : (a T 1/ 2 ) y +(a T +b) > 0} is the trans-formed half-space.

    To evaluate ( 14), we only need a formula for the rst inte-gral; theprevious proofshaveexpressionsfor theother3 inte-grals.To evaluate therst integral, let C = { y : cT y+d > 0}be a new half-space and use the same rotation substitution y = Rz as in the last proof.

    C yyT g ( y) dy = R z1 c +d > 0 zzT g ( z) dz RT

    = R

    d

    c I

    d

    c

    d

    ce1eT 1 R

    T

    = d c

    I d c

    d c

    cc T

    c 2 (15)

    The previous integral can be evaluated by analyzing eachscalar component. g is the standard multivariate Gaussian,so g( z) =

    k i=1 ( zi ) . There are three types of terms:

    z21 : The limits of integration marginalize g and the result-ing integral can be solved using integration by parts:

    z1 c +d > 0 z21 g( z) dz = d / c z

    21 ( z1) dz1

    = d c

    d c

    d c

    z2i ;(i > 1):

    z1 c +d > 0 z2i g( z) dz= d / c ( z1) dz1

    z2i ( zi ) dz i = d c

    The integral over zi is 1 because its the variance of thestandard normal.

    zi z j (i = j, i = 1): These indices cover allnon-diagonalelements.

    z1 c +d > 0 zi z j g ( z) dz=

    z j ( z j ) dz j zi ( zi ) dz i = 0

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    15/18

    Auton Robot (2014) 37:383400 397

    zi = 1, so its limits are the real line. Because g is 0 meanthe integral is 0.

    We now have formula for all of the terms in ( 14). Recallthat B = { x : ( 1/ 2a )T x + (a T + b) > 0}. Using p =(a T +b)/ a T a , (9), (10), and (15):

    a T x +b xx T f ( x ) dx = 1/ 2 ( p) I p ( p)

    1/ 2aa T 1/ 2

    a 21/ 2

    + 1/ 2 ( p)1/ 2a

    a T a T

    + ( p)1/ 2a

    a T aT

    1/ 2 + ( p) T

    = ( p) + T

    + ( p)a T + a T a T a p

    aa T

    a 2

    which is the formula we sought. 6.1 Proof of Lemma 1

    The norm can be evaluated by splitting the integral up intoregions where the absolute value disappears. Let A be the setwhere f ( x ) > g( x ) and Ac be the complement of A where f ( x )

    g ( x ) . Noting that

    A g( x )

    = 1

    Ac g ( x ):

    | f ( x ) g ( x )|dx = 2 A f ( x ) g ( x ) dx (16) f and g have the same covariance, so f is bigger than g on ahalf-space A = { x : a T x +b > 0}where a = 1( 1 2 )and b = ( 1 + 2 ) 1( 2 1)/ 2. This means ( 16) canbe evaluated using Lemma 6. To do this, we need to evaluate p for A f ( x ) dx and A g( x ) dx . There are three relevantterms: T 1 a +b =

    1 2 22

    (17)

    T 2 a +b = 1 2 2

    2(18)

    a T a = 1 2 (19)Using = 1 2 / 2 we get (a T 1 +b)/ a T a = and (a T 2 +b)/ a T a = . Plugging these values intoLemma 6 completes the proof.

    6.2 Proof of Lemma 2

    Let X be a random variable whose density is | f ( x ) g( x )|/ f g 1 . Calculating the entropy of X is difcult astheexpressioninvolves thelog of theabsolute valueof thedif-ference of exponentials. Fortunately, the covariance of X canbe calculated. This is usefulbecause themaximum entropy of

    any distribution with covariance is (1/ 2) log (( 2 e)k | |) ,the entropy of the multivariate Gaussian (Cover and Thomas2004 , Thm. 8.6.5). By explicitly calculating the determinantof the covariance of X , we prove the desired bound.

    To calculate X s covariance, use the formula cov X =E X X T E [ X ] E [ X ]T . Similar to the proof of Lemma 1,we evaluatethe mean by breaking theintegralup into a region A where f ( x ) > g( x ) and AC where g( x ) f ( x ) vice-versa:

    E [ X ] =1

    f

    g 1

    A

    x ( f ( x ) g ( x )) dx

    + Ac x (g ( x ) f ( x )) dx (20)As before, f and g have the some covariance, so A = { x :a T x +b > 0}and Ac = { x : (a )T x +(b) 0}are half-spaces with a = 1( 1 2) and b = ( 1+ 2 ) 1( 2 1)/ 2.

    Eachof the terms in( 20) are Gaussianfunctions integratedover a half-space, which can be evaluated using Lemma 6.To simplify the algebra, we evaluate the integrals involving f and g separately. This involves calculating a few terms,

    three of which are repeats: ( 17), (18), and ( 19). The otherterm is a = 1 2 . As the difference of the means willarise repeatedly, dene = 1 2 . Noting 2 = and ( x ) = ( x ) , the integrals involving f are:

    A x f ( x ) dx Ac x f ( x ) dx = () 1 +

    ()2 () 1 +

    ()2

    ( )

    = ( () ()) 1 + ()

    (21)

    Next, evaluate the integrals in (20) involving g . This canbe done by pattern matching from ( 21). The main changeis that 1 becomes 2 and the sign of p in Lemma 6 ips,meaning the sign of the arguments to () and () ip (see(17) and (18)).

    A xg ( x ) dx Ac xg( x ) dx = ( () ()) 2 +

    ()

    (22)

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    16/18

    398 Auton Robot (2014) 37:383400

    Subtracting ( 22) from (21), recognizing ( x ) = ( x ) , anddividing by f g 1 = 2( () ()) simplies ( 20):

    E [ X ] = 1 + 2

    2(23)

    We now evaluate X s second moment. Split the integral

    up over A and Ac :

    E X X T =1

    f g 1

    A xx T ( f ( x ) g ( x )) dx + Ac xx T (g( x ) f ( x )) dx (24)

    Once again, we evaluate this expression by separately eval-uating the integrals involving f and g .

    Starting with f and using Lemma 6 with and :

    A xx T f ( x ) dx = ()( + 1 T 1 )+ ()

    T 1 + 1 T 2

    T

    42

    (25)

    Ac xx T f ( x ) dx = () ( + 1 T 1 ) ()

    T 1 + 1 T 2

    T

    42

    (26)

    Taking the difference of ( 25) and (26):

    A xx T f ( x ) dx Ac x x T f ( x ) dx = ( () ( ))( + 1 T 1 )

    + ()

    T 1 + 1 T

    T

    2 (27)

    To evaluate the integralsin (24) involving g , wecanpatternmatch using ( 25) and (26). As in the derivation of ( 22), thisinvolves negating the p terms and replacing 1 with 2 .

    A xx T g( x ) dx Ac x x T g ( x ) dx = ( () ())( + 2 T 2 )

    + ()

    T 2 + 2 T +

    T

    2 (28)

    Note the sign difference of T compared to (27).

    To nish calculating the second moment, subtract (28)from ( 27) and divide by f g 1 , simplifying ( 24)

    E X X T =12

    (2 + 1 T 1 + 2 T 2 ) + ()

    f g 1T

    We can now express the covariance of X .

    cov X = + 4T

    +()

    f g 1T

    = + 2 +2()

    2 () 1 T

    42 (29)

    Which follows as ( 1 T 1 + 2 T 2 )/ 2 ( 1 + 2 )( 1 + 2 )T / 4 = T / 4.

    To calculate the determinant of the covariance, factor out of ( 29) and dene = 2 +

    2 ()2 ()1 :

    |cov X | = I +1 T

    42 = | | (1 +) (30)

    The last step follows from the eigenvalues. 1 T onlyhas one non-zero eigenvalue; it is a rank one matrix as 1is full rank and T is rank one. The trace of a matrix is thesum of its eigenvalues, so tr ( 1 T ) = t r ( T 1 ) =2 = 42 is the non-zero eigenvalue. Consequently, theonly non-zero eigenvalue of 1 T / 42 is . Adding I to a matrix increases all its eigenvalues by 1 so the only

    eigenvalue of I + 1 T

    / 42

    that is not 1 has a value of 1+ . The determinant of a matrix is the product of its eigen-values, so | I + 1 T / 42| = 1 + . As discussed atthe beginning of the proof, substituting (30) into the expres-sion for the entropy of a multivariate Gaussian achieves thedesired upper bound.

    6.3 Proof of Theorem 2

    To prove the theorem, we build on Theorem 1. Unfortunately,it cannot be directly applied because it is difcult to evaluate1) the L1 norm between GMMs and 2) the entropy of the nor-malized difference of mixture models. However, Lemmas 1and 2 provide a way to evaluate these quantities when thedensities are individual Gaussians. To exploit this fact, wesplit the problem up and analyze the difference in entropiesbetween GMMs that only differ by a single component.

    To start, dene d j ( x ) = ji=1 w i gi ( x )+

    ni= j+1 w i f i ( x ) .d j is a GMM whose rst j components are the rst j com-

    ponents in g and last n j components are the last n jcomponentsin f . Notethat d 0( x ) = g ( x ) and d n ( x ) = g ( x ) .Using the triangle inequality:

    1 3

  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    17/18

    Auton Robot (2014) 37:383400 399

    H [ f ] H [ g] | = |H [d 0 ] H [d 1 ] +H [d 1] H [d n ] | |H [d 0 ] H [d 1 ] | + |H [d 1 ] H [d n ] |

    N

    j=1|H d j1 H d j |

    where the last step applied the same trick n

    2 more times.

    Because d j1( x ) d j ( x ) = w j ( f j ( x ) g j ( x )) each termin the summand can be bounded using Theorem 1:

    H d j1 H d j w j ( f j g j ) 1 log w j ( f j g j ) 1

    + w j ( f j g j ) 1 H | f j ( x ) g j ( x )|

    f j g j 1 (31)

    Because f j ( x ) and g j ( x ) areGaussians with thesame covari-ance, we can apply Lemmas 1 and 2 to complete the proof.

    References

    Binney, J.,Krause,A.,& Sukhatme, G. S. (2013).Optimizingwaypointsfor monitoring spatiotemporal phenomena. International Journal of Robotics Research , 32(8), 873888.

    Charrow, B., Kumar, V., & Michael, N. (2013). Approximate represen-tations for multi-robot control policies that maximize mutual infor-mation. In Proceedings of Robotics: Science and Systems, Berlin,Germany .

    Charrow, B., Michael, N., & Kumar, V. (2014). Cooperativemulti-robotestimation and control for radio source localization. The Interna-tional Journal of Robotics Research , 33 , 569580.

    Chung, T., Hollinger, G., & Isler, V. (2014). Search and pursuit-evasion

    in mobile robotics. Autonomous Robots , 31(4), 299316.Cover, T. M., & Thomas, J. A. (2004). Elements of information theory .New York: Wiley.

    Dame, A., & Marchand, E. (2011). Mutual information-based visualservoing. The IEEE Transactions on Robotics , 27 (5), 958969.

    Djugash, J., & Singh, S. (2012). Motion-aidednetwork slamwithrange. International Journal of Robotics Research , 31(5), 604625.

    Fannes, M. (1973). A continuity propertyof theentropy density for spinlattice systems. Communications in Mathematical Physics , 31(4),291294.

    Fox, D. (2003). Adapting the sample size in particle lters throughKLD-sampling. International Journal of Robotics Research , 22 (12),9851003.

    Golovin, D., & Krause, A. (2011). Adaptive submodularity: Theoryand applications in active learning and stochastic optimization. The

    Journal of Articial Intelligence Research , 42(1), 427486.Grocholsky,B. (2002). Information-theoretic control of multiple sensor platforms . PhD thesis, Australian Centre for Field Robotics.

    Hahn, T. (2013, January). Cuba . http://www.feynarts.de/cuba/ .Hoffmann, G., & Tomlin, C. (2010). Mobile sensor network control

    using mutual information methods and particle lters. The IEEE Transactions on Automatic Control , 55 (1), 3247.

    Hollinger, G., & Sukhatme, G. (2013). Sampling-based motion plan-ning for robotic information gathering. In Proceedings of Robotics:Science and Systems, Berlin, Germany .

    Hollinger, G., Djugash, J., & Singh, S. (2011). Target tracking withoutline of sight using range from radio. Autonomous Robots , 32(1), 114.

    Huber,M., & Hanebeck, U. (2008).Progressive gaussianmixture reduc-tion. In International Conference on Information Fusion .

    Huber, M., Bailey, T., Durrant-Whyte, H., & Hanebeck, U. (2008,August). On entropy approximation for gaussian mixture randomvectors. In Conference on Multisensor Fusion and Integration for Intelligent Systems, Seoul, Korea (pp. 181188).

    Julian, B. J., Angermann, M., Schwager, M., & Rus, D. (2011, Septem-ber). A scalable information theoretic approach to distributed robotcoordination. In Proceedings of the IEEE/RSJ International Con- ference on Intellegent Robots and System, San Francisco, USA (pp.51875194).

    Kassir, A., Fitch, R., & Sukkarieh, S. (2012, May). Decentralised infor-mation gathering with communication costs. In Proceedings of the IEEE International Conference on Robotics and Automation, Saint Paul, USA (pp. 24272432).

    Krause, A., & Guestrin, C. (2005). Near-optimal nonmyopic value of information in graphical models. In Conference on Uncertainty in Articial Intelligence (pp. 324331).

    nanoPAN 5375 Development Kit. (2013, September) .http://nanotron.com/EN/pdf/Factsheet_nanoPAN_5375_Dev_Kit.pdf .

    Owen, D. (1980). A table of normal integrals. Communications inStatistics-Simulation and Computation , 9(4), 389419.

    Park, J.G., Charrow, B., Battat, J., Curtis, D., Minkov, E., Hicks, J.Teller, S., & Ledlie, J. (2010). Growing an organic indoor locationsystem. In Proceedings of International Conference on Mobile Sys-tems, Applications, and Services, San Francisco, CA .

    ROS. (2013, January). http://www.ros.org/wiki/ .Runnals, A. (2007). KullbackLeibler approach to gaussian mixture

    reduction. IEEE Transactions on Aerospace and ElectronicSystems ,43 (3), 989999.

    Ryan, A., & Hedrick, J. (2010). Particle lter based information-theoretic active sensing. Robotics and Autonomous Systems , 58 (5),

    574584.Silva, J.F., Parada,P.(2011). Sufcient conditions forthe convergence of the shannon differential entropy. In IEEE Information Theory Work-shop, Paraty, Brazil (pp. 608612).

    Singh, A., Krause, A., Guestrin, C., & Kaiser, W. J. (2009). Efcientinformative sensing using multiple robots. The Journal of Articial Intelligence Research , 34(1), 707755.

    Stump, E., Kumar, V., Grocholsky, B., & Shiroma, P. (2009). Controlfor localization of targets using range-only sensors. International Journal of Robotics Research , 28 (6), 743.

    Thrun, S., Burgard, W., & Fox, D. (2008). Probabilistic robotics . Cam-bridge: MIT Press.

    Vidal, R., Shakernia, O., Jin Kim, H., Shim, D., & Sastry, S.(2002). Probabilistic pursuit-evasion games: Theory, implementa-tion, and experimental evaluation. IEEE Transactions on Robotics

    and Automation , 18 (5), 662669.Whaite, P., & Ferrie, F. P. (1997). Autonomous exploration: Driven byuncertainty. TheIEEE Transactionson Pattern AnalysisandMachine Intelligence , 19 (3), 193205.

    1 3

    http://www.feynarts.de/cuba/http://www.feynarts.de/cuba/http://nanotron.com/EN/pdf/Factsheet_nanoPAN_5375_Dev_Kit.pdfhttp://nanotron.com/EN/pdf/Factsheet_nanoPAN_5375_Dev_Kit.pdfhttp://nanotron.com/EN/pdf/Factsheet_nanoPAN_5375_Dev_Kit.pdfhttp://nanotron.com/EN/pdf/Factsheet_nanoPAN_5375_Dev_Kit.pdfhttp://www.ros.org/wiki/http://www.ros.org/wiki/http://www.ros.org/wiki/http://nanotron.com/EN/pdf/Factsheet_nanoPAN_5375_Dev_Kit.pdfhttp://nanotron.com/EN/pdf/Factsheet_nanoPAN_5375_Dev_Kit.pdfhttp://www.feynarts.de/cuba/
  • 8/10/2019 Approximate representations for multi-robot control policies that maximize mutual information.pdf

    18/18

    400 Auton Robot (2014) 37:383400

    Benjamin Charrow is a Ph.D.candidate in the Computer andInformation Science Departmentat the University of Pennsyl-vania. He works on informa-tionbasedcontrolfor multi-robotteams.

    Vijay Kumar is the UPSFoundation Professor and theDeputy Dean for Education inthe School of Engineering andApplied Science at the Univer-sityof Pennsylvania. He receivedhis Ph.D. in Mechanical Engi-neering from The Ohio StateUniversity in 1987. He has beenon the Faculty in the Depart-ment of Mechanical Engineer-ing and Applied Mechanics witha secondary appointment in theDepartment of Computer andInformation Science at the Uni-

    versity of Pennsylvania since 1987. He is a Fellow of the AmericanSociety of Mechanical Engineers (ASME) and the Institution of Elec-trical and Electronic Engineers (IEEE).

    Nathan Michael is an AssistantResearch Professor in the Robot-ics Institute at Carnegie MellonUniversity in Pittsburgh, Penn-sylvania. Previously, he receiveda Ph.D. from the Departmentof Mechanical Engineering atthe University of Pennsylvania(2008) before transitioning intoa Research Faculty position inthe same department (2010). Hisresearch focuses on enablingautonomous ground and aer-ial vehicles to robustly operatein uncertain environments with

    emphasis on robust state estimation, control, and cooperative auton-omy.

    1 3