Vision Aided Inertial Navigation
-
Upload
ehsan-asadi -
Category
Documents
-
view
249 -
download
0
Transcript of Vision Aided Inertial Navigation
-
7/24/2019 Vision Aided Inertial Navigation
1/14
This article was downloaded by: [Nanyang Technological University]On: 04 September 2015, At: 00:57Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place,London, SW1P 1WG
Click for updates
Advanced RoboticsPublication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/tadr20
Tightly-coupled stereo vision-aided inertial navigation
using feature-based motion sensorsE. Asadi
a& C.L. Bottasso
ab
aDepartment of Aerospace Science and Technology, Politecnico di Milano, Milano 20156,
Italy.bWind Energy Institute, Technische Universitt Mnchen, Garching bei Mnchen 85748,
Germany.
Published online: 13 Jan 2014.
To cite this article:E. Asadi & C.L. Bottasso (2014) Tightly-coupled stereo vision-aided inertial navigation using feature-
based motion sensors, Advanced Robotics, 28:11, 717-729, DOI: 10.1080/01691864.2013.870496
To link to this article: http://dx.doi.org/10.1080/01691864.2013.870496
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of tContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon ashould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveor howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
http://crossmark.crossref.org/dialog/?doi=10.1080/01691864.2013.870496&domain=pdf&date_stamp=2014-01-13http://www.tandfonline.com/page/terms-and-conditionshttp://www.tandfonline.com/page/terms-and-conditionshttp://dx.doi.org/10.1080/01691864.2013.870496http://www.tandfonline.com/action/showCitFormats?doi=10.1080/01691864.2013.870496http://www.tandfonline.com/loi/tadr20http://crossmark.crossref.org/dialog/?doi=10.1080/01691864.2013.870496&domain=pdf&date_stamp=2014-01-13 -
7/24/2019 Vision Aided Inertial Navigation
2/14
Advanced Robotics, 2014
Vol. 28, No. 11, 717729, http://dx.doi.org/10.1080/01691864.2013.870496
FULL PAPER
Tightly-coupled stereo vision-aided inertial navigation using feature-based motion sensors
E. Asadia and C.L. Bottassoa,baDepartment of Aerospace Science and Technology, Politecnico di Milano, Milano 20156, Italy;b Wind Energy Institute, Technische
Universitt Mnchen, Garching bei Mnchen 85748, Germany
(Received 31 March 2013; revised 6 August 2013; accepted 22 September 2013)
A tightly-coupled stereo vision-aided inertial navigation system is proposed in this work, as a synergistic incorporationof vision with other sensors. In order to avoid loss of information possibly resulting by visual preprocessing, a set offeature-based motion sensors and an inertial measurement unit are directly fused together to estimate the vehicle state.Two alternative feature-based observation models are considered within the proposed fusion architecture. The first modeluses the trifocal tensor to propagate feature points by homography, so as to express geometric constraints among threeconsecutive scenes. The second one is derived by using a rigid body motion model applied to three-dimensional (3D)reconstructed feature points. A kinematic model accounts for the vehicle motion, and a Sigma-Point Kalman filter is usedto achieve a robust state estimation in the presence of non-linearities. The proposed formulation is derived for a generalplatform-independent 3D problem, and it is tested and demonstrated with a real dynamic indoor data-set alongside of a
simulation experiment. Results show improved estimates than in the case of a classical visual odometry approach andof a loosely-coupled stereo vision-aided inertial navigation system, even in GPS (Global Positioning System)-deniedconditions and when magnetometer measurements are not reliable.
Keywords:vision-aided inertial navigation; sensor fusion; tight-coupling; trifocal constraint
1. Introduction
It appears that some animal species have evolved sophisti-
cated behavioral capabilities by, among other means, fusing
together specialized multiple sensory systems, as for exam-
ple in the case of some flying insects.[1] Mimicking this sit-
uation, the sensor fusion concept is being actively exploitedin many robotics applications using both proprioceptive and
exteroceptive sensors.[2]
Stereo vision systems [3] are amongst the most
information-rich sensors available today, and perform in
a manner similar to the animal binocular vision.[4] Eyes
provide the brain with two views of a scene, which are in
turn combined and processed by the brain itself to build a
complex representation of the environment, including fea-
tures and depth information. Vision-based systems have
been gaining increased attention in the field of autonomous
control and navigation in ground,[57] underwater[8], and
also aerial [9,10] applications. Several attempts have al-
ready been documented in the design and implementation
of robust visual odometry systems relying on monocular
or binocular vision.[1113] Egomotion estimation through
the use of the trifocal tensor using monocular vision was
described in [14], and then extended in [12] to stereo
visual odometry by considering a simple constant velocity
Corresponding author. Email: [email protected]
motion. Visual odometry approaches based on a quadrifocal
relationship have also been proposed.[15]
However, reliance on the sole visual information is a
potential weakness of visual odometry methods for many
applications, as scenes may occasionally become not suf-
ficiently information rich and vision may be impaired byseveral external factors. Some authors have addressed this
problem by combining vision and inertial sensor informa-
tion, as this may enhance the quality of state estimates,
with a positive cascading effect on the control systems that
make use of such estimates. Inertial measurements have
been used as model inputs [16] or states,[1719] using vari-
ants of the Kalman filtering approach to robustly estimate
motion. For example, a combination of vision-based mo-
tion sensors with inertial navigation via three-dimensional
(3D) reconstruction was presented in [20]. An interesting
characteristic of the vision-aided inertial approach is that
it may significantly improve estimation even in difficult
situations, such as GPS (Global Positioning System)-deniedconditions, for example under vegetation cover or indoors,
or in the presence of disruptions in the tracking of visual
features, for example when the vehicle undergoes rapid
motions.
As described in [21], there are two possible architectures
for the fusion of an inertial navigation system (INS) with a
2014 Taylor & Francis and The Robotics Society of Japan
-
7/24/2019 Vision Aided Inertial Navigation
3/14
718 E. Asadi and C.L. Bottasso
vision unit. The first, termed loosely-coupled approach,
[2224] uses separate INS and structure from motion (SfM)
blocks that exchange information. In the second, termed
tightly-coupled approach,[20,25] all information compris-
ing the raw vision data and the inertial measurements are
fused within a single high-order estimation filter.
In most tightly-coupled vision-aided systems, vehiclestates as well as visual landmarks are jointly estimated,[17,
2528] similarly to simultaneous localization and mapping
(SLAM) approaches.[29] These methods are capable of
handling the displacement measurement provided by the
vision system, whilebounding the position estimation error;
however, their computational requirements can be signif-
icant due to the state vector augmentation that they per-
form. This may hinder their use in real-time demanding
applications.
There are also approximate SLAM solutions such as
[30,31], providing less computational complexity and sub-
optimal estimates. Other SLAM approaches may reduce
the amount of information that they process by incorporat-ing only high-level features (e.g. features such as corners,
junctions, and lines), in order to achieve real-time capabil-
ities. This way, the problem of data association is simpli-
fied by applying the correspondence search over a smaller
number of features. As considered in [32,33], revisiting
landmarks that were previously tracked would not be of
significant benefit for long-distance navigation. The idea
explored in [32,33] is based on incorporating high-level
landmarks that can be easily tracked for long periods of
time using a track maintenance algorithm. This way only
high-level landmarks are augmented to the state vector,
while stale landmarks (i.e. correspondences that have been
unsuccessful for a given period of time) are pruned byremoving the associated filter states. However, the resulting
estimates of the vehicle states are suboptimal compared to
those obtained by incorporating all feature points including
both low- and high-level features.
There are some other approaches [19,34] incorporating
only frame to frame measurements with a reduced compu-
tational complexity in comparison to SLAM. In this case,
relative-state measurements are processed via a sole
localization approach, while delayed states corresponding
to previous poses are maintained by expanding the state
vector. A formalapproachto this problem wasalso proposed
in [2], which developed a stochastic cloning framework
based on the indirect form of the Kalman filter. This method
also considers augmenting the state vector with two in-
stances of the vehicle states together with a copy of the
last measurements.
This work proposes a tightly-coupled stereo vision-aided
inertial navigation system (TC-SVA-INS), resulting in an
algorithm that is suitable for real-time applications with fast
dynamics, as for example the flight of small autonomous
vehicles. In order to avoid the loss of information resulting
by the preprocessing of visual data,[35] this paper uses
a tightly-coupled approach that relies on directly fusing
tracked feature points together with inertial measurements
to yield state estimates. In this work, twoalternativefeature-
based observation models are embedded within the pro-
posed tightly-coupled fusion architecture. The first model
is derived via the trifocal tensor, which encapsulates ge-
ometric constraints amongst three different scenes. Somepreliminary results concerning this model are presented in
[36], while a comprehensive assessment of its performance
is presented here in comparison to other possible alternative
approaches. The second model is derived via rigid body
motion applied to 3D reconstructed feature points. In order
to reduce the computational cost of processing relative-state
measurements,[2] here thestandardstateestimation method
is used in conjunction with a simplified observation model.
A comparison of these two tightly-coupled solutions with a
loosely-coupled one is provided as well.
The proposed approach is shown to limit the degradation
of position estimation in GPS-denied conditions. Differ-
ently from other visual odometry approaches, akinematic motion model is used here to better reflect the
actual dynamic behavior of the vehicle. The formulation,
being derived for arbitrary 3D motion using a quaternion
parameterization of finite rotations, is independent of the
platform and can be applied to any vehicle. Outlier rejec-
tion is achieved via egomotion estimation, which is per-
formed outside of the sensors fusion loop. The Sigma-Point
Kalmanfilter(SPKF) is used to perform data fusionbecause
of its robustness in the presences of non-linearities. The
TC-SVA-INS method could be also used in parallel with
low-frequency SLAM approaches in a decentralized archi-
tecture to enhance its precision, where information is dis-
tributed between two loops,[37,38] although this possibilityin not explored here.
While the efficiency in terms of computational effort
and reduced complexity is of the utmost importance in a
real-time multi-sensory application, a straightforward im-
plementation of the standard estimator without any state
augmentation is highly advisable. Our approach does not
utilize any state vector augmentation, neither the one that
SLAM methods perform nor the form of augmentation
proposed in localization methods. This way, the approach
allows for a rather high number of relative-state measure-
ments to be fused with inertial information via a tightly
coupled architecture using a single standard estimator. This
results in a reliable frame to frame displacement estimation
and robust outlier rejection by benefiting from all avail-
able visual information (i.e. both low- and high-level fea-
tures). Additionally, a reduced information loss is achieved
in comparison with a loosely-coupled architecture. The
computational cost of the method does not grow with the
environment size. Data association is simple and readily
provided by a tracking algorithm as only frame to frame
features are used. The tracking system in our work does
not use any kind of maintenance algorithm. This way, the
-
7/24/2019 Vision Aided Inertial Navigation
4/14
Advanced Robotics 719
correlation between consecutive measurements is neglected
as previously assumed in [39,40], and as suggested in [2]
to reduce computational complexity.
The effectiveness of the proposed formulation is demon-
strated with thehelp of both an aerialsimulation experiment
and also a real data-set gathered in a dynamic indoor case
by the Rawseeds project.[41]
2. Classical inertial navigation
In this work, tracked feature points are fused via a tightly-
coupled architecture with a standard inertial measurement
unit based on triaxial accelerometers and gyros, in addition
to a triaxial magnetometer, a sonar altimeter and a GPS. The
inertial measurement formulation is based on previously
presented work,[20,42] which is briefly reviewed next.
2.1. Kinematics
The inertial frame of reference is centered at point O and
denoted by a triad of unit vectors E .=(e1, e2, e3), pointing
North, East, andDown (NED navigationalsystem).A body-
attached frame has origin in the generic material point B of
the vehicle and has a triad of unit vectors B .=(b1,b2,b3)
(see Figure1).
The components of the acceleration in the body-attached
frame are sensed by an accelerometer located at point A on
the vehicle. The accelerometer yields a readingaaccaffected
by noisenacc. Gyroscopes measure the body-attached com-
ponents of the angular velocity vector, yielding a reading
gyro affected by a noise disturbance ngyro. The kinematic
equations, describing the motion of the body-attached ref-erence frame with respect to the inertial one, can be written
as
vEB = gE R
aacc+
B B rBB A
+ h(gyro,ngyro) rBB A
+Rnacc, (1a)
B =h(gyro,ngyro), (1b)
rEO B =vEB , (1c)
q =T(B)q, (1d)
where vBisthe velocity of pointB , is the angular velocity,
and the angular acceleration. rB A is the position vector
from pointB to pointA , whiler O B is from pointO to point
B. In the previous expression,gE =(0, 0, g)T indicates the
components of the acceleration of gravity, R = R(q) are
the components of the rotation tensor that brings triadEinto
triadB
, while q are rotation parameters, which are chosenhere as quaternions, so that matrix T can be written as
T(B)=1
2
0 B
T
B B
. (2)
The angular acceleration at time tk is computed accord-
ing to the following three-point stencil formula based on a
parabolic interpolation
h(tk)=
3gyro(tk)4gyro(tk1)+gyro(tk2)
/(2t),
(3)
wheret=tk tk1 = tk1 tk2, (cf.[20]).
2.2. Sensors
AGPS is located at point Gon thevehicle(see Figure1).The
inertial components of the velocity and position vectors of
pointG , noted respectivelyvEG andrEOG , can be expressed
in terms ofvEB ,B, rEO B , andq as
vEG =vEB + R
B rBBG , (4a)
rEOG = rEO B + Rr
BBG , (4b)
where rBG is the position vector from point B to point G.
The GPS yields measurements of the position and velocity
of point G affected by their respective noise vectors nrgps
andnvgps , i.e.
vgps =vEG + nvgps , (5a)
r gps = rEOG + nvgps . (5b)
A sonar altimeter measures the distance halong the body-
attached vector b3, between its location at point Son the
vehicle and point Ton the terrain, as shown in Figure 1,
resulting in the expression
h = rEO B3 /R33 s, (6)
Figure 1. Reference frames and location of sensors.
-
7/24/2019 Vision Aided Inertial Navigation
5/14
720 E. Asadi and C.L. Bottasso
whererEO B3 = r O B e3andR = [Ri j ],i, j =1, 2, 3. The
sonar altimeter yields a reading hsonar affected by noise
nsonar, i.e.
hsonar =h +nsonar. (7)
Finally, we consider a magnetometerthat senses themag-
netic field m of the Earth in the body-attached system B.
The inertial components mE
are assumed to be known andconstant in the area of operation of the vehicle. Hence, we
get
mB =RTmE. (8)
The magnetometer yields a measurementmmagnaffected by
noisenmagn, i.e.
mmagn =mB +nmagn. (9)
3. Vision system
Considering a pair of stereo cameras located on the vehicle
(see Figure1), a triad of unit vectors C .= (c1,c2,c3) has
origin at the optical centerCof the left camera, where c1isdirected along the horizontal scanlines of the image plane
and c3 is parallel to the optical axis, pointing towards the
scene. The right camera has its optical axis and scanlines
parallel to those of the left camera, i.e. C C, where we
use the symbol () to indicate quantities of theright camera.
The origin C of triad C has a distance b = bc1 from C,
whereb is the stereo baseline.
The SVA-INS architecture investigated here is based on
tracking scene points among stereo images and across time
steps. The identification and matching of feature points is
begun with the acquisition of the images and their rectifi-
cation. In this work, strong corners are extracted and then
tracked using a dense disparitymap obtained by theefficientlarge-scale stereo (ELAS) matching method.[43]
3.1. Outlier detection
Having a set of corresponding feature points between left
and right images and from the previous frame to the current
one, various methods can be used to detect and remove
outliers. This process is here performed by an optimization
approach based on the minimization of the sum of error
projections, and merged into the random sample consensus
algorithm similarly to what done in [44]. However, differ-
ently from [44], this procedure is used here to reliably iden-
tify outliers in dynamic environments, instead of estimating
camera motions.
Consider a feature point P , whose position vectors with
respect to frames C and C are noted d and d, respec-
tively. The projections ofP onto the image plane are noted
pC and pC. Assuming an ideal pinhole camera model,
the homogeneous components of the position vectors are
indicated by the symbol () and noted dC
=(dx, dy , dz , 1)T
and pC
=(pu ,pv , 1)T, and they are related by perspective
projection.
Detection of outliers via egomotion estimation starts by a
3D reconstruction that implies theprojection of each feature
point from thestereo image planesto the3D world,usingthe
triangulation method [45] and a disparity map dp obtained
by ELAS. This results in the expressions
dCk1
z =
f b
dp, (10a)
dCk1
x =pCk1v d
Ck1z
f, (10b)
dCk1
y =pCk1u d
Ck1z
f, (10c)
where the camera calibration parameters, focal length fand
optical centercu andcv, are assumed to be known.
Next, each projected 3D point in the camera frame Ck1is transformed to the new frameCkaccording to the camera
translational r ego and rotational ego motion param-
eters. The projected 3D point in the new frame is then
re-projected to its image plane to obtain pCk
. These stepscan be expressed as
p
Cku
pCkv
1
=
f 0 cu0 f cv
0 0 1
R(ego) r ego
dCk1
x
dCk1
y
dCk1
z
1
,
(11)
where Ris the rotation matrix associated with the rotation
parametersego. An identical relationship is obtained for
the right camera.
The position of a feature obtained by the tracking pro-
cedure at the same time step tk is noted pCkvsn . Thus, the
re-projection error p for all feature points in both viewscan be readily calculated as
p =
Ni =1
pCkvsn (i ) pCk2 + pCkvsn (i ) pCk2
.
(12)
A GaussNewton approach is used to iteratively minimize
(12). Motion parameters are independently estimated based
on a given number (typically 50 here) of triple sample
points selected at random, to find the best subset of in-
liers with reprojection errors smaller than a given threshold.
Finally, a subset of inliers is selected to be incorporated in
the data fusion algorithm. These inliers are here chosen by
looking at corresponding points in stereo views that satisfy
equation pT
Fba p= 0, up to a given tight threshold, where
Fba is the fundamental matrix between stereo frames.
4. Tightly-coupled vision-aided inertial navigation
Each pair of tracked feature points across two consecutive
frames provides for an implicit measure of motion, this
way effectively playing the role as a feature-based mo-
tion sensor. A tightly-coupled architecture is developed by
-
7/24/2019 Vision Aided Inertial Navigation
6/14
Advanced Robotics 721
incorporating in the process the selected subset of tracked
feature points to yield an estimate of the vehicle states.
The estimatoris basedon the followingstate-space model:
x(t) = f
x(t), u(t), (t)
, (13a)
y(tk) = hx(tk), (13b)z(tk) = y(tk)+ (tk), (13c)
where the state vector x is defined as
x .=(vE
T
B ,BT ,rE
T
O B , q)T. (14)
Function f(, , ) in (13a) represents in compact form the
rigid body kinematics expressed by (1). The input vectoru
is defined as measurements provided by the accelerometers
aacc and gyros gyro, i.e. u .= (aTacc,
Tgyro)
T, while .=
(nTacc,nTgyro)
T is its associated measurement noise vector.
The observation models for GPS, sonar, and magnetometer
are given by Equations (4),(6) and(8), respectively.
For the implementation of the vision system, a discrete
model is needed that provides a relationship among feature
points in three scenes. Hence, a set of feature points in two
old scenes are propagated by an observation model to their
respective positions in a new frame. Two alternative vision-
based models are considered within the proposed tightly-
coupled fusion architecture by using:
A trifocal tensor to propagate feature points by
geometric constraints among three scenes.
Or a rigid body modelapplied to 3D reconstructed
feature points.
The two approaches are described next.
4.1. Vision-based model via trifocal constraints
Given a pair of matched points in two scenes, pCk1
pCk1 , and the camera projection matrices, it is possible
to extract the position of the points in a third scene by
the use of the trifocal tensor.[12,45] In order to implement
this property as a vision-based observation model directly
within the estimator, it is useful to derive the resulting
projection matrices in terms ofvEB , B, and q, which are
the elements of the estimator state vector, x .
Consider a feature point P projected onto the imageplanesof onestereo cameraat time tk1(Figure 2), where its
position components with respect to the left camera local
reference frame are d(tk1)Ck1 . The components can be
transformed to the right frame attk1by a translation along
thestereo baseline,and to theleftframe at tk via translational
(r c) and rotational (Rc) transformations between the
two consecutive frames, as
d(tk1)C
k1 = d(tk1)Ck1 b, (15a)
d(tk)Ck =Rcd(tk1)
Ck1 rc. (15b)
Figure 2. Geometry for the derivation of the discrete vision-basedmodel via trifocal constraints.
By expressing the velocity of the camera optical center
C in terms of the velocity of point B, the translational
components can be expressed in terms of the states x , i.e.
rc = tCT
R(tk)TvEB (tk)+
B(tk) cB
, (16)
where matrixCindicates the constant-in-time rotation ten-
sor that brings triad BintoC, whilec .= r BCis the position
vector of pointCwith respect to point B .
The rotational motion between two consecutive frames
can be derived by composing the corresponding rotation
tensors, as
Rc = CTR(tk)
TR(tk1)C. (17)
By introducing the angular velocity according to
IR(tk)TR(tk1)= t(tk)
B, the rotational motion can
be expressed in terms of the states x as
Rc =I tCT (tk)
BC. (18)
Projections that map feature coordinates in local reference
frames to the corresponding homogeneous image coordi-
nates in each of three scenes can be obtained via (15) as
p(tk1)Ck1 =KL
I 0
d(tk1)Ck1 , (19a)
p(tk1)C
k1 =KR
I b
d(tk1)Ck1 , (19b)
p(tk)Ck =KL
Rc rc
d(tk1)
Ck1 . (19c)
Here, Pa = KL
I 0
, Pb = KR
I b
, and Pc =
KL
Rc rc
are projection matrices, while KL and
KR are the left and the right camera calibration matrices,
respectively.
-
7/24/2019 Vision Aided Inertial Navigation
7/14
722 E. Asadi and C.L. Bottasso
The trifocal tensor, T, can be derived according to the
projection matrices,[45] using standard matrix-vector nota-
tion,1 yielding
Tqri =(1)
i +1det
Pa
i
P bq
P cr
. (20)
Next, point[pai ]in the first frame is transferred to the third
one, [pci ], by a homography mapping via a line passing
through the corresponding point[pbi ]in the second frame,
i.e.
pcr = pa
i lbqTqri . (21)
A convenient choice for lbwas suggested in [45], as the line
perpendicular toFba p(tk1). Equations (16)(21) provide
for a non-linear discrete observation model, which can be
used to predict the output of feature-based motion sensors
p(tk)Ck, as described later on in Section4.3.
4.2. Vision-based model via 3D reconstructionAn alternative to the trifocal tensor approach is discussed
here. In this model, the reconstructed 3D points, obtained
by(10), are propagated across time by directly using the
equations of rigid body motion. A preliminary version of
this model was previously presented in [20] and evaluated
via a different framework and simulation experiments. The
input/output structure of the model is modified in this work
from 3D3D to 3Dtwo-dimensional (2D), which not only
reduces the computational effort but also enhances the qual-
ity of the estimates.
Considering two consecutive time instants tk1 and tk,
the following vector closure relationship (see Figure (3))
can be derived for both the left and right cameras
r(tk) +c(tk) +d(tk) = r (tk1) +c(tk1) +d(tk1), (22)
wherer .= r O B is the position vector of the reference point
Bon the vehicle with respect to the origin O of the inertial
Figure 3. Geometry for the derivation of the discrete vision-basedmodel via 3D reconstruction.
frame. By expressing each of the terms of Equation(22) in
convenient reference frames, we obtain
d(tk)Ck = CT
R(tk)
T
r(tk)E r (tk1)
E
+ IR(tk)TR(tk1) (c
B +Cd(tk1)Ck1 )
+d(tk1)Ck1 . (23)
This expression depends on the absolute position vector r ,
a quantity that however can not be observed by a vision
system, which only senses relative distances. Hence, since
in the absence of GPS measurements the observed absolute
position will drift away from the true one, it is advisable to
rewrite the above equation in terms of velocities. By setting
IR(tk)TR(tk1) = t(tk)
B, (24a)
r(tk)E r (tk1)
E =tvE(tk), (24b)
we have
d(tk)Ck = tCT
R(tk)
TvE(tk)+ B(tk)
(cB +Cd(tk1)Ck1 )
+d(tk1)
Ck1 .(25)
The left-hand side of Equation (25), d(tk)Ck, may be com-
puted by stereo reconstruction from tracked feature points
at time tkusing the triangulation approach, as proposed in
[20]. In other words, the 2D feature points are first mapped
to 3D world and then used as visual observations within the
filter. This way, however, unknown errors are propagated to
the observations.
To avoid this problem, the model is modified here by
re-projecting the output of Equation(25), i.e. the predicted
3D feature points, to the 2D image plane of the currentscene, so as to compute p(tk)
Ck as the final output of the
vision-based model, givingpu (tk)
Ck
pv(tk)Ck
=
f/dz (tk)
Ck 0 cu0 f/dz (tk)
Ck cv
dx(tk)Ckdy (tk)Ck1
.(26)
In this case, the left-hand side of Equation (26), p(tk)Ck, is
directly available by tracking the system at time tk.
Equations (25)and (26)provide for an alternative non-
linear discrete model to predict the output of feature-based
motion sensors p(tk)Ck, and its use for multi-sensor fusion
is described next.
4.3. Vision-augmented fusion
The two alternative vision-based models presented above
are separately incorporated in the sensor fusion algorithm.
The first implementation, termed tightly-coupled stereo
vision-aided inertial navigation via trifocal constrains
(TC-SVA-INS-TC), is based on Equations (16)(21), which
can be appended to Equations(4), (6) and (8), defining a
vision-augmented observation model, expressed in compact
-
7/24/2019 Vision Aided Inertial Navigation
8/14
Advanced Robotics 723
form by function h() in (13b). The other, called tightly-
coupled stereo vision-aided inertial navigation via 3D
reconstruction (TC-SVA-INS-3D), is based on Equations
(10)and (25)(26). As for the previous method, these equa-
tions can be appended to Equations (4),(6), and (8) to form
the vision-augmented observation model.
For each feature point in the subset of best inliers, weinclude in the augmented vector a new output for both the
left and right cameras:
y =vE
T
G ,rET
OG , h,mBT , . . . ,p(tk)
CTk , p(tk)CTk , . . .
T.
(27)
Considering the tracking results at time tk, these yield visual
measurements pvsn affected by noise nvsn:
pvsn = p(tk)Ck +nvsn. (28)
Accordingly, (28) is appended to the measurement equa-
tions, where the vision-augmented measurement and noise
vectors are defined as
z .=
vTgps,r
Tgps, hsonar,m
Tmagn, . . . ,p
Tvsn, p
Tvsn, . . .
T,
(29a)
.=
nTvgps ,n
Trgps
, nsonar,nTmagn, . . . ,n
Tvsn,n
Tvsn, . . . ,
T.
(29b)
SPKF[46,47] is here preferred to the extended Kalman
filter, and it is used for handling non-linearities in the
observation model.
5. Loosely-coupled vision-aided inertial navigation
In contrast to the TC-SVA-INS method, one may estimate
the motion parameters directly using a separate SfM block
and subsequently fuse its output with the INS, obtaining
a loosely-coupled stereo vision-aided inertial navigation
system (LC-SVA-INS). Although less computational effort
is required by the loosely-coupled approach,the preprocess-
ing of visual measurements results in the approximation of
statistical distributions,[35] in turn leading to the loss of
information and accuracy, as it will be shown later on in the
results section. Nonetheless, this method is considered here
for comparison purposes, and it is thereforebriefly reviewednext.
The camera motion parameters are computed at the end
of the outlier detection procedure by refining the egomotion
estimation based on the best subset of inliers (the same used
in TC-SVA-INS). Then this estimated parameters are fed to
the Kalman filter as visual information.
The estimator is basedon the state-space model expressed
by Equations (13). The relative camera motion between
two consecutive frames, noted, respectively, rc for the
displacement andcfor the rotation, can be expressed in
terms of the states x as
r c =tCT(RTvEB +
B cB), (30a)
c =tCTB. (30b)
Theseequations represent a vision-based observation model
that can be appended to the GPS observation model (5),
definingh()in Equation (13b). Accordingly, the vector ofoutputs yis defined as
y =vE
T
G ,rET
OG , . . . , rcT, c
T
. (31)
The egomotion estimation yields measurements of the rel-
ative motion of the camera affected by noise, i.e.
rego =rc + nrego , (32a)
ego =c + nego . (32b)
The definition of model (13) is complemented by the vector
of measurements z and associated noisevectors
z .= vTgps,r Tgps, . . . , rTego, Tego (33a)
.=
nTvgps ,n
Trgps
, . . . ,nTrego ,nTCego
. (33b)
6. Experiments and results
6.1. Simulation test
A Matlab/Simulink simulator was developed that includes
a flight mechanics model of a small RUAV, models of in-
ertial navigation sensors, magnetometer, GPS, and their
noise models. The simulator is used in conjunction with the
OGRE graphics engine, for rendering a virtual environment
scene and simulating the image acquisition process. All
sensor measurements are simulated (seeTable 1) as the heli-copter flies at an altitude of 2 m following a rectangular path
at a constant speed of 2 m/swithin a small village, composed
of houses and several other objects with realistic textures
(see Figure4). Navigation measurements are provided at
a rate of 100 Hz, while stereo images at the rate of 2 Hz.
The GPS is also available at a rate of 1 Hz. To show the
capability of the system in ensuring an accurate estimation
of the vehicle states even without GPS, four temporary GPS
signal losses are simulated to occur att =15, 40, 65, 90s,
and regained after 10 sec at times t =25, 50, 75, 100 s.
The estimated position and velocity obtained with the
TC-SVA-INS-3D method are shown in Figures 5(a) and (c),
respectively, using a thick solid line, while the estimated
Table 1. Sensors and vibration noise levels.
Sensors Noise Level
Gyro 50(deg /s)2
Accelerometer 0.5 (m/s2)2
Magnetometer 1104 (Gauss)2
GPS 2 (m)2
-
7/24/2019 Vision Aided Inertial Navigation
9/14
724 E. Asadi and C.L. Bottasso
Figure 4. Simulated small village environment and flighttrajectory.
position and velocity obtained with the TC-SVA-INS-TC
method are plotted using a thin solid line. Similar plots are
also provided for the errors of position and velocity esti-
mates in Figures 5(b) and (d), respectively. The two vertical
dashed and solid lines indicate the time instants when the
GPS signal is lost and reacquired, respectively. The results
show a successful performance of both approaches in caseof temporaryloss of theGPS signal. Theerrorof theposition
estimates, as shown in Figure5(b), smoothly increases as
the GPS signal is masked but it does not appear to cause
any appreciable degradation in the quality of the velocity
and position estimates.
6.2. Real case study
The formulation given above was derived for a general
3D motion, such as the one of an aerial vehicle. However,
Figure 5. Results for the simulation experiment using two approaches: TC-SVA-INS-TC and TC-SVA-INS-3D. The two vertical dashline and vertical solid line indicate the time instants when the GPS signal is lost and reacquired, respectively.
-
7/24/2019 Vision Aided Inertial Navigation
10/14
Advanced Robotics 725
Figure 6. Plan view of the building at the University di Milano-Bicocca, explored by the ground vehicle together with matchedreference path.
lacking a 3D data-set, a ground robot data-set was used here
to validate the concept and illustrate the basic performance
of the proposed method. The example is taken from the
Rawseeds project,[41] which provides several multi-sensor
data-sets with their associated ground truths. Specifically,
a dynamic indoor data-set was used to assess the perfor-
mance of a pure vision-aided inertial system in GPS-denied
conditions and unreliablemagnetometerreadings.The loca-
tion explored by the vehicle includes two nearby buildings,
shown together with the reference path in Figure 6. The
data-set is illustrative of practical realistic conditions, since
the number of tracked points decreases in a very marked
way when the vehicle reaches the end of corridors, because
of textureless walls, and tracking is occasionally disrupted
when the vehicle turns rapidly. In all subsequent examples,
stereo images are processed and fused at 5 Hz, while inertial
measurements are incorporated at 120 Hz.
The estimated trajectory obtained with the visual odom-
etry method is depicted in Figure 7.The method exhibits a
significant degradation of the estimation quality caused by
the loss of tracking continuity.
Several views of the dynamic environment captured by
the left camera are depicted in Figure 8,which also reports
tracked feature points across time and detected outliers.The
figure shows that the outlier detection method is capable ofsuccessfully identifying tracked feature points on moving
objects.
The estimated trajectory obtained with the LC-VA-INS
method is shown in Figure 9(a) using a thick solid line,
while theground truth is plotted using a thin solid line. Since
the reconstruction was performed without absolute position
information, trajectories are depicted after realignment with
the ground truth. Figure9(b) shows a time history plot of
the velocity estimates in thexdirection (thick solid line), to-
gether with the error of velocity estimates w.r.t ground truth
Figure 7. Results for thevisual odometryapproach.[12] Thin line:ground truth; thick line: trajectory estimate; stars: points wherefeature tracking is disrupted.
(thin solid line). Despite locally weak trackings at difficult
points in the path, and particularly at the corner of corridors,
estimatesare significantlybetter than in thevisual odometry
case, because of the incorporation of inertial measurements.
However, the position estimation errors are considerable in
parts of the path where the quality of egomotion estimation
is low.
Figures10(a) and 11(a) show the trajectories obtained
with the proposed tightly-coupled methods, respectively,
TC-SVA-INS-TC and TC-SVA-INS-3D. Similarly, Figures
10(b) and11(d) show time history plots of the xvelocitycomponent. In both cases, path and velocity estimates are
reasonably reliable and significantlybetter than in thevisual
odometry and LC-SVA-INS case, because of the use of
inertial measurements and thanks to the tightly-coupled
fusion process. The position estimate errors keep growing
throughout the navigation, which is not an unexpected out-
come because of the lack of absolute positions in the sensor
fusion process. Nevertheless, the quality of state estimates
as well as of velocities is significantly improved with re-
spect to the other methods considered in this work. Further-
more,it appears that TC-SVA-INS-3D is slightly superior to
TC-SVA-INS-TC.
7. Conclusions
A tightly-coupled vision-aided INS was proposed in this
work, that synergistically incorporates vision with other
sensors. The approach reduces loss of information by pro-
cessing all effective visual data directly within the fusion
algorithm. Two alternative feature-based observation mod-
els were described and included in the proposed fusion
architecture; one uses geometric constraints encapsulated
-
7/24/2019 Vision Aided Inertial Navigation
11/14
726 E. Asadi and C.L. Bottasso
Figure 8. Performance of the outlier rejection algorithm in a dynamic environment. Tracked feature points are colored based on theirdistance with respect to the camera, while detected outliers are shown in blue.
by thetrifocal tensor, while the other uses rigid body motion
applied to 3D reconstructed feature points. The formulation
was derived for an arbitrary 3D motion, making the algo-
rithm platform-independent.
The proposed approach is demonstrated in case of tempo-
rary loss of GPS signal using simulation for an autonomous
helicopter flying in an urban environment. Furthermore, a
real data-set from the Rawseeds project was used to assess
-
7/24/2019 Vision Aided Inertial Navigation
12/14
Advanced Robotics 727
Figure 9. Results for the loosely-coupled stereo vision-aided INS.
Figure 10. Results for the tightly-coupled stereo vision-aided inertial navigation via trifocal constraints.
the performance of the approach in an indoor 2D case, in
GPS-denied conditions and without reliable magnetometer
measurements. Results indicate that the proposed TC-SVA-
INS methods significantly improve the quality of state esti-mation with respect to a classical visual odometry approach
and also with respect to a loosely-coupled implementation.
Of the two alternative feature-based observation models, the
one using 3D reconstruction and rigid body motion appears
to offer the best results in terms of accuracy.
Additional work is being conducted to further optimize
the performance of the proposed method by the imple-
mentation of exact models via stochastic cloning. These
efforts will also consider the mitigation of the effects on the
estimator performance caused by delays due to the online
processing of visual information.
If the computational burden is not of particular concern,
a combination of the proposed TC-SVA-INS approach withSLAM can be considered, in order to enhance the precision
of the SLAM method. This combination may be developed
within a decentralized or a centralized architecture. In a
decentralized form, the TC-SVA-INS method can be used
in parallel with a low frequency SLAM approach, where the
information is distributed between two loops. A decentral-
ized implementation is more suitable for real-time applica-
tion compared to a centralized one. In a centralized combi-
nation, high-level features canbe included in thestate vector
-
7/24/2019 Vision Aided Inertial Navigation
13/14
728 E. Asadi and C.L. Bottasso
Figure 11. Results for the tightly-coupled stereo vision-aided inertial navigation via 3D reconstruction.
in a similar wayto theSLAM approach,whilelow-level fea-
tures areincorporated in thesameestimator viathe proposedTC-SVA-INS formulation. These ideas are currently being
investigated and will be reported in future publications.
Note
1. Thei j entry of matrix Ais denoted bya ij , indexi being
the row number, and index j being the column number. a i
indicates thei -th row of matrix A and ai represents matrix Awithout rowi .
Notes on contributors
E. Asadi received the BSc degree from
Kashan University, Kashan, Iran, in 2002, andthe MSc degree from Yazd University, Yazd,Iran, in 2005, both in Mechanical Engineering.Currently, he is a PhD student in AerospaceEngineering at Politecnico di Milano, Milan,Italy, where he works with POLI-Rotorcraftresearch group. His current research interestsinclude sensor fusion, Kalman filtering, vision-
aided inertial navigation, and simultaneous localization andmapping (SLAM).
Carlo L. Bottasso received a PhD degree inAerospace Engineering from the Politecnico
di Milano, Italy, in 1993. Currently, he is theChair of Wind Energy at TUM, TechnischeUniversitt Mnchen, Germany, and Professorof Flight Mechanics with the Department ofAerospace Science and Technology, Politec-nico di Milano, Italy. He has held visitingpositions at various institutions, including
Rensselaer Polytechnic Institute, Georgia Institute of Technology,Lawrence Livermore National Laboratory, NASA Langley, andNREL among others. His research interests and areas of expertiseinclude the flight mechanics and aeroelasticity of rotorcraftvehicles, aeroelasticity and active control of wind turbines, andflexible multibody dynamics.
References
[1] Campolo D, Schenato L, Pi L, Deng X, GuglielmelliE. Attitude estimation of a biologically inspired robotichousefly via multimodal sensor fusion. Adv. Robotics.2009;23:955977.
[2] Mourikis AI, Roumeliotis SI, Burdick JW. SC-KF mobilerobot localization: a stochastic cloning-Kalman filterfor processing relative-state measurements. IEEE Trans.Robotics. 2007;23:717730.
[3] Konolige K, Agrawal M, Bolles R, Cowan C, Fischler M,Gerkey B. Experimental robotics: outdoor mapping andnavigation using stereo vision. Berlin: Springer; 2008. p.179190.
[4] Jung BS, Choi SB, Ban SW, Lee M. A biologically inspiredactive stereo vision system using a bottom-up saliencymap model. In: Artificial Intelligence and Soft Computing-
ICAISC; 2004. p. 730735.[5] Bonin-Fontand F, Ortiz A, Oliver G. Visual navigation for
mobile robots: a survey. J. Intell. Rob. Syst. 2008;53:263296.
[6] Jia S, Sheng J, Shang E, Takase K. Robot localization inindoor environments using radio frequency identificationtechnology and stereo vision. Adv. Robotics. 2008;22:279297.
[7] Irie K, Yoshida T, Tomono M. Outdoor localization usingstereo vision under various illumination conditions. Adv.Robotics. 2012;26:327248.
[8] Dalgleish FR, Tetlow JW, Allwood RL. Vision-basednavigation of unmanned underwater vehicles: a survey. Part2: Vision-based station-keeping and positioning. IMARESTProc., Part B: Marine Design and Operat. 2005;8:1319.
[9] Liu YC, Dai QH. Vision aided unmanned aerial vehicleautonomy: an overview. Proceedings of the 3th InternationalCongress on Image and Signal Processing. Yantai, China;2010. p. 417421.
[10] MooreRJD, Thurrowgood S, Bland D, SoccolD, SrinivasanMV. A bio-inspired stereo vision system for guidance ofautonomous aircraft. Advances in Theory and Applicationsof Stereo Vision, ISBN: 978-953-307-516-7; 2011.
[11] Nister D, Naroditsky O, Bergen J. Visual odometry forground vehicle applications. J. Field Robotics. 2006;23:320.
[12] Kitt B, Geiger A, Lategahn H. Visual odometry basedon stereo image sequences with RANSAC-based outlier
-
7/24/2019 Vision Aided Inertial Navigation
14/14
Advanced Robotics 729
rejection scheme. Proceedings of the IEEE IntelligentVehicles Symposium; San Diego (CA), USA; 2010.p. 486492.
[13] Iida F. Biologically inspired visual odometer for navigationof a flying robot. Robot. Auton. Syst. 2003;44:201208.
[14] Yu YK, Wong KH, Chang MY, Or SH. Recursive camera-motion estimation with the trifocal tensor. IEEE Trans. Syst.Man Cybernet. B. 2006;36:10811090.
[15] Comport AI, Malis E, Rives P. Real-time quadrifocal visualodometry. Int. J. Robot. Res. 2010;29:245266.
[16] Roumeliotis SI, Johnson AE, Montgomery JF. Augmentinginertial navigation with image-based motion estimation.Proceedings of the IEEE International Conference onRobotics and Automation. Washington DC, USA, 2002.p. 43264333.
[17] Qian G, Chellappa R, Zheng Q. Robuststructure frommotionestimation using inertial data. Opt. Soc.Am. 2001;18:29822997.
[18] Veth MJ, Raquet JF, Pachter M. Stochastic constraints forefficient image correspondence search. IEEE Trans. Aero.Electron. Syst. 2006;42:973982.
[19] Mourikis AI, Roumeliotis SI. A multi-state constraintKalman filter for vision-aided inertial navigation. Proceed-
ings of the IEEE International Conference on Robotics andAutomation, Roma, Italy; 2007. p. 35653572.
[20] Bottasso CL, Leonello D. Vision-aided inertial navigationby sensor fusion for an autonomous rotorcraftvehicle.Proceedings of the AHS International Specialists Meetingon Unmanned Rotorcraft. Scottsdale (AZ), USA, 2009. p.324334.
[21] CorkeP,LoboJ, Dias J.An introduction to inertialand visualsensing. Int. J. Robot. Res. 2007;26:519535.
[22] Weiss S, Siegwart R. Real-time metric state estimationfor modular vision-inertial systems. Proceedings of theIEEE International Conferenceon Roboticsand Automation;Shanghai, China; 2011. p. 45314537.
[23] Tardif JP, George M, Laverne M, Kelly A, Stentz A. A newapproach to vision-aided inertial navigation. Proceedings
of the IEEE/RSJ International Conference on IntelligentRobots and Systems; 2010. p. 41614168.
[24] Goulding JR. Biologically-inspired image-based sensorfusion approach to compensate gyro sensor drift in mobilerobot systems that balance. Proceedings of the MultisensorFusion and Integration for Intelligent Systems; Arizona,USA; 2010. p. 102108.
[25] Strelow D, Singh S. Motion estimation from image andinertial measurements. Int. J. Robot. Res. 2004;23:11571195.
[26] Veth M, Anderson R, Webber F, Nielsen M. Tightly-coupledINS, GPS, and imaging sensors for precision geolocation.California: Proceedings of the Institute of NavigationNational Technical Meeting; San Diego; 2008.
[27] Ong LL, Ridley M, Kim JH, Nettleton E, Sukkarieh S.
Six DoF decentralised SLAM. Proceedings of AustralasianConference on Robotics and Automation; 2003. p. 1016[28] Chen J,PinzA. Structure and motionby fusion of inertialand
vision-based tracking. Proceedings of 28th OAGM/AAPRConference, Digital Imaging in Media and Education; 2004.p. 5562
[29] Durrant-Whyte H, Bailey T. Simultaneous localisation andmapping (SLAM): Part I the essential algorithms. IEEERobotics Autom. Mag. 2006;13:99110.
[30] Newman P, Leonard J, Tardos JD, Neira J. Explore andreturn: experimental validation of real-time concurrentmappingand localization. Proceedings of IEEEInternational
Conference on Robotics and Automation; Washington, DC;2002. p. 18021809
[31] Julier SJ, Uhlmann JL. Simultaneous localisation and mapbuilding using split covariance intersection. Proceedings ofIEEE/RSJ International Conference on Intelligent Robotsand Systems; Maui, HI; 2001. p. 12571262
[32] Veth M, Raquet J. Two-Dimensional stochastic projectionsfor tight integration of Optical and inertial sensors fornavigation. Proceedings of the Institute of NavigationNational Technical Meeting; 2006. p. 587596
[33] Ebcin S, Veth M. Tightly-coupled image-aided inertialnavigation using the unscented Kalman filter. Proceedingsof the 20th International Technical Meeting of the SatelliteDivision of The Institute of Navigation; Fort Worth, TX;2007. p. 18511860
[34] Roumeliotis SI, Burdick JW. Stochastic cloning: a general-ized framework for processing relative state measurements.Proceedings of IEEE International Conference on Roboticsand Automation; 2002. p. 17881795
[35] Dijkstra F, Luinge HSchon TB. Tightly coupled UWB/IMUpose estimation. Proceedings of the IEEE InternationalConference on Ultra-Wideband; 2009. p. 688692
[36] Asadi E, Bottasso CL. Tightly-coupled vision-aided inertial
navigation via trifocal constraints. Proceedings of the IEEEInternational Conference on Robotics and Biomimetics;Guangzhou, China; 2012. p. 8590
[37] Asadi E, Bozorg M. A decentralized architecture forsimultaneous localizationand mapping.IEEE/ASME Trans.Mech. 2009;14:6471.
[38] WeissS, AchtelikM, LynenS, ChliM, SiegwartR. Real-timeonboard visual-inertial state estimation and self-calibrationof MAVs in unknown environments. Proceedings of theIEEE International Conferenceon Roboticsand Automation;St. Paul, Minnesota, USA; 2012. p. 957964
[39] Konolige K. Large-scale map-making. Proceedings ofAAAINationalConferenceonArtificial Intelligence; SanJose, CA;2004. p. 457463
[40] Hoffman BD, Baumgartner ET, Huntsberger TL, Schenker
PS. Improved state estimation in challenging terrain. Auton.Robot. 1999;6:113130.
[41] Bonarini A, Burgard W, Fontana G, Matteucci M, SorrentiDG, Tardos JD. RAWSEEDS: roboticsadvancement throughweb-publishing of sensorial and elaborated extensive datasets. Proceedings of the IROS06 Workshop on Benchmarksin Robotics, Research; 2006.
[42] Willner D, Chang CB, Dunn KP. Kalman filter algorithms fora multi-sensor system. Proceedings of the IEEE Conferenceon Decision and Control; 1976. p. 570574
[43] Geiger A, Roser M, Urtasun R. Efficient large-scale stereomatching. Lecture notes in computer science. Springer;2011. p. 2538
[44] Geiger A, Ziegler J, Stiller C. StereoScan: dense 3Dreconstruction in real-time. Proceedings of the IEEE
Intelligent Vehicles Symposium. Baden-Baden; Germany;2011. p. 963968.[45] Hartley R, Zisserman A. Multiple view geometry in
computer vision. Cambridge University Press; 2004.[46] Van Der Merwe R, Wan E, Julier SJ. Sigma-Point
Kalman filters for nonlinear estimation and sensor fusion:applications to integrated navigation. Proceedings of theAIAA GNC Conference and Exhibition; Providence, RI;2004. p. 17351764
[47] Xiong K, Wei CL, Liu LD. Robust unscented Kalmanfiltering for nonlinear uncertain systems. Asian J. Control.2010;12:426433.