Active Learning for Robot Exploration · Bayesian Optimization is an optimization technique for...

Active Learning for Robot ExplorationBayesian Optimization for Object Grasping

Jose Miguel Silva do Carmo Nogueira

Thesis to obtain the Master of Science Degree in

Electrical and Computer Engineering

Supervisor: Prof. Alexandre Jose Malheiro Bernardino

Examination Committee

Chairperson: Prof. Joao Fernando Cardoso Silva SequeiraSupervisor: Prof. Alexandre Jose Malheiro Bernardino

Members of the Committee: Prof. Joao Manuel de Freitas Xavier

May 2017

Sorte ao jogo, sorte ao jogo.Jose Nogueira

Dedicatoria

Este trabalho e para ti, meu grande amigo. Por todas as horas, por toda a ajuda. Que eu tenha

sido para ti tudo o que tu foste para mim. Nunca te esquecerei.

iii

Acknowledgments

Thanks to all my Vislab colleagues during my short time as researcher.

A great thank you to Lorenzo Jamone and Ruben Martinez-Cantin, my fellow research partners

and great teachers.

And to my mentor and true friend, Alexandre Bernardino, my greatest appreciation.

This work is not only my own. It’s part of all the giants on whose shoulders I stood on.

v

Abstract

The thesis proposal here addressed is Active Learning for Robot Exploration, which relates to

learning an unknown activity throughout experimentation. Specifically we will present the problem of

Object Grasping Optimization, where we wish to learn about the optimal way to grab an object. We

will discuss how we evaluate grasp quality, how do we use information during exploration to decide

where to grasp next and what are the inherent problems that may affect the learning activity and how

we propose to solve them.

The learning strategy employed here is Bayesian Optimization: a global optimization method,

where prior beliefs are used to form a stochastic model of an objective function we wish to learn. This

model is then used to determine where to sample next over the input space, given an active learning

criterion. Bayesian Optimization is an optimization technique for black-box functions, known as one

of the most successful trial-and-error techniques designed for sample efficiency, at the cost of extra

computation. It’s mainly used with functions that are expensive to evaluate (in terms of cost, energy,

time) and it has been called the intelligent brute-force algorithm.

The two main problems we address in this work is objective functions with varying smoothness

properties over the input space and input noise presence, which reduce Bayesian Optimization ef-

fectiveness. Both these problems are noticeable aspects of Object Grasping, as well for many other

applications.

These problems will be tackled by using a heteroscedastic regression model with Bayesian Opti-

mization - Treed Gaussian Processes and by formulating new learning criteria and rule for selecting

the best candidate for the optimum, which we denote Unscented Bayesian Optimization.

A Treed Gaussian Process is a stochastic model which is, essentially, a composition of several

Gaussian Processes wherein each is responsible for a exclusive partition of the input space with dif-

ferent smoothness and noise parameters. The Unscented Bayesian Optimization uses the unscented

transform to compute the expected and variance values of the stochastic model in order to improve

optimization and active search in presence of input noise modeled by a covariance matrix.

The results presented highlight how our methods outperform the classical Bayesian Optimization,

both in synthetic problems and in realistic robot grasp simulations.

vii

Acronyms

BO Bayesian Optimization. 2–4, 6, 8, 9, 12, 21–23, 25, 27, 28, 30, 32–36, 38, 40, 42

BO-GP Bayesian Optimization using Gaussian Processes. 25, 28–31, 33, 35

BO-TGP Bayesian Optimization using Treed Gaussian Processes. 25, 28–33, 35, 42

GF Gramacy 2-D Exponential Function. 25, 26, 28–30

GM Mixture of 2D Gaussian distributions. 25, 26, 28–30, 34, 35, 40

GP Gaussian Process. 2, 4, 6, 8–10, 13, 15, 22, 27–33

GPs Gaussian Processes. 3, 6, 8, 9, 15, 16, 25, 27, 30

MCMC Monte Carlo Markov Chain. 13

RKHS 1D Reproducing Kernel Hilbert Space Function. 25–30, 33–35, 40

SP Stochastic Process. 2, 3, 9, 12, 18

TGP Treed Gaussian Process. 4, 15, 22, 27–33, 42

TGPs Treed Gaussian Processs. 15, 16, 27, 28, 30

UBO Unscented Bayesian Optimization. 4, 23, 34–36, 38, 40, 42

UO Unscented Outcome. 22

viii

List of symbols

x∗ Optimization optimum.

xt+1 Next query.

X Input space.

d Number of input space dimensions.

f Objective function.

ε Observational noise.

y Observation. Query output.

D1:t Dataset. Sample until iteration t.

u Learning criterion function.

yt Observation at iteration t.

xt Query at iteration t.

ε Observational noise at iteration t.

GP Gaussian Process.

µ Mean.

k Kernel function. Covariance function.

kυ=5/2 Matern 5/2 covariance function.

σp Objective function Signal variance hyper-parameter.

σn Observational noise hyper-parameter.

σx Input space noise hyper-parameter.

li Kernel length on dimension i hyper-parameter.

θ Hyper-parameters.

θi Hyper-parameters i-th sample.

Θ Hyper-parameters’ samples.

µi(xq) Gaussian Process’ expected value for query.

σ2i (xq) Gaussian Process’ variance value for query.

Φ Normal distribution’s cumulative density function.

ix

φ Normal distribution’s probability density function.

ξ Auxiliary exploitation/exploration parameter.

hi TGP binary test on feature i.

τ TGP binary test threshold.

wk Set of all TGP node weights, with order k.

wk Set of all TGP node weights, with order k.

x0,x(i)+ ,x

(i)− Sigma points.

ω0, ω(i)+ , ω

(i)− Unscented weights.

Σx Input space noise covariance matrix.

δx, δy, δz, θx, θy, θz, s1 Simox application input search space variables.

xmci Monte Carlo input space i-th sample.

ymc Monte Carlo sample observation.

x

Contents

Acronyms viii

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Report Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Related Work 5

2.1 Gaussian Process and Bayesian Optimization’s Fundamental Literature . . . . . . . . . 6

2.2 Learning Criterion Function Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Learning Criteria Functions Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.4 Bayesian Optimization Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.5 Simox - Robotics Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.6 Grasping References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.7 Noise Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.8 Heteroscedasticity and varying smoothness properties . . . . . . . . . . . . . . . . . . . 9

2.9 Other honorable contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.9.1 Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.9.2 Multi-Valued Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Theoretical Concepts 11

3.1 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Gaussian Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 Treed Gaussian Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3.1 Tree Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3.2 Hyper-parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4 Learning Criteria Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.4.1 Expected Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.5 Unscented Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.5.1 Unscented transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.5.2 Computing the unscented transformation . . . . . . . . . . . . . . . . . . . . . . 20

3.5.3 Unscented expected improvement . . . . . . . . . . . . . . . . . . . . . . . . . . 21

xi

Contents

3.5.4 Unscented optimal incumbent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Experiments 24

4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Synthetic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3 Robot Grasp Simulator - Simox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4 Results BO-GP Vs. BO-TGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.4.2 Synthetic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.4.3 Robot Grasp Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4.3.A Simox Metric Signal profile . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.5 Results BO Vs. UBO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.5.2 Synthetic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.5.3 Robot Grasp Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Conclusions 41

xii

1Introduction

1

1.1 Motivation

Active Learning - or Optimal Experimental Design - is a widely known technique in robotics for

online Machine Learning, where both prior data and samples acquired during task execution are used

to determine a certain learning objective. These objectives may correspond to parameter optimization,

function estimation, among others.

At the present problem, active learning will be employed to determine an objective function -

a grasp metric which measures the quality of the grasp. This quality can be intuitively perceived

as the volume of convex envelope originated by points of contact on the object’s surface and their

corresponding friction cones.

It must be noted that learning the best grasp configuration which wields a high valued metric may

not be an easy task. To grasp an object one most consider gripper motion and object reachability, all

of which involve risk of collision, material wear and time as a resource. Furthermore, in this setting it

is very common the presence of controller noise and different smoothness metric behaviors along the

object surface: near an object’s edge the return grasp metric may be highly variable depending on

the real position of the grasping device; while, on the other hand, performing a grasp near a smooth

surface will return a much less variable metric in multiple tries. These aspects make the learning of

such metrics not trivial. Moreover, we wish not only to learn the configuration of the end-effector 1

which leads to a highest valued grasp metric, but also, take into account its consistency given the

presence of imprecision (noise) in the end-effector control.

In a broader perspective, we wish to learn about a specific metric or quantity which we do not

know before hand or how it behaves. Here the knowledge acquired during the learning process will

take form whether by decreasing uncertainty about the estimated metric over the input space and by

determine where this metric (or its estimation with low uncertainty) has high values.

The estimation of these objective functions will be done through a realization of a Stochastic

Process (SP). One of the advantages of using such descriptors relies on the fact that one can estimate

both expected value and high order statistics of any point with only a limited set of sampled points.

These statistics give important information of where to explore next.

Bayesian Optimization (BO) typically uses such processes to describe the objective function which

we wish to optimize 2. Its usefulness shows when we are trying to optimize functions which are

expensive to evaluate (sample budget is a restriction) or when the optimization problem is non-convex.

BO methods had shown positive results in terms of the number of samples required to approach

function optima [1], [2], [3]. This feature derives from its ability to include prior belief into choosing

where to sample next and the ability to represent uncertainty over the estimated values.

A Gaussian Process (GP) is one commonly used SP in BO. In a GP the input domain is contin-

uous and every point in its domain is associated with a normally distributed random variable. More

specifically, any finite set of samples have a joint Gaussian distribution and any query point is also

normally distributed - with mean and variance values.1grasp manipulator with haptic sensors2Global maximization or minimization.

2

There exists a great variety of applications using Gaussian Processes (GPs) in active learning

and data estimation, ranging from bipedal walking learning [4], estimating sensor network data [5],

configuration of parameterized algorithms [6], parameter-based procedural animation design [7], path

planning [8], to the problem at hand: grasping optimization [9], [10], [11].

Despite the popularity of GPs and its use in BO, some of the GPs’ limitations threat applications

susceptible to noise over the input space and when the objective function has different smoothness

behaviors over the input space (GPs don’t model input noise in its regression model and covari-

ance functions remain with identical kernel lengths throughout the input space - same smoothness

degrees). These two aspects will naturally hinder BO performance in such applications.

The learning problem at hand - grasp optimization - is an example where these kind of problems

may occur. This dissertation sets to expose and address these issues with state of the art solutions

for BO and evaluate their effectiveness.

1.2 Problem Statement

The general Bayesian Optimization problem is to find the global optimum (either maximum or

minimum value) of an unknown function f : Rd → R over a compact input domain X. In this case,

we wish to maximize f(.) which represents our grasping metric and x our end-effector parameterized

configuration:

x∗ = argmaxx∈X⊂Rd

f (x) , d ≥ 1 (1.1)

However, we don’t inspect directly the values f (x). Instead observe a value y which adds obser-

vational noise to our metric (eq. 1.2), assumed to follow a normal distribution).

y(x) = f(x) + ε, ε ∼ N (0, σn(x)) (1.2)

Since we only know y(x), we must consider f estimation through a probabilistic distribution over

functions - a SP, which maximizes,

P (f |D1:t) ∝ P (D1:t|f)P (f) (1.3)

where, D1:t = {x1:t, y1:t} are the observations made so far. Equation 1.3 is the ”Bayes’ Theorem”

applied in this optimization, hence the name of Bayesian Optimization.

Under appropriate conditions (see [3]), BO will converge to the optimal value x∗. One of these

conditions is related with how optimization chooses the next sample xt+1 through a function known

as learning criterion u : Rd → R. This function serves as a way to evaluate information gain in search

of the global optimum and considers implicitly or explicitly a trade-off between exploration (searching

where the function estimation has high variance) and exploitation (searching where the objective

function has high expected values).

xt+1 = argmaxx∈X⊂Rd

u (x) (1.4)

3

Typically, BO problems are considered under a fixed budget of samples available during the op-

timization process. The specified budget reflects how we evaluate the task of sampling one point in

terms of cost. The cost of sampling comes from material wear, risk of damage or collision and com-

putational cost. In fact, BO complexity scales highly with the number of samples taken: O(n3) [12].

To correctly evaluate the performance of this kind of optimization, one should weigh the difference

between the estimated global optimum and the real global optimum (if known) after using the bud-

get. Other standard procedure to evaluate performance is to plot the progression of the best sample

taken over the number of iterations and compare with other learning methods. The later will be used

throughout this thesis since it’s a standard procedure in the BO problem community.

As said in section 1.1, BO will be used to optimize a grasp metric over a continuous space of

feasible parametric end-effector configurations. This metric may have different characteristics in terms

of smoothness, average value and observational noise, related to surface characteristics of the object

unknown beforehand. It is also important to take into account that it may exist mechanical imprecision

which adds noise to the input space. The challenge at hand is to consider, implicitly or explicitly, these

three critical points which influence BO performance in the grasping setting.

1.3 Report Outline

This dissertation will be structured in four different Sections.

Section 2 will cover State of the Art articles and other published works considered in this disserta-

tion. The State of the Art covers Bayesian Optimization and Stochastic Processes used in Bayesian

Optimization, the grasp learning settings and noise and heteroscedasticity modeling.

Section 3 will present all theoretical basis used for this work. It will be presented the Bayesian

Optimization algorithm, Gaussian Processes’ model, learning criteria, Treed Gaussian Processes’

model and the Unscented Bayesian Optimization variant.

In Section 4 it will be presented all results and discuss them in detail. This section will be divided

by two main sets of results: one for GP Vs. Treed Gaussian Process (TGP) and one for BO Vs.

Unscented Bayesian Optimization (UBO).

Section 5 will conclude with this work application spectrum and all the relevant information and

conclusions taken from all experiments carried in this work.

4

2Related Work

5

2.1 Gaussian Process and Bayesian Optimization’s Fundamen-tal Literature

Two of the most mentioned works related to BO and GPs are the MIT Press book from Carl

Rasmussen and Christopher Williams [13] and a publication from Eric Brochu, Vlad Cora and Nando

de Freitas [3].

Rasmussen and Williams [13] define GPs, theoretically and practically. They address both regres-

sion and classification problems using GP, remark the importance of covariance functions - kernel

functions - and show how to estimate these kernels’ hyper-parameters.

Brochu et al. [3] is a tutorial on BO using GPs. It indicates BO’s requirements and restrictions,

references GPs’ regression equations and the most commonly used kernels. One of its important

contributions is the statement and exposition of three learning criteria functions typically used in BO.

The first one is called probability of improvement (PI) [14], which is a naive attempt that utilizes the

mean and variance outputs of the GPs. The other two are, by far, more robust functions - the expected

improvement (EI) [15] and the upper confidence bound criteria (UCB) [5], yield better results (see

[5] for an experimental comparison). It shows some applications where Bayesian Optimization was

applied and tested, such as Sony AIBO ERS-7 gait parameter learning, parameter learning for neural

network controllers, simulated driving task learning, and more.

2.2 Learning Criterion Function Optimization

Using a learning criterion function demands optimizations of their own, in order to obtain the next

point to sample xt+1, as is in equation 1.4. An algorithm often used [3, 6, 7, 9] to solve this optimization

problem is the DIRECT algorithm [16]. This algorithm is applicable for real valued Lipschitz continuous

functions, with limited input domains.

2.3 Learning Criteria Functions Comparison

In the Srinivas’ paper [5], the authors present an alternative learning criterion function - upper

confidence bound criterion, formalize its exposition and makes a comparison with other standard

and commonly used functions. With this function it is possible to determine bounds for the rate of

convergence of BO using Gaussian Processes. Its use also grants an explicit parameter that controls

the exploitation-exploration trade-off. The comparison between different learning criteria used in this

dissertation may be examined in table 2.1.

2.4 Bayesian Optimization Toolbox

In this thesis, we used and further implemented BO Algorithms with a Toolbox written in C++ by

Ruben Martinez-Cantin: BayesOpt [12]. This toolbox has a variety of stochastic surrogate models,

covariance functions, learning criteria functions and optimization algorithms. Also, it’s a state of the art

6

FunctionsPI EI UCB

Positive Aspects Simple, heuristic func-tion.

Includes an im-plicit trade-off be-tween exploitationand exploration.

Represents themaximum likeli-hood estimator for theimprovement-metric.

Includes an ex-plicit trade-off be-tween exploitationand exploration.

Linear equation.

Known bounds forrates of convergence.

Negative Aspects

It’s pure exploita-tion oriented.

No exploration term.

Needs the normalcumulative distri-bution function.

Unknown rate ofconvergence.

The explo-ration/exploitationparameter doesn’t ex-press a linear controlover this adjustment.

Needs both nor-mal probability densityand cumulative dis-tribution functions.

Unknown rate ofconvergence.

Bounds for the rates ofconvergence are nottrivially calculated.

These bounds aredeeply connectedwith the trade-offparameter. Incorrecttuning of this pa-rameter may easilylead to local optimumconvergence.

Experimental Results [5] The worst results ofthe three functions.

Both have similar performances, althoughUCB is slightly better.

Table 2.1: Learning Criteria Functions’ Comparison

optimization toolbox and its performance, in terms of optimization gap and run-time, shows positive

results compared with other similar software. In this work, we augmented BayesOpt features and API

to incorporate new contributions.

2.5 Simox - Robotics Simulation Environment

In order to do experiments and collect data and results for the grasp learning experiments, we

used a simulator written in C++ by Nikolaus Vahrenkamp called Simox [17]. Simox provides a set of

three toolboxes that are associated to, respectively: model physics and visuals of objects and robot

identities; path planning for robot actuators; grasp quality assessment. This simulator is targeted for

grasping tasks and has a model for iCub, with remarkable hand representations.

2.6 Grasping References

Although this work does not address directly the choice for a grasping metric (since it implements

the one in from Simox toolbox), some other works relative to grasp quality metrics were considered

for this thesis.

Under the grasping optimization framework, Veiga [9] uses a Bimodal Wrech Space Analysis Met-

ric (BWSAM) to sample grasping quality over an object. This metric is particularly suited since it

manages both situations where a force-closure grip is performed (object is correctly gripped) and

7

situations of non-force-closure (grip device may or not touch the object, and the object may fall if an

external force is applied). This work also considers a soft touch contact approach 1 to target noise

reduction for the grasping metric. In terms of BO, Veiga uses regular Gaussian Regression with a

Matern kernel, expected improvement learning criterion and no hyper-parameter optimization.

Similar to the previous approach, Dragiev et al. [10] use visual and haptic sensors to describe

the grasping metric as a potential field. This allows easy determination of GP’s prior using vision.

The potential field is posteriorly used as a trajectory guidance metric for the grasping device and it is

modeled through a Gaussian Process - GPISP. Although this work is important in terms of grasping,

its implementation doesn’t consider BO.

Montesano and Lopes [11] considered the usage of Beta Distributions to model success of each

grasp. The input query is evaluated in terms of visual descriptors captured by a camera. The objective

function, in this case, is not a real valued function: the grasping metric is evaluated in terms of a binary

outcome: success or unsuccess. The relation between the input space and the outcome metric is

done using Beta Processes - BP -, which are used in their Bayesian Optimization. The authors

define a set of active learning criteria, similar to the PI and EI for the regular GPs. The experiments

performed only compare performances between the developed learning criteria, but no comparison is

made to the existing solutions for grasping optimization. This way of grading grasp might be efficient

way to handle observational noise for this application.

As final reference, Henriques [18] addresses end-effector calibration, parametrization and path

planning. This work is particularly important to this dissertation since it was developed for the iCub’s

hand and defines hand closure parametrization as a suitable learning input space for grasping tasks.

With a dataglove, Henriques collected data for a set of dexterous grasp types, in order to map the

motor actuators into a set of eigen components or grasp synergies. Mapping the actuators into the

synergy space allows to reduce the dimensionality of the controlled variables. He also claims that this

description is less susceptible to calibration errors.

2.7 Noise Modeling

As previously said in Chapter 1, noise handling must be an important consideration to be taken

into account in grasping applications (and also in other real task applications). Observational noise

description with a hyper-parameter - which is already done in regular GPs - might be insufficient and

lead to bad optimization. Researches have already begun to worry about this issue, but only a few

address it under the BO subject.

Tesch et al. [19] tried to evaluate binary outcomes for stochastic modeled functions, just as Monte-

sano and Lopes [11]. This work is considered under the BO paradigm, and performs a transformation

over a regular GP to obtain a new metric for binary classification. This metric is then used to define

new learning criterion. Their work is tested for synthetic functions and later on with a snake robot.

Experiments only compare results between different learning criteria, using either regular GP or the1considers a vicinity of points of contact around the collision point with the object

8

surrogate model with a specific designed learning criterion. The authors also purpose a new explo-

ration bound called estimation bounds that forbids exploration in certain areas: mainly those that have

already been exploited a lot. These implementations may perform well under noise conditions and

are yet to be tested in this setting.

McHutchon and Rasmussen [20] considered the modeling of noise in the input space. This noise

is posteriorly included in the Gaussian Process - Noisy Input GP or NIGP. No Bayesian Optimization

is performed. This implementation may model well input noise for grasping applications.

2.8 Heteroscedasticity and varying smoothness properties

Heteroscedasticity is a non-uniformity in variability (or, in statistical terms, variance) of a certain

measure dispersion. In Bayesian Optimization, it is a concept specifically applied to refer to non-

stationarity of the objective function. In other words, heteroscedasticity is not correctly modeled with

fixed correlation parameters (which translates into fixed smoothness properties throughout the input

space). Some of the most recent approaches to model different smoothness properties in objective

functions used Heteroscedastic regression models to address this issue.

Works from Le et al. [21], Kersting et al. [22], Kuindersma et al. [23] consider non-parametric

observational noise for SP modeling - this is called heteroscedastic noise. Le et al. [21] define jointly

the objective function and noise modeling with a series of equations - called Heteroscedastic GP

(HGP) regression. Regression is tested afterwards with synthetic data and supplied information from

different sources. This method outperforms GP in their experiments. It is also important to remark that

this method is later compared with McHutchon and Rasmussen [20], having sightly lower results in

those experiments’ settings. Kersting et al. [22] purpose an identical approach to the noise problem,

only this time both objective function and noise are modeled by two different GPs. Similar experiments

were performed, with identical kind of results and conclusions: heteroscedastic models perform well

under variational noise conditions. Kuindersma et al. [23] use an almost identical approach as in

Kersting et al. [22], only this time it’s applied to BO rather than to a data fitting problem. The authors

define a particular active learning criterion which will exploit the new description for the variational

noise and the expected value for the objective function. Their work is applied to pendulum control and

robot balance recovery.

2.9 Other honorable contributions

2.9.1 Curse of Dimensionality

One of the drawbacks of GP’s application is the curse of dimensionality. BO rate of convergence

is highly influenced by the number of dimensions of the input space. Wang et al. [6] suggest a solution

to this problem. Their work assumes that only a subset of all dimensions have predominant effect on

the objective function’s output. They use a random sampled matrix which is a linear transformation

between the higher dimensional input space and the lower (effective) dimensional one. Under cer-

tain conditions, one may use the presented theorems which state if the higher dimensional problem

9

converges to the optimal solution, then the lower dimensional one shall also converge to the same

optimum with high probability.

2.9.2 Multi-Valued Objective Functions

Along the studies done for State of the Art purposes, it was considered the usage of Multi-Valued

Functions (functions that could output different values for the same input entry). This subject is of great

importance in robotics (inverse kinematics of serial robots, for example), and it might be applicable

eventually for this case of studies since GP can’t represent such functions. Damas and Santos-Victor

[24] tackle this issue. Unfortunately, this work only sets the problem in terms of regression, in which

all training data is sampled beforehand. Additional work is required to include the selection of the

linear experts in an active learning sample law. The purposed algorithm is compared afterwards with

other two Multi-Valued representation methods, showing better results.

10

3Theoretical Concepts

11

3.1 Bayesian Optimization

Bayesian Optimization is an optimization technique used to find the global optimum of an objective

function which is either expensive to evaluate or doesn’t have a closed-form expression. It makes use

of the Bayes’ Theorem to include prior belief about how we think the function behaves (smoothness,

parametric signal and noise modeling, sampled points) to estimate the target objective function. This

is done by using a probabilistic surrogate model, typically a SP, that is a distribution over the family of

functions P (f) where the target function f() belongs.

BO also incorporates a decision making process that takes all the information captured in the

surrogate model and selects, via a learning criterion, the next query point in order to maximize it. In

that way, BO can be understood as active learning applied to learn the optima location.

Algorithm 3.1 Bayesian Optimization1: for i = t = 1,2,...,n do2: Update the SP with all available dataset and prior information3: Find xt = argmaxxu (x|D1:t−1). The u (·) is the learning criterion.4: Sample yt = f (xt) + εt; Augment dataset with the new observation {xt, yt}5: end for

In line 2 of the algorithm 3.1, one has the option to update (estimate) the hyper-parameters of the

SP. These hyper-parameters are what determine our prior belief on how the distribution of f() over

the function space is. By doing this estimation we are actively adapting the hyper-parameters to the

sampled data during the learning process.

12

3.2 Gaussian Process

A Gaussian Process is surrogate model which describes the target function in a specific input

query f(x) by its mean µ and covariance k. In this case, the target function is the grasping metric

which we wish to maximize by choosing an appropriate end-effector configuration.

f (x) ∼ GP (µ (x) , k (x,x′)) (3.1)

The covariance function determines smoothness properties of the objective function we are sam-

pling. It materializes how much the output f(x) is correlated over the input space. In this work we

used the Matern class covariance function (eq. 3.2, fig. 3.1), with υ = 5/2.

kυ=5/2 (xj ,xj′) = σ2p

d∏i=1

((1 +

√5r

li+

5r2

3l2i

)exp

(−√

5r

li

))(3.2)

σp, li are called hyper-parameters θ of the GP.

Figure 3.1: Matern class covariance function with υ = 5/2

The hyper-parameters are estimated (with the exception of σp, which is fixed) using Slice Sam-

pling, a Monte Carlo Markov Chain (MCMC) algorithm [12, 25], in order to maximize their log-marginal-

likelihood (eq. 3.3). Basically, it’s an pseudo-random number sampler algorithm which approximates

an arbitrary probability density function (in this case, the hyper-parameters likelihood function). The

state of the MCMC chain (with m samples) results on a set of hyper-parameter particles Θ = {θi}mi=1,

our estimations.

13

2 log p (y|x1:t, θ) =− yᵀ(Kθt + σ2

nI)−1

y

− log∣∣Kθ

t + σ2nI∣∣− t log (2π)

(3.3)

Without loss of generality, we consider our prior belief of the objective function with zero mean and

covariance k.

[ffq

]∼ N

(0,

[K(X,X) + σ2

nI k(X,xq)k(xq,X) k(xq,xq)

])(3.4)

where K(X,X), k(X,xq), k(xq,X) and k(xq,xq) denote, respectively n × n, n × 1, 1 × n and 1 × 1

covariance matrices between pairs of samples. The symbol q marks new query and σn parameterizes

observational noise. X denotes all query points from the current dataset.

To get the posterior distribution and estimate a target function value at a new query point xq with

with kernel ki conditioned on the i-th hyper-parameter sample ki = k(·, ·|θi), we consider the joint

Gaussian prior with the observed data points X, which gives:

fq|xq,X,y ∼m∑i=1

N(µi(xq), σ

2i (xq)

)(3.5)

where,

µi(xq) = ki(xq,X)Ki(X,X)−1y

σ2i (xq) = ki(xq,xq)− ki(xq,X)Ki(X,X)−1ki(X,xq)

(3.6)

Note that, because we use a sampling distribution of θ, the predictive distribution at any point xq

is a mixture of Gaussians.

14

3.3 Treed Gaussian Process

A Treed Gaussian Process distinguishes from a GP as it is a partial non-stationary regression

model: it considers changes in the model’s parametrization over the input space, opposed to GPs.

In other words, it can model different smoothness objective function behaviors over the input space.

The main difference between this model and the ones from section 2.7 is that non-stationarity is due

to the partition of the input space, where each partition has its own hyper-parameters, rather than

having them vary continuously over the input space.

In Assel et al. [26], Treed Gaussian Processs (TGPs) were used as means to address het-

eroscedasticity in traditional Bayesian learning (using GPs) performance decrease. Here, the het-

eroscedasticity concept is applied to non-stationarity (or different smoothness behaviors). This work

showed that using TGPs as surrogate functions wielded better results and faster convergence for their

experiments.

A TGP can be described as a Decision Tree (fig. 3.2), where each leaf node corresponds to

singular GP with a respective input space compact interval. This means that for any point of the

input domain, there is only one GP which models the objective function in its corresponding compact

interval, according to equations 3.6.The union of all leaves’ intervals represents all the input space

and the intersection of any two of these intervals is empty. L denotes the set of all leaves of the TGP.

Figure 3.2: Treed Gaussian Process

To understand how to travel throughout TGP, or in other words, determine which leaf governs a

specific point inside the input space, one must understand how the non-leaf nodes work. In each non-

leaf node a binary test is performed to x to determine to which child node x belongs. This test can be

explicitly written as hi (x) > τ , with i = 1..d and d as the number of dimensions of x. i represents a

feature of x and τ is called the threshold. The function hi is expressed by:

15

hi (x) = xi, xi = ith component of x (3.7)

Each one of these binary outcomes correspond to only one of the child nodes.

3.3.1 Tree Construction

To construct the tree which models the current state of the learning process, we start with a tree

composed by only one node which governs all the input space. Then, this tree is split recursively until

splitting is no longer viable.

We wish to split (if possible) every node into two children resulting in a overall uncertainty reduction

of the original node, while also guarantying that the two new child nodes have a minimum number of

samples. This last detail is crucial for hyper-parameter optimization, since with a low number of

samples hyper-parameter estimation can be compromised.

The uncertainty of a node A is defined as:

U (A) =1

|A|∑yi∈A

(yA − yi)2 (3.8)

where yA is the average of the output of the samples in A, |A| the number of samples and yi the

output of sample i.

Node A is split on feature i and threshold τ into two child-nodes A′h,τ and A′′h,τ if the splitting

correspond to an overall uncertainty reduction, if it doesn’t violate the minimum number of samples

per leaf and if it maximizes the following equation:

I(A,A′h,τ , A

′′h,τ

)= U (A)−

∣∣∣A′h,τ ∣∣∣|A|

U(A′h,τ

)−

∣∣∣A′′h,τ ∣∣∣|A|

U(A′′h,τ

) (3.9)

Equations 3.7 and 3.9 imply that these splits occur at the sampled points. Therefore we have a

finite and discrete set of features i and thresholds τ from which we wish to maximize eq. 3.9. It

was also shown in Assel et al. [26] that this strategy allows to maintain low variance in the vicinity of

the splits (since the surrogate model variance is the lowest at the sampled queries). In this work we

maximize the previous equation with brute-force search.

3.3.2 Hyper-parameter estimation

As explained in Section 3.2, we use the log-marginal-likelihood to perform hyper-parameter opti-

mization for GPs.

For TGPs, an aggregation technique is performed which allows a specific GP associated with leaf

j ∈ L to optimize hyper-parameters with its respective samples as well as the samples from other

leafs.

16

For the sake of notation, let y(j) denote the data in node j, let y(j\i) denote the data in node j

excluding the data in node i. Let δj be the depth of node j such that the root node has depth equal to

zero. Let ρj be the list of nodes in the path from node j to the root, and let ρji be the i-th element in

the list ρj , such that ρj0 = j and ρjδj = 0 (root).

We then consider the weighted marginal pseudo-likelihood decomposition (proposed in Assel et al.

[26]) as follows:

p (y|x1:t, θ) ≈ pwj0(y(j)|x(j), θ

)×|ρj|∏i=1

pwji

(y(ρji\ρ

ji−1)|x(ρji\ρ

ji−1)

, θ) (3.10)

Using eqs. 3.3 and 3.10, we obtain the weighted log-marginal-likelihood:

log p (y|x1:t, θ) = wj0 log p(y(j)|x(j), θ

)+

|ρj|∑i=1

wji log p(y(ρji\ρ

ji−1)|x(ρji\ρ

ji−1)

, θ) (3.11)

One should think of the set(y(ρji\ρ

ji−1)

,x(ρji\ρji−1)

)as all samples of the i-th order parent of node

j except the samples of the (i− 1)-th order parent of node j. For example, the 0 order parent of

node j is itself, the 1-st order parent is its direct father node and the 2-nd order parent would be its

grandfather node.

The purpose of using this aggregation technique for estimating the hyper-parameters is not only

to make leafs not totally independent from each other in terms of regression, but also to allow lower

values for the minimum number of samples per leaf so hyper-parameter optimization is not compro-

mised. But one should notice that by reinforcing this cross-effect we can also lose the opportunity to

model the objective function with more accuracy.

As for the weights wji , we propose a different approach from Assel et al. [26], in light of what was

said in the previous paragraph. In their work they use a fixed formula to calculate weights. We explore

alterations to this formula and compare their performance. We consider the weights as:

w(k)ji =

(2

1 + δj − δi

)k(3.12)

Higher values of k promote greater independency between leafs when estimating the hyper-

parameters. In Assel et al. [26] k was equal to 1. We note the set of all weights over j and i as

wk. We also note, from now forward, w∞ as the case where each leaf only uses its own data for

hyper-parameter estimation, i.e. w(∞)j0 = 1 and w(∞)ji = 0,∀i 6= 0.

17

3.4 Learning Criteria Functions

The Learning Criteria functions u(.) considered were already referenced in Section 2.3. They

are used as means to choose the next query xq in order to gain the maximum information about

the objective function we are trying to maximize and to search where the global optimum is most

likely to be located. Their usefulness comes from their sample-efficient results and their ability to

optimize non-convex functions (as long as they are Lipschitz continuous 1). It is expected that these

criteria have distinct sampling behaviors over the optimization. It is also important to notice that these

functions will need a global maximization method in order to determine the next point to be sampled

(eq. 1.4). For the remainder of this work, we will only consider the expected improvement criterion

and its unscented variation (presented in section 3.5).

3.4.1 Expected Improvement

This criterion considers the expected value of the Improvement metric:

I(x) = max{

0, f(xt+1)− f(x+)}

(3.13)

where f(xt+1) and f(x+) represent, respectively, the objective function value for the next query can-

didate and the maximum observation for D1:t.

The probability of I can be calculated from the normal density function:

P (I(x)) =1√

2πσ(x)exp

(− ((µ(x)− f(x+))− I)

2

2σ2(x)

)(3.14)

where µ(x) and σ2(x) represent, respectively, the estimated mean and variance of f(x) given by the

SP. In the BayesOpt library using MCMC samples, these values are µi=0 and σ2i=0 from equation 3.6.

The last equation yields,

EI (x) =

{(µ(x)− f(x+)) Φ(Z) + σ(x)φ(Z) if σ(x) > 00 otherwise , Z =

µ(x)− f(x+)

σ(x)(3.15)

2

The EI criterion can be further improved to include an auxiliary parameter ξ, which tunes the

exploitation-exploration trade-off (typically ξ = 0.01, scaled by the objective function signal variance

σp [3]).

EI (x) =

{(µ(x)− f(x+)− ξ) Φ(Zξ) + σ(x)φ(Zξ) if σ(x) > 00 otherwise , Zξ =

µ(x)− f(x+)− ξσ(x)

(3.16)

1strong form of uniform continuity for functions2where φ(.) and Φ(.) denote the PDF and CDF of the standard normal distribution respectively.

18

3.5 Unscented Bayesian Optimization

In this dissertation, we consider the input noise during the decision process to explore and select

the regions that are safe, rather than modeling it during function modeling state [20]. That is, the

regions that guarantee good results even if the experiment/trial is repeated several times in the same

vicinity. This contribution is twofold: we present the unscented expected improvement and the un-

scented optimum incumbent. Both methods are based on the unscented transformation [27, 28] that

was initially developed for tracking and filtering applications.

3.5.1 Unscented transformation

The unscented transformation is a method to propagate probability distributions through nonlinear

transformations with a trade off of computational cost vs accuracy. It is based on the principle that

it is easier to approximate a probability distribution than to approximate an arbitrary nonlinear func-

tion. The unscented transformation uses a set of deterministically selected samples from the original

distribution (called sigma points) and transform them through the nonlinear function f(·). Then, the

transformed distribution is computed based on the weighted combination of the transformed sigma

points.

One of the advantage of the unscented transformation is that the mean and covariance estimates

of the new distribution are accurate to the third order of the Taylor series expansions of f(·) provided

that the original distribution is a Gaussian prior, or up to the second order of the expansion for any

other prior. Figure 3.3 highlights the differences between approximating the distribution using sigma

points (UT) or using standard first-order Taylor linearization (Lin.). The distribution from the UT is

closer to the real distribution. Because the prior and posterior distributions are both Gaussians, the

unscented transformation is a linearization method. However, because the linearization is based on

the statistics of the distribution, it is often found in the literature as statistical linearization.

Another advantage of the unscented transformation is its computational cost. For a d-dimensional

input space, the unscented transformation requires a set of 2d + 1 sigma points. Thus, the compu-

tational cost is negligible compared to other alternatives to distribution approximation such as Monte

Carlo, which requires a large number of samples, or numerical integration such as Gaussian quadra-

ture, which has an exponential cost on d. Van der Merwe [29] proved that the unscented transforma-

tion is part of the more general sigma point filters, which achieve similar performance results. Other

sigma point methods are the central difference filter (CDF) [30] and the divided difference filter (DDF)

[31].

19

Figure 3.3: Propagation of a normal distribution through a nonlinear function. The first order Taylor expansion(dotted) only uses information of the function at the mean point to compute the linear approximation, while theUT (dashed) approaches the function with a linear regression of several sigma points. The actual distribution isthe solid one. (Adapted from [29])

3.5.2 Computing the unscented transformation

Assuming that the prior distribution is a Gaussian distribution x ∼ N (x,Σx), then the 2d+ 1 sigma

points of the unscented transformation are computed based on this sampling strategy

x0 = x

x(i)+ = x +

(√(d+ k)Σx

)i

∀i = 1..d

x(i)− = x−

(√(d+ k)Σx

)i

∀i = 1..d

(3.17)

where (√·)i is the i-th row or column of the corresponding matrix square root. In this case, k is a

free parameter that can be used to tune the scale of the sigma points. Although it may break the

positive defined requirement, the original authors [28] recommended k = −3 or k = 1. To alleviate

the potential numerical problem raised by negative k values and to increase the expressiveness of

the methods, the authors later introduced the scaled unscented transform [32]. However, for our

application such extra complexity is unnecessary.

For these sigma points, the weights are defined as:

ω0 =k

d+ k

ω(i)+ =

1

2(d+ k)∀i = 1..d

ω(i)− =

1

2(d+ k)∀i = 1..d

(3.18)

Then, the transformed distribution is computed as x′ ∼ N (x′,Σ′x) where:

x′ =

2d∑i=0

ω(i)f(x(i)) (3.19)

20

3.5.3 Unscented expected improvement

BO is about selecting the most interesting point each iteration. This is done using criteria designed

to select the point that has the higher potential to become the optimum. However, all those methods

assume that the observed value would be exactly the outcome of the query plus some observation

noise. They assume that the query is always deterministic. This is not true for the grasp optimization

problem, where noise over the grip configuration controller (input space) may cause more pronounced

estimation errors due to unexisting input noise modeling.

Instead, we are going to assume that the query is a probability distribution. Thus, instead of

analyzing the outcome of the criterion, we are going to analyze the resulting posterior distribution of

transforming the query distribution through the acquisition function. For the remainder of the section,

we assume that the input distribution corresponds to the input noise in each query point xq of the

BO process. That is, each query point is distributed according to an isotropic multivariate normal

distribution N (0, Iσx).

For the purpose of safe BO, we will use the expected value of the transformed distribution as the

acquisition function. In this case, we will use the expected improvement. Therefore, the unscented

expected improvement is computed as:

UEI(x) =

2d∑i=0

ω(i)EI(x(i)) (3.20)

where x(i) and ω(i) are computed according to equations (3.17) and (3.18) respectively, and using

Σ′x = Iσx. Note that we only compute the expected value of the transformed distribution x′ = UEI(x).

This value is enough to take a decision considering the risk on the input noise. However, the value of

Σ′x represents also the output uncertainty and can be used as meta-analysis tool, that is, the value

can be used as a risk on the estimation of the risk.

3.5.4 Unscented optimal incumbent

The unscented expected improvement can be used to drive the search procedure towards safe

regions. However, because the target function is unknown by definition, the sampling procedure can

still query good outcomes in unsafe areas. In grasping experiments, for example, one may still take

samples (even though with lower frequency than without using unscented expected improvement)

with high-valued observations and high signal variance in their vicinities (unsafe areas for grasping).

Furthermore, in BO there is a final decision that is independent of the acquisition function em-

ployed. Once the optimization process is stopped after sampling N queries, we still need to decide

which point is the best. Moreover, after every iterations, we may need to say which point is the

incumbent as the best observation.

If the final decision about the incumbent is based on the greedy policy of selecting the sample with

best outcome x∗ such that ybest = f(x∗) we may select an unsafe query.

Instead, we propose to apply the unscented transformation also to the select the optimal incum-

bent x∗, based on the function outcome f(). This would require to evaluate on f() the 2d + 1 sigma

21

points for each observation. However, the main idea of BO is to reduce the number of evaluations

on f(). Therefore, we evaluate the sigma points at the GP prediction µ(). Thus, let us define the

Unscented Outcome (UO) as:

UO(x) =

2d∑i=0

ω(i)m∑j=1

µj(x(i)) (3.21)

where∑mj=1 µj(x

(i)) is the prediction of the GP or TGP according to equation (3.6) integrated over

the kernel hyper-parameters and at the sigma points of equation (3.17).

Under these conditions, the incumbent of the optimal solution x∗ corresponds to:

x∗ = arg maxx

UO(x) (3.22)

In the BO literature, when f() represents a stochastic function with large output noise, it is common

to return the expected value of the GP at the optimum query, instead of the optimum observation 1.1.

Note that our method is also valid under those conditions.

As an illustrative and motivational example for the unscented optimal incumbent, observe function

in Fig. 3.4. In this case, the maximum of the function is at x ≈ 0.87. However, this maximum is

very risky, that is, small variations in x results in large deviations from the optimal outcome. On the

other hand, the local maximum at x ≈ 0.07 is much safer. Even if there is noise in x, repeated

queries will produce similar outcomes. In this case, if we assume input noise of σx = 0.05 and

compute the unscented transformation of that noise through the function, we can see that the sigma

points centered at the leftmost maximum would have higher unscented outcome that the sigma points

centered at the global maximum. We can conclude the expected posterior value of the local smooth

maximum would be larger than the value at the global narrow maximum.

Figure 3.4: RKHS function as in https://github.com/iassael/bo-benchmark-rkhs

In summary, our method takes the unscented transformation to compute the decision functions in

BO, assuming that each query is a probability distribution (due to the input noise) instead of a deter-

ministic value. We found that, for BO, we need to consider the unscented version of the acquisition

function, for which we propose the unscented expected improvement. Furthermore, we also need to

take into consideration the decision to select the best observation or the potential optimum. In this

22

https://github.com/iassael/bo-benchmark-rkhs

case, we propose the unscented optimal incumbent as a robust selection method. Overall, we call

this version of BO as UBO.

23

4Experiments

24

4.1 Overview

In this section we describe the methods used and experiments carried out in this work. Results will

be evaluated separately into two main subsections: one for comparing Bayesian Optimization using

Treed Gaussian Processes (BO-TGP) against Bayesian Optimization using Gaussian Processes (BO-

GP) (or simply BO); the other to compare the benefits of the Unscented Bayesian Optimization (UBO)

with respect to the classical Bayesian optimization (BO). We opted to use GPs for the Unscented

experiments since they are used as standard regression models in BO and it makes comparison

more standard for both contributions.

For both sets of experiments, we first illustrate each type of optimization with synthetic functions

(with low input space dimensionality) that allow us visually understand about the proposed contri-

butions. Then, we show the results of autonomous grasping exploration of daily life objects with a

dexterous robot hand using realistic simulations [17].

In all BO experiments of this work, we have used an extended version of the BayesOpt software

[12] with the proposed methods. For the kernel function, we used the standard choice of the Matern

kernel with ν = 5/2. We used slice sampling for the kernel hyper-parameters optimization (as ex-

plained in 3.2).

4.2 Synthetic Functions

The synthetic functions presented here are the 1D Reproducing Kernel Hilbert Space Function

(RKHS) [33], the Gramacy 2-D Exponential Function (GF) [34] and a specially designed Mixture of

2D Gaussian distributions (GM) (courtesy of Ruben Martinez-Cantin), see Fig.4.1.

All three functions will be used for the BO-TGP Vs. BO-GP experiments. Results from these

experiments carry important information since they all have different degrees of smoothness over

their input space (which may influence standard BO effectiveness).

As for the UBO experiments, only RKHS and GM functions will be used. We chose to do so since

both have multiple local maxima, with one global maximum located at a narrow peak. The global

maximum in both functions represents a region of high risk in presence of significant input noise. This

is not the case for the GF function, which was left out of that set of experiments.

4.3 Robot Grasp Simulator - Simox

We use the Simox simulation toolbox [17] as the simulation environment for the robot exploratory

task. Simox simulates the iCub’s robot hand grasping arbitrary objects.

Given an initial pose for the robot hand and a finger joint trajectory, the simulator runs until all

fingers are in contact with the object surface and posteriorly computes a grasp quality metric with

wrench space analysis from the Simox toolbox [17]. We use a representation of the iCub’s left hand

which can move freely in space (Fig. 4.3) and a few static objects, shown in 4.2.

25

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−2

−1

0

1

2

3

4

5

6

x

y

(a) RKHS function (b) GM function (c) GF function

Figure 4.1: Synthetic Functions

(a) Water bottle (b) Mug (c) Glass (d) Drill

Figure 4.2: Objects used in the simulations with corresponding initial robot hand configuration.

26

The robot hand is initially placed with palm facing parallel to one of the facets, at a fixed distance of

the object bounding box, and the thumb aligned with one of the neighbor facets (Fig. 4.3). This setup

uniquely defines the default pose of the hand with respect to the object. The learning goal is then to

find the optimal grasp pose by choosing incremental translations and rotations ((δx, δy, δz, θx, θy, θz))

of the hand’s pose with respect to its default pose.

Figure 4.3: The Simox robot grasp simulator. The iCub’s left hand is used to perform grasping trials on arbitraryobjects, in this case a glass. The red lines around the glass represent and Object Oriented Bounding Box, whosefacets are used to setup the initial hand configuration.

As for grasp closure primitives, we implement a parametrization like Henriques [18]. This parametriza-

tion maps hand closure synergies into the motor controlling space and then to the hand joints’ values.

We define distinct types of grasps which are modeled with a principal component description. Each

grasp closure is controlled by one parameter, associated with the most energetic component and clo-

sure motion, and a set of other samples that control different finger position adjustments si. These

low energy components can be used as learning variables to optimize the grasp, in addition to the

translation and rotation parameters.

Specifically for experiments, we used a power grasp posture synergy with only one component for

posture adjustments (s1). The advantage of the BO methodology as a black-box optimization is that

the system is agnostic to the parametrization selected and it can be easily replaced.

4.4 Results BO-GP Vs. BO-TGP

4.4.1 Motivation

Before presenting and reviewing this section’s experimental results, we show a model regression

example for the RKHS to illustrate the practical differences between GPs and TGPs as surrogate

models. See figure 4.4.

In this example, we took 27 random samples of the RKHS function into a GP and a TGP, respec-

27

(a) BO-GP (b) BO-TGP

Figure 4.4: Model regression with GP vs TGP

tively. The main differences between the two surrogate models are that TGPs can better estimate

(comparing to GP) the expected value of the objective function f() and show much less variance

between samples. Note that the later fact decreases the chances of using samples where variance

would be high due to inadequate estimation over smooth areas (as can be seen in sub-figure 4.4(a).

Revisit section 3.4.

This happens as result of TGPs’ capability of separating different regions of the input space with

different smoothness characteristics. In turn, better model regression helps the learning criterion

effectiveness by giving more accurate information about the objective function, which benefits BO

overall.

4.4.2 Synthetic Functions

We have performed 100 runs of BO for all three functions (RKHS, GF, GM) and the optimization

procedure using GP and TGP (with different weights wk - see sub-subsection 3.3.2).

For RKHS, each run has 5 initial random samples and the optimization performs 45 iterations, with

σp = 3.93, sl = 8 1; for GF, each run has 20 initial samples and the optimization performs 40 iterations,

with σp = 0.2 2, sl = 15; for GM, each run has 30 initial samples and the optimization performs 110

iterations, with σp = 0.642, sl = 15. For all functions it was used σn = 10−6.

Results for these experiments are shown in figure 4.5 and table 4.1.

From RKHS and GM results we can clearly see that BO-TGP finds the optimal value more fre-

quently (and quicker) than BO-GP. This happens since TGP can have different kernel bandwidths li1minimum number of samples per TGP leaf2Is is common to randomly sample the objective function to estimate σp as the standard deviation of those samples. Since

this may not be possible for all cases, we assume the worst possible case, where the objective function only two possible

values - the global maximum and global minimum. This wields σp =

√f(x+)−f(x−)

2

28

(a) RKHS (b) GF

(c) GM

Figure 4.5: GP Vs. TGP Results. Synthetic Functions. Best Sample observation y over the number of iterations.Black - BO-GP. Blue - BO-TGP and w1. Green - BO-TGP and w2. Red - BO-TGP and w3. Magenta - BO-TGPand w∞. Dashed Red - First TGP node split. Shaded regions represent the standard deviation of the bestobservation at each iteration.

29

y (x∗)± σ (y (x∗))Function GP TGP w1 TGP w2 TGP w3 TGP w∞

RKHS 5.3279 ±0.0642

5.3993 ±0.0638

5.4472 ±0.0621

5.4375 ±0.0627

5.7305 ±0.0128

GF 0.4081 ±0.0093

0.4008 ±0.0079

0.3974 ±0.0108

0.3900 ±0.0118

0.4106 ±0.0074

GM 0.1228 ±0.0011

0.1295 ±0.0001

0.1297 ±8.1e− 04

0.1302 ±8.3e− 04

0.1309 ±6.5e− 04

Table 4.1: Results at the last iteration of the BO process (means and standard deviations over all runs).

over the input space and, therefore, better estimate the expected value and variance of the objective

function f(). This improves exploration with the given learning criterion and allows BO to use the

sample budget more efficiently. BO may search for other possible local optima instead of wasting

samples where variance would be otherwise high (and low expected value) like in GPs.

As for the GF function, BO-TGP doesn’t show significant better results than BO-GP. While TGPs

help BO by better estimating the objective function and improve multiple local optima exploration, the

GF function only has one global/local maximum. GPs may give a worse estimation for GF but, by

having higher overall process variance, they tend to make BO reach the single global optimum with

similar convergence rate.

As a final note, our GF results are contradictory with results from Assel et al. [26], which we could

replicate using σn = 10−3 and σp = 1. Later on we confirmed that these were exactly the values

used throughout their experiments. As we increase the parameter σn, the surrogate model’s overall

variance also increases (check equations 3.6 and 3.2). This leads to lower rates of convergence for

BO in general (as you can see by comparison with the results from Assel et al. [26]). However, this

is less impactful for BO-TGP due its ability to reduce the overall process variance (as explained in

section 4.4.1), which makes BO-TGP have better results in their setting. This observation explains

their results in comparison with the ones presented here.

4.4.3 Robot Grasp Simulations

We have performed 30 runs of BO for all the proposed objects (Water bottle, Mug, Cup of Glass,

Drill) and the optimization procedure using GP and TGP (with different weights wk - see subsection

3.3.2). The water bottle object was evaluated in two different facets (one from the side and one from

the top), while all the other objects were only evaluated from one of the sideways facet.

Each run has 30 initial samples and the optimization performs 120 iterations, with σp = 0.35,

σn = 10−4. The input search space is composed by (δx, δy, δz, θx, θy, θz, s1).

Results for these experiments are shown in figure 4.6 and table 4.2.

From results, we can observe that BO-TGP is, generally, better than BO-GP. The difference, this

time, is that it is not clear which k to use.

As these experiments were performed for a 7-D learning optimization task, we can’t visualize

directly explanations for these observations.

Aside from the experiment from figure 4.6(e) - Drill - higher values of k seem to have better results

30

(a) Water bottle - Sideways (b) Water bottle - Top

(c) Mug - Sideways (d) Glass - Sideways

(e) Drill - Sideways

Figure 4.6: GP Vs. TGP Results. Simox Simulation Environment. Black - BO-GP. Blue - BO-TGP and w1.Green - BO-TGP and w2. Red - BO-TGP and w3. Magenta - BO-TGP and w∞. Dashed Red - First TGP nodesplit. Shaded regions represent the standard deviation of the best observation at each iteration.

31

y (x∗)± σ (y (x∗))Object GP TGP w1 TGP w2 TGP w3 TGP w∞

Water bottle -Sideways

0.5867 ±0.0339

0.6080 ±0.0231

0.6008 ±0.0239

0.5775 ±0.0320

0.6001 ±0.0222

Water bottle -Top

0.5200 ±0.0346

0.5567 ±0.0260

0.5546 ±0.0194

0.5689 ±0.0261

0.5761 ±0.0255

Mug - Sideways 0.1704 ±0.0138

0.1620 ±0.0145

0.1678 ±0.0180

0.1847 ±0.0156

0.1778 ±0.0135

Glass - Side-ways

0.4526 ±0.0263

0.4394 ±0.0180

0.4444 ±0.0267

0.4803 ±0.0229

0.4584 ±0.0260

Drill - Sideways 0.1328 ±0.0095

0.1209 ±0.0080

0.1227 ±0.0087

0.1216 ±0.0117

0.1239 ±0.0096


overall - specifically k = 3 and k = ∞ (similar to the synthetic results). Still, it is not possible to

conclude that as we increase k we get progressively better in BO-TGP. As k models data cross-

correlation between leafs, different k might have different BO-TGP performances according to the

objective function’s characteristics.

4.4.3.A Simox Metric Signal profile

One thing we observed before performing BO experiments was that the Simox grasping metric

had very high signal variance inside small input space vicinities (see figure 4.7).

0.5 0.55 0.6 0.650.4

0.42

0.44

0.46

0.48

0.5

0.52

0.54

0.56

0.58

Figure 4.7: Simox metric profile for the water bottle. Horizontal axis represents δx and the vertical axis representsthe metric value. The default pose was chosen arbitrarily.

This aspect may worsen BO performance and, specifically, can affect with higher degree BO-

32

TGP. Since each TGP leaf has much less points, hyper-parameter optimization performance gets

more susceptible to data distribution.

As a final experiment for the BO-TGP subsection we decided to see how BO-TGP and BO-GP

is affect by the presence of observational noise, with the objective to simulate the grasping metric in

figure 4.7.

The signal noise is defined as y = f (x)− |εy| with εy ∼ N (0, σε).

(a) εy = 0 (b) εy ∼ N (0, 0.2)

Figure 4.8: GP Vs. TGP Results. Simox Simulation Environment.Black - BO-GP. Blue - BO-TGP and w1. Green - BO-TGP and w2. Red - BO-TGP and w3. Magenta - BO-TGPand w∞. Dashed Red - First TGP node split. Shaded regions represent the standard deviation of the bestobservation at each iteration.

y (x∗)± σ (y (x∗))Function GP TGP w1 TGP w2 TGP w3 TGP w∞

εy = 0 5.3279 ±0.0642

5.3993 ±0.0638

5.4472 ±0.0621

5.4375 ±0.0627

5.7305 ±0.0128

εy ∼ N (0, 0.2) 5.2759 ±0.0773

5.2074 ±0.0781

5.2118 ±0.0831

5.3287 ±0.0674

5.2039 ±0.0629


The results from table 4.3 and 4.8 show a similar tendency from the ones observed in 4.4.3. All

BO optimizations suffered from the presence of noise εy, but BO-TGP performance seems to suffer a

great decrease of performance in presence of observational noise compared to BO-GP (BO-GP was

better except for one case).

4.5 Results BO Vs. UBO

4.5.1 Motivation

Before presenting and reviewing all experimental results, we show a regression example for the

RKHS function to illustrate the practical differences between the unscented expected improvement

33

and the expected improvement. See figure 4.9.

(a) Expected Improvement (b) Unscented Expected Improvement

Figure 4.9: RKHS Posterior - σx = 0.01

In this example, we took 40 random samples of the RKHS function into a Gaussian Process. In

sub-figure (a) we can see the expected value and variance of the objective function given by the GP,

while on sub-figure (b), the expected value and variance for the unscented criterion, given by:

Uyq =

m∑i=1

ω(i)yq

Uσ2q =

m∑i=1

ω(i)σ2q

(4.1)

In comparison, one can see that the unscented expected values are smoother for both GP’s mean

and variance. But the most important feature is that the expected mean value for the global maximum

(which has high risk) is now lower than the value for the local maximum at x ≈ 0.078. So, for the

unscented expected improvement the local maximum at x ≈ 0.078 will be considered the learning

optimum and its vicinity will be chosen more frequently, in comparison with the expected improvement

criterion.

4.5.2 Synthetic Functions

To reproduce the effect of the input noise, we queried the result function using 100 Monte Carlo

samples according the input noise distribution at each iteration. By analyzing the outcome of the sam-

ples we can estimate the sample’s mean from the query xq (ymean (xmci )) and the sample’s variance

of the optimum (ystd (xmci )).

We have performed 100 runs of Bayesian Optimization for both functions (RKHS, GM) and the

optimization procedure (BO and UBO).

For RKHS each run has 5 initial random samples and the optimization performs 45 iterations. The

input noise is set as σx = 0.01. For GM each run has 30 initial samples and the optimization performs

34

90 iterations. The input noise is set as σx = 0.1. All other optimization parameters are identical to the

BO-TGP vs BO-GP experiments.

In Fig. 4.10 and Fig. 4.11 we show the statistics over the different runs for the evaluation criteria

with respect to the number of iterations. The shaded region represents the 95% confidence interval.

(a) ymc (x∗) (b) std (ymc (x∗))

Figure 4.10: RKHS Results


Figure 4.11: GM Results

For both functions, we can observe that UBO quickly overcomes the results of BO. As soon as the

random exploration phase finishes and the optimization starts, the UBO computes less risky solutions,

as demonstrated by the higher expected return value and lower standard deviation. In table 4.4 we

35

show the numeric results obtained at the last iteration. We also show values for the worst sample of

the Monte Carlo runs3. The worst case for UBO is always more favorable than the worst case for BO

by a large margin.

4.5.3 Robot Grasp Simulations

We have performed 30 runs of the robotic grasp simulation for each object and each optimization

criterion. The robot hand posture with respect to the objects is initialized as shown in Fig. 4.2. The

input search space, in this case, was composed by (δx, δy).

Each run starts with 40 initial random samples and proceeds with 60 iterations of Bayesian op-

timization, for a total of 100. In this case, we assume that the function is stochastic, due to small

simulation numerical errors, assuming σy = 10−4. Also, we assume an input noise σx = 0.03 (note

that the input space was normalized in advance to the unit hyper-cube [0, 1]d). In each iteration we

sample 20 times at the query point with input noise to compute the expected outcome. The results can

be observed in figures 4.12, 4.13, 4.14 and 4.15. We note that the plots seem noisier than with the

synthetic functions. This fact is due to a lower number of samples at the query points, for the sake of

computation time, as each one required to run the full grasp simulation. Note also that those samples

are only used for evaluation purposes and would not be required for the optimization process.


Figure 4.12: Water bottle. Input Space Noise σx = 0.01

It can be observed that, for the water bottle and glass, the UBO method has clear advantages

over BO. As soon as the initial sampling phase finishes, UBO obtains higher mean values and lower

standard deviations. For the drill, the UBO eventually overcomes the BO, but at later iterations, which

might imply that the unsafe optimum is difficult to find, but still exists. Looking at the quantitative

results shown in Table 4.4, we can see that, at the end of the optimization, UBO is better than BO3Worst cases are not shown graphically due to lack of space, but they are coherent with the evolution of the means.

36


Figure 4.13: Mug. Input Space Noise σx = 0.03


Figure 4.14: Glass. Input Space Noise σx = 0.03

37


Figure 4.15: Drill. Input Space Noise σx = 0.03

in all criteria, except for the mean output value for the mug. For the mug, the 100 iteration are not

enough to obtain better mean values. We can see that the mug and drill objects are more challenging

due to their non-rotational symmetry. Since the optimization is only done in translation parameters,

the method is missing exploration in the rotation degrees of freedom. Furthermore, in the mug’s

case, the facet chosen was the one that contains the mug’s handle. Trying to learn a grasp in this

setting is much harder than the other cases since, for the same input space volume, the percentage

of configurations which return a good metric is much smaller. For the water bottle and the glass, the

rotational degrees of freedom are not so important because the objects are rotational symmetric.

In Fig. 4.16 we illustrate four grasps at the water bottle explored during the experiments. Two

of the grasps are performed in a safe region while the two other are explored at a unsafe region.

Although the unsafe zone has one observation with the highest value, it has also higher risk of getting

a low value in its vicinity.

As a final experiment we decided to run UBO vs BO for the glass object and use all learning

variables used as in section 4.4.3. We have obtained similar results. See figure 4.17.

38

(a) Safe-zone, y = 0.413 (b) Safe-zone, y = 0.418

(c) Unsafe-zone, y = 0.439 (d) Unsafe-zone, y = 0.377

Figure 4.16: Grasp safety. In this example the best grasp is at an unsafe zone (c). However, a bad grasp is inits vicinity (d). The unscented Bayesian optimization chooses grasps with lower risk at the safe zone (a) and (b),where performance is robust to input noise.

39


Figure 4.17: Glass. 7 Dimensions. Input Space Noise σx = 0.03

ymc (x∗) worst ymc (x∗) std (ymc (x∗))Synthetic Problems

Function BO UBO BO UBO BO UBORKHS 4.863 4.934 2.881 4.657 0.554 0.065GM 0.080 0.093 0.023 0.053 0.027 0.014

Simulation - SimoxObject BO UBO BO UBO BO UBOBottle 0.550 0.567 0.390 0.430 0.077 0.065Mug 0.119 0.114 0.051 0.059 0.029 0.027Glass 0.421 0.452 0.080 0.252 0.184 0.087Drill 0.101 0.108 0.050 0.068 0.030 0.018


40

5Conclusions

41

This dissertation presented novel Bayesian Optimization methods for robust grasping, namely

Gaussian Processes Vs. Treed Gaussian Processes and Bayesian Optimization Vs. Unscented

Bayesian Optimization. Both contributions were proposed to answer classical BO problems: het-

eroscedasticity and proper input noise modeling. These problems affect performance of BO in many

active learning problems. The specific application here was to find the optimal way to grasp an object

(using the Simox simulator).

With the work developed during this dissertation, we have also contributed with a paper for IROS

2016 International Conference [35] and we’ve worked with Ruben Martinez-Cantin to augment his

Bayesian Optimization C++ toolbox, with BO-TGP and UBO implementations (https://github.com/

josemscnogueira/bayesopt/tree/tgp_cuei).

Results shown that both implementations outperformed classic BO for the synthetic problem

cases. BO-TGP achieves better results when the objective functions presents multiple local optima

and different smoothness behaviors while UBO finds much safer optima when we consider the pres-

ence of input noise.

These results remained true for the Simox Simulated experiments. However, in the BO-TGP we

saw that observational noise affects greatly its performance. We verified that the Simox Simulation

environment had indeed this type of noise which explained why the results were not as expected

compared to the synthetic ones. It would be interesting, in the future, to include some sort of high-

frequency noise reduction for the Simox grasping environment (like a soft touch implementation).

We couldn’t conclude that using higher degrees of independency between TGP leaf’s data for lack

of consistent results. Despite this, our synthetic results (and some from Simox’s experiments) suggest

that less data correlation between TGP leaf’s leads to higher valued optima selection when different

smoothness characteristics of the objective function are present.

UBO results demonstrated that using plain BO may lead to inconsistent final optima values when

consider input noise. We have shown that using UBO, while considering input noise conditions, leads

to better average optima when comparing with BO.

We presented these two methods with the main purpose of perform robust and safe grasping

of unknown objects by haptic exploration. The potential interest of both these methods goes be-

yond grasping or even robotics. Bayesian optimization is currently being used in many applications:

engineering, computer sciences, economics, simulations, experimental design, biology, artificial in-

telligence. In all those fields, there are many situations where input noise, heteroscedasticity or

uncertainty may arise, and in which safe optimization is therefore fundamental.

Although this dissertation’s work has reached an end, we would like to propose several objec-

tives/goals that would continue our line of research and pursue some problems left to solve:

• We’ve shown in 4.4.3.A that Simox’s grasp metric had very high signal variance. This happens

since the Simox simulator only uses one contact point per finger node. Bayesian Optimization

would benefit greatly if soft touch was implemented (more contact points per surface). It would

also be worthwhile to explore better Simox alternatives.

42

https://github.com/josemscnogueira/bayesopt/tree/tgp_cuei

https://github.com/josemscnogueira/bayesopt/tree/tgp_cuei

• TGPs research has yet to be completed. Tree construction uses a recursive construction algo-

rithm which does not guarantee that the final tree minimizes overall node uncertainty (eq. 3.8).

Also, it uses brute-force search to select splitting points which scales with the number of sam-

ples and input space’s dimensionality. There’s also no rule for selecting the minimum number of

samples per leaf and how this changes the learning process.

• TGPs results should be compared with other heteroscedastic models. Specifically, ones that

explore a jointly distributed objective function and hyper-parameters.

• UEI could be changed to also use information about sigma points’ variance instead of using

only the mean value. It would be worthwhile to explore other learning criteria that consider input

noise and compare it to UEI.

43

Bibliography

[1] J. Mockus. Application of bayesian approach to numerical methods of global and stochastic

optimization. Journal of Global Optimization, 4(4):347–365, June 1994.

[2] D. Jones, M. Schonlau, and W. Welch. Efficient global optimization of expensive black-box func-

tions. Journal of Global Optimization, 13(4):455–492, December 1998.

[3] E. Brochu, V. Cora, and N. Freitas. A tutorial on bayesian optimization of expensive cost func-

tions, with application to active user modeling and hierarchical reinforcement learning. Technical

report, December 2010. arXiv:1012.2599.

[4] D. Luo, Y. Wang, and X. Wu. Active online learning of the bipedal walking. Technical report,

Peking University, 2011.

[5] N. Srinivas, A. Krause, S. Kakade, D. Matheson, and M. Seeger. Gaussian process optimiza-

tion in the bandid setting: No regret and experimental design. Technical report, June 2010.

arXiv:0912.3995v4.

[6] Z. Wang, M. Zoghu, N. Freitas, D. Matheson, and F. Hutter. Bayesian optimization in a billion

dimensions via random embeddings. Technical report, January 2013. arXiv:1301.1942v1.

[7] E. Brochu, T. Brochu, and N. Freitas. A bayesian interactive optimization approach to procedural

animation design. Technical report, University of British Columbia, 2010.

[8] R. Martinez-Cantin, N. Freitas, and J. Castellanos. Analysis of particle methods for simultaneous

robot localization and mapping and a new algorithm: Marginal-slam. Robotics and Automation,

2007 IEEE International Conference on Robots and Automation, pages 2415–2420, April 2007.

[9] F. Veiga. Robotic grasp optimization from contact force analysis. Master’s dissertation in electri-

cal and computer engineering, Institudo Superior Tecnico, April 2012.

[10] S. Dragiev, M. Toussaint, and Gienger M. Uncertainty aware grasping and tactile exploration.

2013 IEEE International Conference on Robotics and Automation (ICRA), pages 113–119, May

2013.

[11] L. Montesano and M. Lopes. Active learning of visual descriptors for grasping using non-

parametric smoothed beta distributions. Robotics and Autonomous Systems Journal, August

2011.

44

[12] Ruben Martinez-Cantin. BayesOpt: A Bayesian optimization library for nonlinear optimization,

experimental design and bandits. Journal of Machine Learning Research, 15:3735–3739, 2014.

[13] C. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. The MIT Press,

2006. ISBN 026218253X.

[14] Kushner H. A new method of locating the maximum of an arbitrary multipeak curve in the pres-

ence of noise. Journal of Basic Engineering, 86(1):97–106, March 1964.

[15] J. Mockus, V. Tiesis, and A. Zilinskas. Toward Global Optimization. Dixon, 2nd edition, 1978.

Chapter: The Application of Bayesian Methods for Seeking the Extremum.

[16] Finkel D. Direct optimization algorithm user guide. Technical report, Center for Research in

Scientific Computation, North Carolina State University, Raleigh, NC 27695-8205, March 2003.

[17] V. Vahrenkamp. Simox - a robotics toolbox for simulation, motion and grasp planning. http:

//simox.sourceforge.net/, 2013. Accessed: 2015-05-24.

[18] M. Henriques. Controlo e planeamento de maos roboticas antropomorficas utilizando sinergias.

Master’s dissertation in mathematics and applications, Institudo Superior Tecnico, July 2013.

[19] M. Tesch, J. Schneider, and H. Choset. Expensive function optimization with stochastic binary

outcomes. Technical report, Robotics Institute, Carnegie Mellon University, 2013.

[20] J. McHutchon and C. Rasmussen. Gaussian process training with input noise. Technical report,

Cambrige University, 2011.

[21] Q. Le, A. Smola, and S. Canu. Heteroscedastic gaussian process regression. Technical report,

Australian National University, 0200 ACT, Australia, 2005.

[22] K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard. Most likely heteroscedastic gaussian pro-

cess regression. Technical report, University of Freiburg, 2007.

[23] S. Kuindersma, R. Grupen, and A. Barto. Variational bayesian optimization for runtime risk-

sensitive control. Technical report, University of Massachussets Amherst, 2012.

[24] B. Damas and J. Santos-Victor. Online learning of single and multi-valued functions with an

infinite mixture of linear experts. Neural Computation, 25(11), November 2013.

[25] Jasper Snoek, Hugo Larochelle, and Ryan Adams. Practical Bayesian optimization of machine

learning algorithms. In NIPS, pages 2960–2968, 2012.

[26] J. Assel, Z. Wang, B. Shahriari, and N. Freitas. Heteroscedastic treed bayesian optimizaion.

Technical report, March 2015. arXiv:1410.7172v2.

[27] E. Wan and R. Van Der Merwe. The unscented kalman filter for nonlinear estimation. In Proceed-

ings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control

Symposium (Cat. No.00EX373). Institute of Electrical & Electronics Engineers (IEEE), 2000. doi:

10.1109/asspcc.2000.882463. URL http://dx.doi.org/10.1109/ASSPCC.2000.882463.

45

http://simox.sourceforge.net/

http://simox.sourceforge.net/

http://dx.doi.org/10.1109/ASSPCC.2000.882463

[28] S. Julier and J. Uhlmann. Unscented filtering and nonlinear estimation. Proceedings of the

IEEE, 92(3):401–422, March 2004. doi: 10.1109/jproc.2003.823141. URL http://dx.doi.org/

10.1109/JPROC.2003.823141.

[29] R. van der Merwe. Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-

Space Models. PhD thesis, OGI School of Science & Engineering, Oregon Health & Science

University, April 2004.

[30] K. Ito and K. Xiong. Gaussian filters for nonlinear filtering problems. IEEE Transactions on

Automatic Control, 45(5):910–927, 2000.

[31] M. Nørgaard, N.K. Poulsen, and O. Ravn. New developments in state estimation for nonlinear

systems. Automatica, 36(11):1627–1638, November 2000.

[32] S. Julier and J.K. Uhlmann. The scaled unscented transformation. In IEEE American Control

Conf., pages 4555–4559, Anchorage AK, USA, 8–10 May 2002.

[33] Z. Wang, J. Assel, and N. Freitas. Rkhs 1d function for bayesian optimization tasks. https:

//github.com/iassael/function-rkhs, 2014. Accessed: 2015-10-03.

[34] R. Gramacy and H. Lee. Bayesian treed gaussian process models. Technical report, University

of California, Santa Cruz, 2006. arXiv:0710.4536.

[35] Jose Nogueira, Ruben Martinez-Cantin, Alexandre Bernardino, and Lorenzo Jamone. Unscented

bayesian optimization for safe robot grasping. In 2016 IEEE/RSJ International Conference on

Intelligent Robots and Systems (IROS). IEEE, oct 2016. doi: 10.1109/iros.2016.7759310. URL

https://doi.org/10.1109/iros.2016.7759310.

46

http://dx.doi.org/10.1109/JPROC.2003.823141

http://dx.doi.org/10.1109/JPROC.2003.823141

https://github.com/iassael/function-rkhs

https://github.com/iassael/function-rkhs

https://doi.org/10.1109/iros.2016.7759310

Active Learning for Robot Exploration · Bayesian Optimization is an optimization technique for...

Documents

Transcript of Active Learning for Robot Exploration · Bayesian Optimization is an optimization technique for...