Differentiable Gaussian Process Motion Planning

7
Differentiable Gaussian Process Motion Planning Mohak Bhardwaj 1 , Byron Boots 1 , and Mustafa Mukadam 2 1 University of Washington, 2 Facebook AI Research Abstract— Modern trajectory optimization based approaches to motion planning are fast, easy to implement, and effec- tive on a wide range of robotics tasks. However, trajectory optimization algorithms have parameters that are typically set in advance (and rarely discussed in detail). Setting these parameters properly can have a significant impact on the practical performance of the algorithm, sometimes making the difference between finding a feasible plan or failing at the task entirely. We propose a method for leveraging past experience to learn how to automatically adapt the parameters of Gaussian Process Motion Planning (GPMP) algorithms. Specifically, we propose a differentiable extension to the GPMP2 algorithm, so that it can be trained end-to-end from data. We perform several experiments that validate our algorithm and illustrate the benefits of our proposed learning-based approach to motion planning. I. I NTRODUCTION Robot motion planning is a challenging problem, as it requires searching for collision-free paths while satisfying robot and task-related constraints for high-dimensional sys- tems with limited on-board computation. Trajectory opti- mization is a powerful approach to effectively solving the planning problem and state-of-the-art algorithms can find smooth, collision free trajectories in almost real-time for complex systems such as robot manipulators [1]–[3]. Al- though these approaches are easy to implement and generally applicable to a wide range of tasks, they have certain parame- ters which can strongly affect their performance in practice. This leads to two major problems: (i) there is no formal way of setting parameters for a given task and thus requires manual tuning that can be time-consuming and arduous; and (ii) the planner needs to be re-tuned if the distribution of obstacles in the environment changes significantly, making it brittle in practice, i.e. a planner that works well on one type of environment might completely fail on another. The above issues need to be addressed in order to create flexible robotic systems that can work seamlessly across a variety of environments and tasks. In order to do so, we ask the following question: can we leverage past experience to learn parameters of the planner in a way that directly improves its expected performance over the distribution of problems it encounters? In this work, we focus on GPMP2 [4], a state-of-the-art motion planning algorithm that formulates trajectory opti- mization as inference on a factor graph and finds solutions by solving a nonlinear least squares optimization problem, This work was done while all authors were affiliated with the Georgia Institute of Technology. This work was supported in part by ARL DCIST CRA W911NF-17-2-0181. The authors would also like to thank Alexander Lambert for help with code for generating planning environments. Factor Graph Linearize LinSolve Aδ= b δi yes no i+1 0 φF φL i+1 L C P ENV i+1 ENV IMG+SDF W φL i Fig. 1: The computational graph of dGPMP2 where φF are user defined planning parameters that are fixed and φL are learned planning parameters. See Section IV for details. where the inverse covariances of the factors manifest as weights in the objective function. While GPMP2 has been shown to be a leading optimization-based approach to motion planning [3], in Section III-B we illustrate its sensitivity to its objective function parameters (specifically factor covari- ances). To contend with this problem, we leverage the key insight that GPMP2 can be rebuilt as a fully differentiable computational graph and learn the parameters for its ob- jective function from data in an end-to-end fashion. This allows us to develop a learning strategy that can improve GPMP2’s performance on a given distribution of problems. Our differentiable version can be trained in a self-supervised manner or from expert demonstrations to predict covariances that are time and space varying, in contrast to fixed, hand- tuned covariances, as used in the vanilla approach. Building on top of a structured planner offers interpretability and allows us to explicitly incorporate planning constraints such as collision avoidance and velocity limits. This work is in- tended as a preliminary investigation into learning structured planners from high-dimensional inputs. We perform several experiments in simulated 2D environments to demonstrate the benefits of our approach which we call Differentiable GPMP2 (dGPMP2), illustrated in Fig. 1. II. RELATED WORK Machine learning has been used to accelerate motion planning by combining reinforcement learning (RL) with sampling based planning [5], learning cost functions from demonstration [6], learning efficient heuristics [7], and learn- ing collision checking policies [8]. As machine learning becomes more accessible, there has been a growing interest in using deep learning for planning such as end-to-end networks to perform value iteration [9] or learning a latent space embedding and a dynamics model that is suitable for planning by gradient descent within a goal directed pol- icy [10]. Such approaches have demonstrated that learning

Transcript of Differentiable Gaussian Process Motion Planning

Page 1: Differentiable Gaussian Process Motion Planning

Differentiable Gaussian Process Motion Planning

Mohak Bhardwaj1, Byron Boots1, and Mustafa Mukadam2

1University of Washington, 2Facebook AI Research

Abstract— Modern trajectory optimization based approachesto motion planning are fast, easy to implement, and effec-tive on a wide range of robotics tasks. However, trajectoryoptimization algorithms have parameters that are typicallyset in advance (and rarely discussed in detail). Setting theseparameters properly can have a significant impact on thepractical performance of the algorithm, sometimes making thedifference between finding a feasible plan or failing at the taskentirely. We propose a method for leveraging past experience tolearn how to automatically adapt the parameters of GaussianProcess Motion Planning (GPMP) algorithms. Specifically, wepropose a differentiable extension to the GPMP2 algorithm,so that it can be trained end-to-end from data. We performseveral experiments that validate our algorithm and illustratethe benefits of our proposed learning-based approach to motionplanning.

I. INTRODUCTION

Robot motion planning is a challenging problem, as itrequires searching for collision-free paths while satisfyingrobot and task-related constraints for high-dimensional sys-tems with limited on-board computation. Trajectory opti-mization is a powerful approach to effectively solving theplanning problem and state-of-the-art algorithms can findsmooth, collision free trajectories in almost real-time forcomplex systems such as robot manipulators [1]–[3]. Al-though these approaches are easy to implement and generallyapplicable to a wide range of tasks, they have certain parame-ters which can strongly affect their performance in practice.This leads to two major problems: (i) there is no formalway of setting parameters for a given task and thus requiresmanual tuning that can be time-consuming and arduous; and(ii) the planner needs to be re-tuned if the distribution ofobstacles in the environment changes significantly, makingit brittle in practice, i.e. a planner that works well on onetype of environment might completely fail on another. Theabove issues need to be addressed in order to create flexiblerobotic systems that can work seamlessly across a varietyof environments and tasks. In order to do so, we ask thefollowing question: can we leverage past experience to learnparameters of the planner in a way that directly improvesits expected performance over the distribution of problems itencounters?

In this work, we focus on GPMP2 [4], a state-of-the-artmotion planning algorithm that formulates trajectory opti-mization as inference on a factor graph and finds solutionsby solving a nonlinear least squares optimization problem,

This work was done while all authors were affiliated with the GeorgiaInstitute of Technology. This work was supported in part by ARL DCISTCRA W911NF-17-2-0181. The authors would also like to thank AlexanderLambert for help with code for generating planning environments.

✓0 ✓1 ✓2 ✓3 ✓4

Factor GraphLinearize LinSolve

A�✓ = b<latexit sha1_base64="9GVyuxiiP2LclE7iCTt+53/G8Ns=">AAAFJnicnVTNbtQwEHbbBUr528INLgt7QVW72bQHTogieuCCKBLbVqpX1cRxEmv9J9uhW0WReBrEDSTegxtC3HgIHgBnuyqKF4HUuWTy5ZvxzDcTJ5oz64bDH0vLK50rV6+tXl+7cfPW7Tvd9bsHVpWG0BFRXJmjBCzlTNKRY47TI20oiITTw2Tyovl++I4ay5R86840HQvIJcsYAeehk+59/LyHU8od4ERgV1AHvac9nGS95KTbHw6GM+stOvHc6T/7sjOz/ZP1lV84VaQUVDrCwdrjeKjduALjGOG0XsOlpRrIBHJaOSaobUMgrABXbPqnPRNJwKfSKtPGcgO6YGTaRgnoprc2qLwImpE2aJTzMsi8jXpm4mERZG2kbEN+CLyNJEGQpSRVro1lXEEAlSbIQ5gL5LJl8te+ptZ79L8nnHqlMuYbDQXPlHSLY3BF0EgSdtYMyinFg1gqS+FrD8g+3Yy76Z0tQ61XPSgZeK4MWzj2Ag4nR5RotqzpR9LT5g1kWuEyAVMfx2PvyZQaIMSTKtygVT+ufY496jfT0Fe+/NfaM5wyGxUGkwsma7+pOX6IG/dfTJheMKHZvT8VbOCUZVmFG3WUruoZxRVGVGnd1DpfWD8zV1RVNLL+z4xE6QXJQJQTSEFEe9ROnNKRoVpF2iitLPCtOSnyQ7TRLBeTunSXymELZRwpZ3O/fJZ5y7b290Qc3gqLzsH2IN4ZbL8Z9ncH6NxW0QP0CD1GMXqCdtFLtI9GiKD36AP6hD53Pna+dr51vp9Tl5fmMfdQyzo/fwOyNNjl</latexit>

�✓<latexit sha1_base64="OvzquQYEC1GU2rM+mK7HAQcCy80=">AAAFG3icnVTNbtQwEHbbBUr5aQtHLsBeUNVuNu2BI5XogQuiSGxbqVmtJo6TWOs/2Q5tFeVJEDd4Ah6BG+LKgffgAZhsV0XxIpDqSyZfPo/n+2bi1Aju/HD4c2l5pXfj5q3V22t37t67v76x+eDI6cpSNqJaaHuSgmOCKzby3At2YiwDmQp2nE5ftt+P3zPruFbv/IVhYwmF4jmn4BGabKwnGRMeklQmvmQeJhv94WA4W48Xg3ge9F982Zutw8nmyq8k07SSTHkqwLnTeGj8uAbrORWsWUsqxwzQKRSs9lwy14VAOgm+3Manu5BpwGfKadvFCgum5PS8i1IwraAuqFG54bQLWu1Ruyq6KDJThGWQtfWvC6HzooukwSbHaKZ9F8uFhgCqbJCHch/Y5ar0r7rOHUbsvyecoVM5R6Gh4blWfrENvgyEpKGytlFeaxHsZaqSWHtAxnQz7jYGO5Y5dD0oGUShLV849goOO0e1bKes1aPYWfsGKquTKgXbnMZjjFTGLFCKpDpp0bofN5jjgOFkWvYay39jkOG13aoTsIXkqsFJLZInSRv+iwnnV0xoZ+9PBVtJxvO8Tlp3tKmbGcWXVtZZ09Y6H1jsmS/rOho5/B0jWaEhOchqChnI6IC5qdcmsszoyFhttAOxMydF2EQXzXJxZSp/rRyu1NbTatb362eZS3YN3hNxeCssBke7g3hvsPt22N8fkMu1Sh6Rp+QZiclzsk9ekUMyIpRU5AP5RD73Pva+9r71vl9Sl5fmex6Szur9+A1Nc9WX</latexit>

✓i<latexit sha1_base64="/3oTxRrAsmLtsadQvHQ3poer0lA=">AAAFFXicnVRNb9QwEHXbAKV8tXDkAuwFVe0maQ8cqUQPXBBFYttKm1JNHCexNv6Q7dCtovwMxA1+BGduiCtn/gc/gMl2VRQvAqm+ZPLyPJ73ZuJUV9y6KPq5tLwSXLt+Y/Xm2q3bd+7eW9+4f2hVbSgbUVUpc5yCZRWXbOS4q9ixNgxEWrGjdPKi+370nhnLlXzrzjU7EVBInnMKDqFxkorElczBO366PoiG0Ww9WgzieTB4/mV3tg5ON1Z+JZmitWDS0QqsHceRdicNGMdpxdq1pLZMA51AwRrHBbN9CIQV4MotfNpzkXp8Jq0yfawwoEtOp32Ugu6k9EGFmjWnfdAoh6pl0UeRmSIsvKydc30IPa/6SOptsoxmyvWxvFLgQbXx8lDuPLtsnf5V19RixP57whk6lXMU6hueK+kW2+BKT0jqK+sa5ZSqvL1M1gJr98iYbsbdwmDbMIuueyVDVSjDF469hP3OUSW6Kev0SHbWvYHMmqROwbTj+AQjmTEDlCKpSTq0GcQt5thnOJmGvcLyX2tkOGU2mwRMIbhscVKL5HHShf9iwvSSCd3s/algM8l4njdJ547STTujuNKIJmu7WucDiz1zZdOEI4s/YihqNCQHUU8gAxHuMztxSoeGaRVqo7SyUG3PSSE20YazXFzq2l0phy2VcbSe9f3qWeaSbYv3ROzfCovB4c4w3h3uvIkGe0NysVbJQ/KEPCUxeUb2yEtyQEaEEkU+kE/kc/Ax+Bp8C75fUJeX5nsekN4KfvwGNPTTnw==</latexit>

yes

no

✓i+1<latexit sha1_base64="Zlf+erzBesgPEcZ79lRhobSo6Ws=">AAAFG3icnVTNbtQwEHbbBUr56RaOXIC9oNJuNu2BI5XogQuiSGxbqSnVxJkk1sY/sh3aKsqTIG7wBDwCN8SVA+/BA2BvV0XxIpDqSyZfPo/n+2biVFXM2NHo58LiUu/a9RvLN1du3b5zd7W/dm/fyFpTHFNZSX2YgsGKCRxbZis8VBqBpxUepJMX/vvBe9SGSfHWnis85lAIljMK1kEn/dUk5Ykt0cK7hj2N25P+YDQcTdfD+SCeBYPnX7ana+9kbelXkklacxSWVmDMUTxS9rgBbRmtsF1JaoMK6AQKbCzjaLoQcMPBlhvuac55GvBRGKm7WKFBlYyedVEKygvqgtIpV4x2QS2t0y6KLuqYqYN5kNX714Wc81UXSYNNBmkmbRfLKwkBVOsgD2U2sMvU6V91nRkX4X9POHVO5cwJDQ3PpbDzbbBlICQNlflGWSmrYC+KmrvaA7JLN+VuuGBTo3GuByVDVUjN5o69hMPOUcn9lHk9Ak/9G4isSeoUdHsUH7tIZKiBUkdqEo82g7h1OXbRTabGV67818oxrNTrTQK64Ey0blKL5FHiw38x4eySCX72/lSwnmQsz5vEuyNV004pttS8yVpf62xgXc9s2TTR2LjfMeK1MyQHXk8gAx7toplYqSKNSkZKSyUNVJszUuSaaKJpLiZUba+Uw5RSW1pP+371LDPJxt8TcXgrzAf7W8N4e7j1ZjTYGZKLtUwekMfkCYnJM7JDXpI9MiaU1OQD+UQ+9z72vva+9b5fUBcXZnvuk87q/fgN8C3VTA==</latexit>

✓0<latexit sha1_base64="Q4YxWOY3oBI+Vdbph2Z+WvA6k+E=">AAAFFXicnVRNb9QwEHXbAKV8tXDkAuwFVe0maQ8cqUQPXBBFYttKm1JNHCexNv6Q7dCtovwMxA1+BGduiCtn/gc/gMl2VRQvAqm+ZPLyPJ73ZuJUV9y6KPq5tLwSXLt+Y/Xm2q3bd+7eW9+4f2hVbSgbUVUpc5yCZRWXbOS4q9ixNgxEWrGjdPKi+370nhnLlXzrzjU7EVBInnMKDqFxkorElczBu+h0fRANo9l6tBjE82Dw/MvubB2cbqz8SjJFa8GkoxVYO44j7U4aMI7TirVrSW2ZBjqBgjWOC2b7EAgrwJVb+LTnIvX4TFpl+lhhQJecTvsoBd1J6YMKNWtO+6BRDlXLoo8iM0VYeFk75/oQel71kdTbZBnNlOtjeaXAg2rj5aHceXbZOv2rrqnFiP33hDN0Kuco1Dc8V9IttsGVnpDUV9Y1yilVeXuZrAXW7pEx3Yy7hcG2YRZd90qGqlCGLxx7Cfudo0p0U9bpkeysewOZNUmdgmnH8QlGMmMGKEVSk3RoM4hbzLHPcDINe4Xlv9bIcMpsNgmYQnDZ4qQWyeOkC//FhOklE7rZ+1PBZpLxPG+Szh2lm3ZGcaURTdZ2tc4HFnvmyqYJRxZ/xFDUaEgOop5ABiLcZ3bilA4N0yrURmllodqek0Jsog1nubjUtbtSDlsq42g96/vVs8wl2xbvidi/FRaDw51hvDvceRMN9obkYq2Sh+QJeUpi8ozskZfkgIwIJYp8IJ/I5+Bj8DX4Fny/oC4vzfc8IL0V/PgNK7bTZg==</latexit>

�F<latexit sha1_base64="pVJuzf++8PVYXkjs9JXByTZQTdA=">AAAFE3icnVTLbtQwFHWnAy3l0RaWbICREKrayaQs2FaiQmwQRWLaomY0chwnscYv2Q6dKspXIHbwJewQGxZ8AN+AxIoP4Ho6KooHgVRvcnNyfH3PuTdONWfWDQbflzrL3StXV1avrV2/cfPW+sbm7UOrKkPokCiuzHGKLeVM0qFjjtNjbSgWKadH6eSp/370lhrLlHztzjQdCVxIljOCHUBvklQkumTjZ+ON3qA/mK17i0E8D3p7Kz9/6IedrwfjzeVfSaZIJah0hGNrT+KBdqMaG8cIp81aUlmqMZnggtaOCWrbEBZWYFduw9OeiTTgU2mVaWOFwVApmbZRgrUX0gYVKNaMtEGjHGiWRRsFZgqwCLJ639oQOM7bSBpsspRkyrWxnCscQJUJ8hDmArtslf5V19RCRP97wik4lTMQGhqeK+kW2+DKQEgaKvONckrxYC+VlYDaAzKkm3G3Idgx1ILrQcmYF8qwhWMv4LBzRAk/ZV6PpKf+DcusTqoUm+YkHkEkM2owIUCqE4/WvbiBHPsUJtPQF1D+Sw0Mp8xWnWBTCCYbmNQiuZ/48F9MPL1gYj97fyrYSjKW53Xi3VG6bmYUVxpRZ42vdT6w0DNX1nU0tPAbRqICQ3IsqgnOsIj2qZ04pSNDtYq0UVpZzHfmpAiaaKNZLiZ15S6Vw5bKOFLN+n75LHPJtoF7Ig5vhcXgcLcfP+7vvoILo4/O1yq6ix6gRyhGT9Aeeo4O0BARJNA79AF97L7vfup+7n45p3aW5nvuoNbqfvsNeofT8Q==</latexit>

�L<latexit sha1_base64="C/e3r24uyAyyeEeFCkPOM2PSMB0=">AAAFE3icnVTLbtQwFHWnAy3l0RaWbICREKrayaQs2FaiCxYgisS0Rc1o5DhOYo1fsh06VZSvQOzgS9ghNiz4AL4BiRUfwPV0VBQPAqne5Obk+Pqec2+cas6sGwy+L3WWu1eurqxeW7t+4+at9Y3N24dWVYbQIVFcmeMUW8qZpEPHHKfH2lAsUk6P0slT//3oLTWWKfnanWk6EriQLGcEO4DeJKlIdMnGz8cbvUF/MFv3FoN4HvT2Vn7+0A87Xw/Gm8u/kkyRSlDpCMfWnsQD7UY1No4RTpu1pLJUYzLBBa0dE9S2ISyswK7chqc9E2nAp9Iq08YKg6FSMm2jBGsvpA0qUKwZaYNGOdAsizYKzBRgEWT1vrUhcJy3kTTYZCnJlGtjOVc4gCoT5CHMBXbZKv2rrqmFiP73hFNwKmcgNDQ8V9IttsGVgZA0VOYb5ZTiwV4qKwG1B2RIN+NuQ7BjqAXXg5IxL5RhC8dewGHniBJ+yrweSU/9G5ZZnVQpNs1JPIJIZtRgQoBUJx6te3EDOfYpTKahL6D8lxoYTpmtOsGmEEw2MKlFcj/x4b+YeHrBxH72/lSwlWQsz+vEu6N03cworjSizhpf63xgoWeurOtoaOE3jEQFhuRYVBOcYRHtUztxSkeGahVpo7SymO/MSRE00UazXEzqyl0qhy2VcaSa9f3yWeaSbQP3RBzeCovB4W4/ftzffQUXRh+dr1V0Fz1Aj1CMnqA99AwdoCEiSKB36AP62H3f/dT93P1yTu0szffcQa3V/fYblnHT9w==</latexit>

✓i+1<latexit sha1_base64="Zlf+erzBesgPEcZ79lRhobSo6Ws=">AAAFG3icnVTNbtQwEHbbBUr56RaOXIC9oNJuNu2BI5XogQuiSGxbqSnVxJkk1sY/sh3aKsqTIG7wBDwCN8SVA+/BA2BvV0XxIpDqSyZfPo/n+2biVFXM2NHo58LiUu/a9RvLN1du3b5zd7W/dm/fyFpTHFNZSX2YgsGKCRxbZis8VBqBpxUepJMX/vvBe9SGSfHWnis85lAIljMK1kEn/dUk5Ykt0cK7hj2N25P+YDQcTdfD+SCeBYPnX7ana+9kbelXkklacxSWVmDMUTxS9rgBbRmtsF1JaoMK6AQKbCzjaLoQcMPBlhvuac55GvBRGKm7WKFBlYyedVEKygvqgtIpV4x2QS2t0y6KLuqYqYN5kNX714Wc81UXSYNNBmkmbRfLKwkBVOsgD2U2sMvU6V91nRkX4X9POHVO5cwJDQ3PpbDzbbBlICQNlflGWSmrYC+KmrvaA7JLN+VuuGBTo3GuByVDVUjN5o69hMPOUcn9lHk9Ak/9G4isSeoUdHsUH7tIZKiBUkdqEo82g7h1OXbRTabGV67818oxrNTrTQK64Ey0blKL5FHiw38x4eySCX72/lSwnmQsz5vEuyNV004pttS8yVpf62xgXc9s2TTR2LjfMeK1MyQHXk8gAx7toplYqSKNSkZKSyUNVJszUuSaaKJpLiZUba+Uw5RSW1pP+371LDPJxt8TcXgrzAf7W8N4e7j1ZjTYGZKLtUwekMfkCYnJM7JDXpI9MiaU1OQD+UQ+9z72vva+9b5fUBcXZnvuk87q/fgN8C3VTA==</latexit> L

<latexit sha1_base64="+oVCdJsTndmwlq0ybpOKIAQAGsc=">AAAFGHicnVTNbtQwEHbbBUr5a+HIpbAXVLWbTXvgSCV64ACiSGxbqSnVxHGy1vpPtkO3ivIeiBu8Ai/ADXHlxnvwAEzSVVG8CKTOJeMvn8fzzYydGsGdHw5/Liwu9a5dv7F8c+XW7Tt3762u3T9wurSUjagW2h6l4Jjgio0894IdGctApoIdppPnzf/D98w6rtVbf27YiYRC8ZxT8Ai9S9J8PZHgxxTE+svT1f5wMGxtfd6JZ07/2Zed1vZP15Z+JZmmpWTKUwHOHcdD408qsJ5TweqVpHTMAJ1AwSrPJXNdCKRrDt/ErzuXacBnymnbxQoLZszptItSMI2aLqhRtuG0C1rtUbgquigyU4RlELUpXhfCsosukgabHKOZ9l0sFxoCqLRBHMp9UC5Xpn/VNXXosf+ecIaVyjkKDQuea+Xn2+DHgZA0VNY0ymstgr1MlRJzD8gYruVuorNlmcOqBymDKLTlc8dewmHnqJbNlDV6FDtrVqCyKilTsPVxfIKeypgFSpFUJQ1a9eMaY+wxnEzLXmH6rw0yvLYbVQK2kFzVOKlF8ihp3H8xYXrJhGb2/mSwkWQ8z6v2DmlT1S3Fj62ssrrJdTaw2DM/rqpo5PAuRrLEguQgywlkIKM95iZem8gyoyNjtdEOxNaMFGETXdTG4sqU/kox3FhbT8u271ePMpPsanwn4vBVmHcOtgfxzmD7zbC/OyAXtkweksfkCYnJU7JLXpB9MiKUWPKBfCKfex97X3vfet8vqIsLsz0PSMd6P34DFRjUPw==</latexit>

✓<latexit sha1_base64="HgJO/wEm8vBmwpxLZdvbekEIFRk=">AAAFE3icnVTNbtQwEHbbAKX8tXDkAuwFVe0maQ8cqUQPXBBFYtuiZlVNHCexNv6R7dCtojwF4gZPwZEb4soD8B48AJPtqiheBFJ9yeTL5/F830yc6opbF0U/l5ZXgmvXb6zeXLt1+87de+sb9w+tqg1lI6oqZY5TsKziko0cdxU71oaBSCt2lE5edN+P3jNjuZJv3blmYwGF5Dmn4BB6l6QicSVzcLo+iIbRbD1aDOJ5MHj+ZXe2Dk43Vn4lmaK1YNLRCqw9iSPtxg0Yx2nF2rWktkwDnUDBGscFs30IhBXgyi182nORenwmrTJ9rDCgS06nfZSC7oT0QYWKNad90CiHmmXRR5GZIiy8rJ1vfQgdr/pI6m2yjGbK9bG8UuBBtfHyUO48u2yd/lXX1GLE/nvCGTqVcxTqG54r6Rbb4EpPSOor6xrllKq8vUzWAmv3yJhuxt3CYNswi657JUNVKMMXjr2E/c5RJbop6/RIdta9gcyapE7BtCfxGCOZMQOUIqlJOrQZxC3m2Gc4mYa9wvJfa2Q4ZTabBEwhuGxxUovkcdKF/2LC9JIJ3ez9qWAzyXieN0nnjtJNO6O40ogma7ta5wOLPXNl04Qji79hKGo0JAdRTyADEe4zO3FKh4ZpFWqjtLJQbc9JITbRhrNcXOraXSmHLZVxtJ71/epZ5pJti/dE7N8Ki8HhzjDeHe68iQZ7Q3KxVslD8oQ8JTF5RvbIS3JARoQSQT6QT+Rz8DH4GnwLvl9Ql5fmex6Q3gp+/AY/H9LE</latexit>

C<latexit sha1_base64="RIhwfV8DGkgLXH5haRwFhrAZv80=">AAAFFHicnVTNbtQwEHbbAKX8tIUjF2AvqGo3SXvgSKX2wAVRJLataKrKcZystf6T7dCtorwF4gYvwZUb4sqd9+ABGKerongRSJ1Lxl8+j+ebGTvXnFmXJD8XFpeiGzdvLd9euXP33v3VtfUHh1bVhtARUVyZ4xxbypmkI8ccp8faUCxyTo/yyZ7/f/SeGsuUfOsuND0VuJKsZAQ7gN5lArtxXjZ77dnaIBkmnT2ed9KZM3jxZaezg7P1pV9ZoUgtqHSEY2tP0kS70wYbxwin7UpWW6oxmeCKNo4JavsQFtYfvglfeyHygE+lVaaPVQbrMSPTPkqw9kr6oALJmpE+aJQD0bLqo8DMARZBVF+4PgQl530kDzZZSgrl+ljJFQ6g2gRxCHNBuWyd/1XX1IJH/3vCOVSqZCA0LHippJtvgxsHQvJQmW+UU4oHe6msBeQekCFcx90EZ8tQC1UPUsa8UobNHXsFh50jSvgp83okPfcrLIsmq3Ns2pP0FDxZUIMJAVKTebQZpC3E2KcwmYa+gvRfa2A4ZTaaDJtKMNnCpFbZk8y7/2Li6RUT+9n7k8FGVrCybLo7pHTTdhQ3NqIpWp/rbGChZ27cNPHIwj2MRQ0FKbGoJ7jAIt6nduKUjg3VKtZGaWUx35qRYmiijbtYTOraXSuGHSvjSN31/fpRZpKtfyfS8FWYdw63h+nOcPtNMtgdoktbRo/QU/QMpeg52kUv0QEaIYIk+oA+oc/Rx+hr9C36fkldXJjteYh6Fv34DWGY0zo=</latexit>

P<latexit sha1_base64="WJyr38j7kRMPqTkRjLIsKjvPcgM=">AAAFFHicnVRNb9QwEHXbAKV8tIUjF2AvpWo3SZHgSCV64IJYJLataFaV4ziJtfGHbIduFeU3cEHc4Jf0BOLKnf8Bd8bbVVG8CKT6ksnL8/O8mXFSVTFjo+jHwuJScOXqteXrKzdu3rq9urZ+Z9/IWhM6JLKS+jDFhlZM0KFltqKHSlPM04oepOPn7vvBO6oNk+KNPVV0xHEhWM4ItgC9TTi2ZZo3g/Z4rRf1o+m6Px/Es6D37Gzj15f3yaPB8frSzySTpOZUWFJhY47iSNlRg7VlpKLtSlIbqjAZ44I2lnFquhDmxh2+BU9zylOPT4WRuosVGquSkUkXJVg5J11QgmXFSBfU0oJpUXRRYKYAc0/VFa4LQcmrLpJ6mwwlmbRdLK8k9qBaezqEWa9cpk7/6mtiIKL/PeEEKpUzMOoXPJfCzrfBlp6R1HfmGmWlrLy9VNQccvfIIDflbkGwramBqnsp46qQms0dewH7nSOSuylzfgQ9cW9YZE1Sp1i3R/EIIpFRjQkBUpM4tOnFLWjsUZhMTV9C+q8UMKzUm02CdcGZaGFSi+RB4sJ/MfHkgond7P3JYDPJWJ430zskVdNOKbbUvMlal+tsYKFntmyacGjgHoa8hoLkmNdjnGEe7lEztlKFmioZKi2VNLjanpFCaKIJp1pMqNpeSsOUUltST/t+eZWZZeP+E7H/V5gP9nf68eP+zuuot/sEna9ldA89RBsoRk/RLnqBBmiICBLoA/qEPgcfg7Pga/DtnLq4MNtzF3VW8P03pUXVGQ==</latexit>

ENV

✓i+1<latexit sha1_base64="Zlf+erzBesgPEcZ79lRhobSo6Ws=">AAAFG3icnVTNbtQwEHbbBUr56RaOXIC9oNJuNu2BI5XogQuiSGxbqSnVxJkk1sY/sh3aKsqTIG7wBDwCN8SVA+/BA2BvV0XxIpDqSyZfPo/n+2biVFXM2NHo58LiUu/a9RvLN1du3b5zd7W/dm/fyFpTHFNZSX2YgsGKCRxbZis8VBqBpxUepJMX/vvBe9SGSfHWnis85lAIljMK1kEn/dUk5Ykt0cK7hj2N25P+YDQcTdfD+SCeBYPnX7ana+9kbelXkklacxSWVmDMUTxS9rgBbRmtsF1JaoMK6AQKbCzjaLoQcMPBlhvuac55GvBRGKm7WKFBlYyedVEKygvqgtIpV4x2QS2t0y6KLuqYqYN5kNX714Wc81UXSYNNBmkmbRfLKwkBVOsgD2U2sMvU6V91nRkX4X9POHVO5cwJDQ3PpbDzbbBlICQNlflGWSmrYC+KmrvaA7JLN+VuuGBTo3GuByVDVUjN5o69hMPOUcn9lHk9Ak/9G4isSeoUdHsUH7tIZKiBUkdqEo82g7h1OXbRTabGV67818oxrNTrTQK64Ey0blKL5FHiw38x4eySCX72/lSwnmQsz5vEuyNV004pttS8yVpf62xgXc9s2TTR2LjfMeK1MyQHXk8gAx7toplYqSKNSkZKSyUNVJszUuSaaKJpLiZUba+Uw5RSW1pP+371LDPJxt8TcXgrzAf7W8N4e7j1ZjTYGZKLtUwekMfkCYnJM7JDXpI9MiaU1OQD+UQ+9z72vva+9b5fUBcXZnvuk87q/fgN8C3VTA==</latexit>

ENV

IMG+SDF

W<latexit sha1_base64="fX+oIzfjSldP/Xn0WbnG8hHkdIY=">AAAFFHicnVRNb9QwEHXbBUr5aAtHLsBeStVukiLBkUr0wAVRJLZb0awqx3ESa+MP2Q7dKspv4IK4wS/pCcSVO/8D7ozTVVG8CKT6ksnL8/O8mXESVTJjw/DHwuJS78rVa8vXV27cvHV7dW39zoGRlSZ0SGQp9WGCDS2ZoEPLbEkPlaaYJyUdJZPn7vvoHdWGSfHGnio65jgXLGMEW4DexhzbIsnqUXO81g8HYbvuzwfRLOg/O9v49eV9/Gj/eH3pZ5xKUnEqLCmxMUdRqOy4xtoyUtJmJa4MVZhMcE5ryzg1XQhz4w7fgqc55YnHp8JI3cVyjVXByLSLEqycky4owbJipAtqacG0yLsoMBOAuafqCteFoORlF0m8TYaSVNoulpUSe1ClPR3CrFcuUyV/9TU1ENH/nnAClcoYGPULnklh59tgC89I4jtzjbJSlt5eKioOuXtkkGu5WxBsa2qg6l7KuMylZnPHXsB+54jkbsqcH0FP3BsWaR1XCdbNUTSGSKRUY0KAVMcOrftRAxp7FCZT05eQ/isFDCv1Zh1jnXMmGpjUPH4Qu/BfTDy9YGI3e38y2IxTlmV1e4ekqpuWYgvN67Rxuc4GFnpmi7oOhgbuYcArKEiGeTXBKebBHjUTK1WgqZKB0lJJg8vtGSmAJpqg1WJCVfZSGqaQ2pKq7fvlVWaWjftPRP5fYT442BlEjwc7r8P+7hN0vpbRPfQQbaAIPUW76AXaR0NEkEAf0Cf0ufexd9b72vt2Tl1cmO25izqr9/03xd3VIA==</latexit> �L

<latexit sha1_base64="C/e3r24uyAyyeEeFCkPOM2PSMB0=">AAAFE3icnVTLbtQwFHWnAy3l0RaWbICREKrayaQs2FaiCxYgisS0Rc1o5DhOYo1fsh06VZSvQOzgS9ghNiz4AL4BiRUfwPV0VBQPAqne5Obk+Pqec2+cas6sGwy+L3WWu1eurqxeW7t+4+at9Y3N24dWVYbQIVFcmeMUW8qZpEPHHKfH2lAsUk6P0slT//3oLTWWKfnanWk6EriQLGcEO4DeJKlIdMnGz8cbvUF/MFv3FoN4HvT2Vn7+0A87Xw/Gm8u/kkyRSlDpCMfWnsQD7UY1No4RTpu1pLJUYzLBBa0dE9S2ISyswK7chqc9E2nAp9Iq08YKg6FSMm2jBGsvpA0qUKwZaYNGOdAsizYKzBRgEWT1vrUhcJy3kTTYZCnJlGtjOVc4gCoT5CHMBXbZKv2rrqmFiP73hFNwKmcgNDQ8V9IttsGVgZA0VOYb5ZTiwV4qKwG1B2RIN+NuQ7BjqAXXg5IxL5RhC8dewGHniBJ+yrweSU/9G5ZZnVQpNs1JPIJIZtRgQoBUJx6te3EDOfYpTKahL6D8lxoYTpmtOsGmEEw2MKlFcj/x4b+YeHrBxH72/lSwlWQsz+vEu6N03cworjSizhpf63xgoWeurOtoaOE3jEQFhuRYVBOcYRHtUztxSkeGahVpo7SymO/MSRE00UazXEzqyl0qhy2VcaSa9f3yWeaSbQP3RBzeCovB4W4/ftzffQUXRh+dr1V0Fz1Aj1CMnqA99AwdoCEiSKB36AP62H3f/dT93P1yTu0szffcQa3V/fYblnHT9w==</latexit>

✓i<latexit sha1_base64="/3oTxRrAsmLtsadQvHQ3poer0lA=">AAAFFXicnVRNb9QwEHXbAKV8tXDkAuwFVe0maQ8cqUQPXBBFYttKm1JNHCexNv6Q7dCtovwMxA1+BGduiCtn/gc/gMl2VRQvAqm+ZPLyPJ73ZuJUV9y6KPq5tLwSXLt+Y/Xm2q3bd+7eW9+4f2hVbSgbUVUpc5yCZRWXbOS4q9ixNgxEWrGjdPKi+370nhnLlXzrzjU7EVBInnMKDqFxkorElczBO366PoiG0Ww9WgzieTB4/mV3tg5ON1Z+JZmitWDS0QqsHceRdicNGMdpxdq1pLZMA51AwRrHBbN9CIQV4MotfNpzkXp8Jq0yfawwoEtOp32Ugu6k9EGFmjWnfdAoh6pl0UeRmSIsvKydc30IPa/6SOptsoxmyvWxvFLgQbXx8lDuPLtsnf5V19RixP57whk6lXMU6hueK+kW2+BKT0jqK+sa5ZSqvL1M1gJr98iYbsbdwmDbMIuueyVDVSjDF469hP3OUSW6Kev0SHbWvYHMmqROwbTj+AQjmTEDlCKpSTq0GcQt5thnOJmGvcLyX2tkOGU2mwRMIbhscVKL5HHShf9iwvSSCd3s/algM8l4njdJ547STTujuNKIJmu7WucDiz1zZdOEI4s/YihqNCQHUU8gAxHuMztxSoeGaRVqo7SyUG3PSSE20YazXFzq2l0phy2VcbSe9f3qWeaSbYv3ROzfCovB4c4w3h3uvIkGe0NysVbJQ/KEPCUxeUb2yEtyQEaEEkU+kE/kc/Ax+Bp8C75fUJeX5nsekN4KfvwGNPTTnw==</latexit>

Fig. 1: The computational graph of dGPMP2 where φF are userdefined planning parameters that are fixed and φL are learnedplanning parameters. See Section IV for details.

where the inverse covariances of the factors manifest asweights in the objective function. While GPMP2 has beenshown to be a leading optimization-based approach to motionplanning [3], in Section III-B we illustrate its sensitivity toits objective function parameters (specifically factor covari-ances). To contend with this problem, we leverage the keyinsight that GPMP2 can be rebuilt as a fully differentiablecomputational graph and learn the parameters for its ob-jective function from data in an end-to-end fashion. Thisallows us to develop a learning strategy that can improveGPMP2’s performance on a given distribution of problems.Our differentiable version can be trained in a self-supervisedmanner or from expert demonstrations to predict covariancesthat are time and space varying, in contrast to fixed, hand-tuned covariances, as used in the vanilla approach. Buildingon top of a structured planner offers interpretability andallows us to explicitly incorporate planning constraints suchas collision avoidance and velocity limits. This work is in-tended as a preliminary investigation into learning structuredplanners from high-dimensional inputs. We perform severalexperiments in simulated 2D environments to demonstratethe benefits of our approach which we call DifferentiableGPMP2 (dGPMP2), illustrated in Fig. 1.

II. RELATED WORK

Machine learning has been used to accelerate motionplanning by combining reinforcement learning (RL) withsampling based planning [5], learning cost functions fromdemonstration [6], learning efficient heuristics [7], and learn-ing collision checking policies [8]. As machine learningbecomes more accessible, there has been a growing interestin using deep learning for planning such as end-to-endnetworks to perform value iteration [9] or learning a latentspace embedding and a dynamics model that is suitable forplanning by gradient descent within a goal directed pol-icy [10]. Such approaches have demonstrated that learning

Page 2: Differentiable Gaussian Process Motion Planning

to plan is a promising research direction as it allows theagent to explicitly reason about future actions. However,learning-based approaches still fall short on several fronts.Combining learning and planning in a way where domainknowledge, constraints, and uncertainty are properly handledis challenging, and learned representations are often difficultto interpret.

Recent work in structured learning techniques offer av-enues towards contending with these challenges. Severalmethods have focused on incorporating optimization withinneural network architectures. For example, [11] implicitlylearns to perform nonlinear least squares optimization bylearning an RNN that predicts its update steps, [12] learnsto perform gradient descent, and [13] utilizes a ODE solverwithin its network. Other methods like [14] learn a sequentialquadratic program as a layer in its network, which was laterextended to solve model predictive control [15]. [16] learnsstructured dynamics models for reactive visuomotor control.Taking inspiration from this body of work, in this paperwe present a differentiable inference-based motion planningtechnique that through its structure allows us to combinethe strengths of both traditional model-based methods andmodern learning methods, while mitigating their respectiveweaknesses.

Another related field of work is on automatic parametertuning of motion planning algorithms. Approaches such as[17], [18] treat the planner as a black-box and use ma-chine learning tools such as Bayesian optimization, bandits,and random forests to optimize a single configuration ofparameters that improves planner performance. However,such a single configuration does not adapt to changes inthe environment distribution which hinders generalization.Additionally, the number of parameters optimized in theseapproaches is far fewer than ours and they do not incorporatehigh dimensional inputs such as images.

III. BACKGROUND

We begin by reviewing the GPMP2 [3] planner that wewill later reconstruct as a differentiable computational graph.Then, we discuss limitations of GPMP2 with respect to itssensitivity to objective function parameters, thus motivatingour learning algorithm.

A. Planning as inference on factor graphs

We take a probabilistic inference perspective on motionplanning as described in the GPMP2 framework [4]. Theplanning problem is posed as computing the maximum aposteriori (MAP) trajectory given a prior over trajectoriesand a likelihood of events of interest that the trajectorymust satisfy. By selecting appropriate distributions, sparsitycan be induced in the MAP problem, which allows forefficient inference. Following [4], we describe the essentialcomponents of GPMP2 here.

The Prior: In GPMP2, a continuous-time Gaussianprocess (GP) is used to define a prior distribution overtrajectories, θ(t) ∼ GP(µ(t),K(t, t′)), where µ(t) is themean function and K(t, t′) is the kernel. For the purposes

of our approach, we represent the trajectory using N supportstates, θ = [θ1, . . . ,θN ]T at different points in time anddefine the mean vector and covariance matrix as

µ = [µ1, . . . ,µN ]T, K = [K(ti, tj)]∣∣∣ij,1≤i,j≤N

(1)

and this GP defines a prior on the space of trajectories

P (θ) ∝ exp{− 1

2‖θ − µ‖2K

}, (2)

where ‖θ−µ‖2K.= (θ−µ)TK−1(θ−µ) is the Mahalanobis

distance. The GP prior distribution is generated by an LTV-SDE [19]

θ = A(t)θ(t) +B(t)u(t) +w(t), (3)

where A(t) and B(t) are system matrices, u(t) is a biasterm, and w(t) ∝ GP(0,Qcδ(t − t′)) is a white noiseprocess with Qc being the power spectral density matrix ofthe system and δ being the Dirac delta function. The firstand second order moments of the solution to Eq. (3) givesus the mean and covariance of the desired GP prior. Theresulting inverse kernel matrix of the GP has an exactlysparse block-tridiagonal structure making it ideal for fastinference. Here, we use the constant velocity prior model,where the covariance for a single time step is specified by

Qti,ti+1=

[13∆t3iQc

12∆t2iQc

12∆t2iQc ∆tiQc

], (4)

where ∆ti = ti+1 − ti. The full GP covariance is obtainedby composing Qti,ti+1

at every time step along with thestart and goal covariances, Ks and Kv . Please refer to [4]and [19] for details. One important thing to note here is thatthe GP prior covariance is completely parameterized by thepower spectral density matrix Qc.

The Likelihood function: The likelihood function isused to capture planning requirements in the form of eventse that the trajectory must satisfy. These include constraintssuch as collision avoidance, joint or velocity limits, or othertask relevant objectives. We define the likelihood function asa distribution in the exponential family given by

L(θ; e) ∝ exp{− 1

2‖h(θ)‖2Σ

}, (5)

where h(θ) is a vector-valued cost function and e are theevents of interest.

Inference: Given the prior and likelihood, the MAPproblem can be solved as

θ∗ = argmaxθ{P (θ|e)} = argmin

θ

{− log

(P (θ)L(θ;e)

)}θ∗ = argmin

θ

{1

2‖θ − µ‖2K +

1

2‖h(θ)‖2Σ

}. (6)

In general, h(θ) can be non-linear and thus the aboveequation is a Nonlinear Least Squares (NLLS) problemwhich can be solved using iterative approaches like Gauss-Newton or Levenberg-Marquardt (LM) algorithms. At anyiteration i, these algorithms proceed by first linearizing thecost function around the current estimate of the trajectory,

Page 3: Differentiable Gaussian Process Motion Planning

θi, using a Taylor expansion h(θ) = h(θi) + Hδθ, whereH = ∂h

∂θ

∣∣∣θ=θi

and then solving the following linear system

to find the update, δθ:(K−1 + HTΣ−1H

)δθ = −K−1(θi −µ)−HTΣ−1h(θi).

(7)Gauss-Newton optimization in particular updates the currentestimate with the following rule

θi+1 = θi + δθ. (8)

GPMP2 exploits the sparsity of the linear system in Eq. (7)to formulate MAP inference on a factor graph and solveit efficiently. While GPMP2 is a state-of-the-art methodthat outperforms several leading sampling and optimizationbased approaches to motion planning [3], it still has somepractical limitations with respect to setting the parameters inits objective in Eq. (6). Next, we will discuss these limitationsin-depth with a few examples.

B. Sensitivity to objective function parameters

The performance of GPMP2 is dependent on the valuesof QC (the parameter that governs the covariance of the GPprior) and Σ (the covariance of the likelihood) as per itsobjective function from Eq. (6). For example, for collisionavoidance, the distribution of obstacles in the environmentaffects what relative settings of QC and obstacle covarianceσobs (such that Σ = σ2

obs × I) will be effective in solvingthe planning problem.

Different datasets require different relative settings ofparameters. Due to the nonlinear interactions between theseparameters it might not be possible to find a fixed setting thatwill always work, and in practice it can be a tedious task tofind a setting that works for many different environments. Forexample, in environments like the one in Fig. 2a-2b, wherethe planner needs to find a trajectory that goes around thecluster of obstacles, a small obstacle covariance is requiredto make the planner navigate around the “tarpit.” But, atthe same time, if a large dynamics covariance is used, itmight try to squeeze in between obstacles where the costcan have a local minima. So a smaller dynamics covarianceis needed as well. Another example is shown in Fig. 2c-2d with dispersed obstacles near the start and goal. Here anentirely different setting of covariances is effective. Sinceobstacles are small and diffused, solutions can generallybe found close to the straight line initialization. A smallerdynamics covariance helps with that. Also, the start and goalcan be very near obstacles which means that a small obstaclecovariance might lead to solutions that violate the start andgoal constraints. Having a smaller obstacle covariance canalso lead to trajectories that are very long and convoluted asthey try to stay far away from obstacles.

Small changes in parameters can lead to trajectories lyingin different homotopy classes. For example, Fig. 2e-2f illus-trates how even minor changes in the obstacle covariance canlead to significant changes in the resulting trajectories. Thismakes tuning covariances harder, as the effects are further

(a) σobs = 0.1,QC = 0.5× I

(b) σobs = 0.01,QC = 0.5× I

(c) σobs = 0.1,QC = 0.5× I

(d) σobs = 0.01,QC = 0.5× I

(e) σobs = 0.03,QC = I

(f) σobs = 0.01,QC = I

Fig. 2: (a)-(b) tarpit dataset (robot radius = 0.4m, safety distance =0.4m). For the sameQC , a smaller σobs is required to encourage theplanner to navigate around obstacles. (c)-(d) forest dataset (robotradius = 0.2m, safety distance = 0.2m). For the same QC , a largerσobs is required to focus on finding solutions near the straight linetrajectory. (e)-(f) multi_obs dataset (robot radius = 0.4m, safetydistance = 0.4m) A small change in obstacle covariance can lead tosignificant changes in the trajectory. In all figures, the red dashedtrajectories are the initializations and the blue trajectories are theoptimized solutions.

aggravated over large datasets with diverse environmentsleading to inconsistent results.

With sufficient domain expertise, the parameters can behand-tuned. However, this process can be very inefficient andbecomes increasingly hard for problems in higher dimensionsor when complex constraints are involved. An ideal setupwould be to have an algorithm that can predict appropriateparameters automatically for each problem. Therefore, in thiswork, we rebuild the GPMP2 algorithm as a fully differen-tiable computational graph, such that these parameters can bespecified by deep neural networks which can be trained end-to-end from data. When deployed, our differentiable GPMP2approach (dGPMP2) can then automatically select its ownparameters given a particular motion planning problem.

IV. A STRUCTURED COMPUTATIONAL GRAPH FORMOTION PLANNING

In this section, we first explain how GPMP2 can beinterpreted as a differentiable computation graph. Then, weexplain how learning can be incorporated in the frameworkand finally, we show how the entire system can be trainedend-to-end from data.

A. Differentiable GPMP2

Our architecture consists of two main components: aplanning module P that is differentiable but has no learnableparameters and a trainable module W that can be imple-mented using a differentiable function approximator suchas a neural network as shown in Fig. 1. As discussed inSection III, GPMP2 performs trajectory optimization viaMAP inference on a factor graph by solving an iterativenonlinear optimization, where at any iteration the factor

Page 4: Differentiable Gaussian Process Motion Planning

graph is linearized at the current estimate of the trajectoryto produce the linear system in Eq. (7) and an update stepis computed by solving that linear system. At a high level,our planning module P implements this update step as acomputational graph. The trainable module W is then setup to parameterize some desired planning parameters andoutputs these as φL at every iteration. These parameterscorrespond to factor covariances used by P to construct thelinearized factor graph. Additionally, P takes as input a setof fixed planning parameters φF to allow parameters thatcan be user-specified and are not being learned, for example,obstacle safety distance and covariances of constraint factorslike start, goal, and velocity. The key insight is that since alloperations are differentiable for solving Eq. 7, we can easilydifferentiate through it using standard autograd tools [20]and thus train W in an end-to-end fashion from data.

Similar to GPMP2, during the forward pass, dGPMP2iteratively optimizes the trajectory where at the ith iteration,the planning module P takes the current estimate of thetrajectory θi and planning parameters φL and φF as inputs(where φL is the output of the trainable module W and φFare user-defined and fixed) and produces the next estimateθi+1 as shown in Fig. 1. The new estimate then becomesthe input for the next iteration. This process continues untilθi+1 passes a specified convergence check or a maximumof T iterations and the optimization terminates (C). At theend of the optimization, we roll out a complete differentiablecomputation graph for the motion planner.

Notation: θi refers to the trajectory estimate at the ith

iteration of the optimization that goes from 1, . . . , T and θiis the ith state along the trajectory that goes from 1, . . . , N .

The planning module: θi is fed into the planningmodule along with a signed distance field of the environmentand additional planning parameters (φF and φL) such asfactor covariances, safety distance, robot kinematics, start-goal constraints, and other task related constraints. Theseinputs are used to construct the linear system in Eq. (7)corresponding to the linearized factor graph of the planningproblem. Similar to standard GPMP2, constraints are im-plemented as factors with fixed small covariances and thelikelihood function for obstacle avoidance is the hinge lossfunction (see Section V) with covariance Σ. The trajectoryupdate δθi is then computed by solving this linear system,using Cholesky decomposition of the normal equations [3],[21], and the new trajectory θi+1 is computed using a Gauss-Newton step. The above procedure is fully differentiableand allows computing gradients in the backwards pass withrespect to θi, GP covariance K, and likelihood functioncovariance Σ.

The trainable module: The trainable module W out-puts planning parameters φL. These correspond to covari-ances of factors in Eq. 7 that we wish to learn from data. Inpractice, we can choose to learn the GP covariance K, thelikelihood covariance Σ, or both. Additionally, this approachallows us to learn individual covariances for different statesalong the trajectory [θ1, . . . ,θN ] and different iterations ofthe optimization thus offering much more expressiveness

than a single hand-tuned covariance. We implement Was a feed-forward convolutional neural network that takesas input the bitmap image of the environment, the signeddistance field and the current trajectory θi, and outputs aparameter vector φi

L at every iteration i. Note that, given ourarchitecture, W can be customized as per individual needsbased on problem requirements or parameters chosen to belearned.

After a forward pass, we roll out a fully differentiablecomputation graph that outputs a sequence of trajectories{θ1, . . . ,θT }. Then we evaluate a loss function on this se-quence and backpropagate that loss to update the parametersof W such that it produces parameters φL that allow usto optimize for better quality trajectories on the dataset asmeasured by the loss. We explain our loss function and thetraining procedure in detail below.

B. Learning factor graph covariances

Imitation loss: Consider the availability of expertdemonstrations for a planning problem. These may beprovided by an asymptotically optimal (but slow) motionplanner [22] or by human demonstration [23]. dGPMP2 canbe trained to produce similar trajectories by minimizing anerror metric between the demonstrations and learner’s outputwith

Limitation = ||θe − θ||22 (9)

where θe is the expert’s demonstrated trajectory and themetric is the L2 norm.

Task loss: Naively trying to match the expert can beproblematic for a motion planner. For example, when equallygood paths lie in different homotopy classes, the learner mayland in a different one than the expert. In this case, penalizingfor not matching the expert may be excessively conservative.If using human demonstrations as an expert, a realizabilitygap can arise when the planner has different constraints ascompared with the human. Thus, we use an external taskloss as a regularizer that encourages smoothness and obstacleavoidance, while respecting start and goal constraints, as isoften used in motion planning [24]:

Lplan = Fsmooth + λ×Fobs, (10)

where Fsmooth corresponds to the GP prior error and Fobs

is the obstacle cost that are described in Eq. (6) and λ isa user specified parameter. In practice, the performance isnot sensitive to the setting of λ. Then, the overall loss for asingle trajectory is, L = Limitation + Lplan. Note that ourframework allows for any choice of loss function dependingon the application.

Training: During training we roll out our learner fora fixed number of iterations T and use BackpropagationThrough Time (BPTT) [25] on the sum of losses of theintermediate trajectories in order to update the parametersof the trainable module W. Then, the total loss minimizedfor our learner over a batch of size K is

Ltotal =1

K

1

T

K∑k=1

T∑i=1

Lk,i. (11)

Page 5: Differentiable Gaussian Process Motion Planning

V. EXPERIMENTAL EVALUATION

We test our approach on 2D navigation problems withcomplex environment distributions and problems with user-specified velocity constraints. Many real world motion plan-ning problems such as warehouse automation (KIVA sys-tems), extra-terrestrial rovers, in-home robots (Roomba),navigation from satellite data, and last mile delivery, amongothers, are inherently 2D. These problems are challengingpartly because of local minima generated by complex distri-butions of obstacles and other constraints such as velocitylimits. The sensitivity of planners to parameter settingsfurther adds to the difficulty. This is captured by the datasetswe consider, where the forest distribution consists of smallobstacles scattered around the workspace and requires therobot to squeeze through several narrow corridors and thetarpit distribution contains a small number of larger obsta-cles clumped together near the center of the workspace andrequires the robot to avoid the cluster of obstacles entirely.It is challenging for a single planner with fixed parametersto solve problems from both distributions.

A. Implementation details

All our experiments and training are performed on adesktop with 8 Intel Core i7-7700K @ 4.20GHz CPUs,32GB RAM and a 12GB NVIDIA Titan Xp. We considera 2D point robot in a cluttered environment and planningis done in a state space θi = [x, y, x, y]T . The robot isrepresented as a circle with radius r centered on its centerof mass and the environment is a binary occupancy grid.A Euclidean signed distance field is computed from theoccupancy grid to evaluate distance to obstacles and checkcollisions. We utilize the same collision likelihood factor asGPMP2 [4], h(θi) = c(x(θi)), where x(θi) = [x, y]T is theposition coordinates of the center of mass and the hinge losscost function c is

c(x) =

{−d(x) + ε d(x) ≤ ε0 otherwise

(12)

where ε = r + εsafe with εsafe as a user defined safetydistance, and d is the signed distance. In our current exper-iments, we consider σobs as the learned parameter φL andQC , εsafe,Ks,Kg to be the fixed parameters φF . Although,performance of the planner depends on both QC and Σ,for our task they trade off against each other and thuswe can achieve a similar behavior by varying one relativeto the other. Since in our setup the environment changes,learning the likelihood covariance Σ is more relevant. Inother problem domains learning QC instead might be morerelevant such as [23]. It is important to note the difference inexpressiveness of Σ between GPMP2 where Σ = σ2

obs × I,and dGPMP2 where Σ = diag(σ2

obs1, . . . , σ2

obsN) with any

σobsi being a function of the current trajectory and theenvironment.

Loss function: It can be expensive to gather a largenumber of human demonstrations to train the planner. Hence,we use a self-supervised approach. Sampling based asymp-totically optimal planning methods such as RRT* [22] are

(a) Expert (b) GPMP2,σobs = 0.15

(c) GPMP2,σobs = 0.01

(d) dGPMP2

Fig. 3: Example comparison of (d) dGPMP2 against (b)-(c) GPMP2(fixed hand tuned covariances) and (a) Expert on forest (top row)and tarpit (bottom row) datasets. Hand tuned covariances thatwork well on one distribution of obstacles fail on the other andvice versa. By imitating the expert, dGPMP2 is able to performconsistently across different environment distributions. Green circleis start, cyan is goal, dashed red line is initialization, and Qc =0.5×I , r = 0.4m for all. Trajectory is in collision if at any state thesigned distance between robot center of mass and nearest obstacleis less than or equal to r.

effective in finding good homotopy classes to serve as aninitialization for local trajectory optimizers, but can be slowto converge and produce non-smooth solution paths. We usea combination of RRT* and GPMP2 as our expert. Experttrajectories are generated by first running RRT* and arethen optimized with GPMP2 to yield smooth solutions. Thisallows dGPMP2 to learn by utilizing the best combination oflocal and global planning. We use the loss function definedin Section IV-B with this expert.

Network architecture: For W we use a standard feed-forward neural network model consisting of convolutionaland fully connected layers. The network consists of 5 con-volutional layers with [16, 16, 16, 32, 32] filters respectively,all 3x3 in size. This is followed by two fully connected layerswith [1000, 640] hidden units. We use ReLU activation withbatch normalization in all layers and a dropout probabilityof 0.5 in the fully connected layers. The input to the neuralnetwork is a 128x128 bitmap of the environment stackedon top of the euclidean signed distance field of the samedimensions. Backpropagation is performed for fixed numberof iterations, T = 10. At every iteration, the network outputsa different likelihood covariance for each state along thetrajectory.

Comparing planners: The convergence for the opti-mization is based on the following criterion: a toleranceon the relative change in error across iterations tol(δerror),magnitude of update tol(δθ), and max iterations Tmax. Onconvergence the final trajectory is returned. We report thefollowing metrics on a test set of environments: (i) success,percent of problems solved i.e. when a collision free trajec-tory is found, (ii) average gp_mse, mean-squared GP errormeasuring smoothness and (iii) collision_intensity, theaverage portion of trajectory spent in collision when acollision occurs.

We test our framework on two different planning tasks todemonstrate (i) how learning covariances improves perfor-mance and (ii) how the planner’s structure allows us to in-corporate constraints. We compare against a baseline GPMP2

Page 6: Differentiable Gaussian Process Motion Planning

TABLE I: Comparison of dGPMP2 versus GPMP2 with fixed handtuned covariances. dGPMP2 learns the obstacle covariance σobs

using training set of 5000 environments. QC = 0.5 × I for all.Total trajectory time is 10s with 100 states along the trajectory andλ = 1.0 for training.

GPMP2 dGPMP2σobs = 0.15 σobs = 0.01

forest onlysuccess

71.02 52.18 66.67tarpit only 55.56 74.08 68.00

mixed 62.67 64.00 67.33gp_mse 0.002 0.0484 0.0015

num_iters 55.69 86.74 50.00coll_intensity 0.0464 0.0414 0.0374

with hand-tuned parameters. However, we do not compareagainst other sampling and optimization-based planners andrefer the reader to [3] for benchmarks of GPMP2 againstleading sampling and optimization-based planners.

B. Learning on complex distributions

In this experiment, we show that if the planner’s pa-rameters are fixed, performance can be highly sensitive todistribution of obstacles in the environment. However, ifa function can be learned to set the parameters based onthe current planning problem, this can help the plannerachieve uniformly good performance across different obsta-cle distributions. We construct a hybrid dataset which is amixture of two distinct distributions of obstacles, forest

and tarpit, as shown in Fig. 3. We use a test set of 150randomly sampled environments from this mixed dataset andfurther subdivide it into two sets for each of the constituentdistributions (roughly equal in proportion). We then hand-tuned parameters for GPMP2 to find the best covariancesfor the individual distributions and compared them againstdGPMP2 on three different test sets: two for the individualdistributions and one for a mixed (roughly equal of thetwo distributions). Since there is no formal mathematicalprocedure for tuning parameters or even well-known heuris-tics, we rely on a manual line-search. Although this canlikely be automated to find best static covariances for onegiven environment distribution, it is not practical when theenvironment distribution changes or when the parametersneed to be adapted based on the location of the robot inthe environment or the time-step on the trajectory.

The results in Table I show that for GPMP2 the bestparameters on one distribution perform poorly on the otherdistribution in terms of success, although their performanceson the mixed dataset are similar. Conversely, dGPMP2 hasuniform and consistent performance across both distributionseven though it is only trained on the mixed dataset. Thisdemonstrates that dGPMP2 does not require manual tuningfor every distribution of planning environments, but canautomatically predict the covariances to use based on thecurrent trajectory and environment as can be seen in Fig. 3.Additionally, dGPMP2 has the lowest gp_mse on the mixeddataset meaning the trajectories produced are still smooth.dGPMP2 also converges in fewer number of iterations thanthe GPMP2 due to the covariance being more expressive andvarying over iterations.

TABLE II: Performance of dGPMP2 with velocity constraints ondifferent combinations of training and testing. Mild constraints arevxmax = 1.5m/s, vymax = 1.5m/s, and maxtime = 15s,tight constraints are vxmax = 1.0m/s, vymax = 1.0m/s, andmaxtime = 10s for the same start and goal. maxtime ismaximum time allowed for the trajectory.

Training condition Mild Mild TightTesting condition Mild Tight Tight

success 96 96 98.12constraint_violation 0.0022 0.104 0.097

Limitations: Since BPTT is known to have issues withexploding and vanishing gradients for long sequences, weuse a small number of iterations (T = 10) during trainingwhich prevents the learner from sufficiently exploring duringtraining. The network architecture is a simple feed-forwardnetwork and does not have any memory and hence the learnerdoes not learn to escape local minima very well. We believethat these issues can be addressed in the future using learn-ing techniques such as Truncated Backpropagation ThroughTime (TBPTT) [26], policy gradient methods [27], [28], andrecurrent networks such as LSTMs [29].

C. Planning with velocity constraints

We show that our learning method can explicitly in-corporate planning constraints by including velocity limitfactors into the optimization. We use a hinge loss similarto obstacle cost to bound the robot velocity x and y andset the covariance to a low value, Kv = 10−4, analo-gous to joint limit factors in [3]. We evaluate the averageconstraint_violation on a dataset with multiple ran-domly placed obstacles and study the effect of incorporatingconstraints during training. Table II shows a comparisonbetween dGPMP2 trained with mild constraints and testedon problems with mild and tight constraints versus dGPMP2trained using tight constraints and tested on problems withtight constraints (details in the Table II caption). We see that,by incorporating tight constraints during training, dGPMP2can learn to handle tight constraints while avoiding obstacles.This illustrates that dGPMP2 can successfully incorporateconstraints within its structure, and that the method can learnto plan while respecting user-defined planning constraints.

VI. DISCUSSION

In this work, we developed dGPMP2, a novel differen-tiable motion planning algorithm, by reformulating GPMP2as a differentiable computational graph. Our method learnedto predict objective function parameters as part of the differ-entiable planner and demonstrated competitive performanceagainst planning with fixed, hand-tuned parameters. Ourexperimental results show that this strategy is an effec-tive way to leverage experience to further improve upontraditional state-of-the-art motion planning algorithms. Wecurrently limited our experiments to only point robots in 2Denvironments to investigate the properties of the algorithm ina controlled setting. However, since the formulation was builton the GPMP2 planner, we believe that it can be extended tohandle more complicated motion planning problems includ-ing articulated robots in 3D workspaces.

Page 7: Differentiable Gaussian Process Motion Planning

REFERENCES

[1] N. Ratliff, M. Zucker, J. A. Bagnell, and S. Srinivasa, “Chomp:Gradient optimization techniques for efficient motion planning,” inICRA, 2009.

[2] J. Schulman, J. Ho, A. Lee, I. Awwal, H. Bradlow, and P. Abbeel,“Finding locally optimal, collision-free trajectories with sequentialconvex optimization.” in RSS, 2013.

[3] M. Mukadam, J. Dong, X. Yan, F. Dellaert, and B. Boots, “Continuous-time Gaussian process motion planning via probabilistic inference,”The International Journal of Robotics Research (IJRR), vol. 37, no. 11,pp. 1319–1340, 2018.

[4] J. Dong, M. Mukadam, F. Dellaert, and B. Boots, “Motion planningas probabilistic inference using Gaussian processes and factor graphs,”in Proceedings of Robotics: Science and Systems (RSS-2016), 2016.

[5] A. Faust, K. Oslund, O. Ramirez, A. Francis, L. Tapia, M. Fiser,and J. Davidson, “PRM-RL: Long-range robotic navigation tasks bycombining reinforcement learning and sampling-based planning,” in2018 IEEE International Conference on Robotics and Automation(ICRA). IEEE, 2018, pp. 5113–5120.

[6] N. D. Ratliff, D. Silver, and J. A. Bagnell, “Learning to search:Functional gradient techniques for imitation learning,” AutonomousRobots, vol. 27, no. 1, pp. 25–53, 2009.

[7] M. Bhardwaj, S. Choudhury, and S. Scherer, “Learning heuristic searchvia imitation,” in Conference on Robot Learning, 2017, pp. 271–280.

[8] M. Bhardwaj, S. Choudhury, B. Boots, and S. Srinivasa, “Leveragingexperience in lazy search,” 2019.

[9] A. Tamar, Y. Wu, G. Thomas, S. Levine, and P. Abbeel, “Valueiteration networks,” in Advances in Neural Information ProcessingSystems, 2016, pp. 2154–2162.

[10] A. Srinivas, A. Jabri, P. Abbeel, S. Levine, and C. Finn, “Universalplanning networks: Learning generalizable representations for visuo-motor control,” in International Conference on Machine Learning,2018, pp. 4739–4748.

[11] R. Clark, M. Bloesch, J. Czarnowski, S. Leutenegger, and A. J.Davison, “Learning to solve nonlinear least squares for monocularstereo,” in Proceedings of the European Conference on ComputerVision (ECCV), 2018, pp. 284–299.

[12] M. Andrychowicz, M. Denil, S. Gómez, M. W. Hoffman, D. Pfau,T. Schaul, B. Shillingford, and N. de Freitas, “Learning to learnby gradient descent by gradient descent,” in Advances in NeuralInformation Processing Systems 29, 2016, pp. 3981–3989.

[13] T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neuralordinary differential equations,” in Advances in Neural InformationProcessing Systems, 2018, pp. 6572–6583.

[14] B. Amos and J. Z. Kolter, “Optnet: Differentiable optimization as alayer in neural networks,” in Proceedings of the 34th International

[21] F. Dellaert and M. Kaess, “Square root SAM: Simultaneous local-ization and mapping via square root information smoothing,” TheInternational Journal of Robotics Research, vol. 25, no. 12, pp. 1181–1203, 2006.

Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp.136–145.

[15] B. Amos, I. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differ-entiable mpc for end-to-end planning and control,” in Advances inNeural Information Processing Systems, 2018, pp. 8299–8310.

[16] A. Byravan, F. Lceb, F. Meier, and D. Fox, “Se3-pose-nets: Structureddeep dynamics models for visuomotor control,” in 2018 IEEE Interna-tional Conference on Robotics and Automation (ICRA). IEEE, 2018,pp. 1–8.

[17] J. Cano, Y. Yang, B. Bodin, V. Nagarajan, and M. O’Boyle, “Automaticparameter tuning of motion planning algorithms,” in 2018 IEEE/RSJInternational Conference on Intelligent Robots and Systems (IROS).IEEE, 2018, pp. 8103–8109.

[18] R. Burger, M. Bharatheesha, M. van Eert, and R. Babuška, “Automatedtuning and configuration of path planning algorithms,” in 2017 IEEEInternational Conference on Robotics and Automation (ICRA). IEEE,2017, pp. 4371–4376.

[19] T. D. Barfoot, C. H. Tong, and S. Särkkä, “Batch continuous-timetrajectory estimation as exactly sparse gaussian process regression.”Citeseer, 2014.

[20] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito,Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differen-tiation in pytorch,” 2017.

[22] S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimalmotion planning,” The International Journal of Robotics Research,vol. 30, no. 7, pp. 846–894, 2011.

[23] M. A. Rana, M. Mukadam, S. R. Ahmadzadeh, S. Chernova, andB. Boots., “Towards robust skill generalization: Unifying learningfrom demonstration and motion planning,” in Proceedings of the 2017Conference on Robot Learning (CoRL), 2017.

[24] M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, M. Klingensmith,C. M. Dellin, J. A. Bagnell, and S. S. Srinivasa, “Chomp: Covarianthamiltonian optimization for motion planning,” The InternationalJournal of Robotics Research, vol. 32, no. 9-10, pp. 1164–1193, 2013.

[25] D. E. Rumelhart, G. E. Hinton, R. J. Williams et al., “Learning rep-resentations by back-propagating errors,” Cognitive modeling, vol. 5,no. 3, p. 1, 1988.

[26] R. J. Williams and J. Peng, “An efficient gradient-based algorithm foron-line training of recurrent network trajectories,” Neural computation,vol. 2, no. 4, pp. 490–501, 1990.

[27] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trustregion policy optimization,” in ICML, 2015.

[28] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estima-tion,” arXiv preprint arXiv:1506.02438, 2015.

[29] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” NeuralComput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.