Computational Sensorimotor Learning Computational Sensorimotor Learning Pulkit Agrawal Electrical...

download Computational Sensorimotor Learning Computational Sensorimotor Learning Pulkit Agrawal Electrical Engineering

of 203

  • date post

    28-Jul-2020
  • Category

    Documents

  • view

    2
  • download

    0

Embed Size (px)

Transcript of Computational Sensorimotor Learning Computational Sensorimotor Learning Pulkit Agrawal Electrical...

  • Computational Sensorimotor Learning

    Pulkit Agrawal

    Electrical Engineering and Computer Sciences University of California at Berkeley

    Technical Report No. UCB/EECS-2018-133 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-133.html

    September 23, 2018

  • Copyright © 2018, by the author(s). All rights reserved.

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.

  • Computational Sensorimotor Learning

    By

    Pulkit Agrawal

    A dissertation submitted in partial satisfaction of the

    requirements for the degree of

    Doctor of Philosophy

    in

    Computer Science

    in the

    Graduate Division

    of the

    University of California, Berkeley

    Committee in charge:

    Professor Jitendra Malik, Chair Professor Alexei A. Efros Professor Jack Gallant

    Summer 2018

  • Computational Sensorimotor Learning

    Copyright 2018 by

    Pulkit Agrawal

  • 1

    Abstract

    Computational Sensorimotor Learning

    by

    Pulkit Agrawal

    Doctor of Philosophy in Computer Science

    University of California, Berkeley

    Professor Jitendra Malik, Chair

    Our fascination with human intelligence has historically influenced AI research to directly build autonomous agents that can solve intellectually challenging problems such as chess and GO. The same philosophy of direct optimization has percolated in the design of systems for image/speech recognition or language translation. But, the AI systems of today are brittle and very different from humans in the way they solve problems as evidenced by their severely limited ability to adapt or generalize. Evolution took a very long time to evolve the necessary sensorimotor skills of an ape (approx. 3.5 billion years) and relatively very short amount of time to develop apes into present-day humans (approx. 18 million years) that can reason and make use of language. There is probably a lesson to be learned here: by the time organisms with simple sensorimotor skills evolved, they possibly also developed the necessary apparatus that could easily support more complex forms of intelligence later on. In other words, by spending a long time solving simple problems, evolution prepared agents for more complex problems. It is probably the same principle at play, wherein humans rely on what they already to know to find solutions to new challenges. The principle of incrementally increasing complexity as evidenced in evolution, child development and the way humans learn may, therefore, be vital to building human-like intelligence.

    The current prominent theory in developmental psychology suggests that seem- ingly frivolous play is a mechanism for infants to conduct experiments for incre- mentally increasing their knowledge. Infant’s experiments such as throwing objects, hitting two objects against each other or putting them in mouth help them under- stand how forces affect objects, how do objects feel, how different materials interact, etc. In a way, such play prepares infants for future life by laying down the foundation

  • 2

    of a high-level framework of experimentation to quickly understand how things work in new (and potentially non-physical/abstract) environments for constructing goal-directed plans.

    I have used ideas from infant development to build mechanisms that allow robots to learn about their environment by experimentation. Results show that such learning allows the agent to adapt to new environments and reuse its past knowledge to succeed at novel tasks quickly.

  • i

    To my parents – Sadhana & Ratan Agrawal

  • ii

    Contents

    Acknowledgments vi

    1 Introduction 1 1.1 Today’s Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.2.1 Task Communication . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Learning Sensorimotor Behavior . . . . . . . . . . . . . . . . . . . . . 10

    1.3.1 Reinforcement Learning (RL) . . . . . . . . . . . . . . . . . . 11 1.3.2 Learning from Demonstration/Imitation Learning . . . . . . . 13

    1.4 Classical Model Based Control . . . . . . . . . . . . . . . . . . . . . . 13 1.4.1 System Identification . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.2 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.3 Is the engineering wisdom of modularization the way to go? . 16

    1.5 Core problem of Artificial General Intelligence . . . . . . . . . . . . . 18 1.6 Summary of the Proposed Solution . . . . . . . . . . . . . . . . . . . 19

    2 Learning to See by Moving 22 2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2 A Simple Model of Motion-based Learning . . . . . . . . . . . . . . . 25

    2.2.1 Two Stream Architecture . . . . . . . . . . . . . . . . . . . . . 26 2.2.2 Shorthand for CNN architectures . . . . . . . . . . . . . . . . 26 2.2.3 Slow Feature Analysis (SFA) Baseline . . . . . . . . . . . . . . 26 2.2.4 Proof of Concept using MNIST . . . . . . . . . . . . . . . . . 28

    2.3 Learning Visual Features From Egomotion in Natural Environments . 29 2.3.1 KITTI Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.2 SF Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.3 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . 31

    2.4 Evaluating Motion-based Learning . . . . . . . . . . . . . . . . . . . 32 2.4.1 Scene Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 32

  • Contents iii

    2.4.2 Object Recognition . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.3 Intra-Class Keypoint Matching . . . . . . . . . . . . . . . . . 34 2.4.4 Visual Odometry . . . . . . . . . . . . . . . . . . . . . . . . . 35

    2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    3 A Model for Intuitive Physics 38 3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    3.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.2 Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . . . 43 3.2.3 Blob Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.1 Forward model regularizes the inverse model . . . . . . . . . . 46

    3.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    4 Learning from Experts 53 4.1 A Framework for Learning by Observation . . . . . . . . . . . . . . . 54

    4.1.1 Learning a Model to Imitate . . . . . . . . . . . . . . . . . . . 55 4.2 Imitating Visual Demonstrations . . . . . . . . . . . . . . . . . . . . 56

    4.2.1 Goal Recognizer . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3 Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    4.3.1 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    4.4.1 Importance of Imitation . . . . . . . . . . . . . . . . . . . . . 59 4.4.2 Generalization to other ropes . . . . . . . . . . . . . . . . . . 60

    4.5 Expert Guided Exploration . . . . . . . . . . . . . . . . . . . . . . . 61 4.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    5 Revisting Forward and Inverse Models 64 5.1 Forward Consistency Loss . . . . . . . . . . . . . . . . . . . . . . . . 64 5.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    5.2.1 Ablations and Baselines . . . . . . . . . . . . . . . . . . . . . 68 5.2.2 3D Navigation in VizDoom . . . . . . . . . . . . . . . . . . . 71

    5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    6 Exploration 75 6.1 Curiosity-Driven Exploration . . . . . . . . . . . . . . . . . . . . . . . 77

    6.1.1 Prediction error as curiosity reward . . . . . . . . . . . . . . . 78 6.1.2 Self-supervised prediction for exploration . . . . . . . . . . . . 79

  • Contents iv

    6.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    6.3.1 Sparse Extrinsic Reward Setting . . . . . . . . . . . . . . . . . 83 6.3.2 No Reward Setting . . . . . . . . . . . . . . . . . . . . . . . . 86 6.3.3 Generalization to Novel Scenarios . . . . . . . . . . . . . . . . 87

    6.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    7 Initial State 93 7.1 Investigating Human Priors for Playing Games . . . . . . . . . . . . . 93 7.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 7.3 Quantifying the importance of object priors . . . . . . . . . . . . . . 96

    7.3.1 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7