Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates,...
-
Upload
nigel-logan -
Category
Documents
-
view
216 -
download
0
Transcript of Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates,...
![Page 1: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/1.jpg)
Space-Indexed Dynamic Programming: Learning to
Follow Trajectories
J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway
Computer Science DepartmentStanford University
July 2008, ICML
![Page 2: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/2.jpg)
Outline
• Reinforcement Learning and Following Trajectories
• Space-indexed Dynamical Systems and Space-indexed Dynamic Programming
• Experimental Results
![Page 3: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/3.jpg)
Reinforcement Learning and Following Trajectories
![Page 4: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/4.jpg)
Trajectory Following
• Consider task of following trajectory in a vehicle such as a car or helicopter
• State space too large to discretize, can’t apply tabular RL/dynamic programming
![Page 5: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/5.jpg)
Trajectory Following
• Dynamic programming algorithms w/ non-stationary policies seem well-suited to task– Policy Search by Dynamic Programming
(Bagnell, et. al), Differential Dynamic Programming (Jacobson and Mayne)
![Page 6: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/6.jpg)
Dynamic Programming
t=1
Divide control task into discrete time steps
![Page 7: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/7.jpg)
Dynamic Programming
t=1
Divide control task into discrete time steps
t=2
![Page 8: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/8.jpg)
Dynamic Programming
t=1
Divide control task into discrete time steps
t=2t=3
t=4 t=5 : : :
![Page 9: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/9.jpg)
Dynamic Programming
t=1 t=2t=3
t=4 t=5 : : :
Proceeding backwards in time, learn policies for
t = T, T-1, …, 2, 1
![Page 10: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/10.jpg)
Dynamic Programming
t=1 t=2t=3
t=4 t=5 : : :
Proceeding backwards in time, learn policies for
t = T, T-1, …, 2, 1
¼5
![Page 11: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/11.jpg)
Dynamic Programming
t=1 t=2t=3
t=4 t=5 : : :
Proceeding backwards in time, learn policies for
t = T, T-1, …, 2, 1
¼5¼4
![Page 12: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/12.jpg)
Dynamic Programming
t=1 t=2t=3
t=4 t=5 : : :
Proceeding backwards in time, learn policies for
t = T, T-1, …, 2, 1
¼5¼4¼3
¼2¼1
![Page 13: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/13.jpg)
Dynamic Programming
t=1 t=2t=3
t=4 t=5 : : :
Key Advantage: Policies are local (only need to perform well over small
portion of state space)
¼5¼4¼3
¼2¼1
![Page 14: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/14.jpg)
Problems with Dynamic Programming
Problem #1: Policies from traditional dynamic
programming algorithms are time-indexed
![Page 15: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/15.jpg)
Problems with Dynamic Programming
¼5
Supposed we learned policy assuming this
distribution over states¼5
![Page 16: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/16.jpg)
Problems with Dynamic Programming
¼5
But, due to natural stochasticity of environment, car is actually here at t = 5
![Page 17: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/17.jpg)
Problems with Dynamic Programming
¼5
Resulting policy will perform very poorly
![Page 18: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/18.jpg)
Problems with Dynamic Programming
¼5¼4
¼3¼2
¼1
Partial Solution: Re-indexingExecute policy closest to current
location, regardless of time
![Page 19: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/19.jpg)
Problems with Dynamic Programming
Problem #2: Uncertainty over future states makes it hard to
learn any good policy
![Page 20: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/20.jpg)
Problems with Dynamic Programming
Due to stochasticity, large uncertainty over states in
distant future
Dist. over states at time t = 5
![Page 21: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/21.jpg)
Problems with Dynamic Programming
DP algorithms require learning policy that performs well over entire distribution
Dist. over states at time t = 5
![Page 22: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/22.jpg)
Space-Indexed Dynamic Programming
• Basic idea of Space-Indexed Dynamic Programming (SIDP):
Perform DP with respect to space indices (planes tangent to trajectory)
![Page 23: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/23.jpg)
Space-Indexed Dynamical Systems and Dynamic
Programming
![Page 24: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/24.jpg)
Difficulty with SIDP
• No guarantee that taking single action will move to next plane along trajectory
• Introduce notion of space-indexed dynamical system
![Page 25: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/25.jpg)
Time-Indexed Dynamical System
• Creating time-indexed dynamical systems:
_s = f (s;u)
![Page 26: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/26.jpg)
Time-Indexed Dynamical System
• Creating time-indexed dynamical systems:
_s = f (s;u)
current state
![Page 27: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/27.jpg)
Time-Indexed Dynamical System
• Creating time-indexed dynamical systems:
_s = f (s;u)
control actioncurrent state
![Page 28: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/28.jpg)
Time-Indexed Dynamical System
• Creating time-indexed dynamical systems:
_s = f (s;u)
control actioncurrent statetime derivative of state
![Page 29: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/29.jpg)
Time-Indexed Dynamical System
• Creating time-indexed dynamical systems:
_s = f (s;u)
Euler integration
st+¢ t = st +f (st;ut)¢ t
![Page 30: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/30.jpg)
Space-Indexed Dynamical Systems
• Creating space-indexed dynamical systems:
• Simulate forward until whenever vehicle hits next tangent plane
space index d
space index d+1
![Page 31: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/31.jpg)
Space-Indexed Dynamical Systems
• Creating space-indexed dynamical systems:
space index dspace index d+1
_s = f (s;u)
sd+1 = sd+f (sd;ud)¢ t(sd;ud)
![Page 32: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/32.jpg)
Space-Indexed Dynamical Systems
• Creating space-indexed dynamical systems:
space index dspace index d+1
_s = f (s;u)
(Positive solution exists as long as controller makes
some forward progress)
sd+1 = sd+f (sd;ud)¢ t(sd;ud)
¢ t(s;u) =( _s?d+1)
T (s¡ s?d+1)( _s?d+1)
T _s
¢ t(s;u)
![Page 33: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/33.jpg)
Space-Indexed Dynamical Systems
• Result is a dynamical system indexed by spatial-index variable d rather than time
• Space-indexed dynamic programming runs DP directly on this system
sd+1 = sd+f (sd;ud)¢ t(sd;ud)
![Page 34: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/34.jpg)
Space-Indexed Dynamic Programming
Divide trajectory into discrete space planes
d=1
![Page 35: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/35.jpg)
Space-Indexed Dynamic Programming
Divide trajectory into discrete space planes
d=1 d=2
![Page 36: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/36.jpg)
Space-Indexed Dynamic Programming
Divide trajectory into discrete space planes
d=1 d=2d=3
d=4d=5
![Page 37: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/37.jpg)
Space-Indexed Dynamic Programming
d=1 d=2d=3
d=4d=5
Proceeding backwards, learn policies for
d = D, D-1, …, 2, 1
![Page 38: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/38.jpg)
Space-Indexed Dynamic Programming
d=1 d=2d=3
d=4d=5
¼5
Proceeding backwards, learn policies for
d = D, D-1, …, 2, 1
![Page 39: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/39.jpg)
Space-Indexed Dynamic Programming
d=1 d=2d=3
d=4d=5
¼5¼4
Proceeding backwards, learn policies for
d = D, D-1, …, 2, 1
![Page 40: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/40.jpg)
Space-Indexed Dynamic Programming
d=1 d=2d=3
d=4d=5
¼5¼4¼3
¼2¼1
Proceeding backwards, learn policies for
d = D, D-1, …, 2, 1
![Page 41: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/41.jpg)
Problems with Dynamic Programming
Problem #1: Policies from traditional dynamic
programming algorithms are time-indexed
![Page 42: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/42.jpg)
Space-Indexed Dynamic Programming
Time indexed DP: can execute
policy learned for different location
Space indexed DP: always executes policy based on current spatial
index
¼5
¼4
![Page 43: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/43.jpg)
Problems with Dynamic Programming
Problem #2: Uncertainty over future states makes it hard to
learn any good policy
![Page 44: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/44.jpg)
Space-Indexed Dynamic Programming
Time indexed DP: wide distribution
over future states
Space indexed DP: much tighter
distribution over future states
Dist. over states at time t = 5 Dist. over states at index d = 5
![Page 45: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/45.jpg)
Space-Indexed Dynamic Programming
Time indexed DP: wide distribution
over future states
Space indexed DP: much tighter
distribution over future states
Dist. over states at time t = 5 Dist. over states at index d = 5
t(5):
![Page 46: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/46.jpg)
Experiments
![Page 47: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/47.jpg)
Experimental Domain
• Task: following race track trajectory in RC car with randomly placed obstacles
![Page 48: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/48.jpg)
Experimental Setup
• Implemented space-indexed version of PSDP algorithm– Policy chooses steering angle using SVM
classifier (constant velocity)– Used simple textbook model simulator of car
dynamics to learn policy
• Evaluated PSDP time-indexed, time-indexed with re-indexing and space-indexed
![Page 49: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/49.jpg)
Time-Indexed PSDP
![Page 50: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/50.jpg)
Time-Indexed PSDP w/ Re-indexing
![Page 51: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/51.jpg)
Space-Indexed PSDP
![Page 52: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/52.jpg)
Empirical Evaluation
Time-indexed PSDP Time-indexed PSDP with Re-indexing
Space-indexed PSDP
Cost: 49.32Cost: Infinite (no trajectory succeeds) Cost: 59.74
![Page 53: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/53.jpg)
Additional Experiments
• In the paper: additional experiments on the Stanford Grand Challenge Car using space-indexed DDP, and on a simulated helicopter domain using space-indexed PSDP
![Page 54: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/54.jpg)
Related Work
• Reinforcement learning / dynamic programming: Bagnell et al., 2004; Jacobson and Mayne, 1970; Lagoudakis and Parr, 2003; Langford and Zadrozny, 2005
• Differential Dynamic Programming: Atkeson, 1994; Tassa et al., 2008
• Gain Scheduling, Model Predictive Control: Leith and Leithead, 2000; Garica et al., 1989
![Page 55: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/55.jpg)
Summary
• Trajectory following uses non-stationary policies, but traditional DP / RL algorithms suffer because they are time-indexed
• In this paper, we introduce the notions of a space-indexed dynamical system, and space-indexed dynamic programming
• Demonstrated usefulness of these methods on real-world control tasks.
![Page 56: Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science.](https://reader030.fdocuments.in/reader030/viewer/2022032708/56649e8e5503460f94b91f18/html5/thumbnails/56.jpg)
Thank you!
Videos available online athttp://cs.stanford.edu/~kolter/icml08videos