JHU BME 580.422 Biological Systems II Spinal motor system and force generation Reza Shadmehr
580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and...
-
Upload
chester-porter -
Category
Documents
-
view
219 -
download
2
Transcript of 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and...
![Page 1: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/1.jpg)
580.691 Learning Theory
Reza Shadmehr
Optimal feedback control
stochastic feedback control with and without additive noise
![Page 2: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/2.jpg)
( ) ( ) ( ) ( ) ( 1) ( 1) ( 1)k k T k k k T k kL T u u y y
(0)
(0) (1) ( 1)
(1) (2) ( ) (1) (2) ( )
, , ,
, , , , , ,
p
p pC C C
x
u u u
y y y x x x
Starting at state
Sequence of actions
Observations
Cost to minimize ( ) ( ) ( ) ( ) ( ) ( )
0
pk T k k k T k k
k
J L T
u u y y
Cost per step
( 1) ( ) ( )
( ) ( )
k k k
k k
A B
C
x x u
y x
![Page 3: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/3.jpg)
( ) ( ) ( ) ( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( )
( ) ( ) ( ) ( )
0
p p T p p p T p p
p
p p T p p p T T p p
p T p
p p T p p
T L
V T C T C
W C T C
V W
y y u u
u
x y y x x
x x x
Start at the last time point k=pThe cost at k=p is:
To minimize this cost, set
Under this policy, the value of states are:
![Page 4: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/4.jpg)
( ) ( ) ( ) ( )
( 1) ( 1) ( )
( 1) ( ) ( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( )
( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( )
( 1) ( 1) ( 1) ( 1) ( 1) ( 1
arg min
p p T p p
p p p
u
p p p T p p p T p p p
p T T p p p T p p p
p T T p p p T p p
V W
x V x
V T L V
C T C L V
C T C L
x x x
x y y u u x
x x u u x
x x u u
)
( 1) ( 1) ( ) ( 1) ( 1)
( 1) ( 1) ( 1) ( 1) ( 1) ( 1)
( 1) ( ) ( 1) ( 1) ( ) ( 1)
( 1) ( ) ( 1)2
Tp p p p p
p T T p p p T p p
p T T p p p T T p p
p T T p p
A B W A B
C T C L
A W A B W B
B W A
x u x u
x x u u
x x u u
u x
![Page 5: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/5.jpg)
( 1) ( )
( 1) ( 1) ( ) ( 1) ( ) ( 1)( 1)
1( 1) ( 1) ( ) ( ) ( 1)
1( 1) ( 1) ( ) ( )
( 1) ( 1) ( 1)
2 2 2 0
p p
p p T p p T p pp
p p T p T p p
p p T p T p
p p p
d VL B W B B W A
d
L B W B B W A
G L B W B B W A
G
xu u x
u
u x
u x
![Page 6: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/6.jpg)
( 1) ( ) ( 1) ( 1) ( ) ( 1)
( 1) ( 1) ( ) ( 1) ( 1) ( ) ( 1)
( 1) ( 1) ( ) ( 1)
( 1) ( 1) ( 1) ( ) ( 1) ( 1)
( 1) ( 1) ( ) ( 1)
2
2
p p p T T p T p p
p T p T p p p T T p p
p T T p T p p
p T p T p T p p p
p T p T T p p
V C T C A W A
L B W B B W A
C T C A W A
G L B W B G
G B W A
x x x
u u u x
x x
x x
x x
We will now show that if we choose the optimal u at step p-1, then cost to go is once again a quadratic function of state x.
Can be simplified to:( 1) ( ) ( 1) ( 1)p T T p p pA W BG x x
( 1) ( ) ( 1) ( 1) ( ) ( ) ( 1) ( 1)
( 1) ( 1) ( ) ( 1) ( 1)
( 1) ( 1) ( 1)
p p p T T p T p T p p p
p T T p T p p p
p T p p
V C T C A W A A W BG
C T C A W A BG
W
x x x
x x
x x
Because it is a scalar, it can be written as:
( 1) ( ) ( 1) ( 1)2 p T T p p pA W BG x x
![Page 7: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/7.jpg)
We just showed that for the last time step, the cost to go is a quadratic function of x:
( ) ( ) ( ) ( )p p T p pV W x x x
The optimal u to at time point p-1 minimizes cost to go J(p-1):
1( 1) ( 1) ( ) ( )
( 1) ( 1) ( 1)
p p T p T p
p p p
G L B W B B W A
G
u x
If at time point p-1 we indeed carry out this optimal policy u, then the cost to go at time p-1 also becomes a linear function of x:
( 1) ( 1) ( 1) ( 1)
( 1) ( 1) ( ) ( 1)
p p T p p
p T p T p p
V W
W C T C A W A BG
x x x
If we now repeat the process and find the optimal u for time point p-2, it will be:
1( 2) ( 2) ( 1) ( 1)
( 2) ( 2) ( 2)
p p T p T p
p p p
G L B W B B W A
G
u x
And if we apply the optimal u at time points p-2 and p-1, then the cost to go at time point p-2 will be a quadratic function of x:
( 2) ( 2) ( 2) ( 2)
( 2) ( 2) ( 1) ( 2)
p p T p p
p T p T p p
J W
W C T C A W A BG
x x
So in general, if for time points t+1, …, p we calculated the optimal policy for u, then the above gives us a recipe to compute the optima policy for time point t.
![Page 8: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/8.jpg)
( 1) ( ) ( )
( 1) ( 1)
1(0) ( 1) ( 1) ( 1) ( ) ( ) ( )
0
k k k
k k
pk T k k k T k k
k
A B
C
J T L
x x u
y x
y y u u
Summary: optimal feedback control
( ) ( ) ( ) ( )
( ) ( )
1( 1) ( 1) ( ) ( )
( 1) ( 1) ( 1)
( 1) ( 1) ( 1) ( 1)
( 1) ( 1) ( ) ( 1)
p p T p p
p T p
p p T p T p
p p p
p p T p p
p T p T p p
V W
W C T C
G L B W B C W A
G
V W
W C T C A W A BG
x x x
u x
x x x
(0)x
(1)y
(0)u
(1)x ( )px
( 1)pu
( )py
Cost to go
1(0) (0) (1) (1)
(0) (0) (0)
(0) (0) (0) (0)
(0) (0) (1) (0)
T T
T
T T
G L B W B B W A
G
J W
W C T C A W A BG
u x
x x
The procedure is to compute the matrices W and G from the last time point to the first time point.
![Page 9: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/9.jpg)
1 2
1 1
2 2
( 1) ( )
3 . / 0.45 . . / 0.3 . /
0 1 0 00
10
1 0
0.01sec
c c
k k
c
c
k N m rad b N m s rad m kg m rad
x x x x
x xk b
x x um m m
A B
y
A B
A I A t
C B t
x x u
x
x x u
Continuous time model of the elbow
Discrete time model of the elbow
Modeling of an elbow movement
![Page 10: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/10.jpg)
Goal: Reach a target at 30 deg in 300 ms time and hold it there for 100 ms.
Unperturbed movement Arm held at start for 200ms Force pulse to the arm for 50ms
0 0.1 0.2 0.3 0.4sec
0
0.1
0.2
0.3
0.4
0.5
noitisoP
0.05 0.1 0.15 0.2 0.25 0.3 0.35sec
-75
-50
-25
0
25
50
75
rotoMdnammoc
0 0.1 0.2 0.3 0.4sec
0
0.1
0.2
0.3
0.4
0.5
noitisoP
0.05 0.1 0.15 0.2 0.25 0.3 0.35sec
-5
0
5
10
15
rotoMdnammoc
0 0.1 0.2 0.3 0.4sec
0
0.1
0.2
0.3
0.4
0.5
noitisoP0.05 0.1 0.15 0.2 0.25 0.3 0.35
sec
-30
-20
-10
0
10
rotoMdnammoc
0 0.1 0.2 0.3 0.4sec
1
1.25
1.5
1.75
2
2.25
2.5
L
L
0 0.1 0.2 0.3 0.4sec
0
500000
1́ 106
1.5́ 106
2́ 106
soPtsoc T
0 0.1 0.2 0.3 0.4sec
0
5000
10000
15000
20000
leVtsoc T
![Page 11: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/11.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6sec
0
500000
1´106
1.5́ 106
2´106
soPtsoc
Movement with a via point: we set the cost to be high at the time when we are supposed to be at the via points.
0 0.1 0.2 0.3 0.4 0.5 0.6sec
-10
0
10
20
30
rotoMdnammoc
0 0.1 0.2 0.3 0.4 0.5 0.6sec
0
0.2
0.4
0.6
0.8
noitisoP
0 0.1 0.2 0.3 0.4 0.5 0.6sec
0
200
400
600
800
soPniaG
T
G
![Page 12: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/12.jpg)
( 1) ( ) ( )
( 1) ( 1)
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
1
0,
0,
k k kx x
k ky y
k k T k k k T k k
pk T k k k T k k
k
A C N Q
B N R
L T
J L T
x x u ε ε
y x ε ε
u u y y
u u y y
Stochastic optimal feedback control
Biological processes have noise. For example, neurons fire stochastically in response to a constant input, and muscles produce a stochastic force in response to constant stimulation. Here we will see how to solve the optimal control problem with additive Gaussian noise.
Cost to minimize
Because there is noise, we are no longer able to observe x directly. Rather, the best we can do is to estimate it. As we saw before, for a linear system with additive noise the best estimate of state is through the Kalman filter. So our goal is to determine the best command u for the current estimate of x so that we can minimize the global cost function.
Approach: as before, at the last time point p the cost is a quadratic function of x. We will find the optimal motor command for time point p-1 so that it minimizes the expected cost to go. If we perform the optimal motor command at p-1, then we will see that the cost to go at p-1 is again a quadratic function of x.
![Page 13: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/13.jpg)
Preliminaries: Expected value of a squared random variable. In the following example, we assume that x is the random variable.
2
22
2
22
var
var
var
var
var
T
TT
T T
T
T
TT
v x
x E x E x
E v E x
E x x E x
v
E E E
E E tr
tr E
tr tr E E
E tr E E
x x
x xx x x
x x xx
xx
x x x
x x x x x
1 2
2 2 21 2
21 1 2 1
22 1 2 2
21 2
2
1
Tn
Tn
n
T n
n n n
nT
ii
T T
r r r
r r r
r r r r r
r r r r r
r r r r r
tr r
tr
r
r r
rr
rr
r r rr
Scalar x
Vector x
var
T
T
v A
E v tr A E AE
x x
x x x
![Page 14: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/14.jpg)
( )
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( )
( ) ( 1) ( 1) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( )
0
2
, 2
p
p p T p p p T T T p py y
p T T p p T p T p py y y
p T p
p p p p T p p T p T p py y y
T p py
V T C T C
C T C T T C
W C T C
E V E W E T E T C
E T C
u
x y y x ε x ε
x x ε ε ε x
x x u x x ε ε ε x
ε x
( ) ( 1) ( 1) ( )
( )
( ) ( 1) ( 1) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
cov , 0
, var var
T p p p T py x y x
py x
p p p p p p T p p py
p p T p p
E T C A B E T C
tr T C
E V tr W E W E tr T
tr W Q E W E
ε x u ε ε ε
ε ε
x x u x x x ε
x x
( )
( 1) ( 1) ( ) ( 1) ( 1) ( ) ( )
( 1) ( ) ( 1) ( 1) ( ) ( 1)
( 1) ( ) ( 1) ( ) ( )2
p
Tp p p p p p p
p T T p p p T T p p
p T T p p T p p
tr T R
A B W A B tr W Q tr T R
A W A B W B
A W B tr W Q tr T R
x u x u
x x u u
x u
![Page 15: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/15.jpg)
( 1) ( 1) ( ) ( 1) ( 1)
( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( 1) ( )
( 1) ( 1) ( ) ( 1) ( 1) ( 1) ( ) ( 1)
( 1) ( ) ( 1) ( )
arg min ,
2
p p p p p
u
p p T p p p T p p p
p T T p T p p p T p T p p
p T T p p p
E V
T L E V
C T C A W A L B W B
A W B tr W
x x x u
y y u u x
x x u u
x u
( ) ( 1)
( 1)( 1) ( ) ( 1) ( ) ( 1)
( 1)
1( 1) ( 1) ( ) ( ) ( 1)
1( 1) ( 1) ( ) ( )
( 1) ( 1) ( 1)
2 2 0
p T py y
pp T p p T p p
p
p p T p T p p
p p T p T p
p p p
Q tr T R T
dL B W B B W A
d
L B W B B W A
G L B W B B W A
G
ε ε
u xu
u x
u x
So we see that if our system has additive state or measurement noises, the optimal motor command remains the same as if the system had no noises at all. When we use the optimal policy at time point p-1, we see that, as before, the cost-to-go at p-1 is a quadratic function of x. The matrix W at p-1 remains the same as when the system had no noise.
The problem is that we do not have x. The best that we can do is to estimate x via the Kalman filter. We do this in the next slide.
![Page 16: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/16.jpg)
1 21 1
1 2 2 2 2
2 2 2 3 2 32 2
1 2 2 3 2 32 2 2
1 21
1 2 2 2 2 2
1 1 1
ˆ
ˆ ˆ
ˆ ˆ ˆ
ˆ ˆ ˆ
ˆ ˆ
ˆ ˆ ˆ
ˆ
p pp p
p p p p p
p p p p p pp p
p p p p p pp p p
p pp
p p p p p p
p p p
G
A B
K C
A B AK C
A B AK C
G
u x
x x u
x x y x
x x u y x
x x
x x u y x
u x
On trial p-1, our best estimate of x is the prior.
We compute the prior for the current trial from the posterior of the last trial.
The posterior estimate.
Our short-hand way to note the prior estimate of x on trial p-1.
Although the noises in the system do not affect the gain G, the estimate of x is of course affected by the noises because the Kalman gain is influenced by them.
Kalman gain
![Page 17: 580.691 Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c30310/html5/thumbnails/17.jpg)
( 1) ( ) ( )
( 1) ( 1)
1(0) ( ) ( ) ( ) ( 1) ( 1) ( 1)
0
0,
0,
k k kx x
k ky y
pk T k k k T k k
k
A B N Q
C N R
J L T
x x u ε ε
y x ε ε
u u y y
Summary of stochastic optimal control for a linear system with additive Gaussian noise and quadratic cost
( ) ( ) ( ) ( ) ( )
( ) ( )
( ) ( )
1( 1) ( 1) ( ) ( )
( 1) ( 1) ( 1)
( 1) ( 1) ( 1) ( 1) ( 1)
( 1) ( 1) ( ) ( 1)
( 1) ( ) ( 1) (
ˆ
p p T p p p
p T p
p T py y
p p T p T p
p p p
p p T p p p
p T p T p p
p p T p py y
J W w
W C T C
w T
G L B W B B W A
G
J W w
W C T C A W A BG
w tr W Q T E w
x x
ε ε
u x
x x
ε ε )
Cost to go at the start
Cost to go at the end
( )ptr T R