MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity...

MC2019 Part 2Optimal control for stochastic systems.

Hirokazu Tanaka

Optimal control for stochastic systems.

In this lecture, we will learn:• Feedforward and feedback control• Trial-by-trial variability• Signal-dependent noise• Dynamic programming• Bellman’s optimality equation• Linear-quadratic-Gaussian (LQG) control• Optimal feedback control (OFC) model• Human psychophysics

Straight paths and bell-shaped velocity in reaching.

Morasso (1981) Exp Brain Res

12

3 4 5

6

joint angles

angular velocity

angular accel.

hand speed

T1->T4 T3->T5 T2->T5 T1->T5

Power laws for curved movements.

Lacquaniti et al. (1983) Acta Psychologica

23θ κ∝

13v κ

−∝

: angular velocity : h: curvature

and speedvθ

κ

or

curvature

angu

lar v

eloc

ity

Fitts’ law for movement duration in rapid pointing movements.

Fitts (1954) J Exp Psychol

22logf

Dt a bW

= +

“Main sequence” for saccadic movements

Bahill et al. (1975) Math Biosci

Optimality principle: how a unique trajectory is chosen.

http://en.wikipedia.org/wiki/Snell%27s_law

Snell’s law

Fermat’s principle of least time

1 1 1

2 2 2

sin 1/sin 1/

v nv n

θθ

= =

( ) ( ) ( )Q Q

0 P P

1T dsT s dt n s dsv s c

⋅ = = = ∫ ∫ ∫

( ) ( )Q

P

1 0T s n s dsc

δ δ⋅ = = ∫

Feedforward and feedback control.

Feedforward control as a function of time step:

1k k k+ = +x Ax Bu

( )k k k=u u

Feedback control as a function of state:

( )k k k=u u x

For a deterministic system, feedforward and feedback control are equivalent. On the other hand, for a stochastic system, feedforward and feedback control are NOT equivalent.

Signal-dependent noise: trial-by-trial variability.

van Beers et al. (2004) J Neurophysiol;Schmidt et al. (1979) Psychol Rev; Jones et al. (2002) J Neurophysiol

Variability in reaching endpoints Variability in force production

( ) ( )noise 0, 11 ,ξξ= +u u

noise noise TE , Cov = = u u u uuSignal dependent noise

Minimum-variance control predicts smooth trajectory.

Harris & Wolpert (1998) Nature

( )1 1n n n nξ+ = + +x Ax Bu

( )( ) ( )

( )

1 1 1

22 2 2 1 1

1

00

1

1

1 1

1n

nk

k

n n n n

n n n n n

n kk

ξ

ξ ξ

ξ

− − −

− −

−

=

− − −

− −

= + +

= + + + +

=

= + +∑

x Ax Bu

A x ABu Bu

A x A Bu

[ ] 0

1

0

1E n kn

nn

kk

− −−

=

= +∑x A x A Bu

[ ] ( )T1 T1

0

T 1Cov n k n kn k k

n

k

−

=

− − − −= ∑x A Bu u B A



[ ] ( )0

11

0E

nn

kn k

nk

f ft t T−

=

− −= + = ≤ ≤ +∑x A x A Bu x

[ ] ( )T1 T T1

11110

1Covf f

f f

Tn k n

t t T n

kt n t k

n kt

k+

− − −

=

−+ −

= =

= ∑ ∑ ∑x A Bu u B A

Minimize the variance of final position (quadratic with respect to u)

under constraints (T+1 constraints with respect to u)

This problem can be solved with quadratic programming (quadprogcommand).



Saccade velocity

Reaching and drawing

Matlab code for the minimum-variance model.


t1 = 224/1000; % time const of eye dynamics t2 = 13/1000; % another time const of eye dynamicstm = 10/1000;

dt = 1/1000; % simulation time step

tf = 50/1000; % movement durationtp = 20/1000; % post-movement durationK = tf/dt;L = tp/dt;

x0 = [0; 0; 0]; % initial statexf = [10; 0; 0]; % final state

Ac = [0 1 0; ...0 0 1; ...-1/(t1*t2*tm) -1/(t1*t2)-1/(t1*tm)-1/(t2*tm) -1/t1-1/t2-

1/tm];Bc = [0; 0; 1];A = expm(Ac*dt);B = inv(Ac)*(eye(3)-expm(Ac*dt))*Bc;

Q = zeros(K+L);

% calculation of Qfor ell=0:K+L-1

if ell<Kfor k=K:K+L-1

tmpQ = A^(k-ell-1)*B*B'*A'^(k-ell-1);Q(ell+1, ell+1) = Q(ell+1, ell+1) + tmpQ(1,1);

endelse

for k=ell+1:K+L-1tmpQ = A^(k-ell-1)*B*B'*A'^(k-ell-1);Q(ell+1, ell+1) = Q(ell, ell) + tmpQ(1,1);

endend

End

% calculation of CC = [];for p=1:L+1

Ctmp = [];for q=1:K+L

if K-1-(q-1)+(p-1)>0Ctmp = [Ctmp A^(K-1-(q-1)+(p-1))*B];

elseif K-1-(q-1)+(p-1)==0Ctmp = [Ctmp B];

elseCtmp = [Ctmp zeros(3,1)];

endendC = [C; Ctmp];

end

% calculation of dd = [];for ell=0:L

d = [d; xf-A^(K+ell)*x0];end

% solution by quadratic programmingu = quadprog(Q, [], [], [], C, d);

% forward solutionx = zeros(3, K+L);x(:,1) = x0;for k=1:K+L-1

x(:,k+1) = A*x(:,k) + B*u(k);end

figure(1);subplot(211); hold on; plot(x(1,:), 'k', 'linewidth', 2);set(gca, 'plotboxaspectratio', [1.5 1 1]);xlabel('time (ms)'); ylabel('eye position (deg)');subplot(212); hold on; plot(x(2,:), 'k', 'linewidth', 2)set(gca, 'plotboxaspectratio', [1.5 1 1]);xlabel('time (ms)'); ylabel('eye velocity (deg/s)');

Matlab code for the minimum-variance model.


Minimum-time control predicts Fitts’ law and main sequence.

Tanaka et al. (2006) J Neurophysiol

Faster, but less accurate

300ms 700ms

More precise, but slower

500ms

Unique movement time can be determined

[ ]

[ ]

MT ,

T

1 Var (t

( ) E ( )

{ }

)t tf p

f

t tf pf

f

p

t f

t f

f

V d

dt t

tt

S t

t

t

λ+

+

=

+

−

+

− ∫

∫ μ x x

u

x

minimization of movement time

final variance constraint

final position constraint

Minimum-time control predicts Fitts’ law and main sequence.

Tanaka et al. (2006) J Neurophysiol

Fitts’ law

Main sequence

Model Experiment

Optimal control: Bellman’s optimality equation.

( ) ( )1

0 1 1 00

1 1, , , ;2 2

NT T T

N k k k k k N N Nk

J−

−=

= + +∑u u u x x Q x u Ru x Q x

1k k k+ = +x Ax Bu

Error during movement

Control cost Endpoint error

Find control signals {u0, u1, …, uN-1} that minimize the cost function.

Deterministic (i.e., noiseless) dynamics:

Cost function:

Optimal control: Pontryagin’s minimum principle.

State x

Time stepNN-1N-2N-30 1 2 3

u0

u1 u2u3

uN-3

uN-2uN-1

x0

x1

x2 x3

xN-3 xN-2

xN-1

xN

( ) ( )1

0 1 1 00

1 1, , , ;2 2

NT T T

N k k k k k N N Nk

J−

−=


The control signals {u0, u1, …, uN-1} are optimized as a whole.


State x

Time stepNN-1N-2N-30 1 2 3

u0

u1 u2u3

x0

x1

x2 x3

n n+1 n+2

(cost at state xn at step n) = (cost of moving xn to xn+1) + (cost at state xn+1 at step 𝑛𝑛 + 1)

( )1 1n nV + +x( )n nV x


( ){ }

( )1 1

1

, , ,

1 12 2min

n n N

NT T T

n n k k k k k N N Nk n

V+ −

−

=

= + + ∑

u u ux x Q x u Ru x Q x

( ){ }

( )

( ){ }

( )1

1 1

1

1

,

,

,

1

,

,

1

1 12 2

1 1 1 2 2 2

min

min min

n N

nn N

n

NT T T

n n k k k k k N N Nk n

NT T T T Tn n n n n k k k k k N N N

nk

V

+ −

+ −

−

=

+

−

=

= + +

= + + + +

∑

∑

u u u

u u u

x x Q x u Ru x Q x

x Q x u Ru x Q x u Ru x Q x

( ) ( )1 112min

n

T Tn n n n n n nV + +

= + + ux Q x u Ru x

( )1 1n nV + +x

( ) ( ) ( )1 112min

n

T Tn n n n n n n n nV V + +

= + + ux x Q x u Ru x

Solvable example: Linear-Quadratic-Regulator control.

( ) ( )

( ) ( ) ( )

1 1 1

1

1 12 21 1 2 2

min

min

T T Tk k k k k k k k k

u

TT Tk k k k k k k k k k

u

V + + +

+

= + + = + + + +

x x Q x u Ru x S x

x Q x u Ru Ax Bu S Ax Bu

( )

( )1 1

1 1

102

T T T T T T Tk k k k k k k k k k

T Tk k k k

+ +

+ +

∂ = + + + ∂

= + +

u Ru u B ABu x A S Bu u B S Axu

R B S B u B S Ax

( ) 1

1 1T T

k k k k k k

−

+ += − + = −u R B S B B S Ax L x

Assume that the cost-to-go function has a quadratic form:

( )1 1 1 112

Tk k k kV + + + +=x x S x

Substituting this into the Bellman equation gives

Then, by minimizing with respect to u,

A feedback control law is obtained:


( ) 1

1 1T T

k k k k k k

−

+ += − + = −u R B S B B S Ax L x

( ) ( ) 11 11

1 12 2

TT T

k k k k k k k kk

V−− −

+ = = + +

x x S x x Q A S BR B A x

( ) 11 11

T Tk k k

−− −+= + +S Q A S BR B A

Substituting the control law

into the Bellman equation gives

Therefore a backward recursive equation for the matrix S is obtained.

0

S0

L0

1

S1

L1

2

S2

k-1

Sk-1

Lk-1

k

Sk

Lk

k+1

Sk+1

Lk-2

N-1

SN-1

LN-1

N

SN

LN-2


k k k= −u L x

( ) 1

1 1T T

k k k

−

+ += +L R B S B B S A

( ) 11 11

T Tk k k

−− −+= + +S Q A S BR B A

N N=S Q

Backward recursion equation for the matrix S:

Feedback control law:

where

with the terminal condition

Therefore, once the matrices {S0, …, SN} and {L0, …, LN-1} are obtained, the optimal feedback control signals {u0, …, uN-1} can be computed.

Deterministic and stochastic optimal control.

( ) ( )1

0 1 1 00

1 1, , , ;2 2

NT T T

N k k k k k N N Nk

J−

−=


( ) ( )1

0 1 10 ,

0 01 1ˆ, , , ; E,2 2

NT T T

N k k k k k N N Nk

J−

−=

Σ = + + ∑

w v

u u u x x Q x u Ru x Q x

1k k k

k k

+ = + =

x Ax Buy Cx

1k k k

k kk

k+ = + ++ =

wv

x Ax Buy Cx

Deterministic Optimal Control

Stochastic Optimal Control

Stochastic optimal control combines Kalman filter and feedback control.

ˆk k k= −u L x

( )1ˆ ˆ ˆk k k k k k+ = + + −x Ax Bu K y Cx

( ) ( )1

0 1 10 ,

0 01 1ˆ, , , ; E,2 2

NT T T

N k k k k k N N Nk

J−

−=

Σ = + + ∑

w v


1k k k

k kk

k+ = + ++ =

wv

x Ax Buy Cx

Stochastic Optimal Control

Optimal state estimation (Kalman Filter)

Feedback controller

Optimal feedback control as a model of motor control.

Todorov & Jordan (2002) Nature Neurosci

( ) ( )1

0 1 10 ,

0 01 1ˆ, , , ; , E2 2

NT T T

N k k k k k N N Nk

Jε

−

−=

Σ = + + ∑

ω


1 k

k

k k k k

k k

ε+ = + += +

x Ax Bu Cy Hx ω

u

Optimal feedback control as a model of motor control.

Todorov & Jordan (2002) Nature Neurosci; Todorov (2005) Neural Comput

Kalman filter:( )1ˆ ˆ ˆk k k k k k+ = + + −x Ax Bu K y Hx

( )( )

( ) ( )

1T T

T T T1 1 1

1ˆ T ˆ ˆ

1

;

ˆ ˆ;

k k k

k k k k k k

T Tk k k k k k

−

+

+

= +

= − + =

= + − − =

Σ Σ

Σ Σ Σ Σ Σ

Σ Σ Σ Σ

e

x x x

e ω

e e x e

e

K A H H

A K H A CL L C

K H A A BL A BL x

Ω

x

H

Feedback control:ˆk k k= −u L x

( )( )( )( ) ( )

1T T T1 1 1 1

T1

TT1 1

;

;

k

N

N

k k k k

k k k k N

k k k k k k

Q

−

+ + + +

+

+ +

= + + +

= + − =

= + − − =

e

e e e

x x x

x x x

x

L B S B R C S S C B S A

S Q A S A BL S

S A S BL A K H S A K H S 0

小脳内部順モデル予測

運動野フィードバック制御

頭頂葉カルマンフィルタ

身体・環境

体性感覚野感覚処理

大脳基底核評価関数の設定

predictionx̂

filterx̂

yfilterx̂filterˆ= −u Lx

u

Computational neuroanatomy for motor control.

Shadmehr & Krakauer (2008); Haar & Donchin (2019)

Matlab code: optimal feedback control model.


for iter = 1:MaxIter

% initialize covariancesSiE = S1;SiX = X1*X1';SiXE = zeros(szX,szX);

% forward pass - recompute Kalman filter for k = 1:N-1

% compute Kalman gaintemp = SiE + SiX + SiXE + SiXE';if size(D,2)==1,

DSiD = diag(diag(temp).*D.^2);else

DSiD = zeros(szY,szY);for i=1:szD

DSiD = DSiD + D(:,:,i)*temp*D(:,:,i)';end;

end;K(:,:,k) = A*SiE*H'*pinv(H*SiE*H'+D0*D0'+DSiD);

% compute new SiEnewE = E0*E0' + C0*C0' + (A-K(:,:,k)*H)*SiE*A';LSiL = L(:,:,k)*SiX*L(:,:,k)';if size(C,2)==1,

newE = newE + B*diag(diag(LSiL).*C.^2)*B';else

for i=1:szCnewE = newE + B*C(:,:,i)*LSiL*C(:,:,i)'*B';

end;end;

% update SiX, SiE, SiXESiX = E0*E0' + K(:,:,k)*H*SiE*A' + (A-B*L(:,:,k))*SiX*(A-B*L(:,:,k))' + ...

(A-B*L(:,:,k))*SiXE*H'*K(:,:,k)' + K(:,:,k)*H*SiXE'*(A-B*L(:,:,k))';SiE = newE;SiXE = (A-B*L(:,:,k))*SiXE*(A-K(:,:,k)*H)' - E0*E0';

end;

% first pass initializationif iter==1,

if Init==0, % open loopK = zeros(szX,szY,N-1);

elseif Init==2, % randomK = randn(szX,szY,N-1);

end;end;

% initialize optimal cost-to-go functionSx = Q(:,:,N);Se = zeros(szX,szX);Cost(iter) = 0;

% backward pass - recompute control policyfor k=N-1:-1:1

% update CostCost(iter) = Cost(iter) + trace(Sx*C0*C0') + ...

trace(Se*(K(:,:,k)*D0*D0'*K(:,:,k)' + E0*E0' + C0*C0'));

% Controllertemp = R + B'*Sx*B;BSxeB = B'*(Sx+Se)*B;if size(C,2)==1,

temp = temp + diag(diag(BSxeB).*C.^2);else

for i=1:size(C,3)temp = temp + C(:,:,i)'*BSxeB*C(:,:,i);

end;end;L(:,:,k) = pinv(temp)*B'*Sx*A;

% compute new SenewE = A'*Sx*B*L(:,:,k) + (A-K(:,:,k)*H)'*Se*(A-K(:,:,k)*H);

% update Sx and SeSx = Q(:,:,k) + A'*Sx*(A-B*L(:,:,k));KSeK = K(:,:,k)'*Se*K(:,:,k);if size(D,2)==1,

Sx = Sx + diag(diag(KSeK).*D.^2);else

for i=1:szDSx = Sx + D(:,:,i)'*KSeK*D(:,:,i);

end;end; Se = newE;

end;

% adjust costCost(iter) = Cost(iter) + X1'*Sx*X1 + trace((Se+Sx)*S1);

% progress barif ~rem(iter,10),

fprintf('.');end;

% check convergence of Costif (Niter>0 & iter>=Niter) | ...

(Niter==0 & iter>1 & abs(Cost(iter-1)-Cost(iter))<Eps) | ...(Niter==0 & iter>20 & sum(diff(dist(iter-10:iter))>0)>3),break;

end;end;

Matlab code: optimal feedback control model.


Infinite-horizon optimal feedback control.

Phillis (1985) IEEE Trans Auto Contr; Qian et al. (2013) Neural Comput

( ) ,d dd dt dd dt d

β γ ωξ

= +

+

+ + +

=

x Yu Gy Cx

x Bu xD

A F

( )T T T1 2 0

1lim E E limt

t tJ J J dt

t→∞ →∞

= + = + + ∫x Ux x Qx u Ru

Stochastic dynamics with signal- and state-dependent noises:

Infinite-horizon cost:

( ) ( ).

ˆ ,ˆˆˆd dt d dt= + + −

= −

x Ax Bu yu

xL

Cx

K

In infinite-horizon optimal feedback control, the Kalman gain (K) and the feedback gain (L) are time invariant.

Bimanual coordination explained by OFC model.

Dietrichsen (2007) Curr Biol

Two cursor condition One cursor condition

Model

Experiment

Motor adaptation as reoptimization.

Izawa et al. (2008) J Neurosci

Sigmoidal pathsModel Experiment

Field uncertaintyModel Experiment

Movement as a real-time decision making process.

Nashed et al. (2014) J Neurosci

Learning optimal control policy without knowing dynamics.

Jiang & Jiang (2014) Biol Cybern

ii

id dt dt η= + + ∑x Ax Bu B C u

( )0

T TJ dt∞

= +∫ x Qx u Ru

= −u Kx

10 T −= + − +A P PA Q PBR BP1 T−=K R B P

Algebraic Riccatti equation:

The Riccatti equation for the optimal feedback gain requires explicit knowledge of the forward model (the matrices A and B)!



10 T −= + + −A P PA Q PBR BP

( ) ( )0 Tk k k k k k= − + − + −A BK P P A BK Q K RK

11

Tk k

−+ =K R B P


Iterative solution to the Riccatti equation:

1) A − BK is Hurwitz.2) 𝑃𝑃∗ ≤ 𝑃𝑃𝑘𝑘+1 ≤ 𝑃𝑃𝑘𝑘3) 𝐾𝐾𝑘𝑘 → 𝐾𝐾∗ and 𝑃𝑃𝑘𝑘 → 𝑃𝑃∗



10 T −= + − +A P PA Q PBR BP

( ) ( )0 Tk k k k k k= − + − − +A BK P P A BK Q K RK

11

Tk k

−+ =K R B P


Iterative solution to the Riccatti equation:

1) A − BK is Hurwitz.2) 𝑃𝑃∗ ≤ 𝑃𝑃𝑘𝑘+1 ≤ 𝑃𝑃𝑘𝑘3) 𝐾𝐾𝑘𝑘 → 𝐾𝐾∗ and 𝑃𝑃𝑘𝑘 → 𝑃𝑃∗



10 T −= + + −A P PA Q PBR BP

Problem: In determining optimal policy, the system matrices need to be known:

How can optimal policy be derived without assuming the direct knowledge of the system matrices?

Solution: Note that

( )

10 T T

T T T

dt

d dt

− = + + − = + +

x A P PA Q PBR x

x Px x Q K RK x

BP

Therefore, the knowledge of 𝐀𝐀,𝐁𝐁 is replaced with the knowledge of 𝑑𝑑𝐱𝐱.



By integrating over an infinitesimal interval:

( )0 T T Td dt = + + x Px x Q K RK x

( ) ( ) ( ) ( )tT T t T

t

Tt t t tt dttδ

δ δ+

+ + − = − + ∫x Px x Px x Q K RK x

By integrating over an infinitesimal interval:

Therefore, without knowing the system dynamics, the matrices Pand K are determined.

Summary

• Feedforward and feedback control differ in stochastic dynamics.

• Signal-dependent control noise contributes to trial-by-trial motor variability.

• Movement accuracy under signal-dependent noise models human movements.

• Optimal feedback control (OFC) integrates state estimation and motor control in a single computational framework.

• A number of psychophysical experiments can be explained by the OFC model.

References

• Schmidt, R. A., Zelaznik, H., Hawkins, B., Frank, J. S., & Quinn Jr, J. T. (1979). Motor-output variability: a theory for the accuracy of rapid motor acts. Psychological review, 86(5), 415.

• van Beers, R. J., Haggard, P., & Wolpert, D. M. (2004). The role of execution noise in movement variability. Journal of Neurophysiology, 91(2), 1050-1063.

• Harris, C. M., & Wolpert, D. M. (1998). Signal-dependent noise determines motor planning. Nature, 394(6695), 780-784.

• Tanaka, H., Krakauer, J. W., & Qian, N. (2006). An optimization principle for determining movement duration. Journal of neurophysiology, 95(6), 3875-3886.

• Todorov, E., & Jordan, M. I. (2002). Optimal feedback control as a theory of motor coordination. Nature neuroscience, 5(11), 1226-1235.

• Todorov, E. (2005). Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural computation, 17(5), 1084-1108.

• Athans, M. (1967). The matrix minimum principle. Information and control, 11(5), 592-606.

• Phillis, Y. A. (1985). Controller design of systems with multiplicative noise. Automatic Control, IEEE Transactions on, 30(10), 1017-1019.

• Qian, N., Jiang, Y., Jiang, Z. P., & Mazzoni, P. (2013). Movement duration, fitts's law, and an infinite-horizon optimal feedback control model for biological motor systems. Neural computation, 25(3), 697-724.

• Diedrichsen, J. (2007). Optimal task-dependent changes of bimanual feedback control and adaptation. Current Biology, 17(19), 1675-1679.

• Izawa, J., Rane, T., Donchin, O., & Shadmehr, R. (2008). Motor adaptation as a process of reoptimization. The Journal of Neuroscience, 28(11), 2883-2891.

• Nashed, J. Y., Crevecoeur, F., & Scott, S. H. (2014). Rapid online selection between multiple motor plans. The Journal of Neuroscience, 34(5), 1769-1780.

• Jiang, Y., & Jiang, Z. P. (2014). Adaptive dynamic programming as a theory of sensorimotor control. Biological cybernetics, 108(4), 459-473.

Exercise

• Write a Matlab code of the minimum-variance model for a trajectory with given initial and final positions.

• Write a Matlab code of the optimal feedback control for a trajectory with given initial and final positions.

• Investigate how the minimum-variance model and the optimal feedback model differ.

MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity...

Documents

Transcript of MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity...