ol strategies Learning human contr HCS Model
Transcript of ol strategies Learning human contr HCS Model
Lear
ning
hum
an c
ontr
ol s
trat
egie
s
syst
em s
tate
hum
an c
ontr
ol
envi
ronm
ent
uk
()
z
1
–
z
1
–
z
1
–
…
z
1
–
z
1
–
…
uk
1
–
()
uk
n
u
–
1
+
()
xk
()
xk
1
–
()
xk
n
x
–
1
+
()
zk
()
uk
1
+
()
HCS Model
envi
ronm
ent
mod
el c
ontr
olsy
stem
sta
te
Human control strategy
•
Dynamic, reaction
skill, with reasonably well-defined I/O representation
• Control gains
• Control structure/approach
• Interaction with environment/surrounding
• No abstract or high-level reasoning
• Example: human driving
0 50 100 150 200 250 300-8000
-6000
-4000
-2000
0
2000
4000
0 50 100 150 200 250 300
App
lied
forc
e (N
)
Time (sec)Time (sec)
Stan Oliver
Learning human control strategy
• Difficult to model human control strategy
analytically
.
• Unknown individual controller properties
• Structure
• Order
• Human control strategy modeling complications
• Dynamic
• Stochastic — errors and inconsistencies
• Possibly discontinuous
• Possibly nonlinear
• Continuous learning approach: neural networks
HC
S m
odel
exa
mpl
e:
Ck
time
time
velocity lateral offset steering applied force
Gro
ucho
’s c
ontr
olC
k m
odel
con
trol
010
020
030
040
050
060
0405060708090
010
020
030
040
050
060
0
-4-2024
010
020
030
040
050
060
0-0
.2
-0.10
0.1
0.2
010
020
030
040
050
060
0-8
000
-600
0
-400
0
-200
00
2000
4000
010
020
030
040
050
060
0405060708090
010
020
030
040
050
060
0
-4-2024
010
020
030
040
050
060
0-0
.2
-0.10
0.1
0.2
010
020
030
040
050
060
0-8
000
-600
0
-400
0
-200
00
2000
4000
HC
S m
odel
exa
mpl
e #2
:
Ck
time
time
velocity lateral offset steering applied force
Har
po’s
con
trol
Ck
mod
el c
ontr
ol
010
020
030
040
050
060
070
0405060708090
010
020
030
040
050
060
070
0
-4-2024
010
020
030
040
050
060
070
0-0
.2
-0.10
0.1
0.2
010
020
030
040
050
060
070
0-8
000
-600
0
-400
0
-200
00
2000
4000
010
020
030
040
050
060
0405060708090
010
020
030
040
050
060
0
-4-2024
010
020
030
040
050
060
0-0
.2
-0.10
0.1
0.2
010
020
030
040
050
060
0-8
000
-600
0
-400
0
-200
00
2000
4000
Additional hidden units in
neural network
0 100 200 300 400 500 600 700-6000
-4000
-2000
0
2000
4000
0 100 200 300 400 500 600-8000
-6000
-4000
-2000
0
2000
4000
time
appl
ied
forc
eap
plie
d fo
rce
time
linear model
one additional hidden unit
Need for discontinuous learning
•
Cq
,
Ck
models are convergent (stable) strategies, but are not “similar.”
• In reduced dimensional plot, “switches” overlap “nonswitches.”
• Continuous neural network may have difficulty modeling discontinuous (e.g. switching) control strategies.
-3 -2 -1 0 1 2 3
0
1
2
3
4
-3 -2 -1 0 1 2 3
0
1
2
3
1st PC2n
d P
C1st PC
2nd
PC
Current command = brake Current command = gas
Hybrid continuous/discontinuous control
Continuous cascade network steering control
Signal-to-symbol
conversion
Road description
Stochastic action choice
Discontinuous HMM acceleration control
r
x
r
y
,{ }
λ
1
λ
2
λ
N
O
∗
P O
∗ λ
1
( )
P A
i
O
∗( )
P O
∗ λ
2
( )
P O
∗ λ
N
( ) δ
φ
v
ξ
v
η
ω δ φ, , , ,{ }
…
v
ξ
v
η
ω δ φ, , , ,{ }
r
x
r
y
,{
}
O
∗
O
∗
P A
i
( )
Actions
• Applied force > 0 (gas is currently active):
• : do nothing.
• : increase applied force.
• : decrease applied force, but keep > 0.
• : switch to braking
• Applied force < 0 (brake is currently active):
• : do nothing
• : increase braking force
• : decrease braking force
• : switch to accelerator
A
1
A
2
A
3
A
4
A
5
A
6
A
7
A
8
Statistical model and priors
• Single-state HMMs (encode arbitrary statistical distribution of inputs).
• Priors: frequency of occurrence for action in human data set
P O
∗
A
i
( )
P O
∗ λ
i
( )≡
A
i
P A
i
O
∗( )
P O
∗ λ
i
( )
P A
i
( )∝
HC
S m
odel
exa
mpl
e: h
ybri
d
time
time
velocity lateral offset steering applied force
Gro
ucho
’s c
ontr
olhy
brid
mod
el c
ontr
ol
010
020
030
040
050
060
0405060708090
010
020
030
040
050
060
0
-4-2024
010
020
030
040
050
060
0-0
.2
-0.10
0.1
0.2
010
020
030
040
050
060
0-8
000
-600
0
-400
0
-200
00
2000
4000
010
020
030
040
050
060
0405060708090
010
020
030
040
050
060
0
-4-2024
010
020
030
040
050
060
0-0
.2
-0.10
0.1
0.2
010
020
030
040
050
060
0-8
000
-600
0
-400
0
-200
00
2000
4000
HC
S m
odel
exa
mpl
e #2
: hyb
rid
time
time
velocity lateral offset steering applied force
Har
po’s
con
trol
hybr
id m
odel
con
trol
010
020
030
040
050
060
070
0405060708090
010
020
030
040
050
060
070
0
-4-2024
010
020
030
040
050
060
070
0-0
.2
-0.10
0.1
0.2
010
020
030
040
050
060
070
0-8
000
-600
0
-400
0
-200
00
2000
4000
010
020
030
040
050
060
070
0405060708090
010
020
030
040
050
060
070
0
-4-2024
010
020
030
040
050
060
070
0-0
.2
-0.10
0.1
0.2
010
020
030
040
050
060
070
0-8
000
-600
0
-400
0
-200
00
2000
4000
A c
lose
r lo
ok
200
210
220
230
240
-800
0
-600
0
-400
0
-200
00
2000
4000
PA
1
O
()
PA
2
O
()
PA
4
O
()
PA
5
O
()
PA
6
O
()
PA
8
O
()
t
sec
()
φ
N
()
switc
h (g
as to
bra
ke)
switc
h (b
rake
to g
as)
Sample Groucho turning maneuver
0 2 4 6 8 10 12
60
65
70
75
80
85
0 2 4 6 8 10 12
-4
-2
0
2
4
0 2 4 6 8 10 12-0.2
-0.1
0
0.1
0.2
0 2 4 6 8 10 12-8000
-6000
-4000
-2000
0
2000
4000
v
(mi/h
)
δ
(rad
)
φ
(N)
d
ξ
(m)
t (sec)t (sec)
t (sec) t (sec)
Human-to-model validation
• Quantify qualitative observations
• Hybrid model exhibits greater similarity than either of the purely NN-based modeling approaches.
Larry Moe Curly Harpo0
0.1
0.2
0.3
0.4
0.5
Hybrid model
Ck model
Cq model
sim
ilari
ty
Model-to-human classification
Table 1: Hybrid model-to-human matching
Larry Moe Groucho Harpo
Larry’s model
0.450
0.329 0.315 0.069
Moe’s model
0.126
0.555
0.338 0.217
Groucho’s model
0.152 0.377
0.457
0.206
Harpo’s model
0.013 0.134 0.127
0.578
Table 2:
Ck
model-to-human matching
Larry Moe Groucho Harpo
Larry’s model
0.161
0.166
0.118 0.157
Moe’s model
0.056
0.088
0.063 0.041
Groucho’s model
0.056 0.066
0.096
0.040
Harpo’s model
0.006 0.008
0.012
0.003
σ
σ