Brain and Reinforcement Learning - Jason...

Kenji Doya doya@oist.jp

Neural Computation Unit Okinawa Institute of Science and Technology

MLSS 2012 in Kyoto

Brain and Reinforcement Learning �

Taipei!1.5 hour!

Tokyo!2.5 hour!

Seoul!2.5 hour!

Shanghai!2 hour!

Manila!

2.5 hour!

Okinawa!

Location of Okinawa �

Beijing!3 hour!

Okinawa Institute of Science & Technology

!  Apr. 2004: Initial research !  President: Sydney Brenner

!  Nov. 2011: Graduate university

!  President: Jonathan Dorfan

!  Sept. 2012: Ph.D. course

!  20 students/year

Our Research Interests How to build adaptive, autonomous systems

!   robot experiments

How the brain realizes robust, flexible adaptation

!  neurobiology

Learning to Walk (Doya & Nakano, 1985)

!  Action: cycle of 4 postures !  Reward: speed sensor output

!  Problem: a long jump followed by a fall

Need for long-term evaluation of action

Reinforcement Learning

!  Learn action policy: s → a to maximize rewards !  Value function: expected future rewards !  V(s(t)) = E[ r(t) + γr(t+1) + γ2r(t+2) + γ3r(t+3) +…]

0≤γ≤1: discount factor !  Temporal difference (TD) error: !   δ(t) = r(t) + γV(s(t+1)) - V(s(t))

environment!

reward r!

action a!

state s!

agent!

γV(s(t+1)!

Example: Grid World !  Reward field

Value function γ=0.9

γ=0.3

Cart-Pole Swing-Up !  Reward: height of pole !  Punishment: collision

!  Value in 4D state space

Learning to Stand Up �Morimoto & Doya, 2000)

!  State: joint/head angles, angular velocity !  Action: torques to motors

!  Reward: head height – tumble

Learning to Survive and Reproduce !  Catch battery packs !  survival

!  Copy ‘genes’ by IR ports !  reproduction, evolution

Markov Decision Process (MDP)

!  Markov decision process !  state s ∈ S !  action a ∈ A !  policy p(a|s) !   reward r(s,a) !  dynamics p(s’|s,a)

!  Optimal policy: maximize cumulative reward !   finite horizon: E[ r(1) + r(2) + r(3) + ... + r(T)] !   infinite horizon: E[ r(1) + γr(2) + γ2r(3) + … ]

0≤γ≤1: temporal discount factor !  average reward: E[ r(1) + r(2) + ... + r(T)]/T, T→∞

environment!

reward r!

action a!

state s!

agent!

Solving MDPs �Dynamic Programming ! p(s’|s,a) and r(s,a) are known

!  Solve Bellman equation V(s) = maxa E[ r(s,a) + γV(s’)]

! V(s): value function expected reward from state s

!  Apply optimal policy a = argmaxa E[ r(s,a) + γV*(s’)]

!  Value iteration

!  Policy iteration

Reinforcement Learning ! p(s’|s,a) and r(s,a) are unknown

!  Learn from actual experience {s,a,r,s,a,r,…}

!  Monte Carlo !  SARSA

!  Q-learning !  Actor-Critic

!  Policy gradient

!  Model-based !  learn p(s’|s,a), r(s,a) and do DP

Actor-Critic and TD learning�!  Actor: parameterized policy: P(a|s; w) !  Critic: learn value function

V(s(t)) = E[ r(t) + γr(t+1) + γ2r(t+2) +…]

!   in a table or a neural network

!  Temporal Difference (TD) error:

! δ(t) = r(t) + γ V(s(t+1)) - V(s(t))

!  Update

!  Critic: ΔV(s(t)) = α δ(t)

!  Actor: Δw = α δ(t) ∂P(a(t)|s(t);w)/∂w

… reinforce a(t) by δ(t)

SARSA and Q Learning

!  Action value function ! Q(s,a) = E[ r(t) + γr(t+1) + γ2r(t+2) …| s(t)=s,a(t)=a]

!  Action selection

! ε-greedy: a = argmaxa Q(s,a) with prob 1-ε*

! Boltzman: P(ai|s) = exp[βQ(s,ai)] / Σjexp[βQ(s,aj)] !  SARSA: on-policy update

! ΔQ(s(t),a(t)) = α{r(t)+γQ(s(t+1),a(t+1))-Q(s(t),a(t))} !  Q learning: off-policy update

! ΔQ(s(t),a(t)) = α{r(t)+γmaxa’Q(s(t+1),a’)-Q(s(t),a(t))}

“Lose to Gain” Task !  N states, 2 actions

!  if r2 >> r1 , then better take a2 �

-r1� -r1�

+r2�

+r1� +r1�

-r2�

a2�s1� s2� s3� s4�

-r1�

+r1�a1�

Reinforcement Learning

!  Predict reward: value function ! V(s) = E[ r(t) + γr(t+1) + γ2r(t+2)…| s(t)=s] ! Q(s,a) = E[ r(t) + γr(t+1) + γ2r(t+2)…| s(t)=s, a(t)=a]

!  Select action !  greedy: a = argmax Q(s,a) !  Boltzmann: P(a|s) ∝ exp[ β Q(s,a)]

!  Update prediction: TD error*! δ(t) = r(t) + γV(s(t+1)) - V(s(t)) ! ΔV(s(t)) = α δ(t) ! ΔQ(s(t),a(t)) = α δ(t)

How to implement these steps?

How to tune these parameters?

��

�� r

��

�� V

�� r

��

�� V

�� r

��

�� V

Dopamine Neurons Code TD Error δ(t) = r(t) + γV(s(t+1)) - V(s(t))

unpredicted

predicted

omitted

(Schultz et al. 1997)

W. SCHULTZ4

fails to occur, even in the absence of an immediately preced-ing stimulus (Fig. 2, bottom) . This is observed when animalsfail to obtain reward because of erroneous behavior, whenliquid flow is stopped by the experimenter despite correctbehavior, or when a valve opens audibly without deliveringliquid (Hollerman and Schultz 1996; Ljungberg et al. 1991;Schultz et al. 1993). When reward delivery is delayed for0.5 or 1.0 s, a depression of neuronal activity occurs at theregular time of the reward, and an activation follows thereward at the new time (Hollerman and Schultz 1996). Bothresponses occur only during a few repetitions until the newtime of reward delivery becomes predicted again. By con-trast, delivering reward earlier than habitual results in anactivation at the new time of reward but fails to induce adepression at the habitual time. This suggests that unusuallyearly reward delivery cancels the reward prediction for thehabitual time. Thus dopamine neurons monitor both the oc-currence and the time of reward. In the absence of stimuliimmediately preceding the omitted reward, the depressionsdo not constitute a simple neuronal response but reflect anexpectation process based on an internal clock tracking theprecise time of predicted reward.

Activation by conditioned, reward-predicting stimuliAbout 55–70% of dopamine neurons are activated by

conditioned visual and auditory stimuli in the various classi-cally or instrumentally conditioned tasks described earlier(Fig. 2, middle and bottom) (Hollerman and Schultz 1996;Ljungberg et al. 1991, 1992; Mirenowicz and Schultz 1994;Schultz 1986; Schultz and Romo 1990; P. Waelti, J. Mire-nowicz, and W. Schultz, unpublished data) . The first dopa-mine responses to conditioned light were reported by Milleret al. (1981) in rats treated with haloperidol, which increasedthe incidence and spontaneous activity of dopamine neuronsbut resulted in more sustained responses than in undruggedanimals. Although responses occur close to behavioral reac-tions (Nishino et al. 1987), they are unrelated to arm andeye movements themselves, as they occur also ipsilateral toFIG. 2. Dopamine neurons report rewards according to an error in re-

ward prediction. Top : drop of liquid occurs although no reward is predicted the moving arm and in trials without arm or eye movementsat this time. Occurrence of reward thus constitutes a positive error in the (Schultz and Romo 1990). Conditioned stimuli are some-prediction of reward. Dopamine neuron is activated by the unpredicted what less effective than primary rewards in terms of responseoccurrence of the liquid. Middle : conditioned stimulus predicts a reward,

magnitude and fractions of neurons activated. Dopamineand the reward occurs according to the prediction, hence no error in theprediction of reward. Dopamine neuron fails to be activated by the predicted neurons respond only to the onset of conditioned stimuli andreward (right) . It also shows an activation after the reward-predicting stim- not to their offset, even if stimulus offset predicts the rewardulus, which occurs irrespective of an error in the prediction of the later (Schultz and Romo 1990). Dopamine neurons do not distin-reward ( left ) . Bottom : conditioned stimulus predicts a reward, but the re- guish between visual and auditory modalities of conditionedward fails to occur because of lack of reaction by the animal. Activity of

appetitive stimuli. However, they discriminate between ap-the dopamine neuron is depressed exactly at the time when the rewardwould have occurred. Note the depression occurring ú1 s after the condi- petitive and neutral or aversive stimuli as long as they aretioned stimulus without any intervening stimuli, revealing an internal pro- physically sufficiently dissimilar (Ljungberg et al. 1992;cess of reward expectation. Neuronal activity in the 3 graphs follows the P. Waelti, J. Mirenowicz, and W. Schultz, unpublishedequation: dopamine response (Reward) Å reward occurred 0 reward pre-

data) . Only 11% of dopamine neurons, most of them withdicted. CS, conditioned stimulus; R, primary reward. Reprinted fromSchultz et al. (1997) with permission by American Association for the appetitive responses, show the typical phasic activations alsoAdvancement of Science. in response to conditioned aversive visual or auditory stimuli

in active avoidance tasks in which animals release a key toavoid an air puff or a drop of hypertonic saline (Mirenowicztogether, the occurrence of reward, including its time, mustand Schultz 1996), although such avoidance may be viewedbe unpredicted to activate dopamine neurons.as ‘‘rewarding.’’ These few activations are not sufficientlystrong to induce an average population response. Thus theDepression by omission of predicted rewardphasic responses of dopamine neurons preferentially reportenvironmental stimuli with appetitive motivational value butDopamine neurons are depressed exactly at the time of

the usual occurrence of reward when a fully predicted reward without discriminating between different sensory modalities.

J857-7/ 9k2a$$jy19 06-22-98 13:43:40 neupa LP-Neurophys

Basal Ganglia for Reinforcement Learning? (Doya 2000, 2007)

Cerebral cortex!state/action coding!

Striatum!reward prediction!

Pallidum!action selection!

Dopamine neurons!TD signal!

Thalamus!

δ �

V(s) � Q(s,a) �

state� action �

Monkey Free Choice Task (Samejima et al., 2005)

10 50 90

90 90-50

50-10 50-90 50-50

P( reward | Right) = QR

Dissociation of action and reward !

Action Value Coding in Striatum (Samejima et al., 2005)

QL! QR!

QL! QR!-1 0 sec!

!  QL neuron

!  -QR neuron

Forced and Free Choice Task Makoto Ito

Center

Cue tone

0.5-1s 1-2s

Rwd tone

No-rwd

Pellet

poking Center! Right!Left!

pellet dish!

Cue tone Reward prob. (L, R)

Left tone (900Hz)

Fixed (50%,0%)

Right tone (6500Hz)

Fixed (0%, 50%)

Free-choice tone

(White noise)

Varied (90%, 50%) (50%, 90%) (50%, 10%) (10%, 50%)

Time Course of Choice

� ��

��

Trials

10-50 10-50 10-50 50-10 90-50 50-90 90-50 50-10 50-90 50-10

Tone B

Left for tone A

Right for tone A

0.9 0.5 0.1 0.1

0.9 P(r|a=L)

P(r|a=R)

Generalized Q-learning Model (Ito & Doya, 2009) �

!  Action selection P(a(t)=L) = expQL(t)/(expQL(t)+expQR(t))

!  Action value update: i�{L,R} Qi(t+1) = (1-α1)Qi(t) + α1κ1 if a(t)=i, r(t)=1

(1-α1)Qi(t) - α1κ2 if a(t)=i, r(t)=0 (1-α2)Qi(t) if a(t)≠i, r(t)=1 (1-α2)Qi(t) if a(t)≠i, r(t)=0

!  Parameters !   α1: learning rate !   α2: forgetting rate !   κ1: reward reinforcement !   κ2: no-reward aversion

��

Left, reward!Left, no-reward!

Right, reward!Right, no-reward!

QL!QR!

(90 50)! (50 90)! (50 10)!Model Fitting by Particle Filter�

Trials!

Model Fitting�

!  Generalized Q learning !  α1: learning

!  α2: forgetting !  κ1: reinforcement

!  κ2: aversion

!  standard: α2=κ2=0 !  forgetting: κ2=0 ��

��

1st Markov model(4)!2nd Markov model(16)!3rd Markov model(64)!

4th Markov model(256)!

standard Q (const)(2)!F-Q (const)(3)!

DF-Q (const)(4)!local matching law(1)!

standard Q (variable)(2)!F-Q (variable)(2)!

DF-Q (variable)(2)!

normalized likelihood!

Neural Activity in the Striatum

!  Dorsolateral

!  Dorsomedial

!  Ventral

Action (out of center hole)! Reward (into choice hole)!

poking C!Tone!poking L!poking R!pellet dish!

DL(122)!DM(56)!NA(59)!

Information of Action and Reward

Action value coded by a DLS neuron Firing rate during tone presentation (blue in left panel)!

Action value for left estimated by FQ-learning!

action value for left estimated by FQ-learning!Firin

e)! Trials!

Trials!

State value coded by a VS neuron Firing rate during tone presentation (blue in left panel)!

Action value for left estimated by FQ-learning!

action value for left estimated by FQ-learning!Firin

Hierarchy in Cortico-Striatal Network

!  Dorsolateral striatum – motor !  early action coding

!  what action to take?

! Dorsomedial striatum - frontal

!  action value

!   in what context?

!  Ventral striatum - limbic

!  state value

!  whether worth doing?�(Voorn et al., 2004)!

thalamus�

Cortex�

Basal�Ganglia�

Cerebellum�

target�

error �+�

output�input�

Cerebellum: Supervised Learning!

reward�

output�input�

Basal Ganglia: Reinforcement Learning!

Cerebral Cortex：Unsupervised Learning!

output�input �

Specialization by Learning Algorithms (Doya, 1999) �

Multiple Action Selection Schemes !  Model-free !  a = argmaxa Q(s,a)

!  Model-based

!  a = argmaxa [r+V(f(s,a))]

forward model: f(s,a)

!  Lookup table

!  a = g(s)

Q s’ a

‘Grid Sailing’ Task Alan Fermin�

!  Move a cursor to the goal !  100 points for shortest path

!   -5 points per excess steps

! Keymap

!  only 3 directions

!  non-trivial path planning

!  Immediate or delayed start

!  4 to 6 sec for planning

!   timeout in 6 sec �

Task Conditions�

!"#$%&'()*$+,-./&'0)1234*!(

!"#$%&'(")*%!+,(,'-.$'(./"&0-$""'%!/'(./"&01%,"/'

,2$%2"3+",'+!'*#(%!,'4"$-.$(+!3'%'(#&2+0,2"4'!%5+3%2+.!'2%,6%789'-:;<=9>?@?'28A:B=AC DCEB=F8>?@?'(8ACGC'+GC@?'HI9=JB=;C DCEB=<CGC>?@?'6:9K='/CL8>?@?M

>!8;8'+9EG=GIG:'CN',J=:9J:'89F'2:JB9C7COL?'@.A=98P8'+9EG=GIG:'CN',J=:9J:'89F'2:JB9C7COL?'M%2$')C<QIG8G=C987'!:I;CEJ=:9J:'&8RC;8GC;=:E

56789:);<8=9)>?):<;<@>)8@>A?9:)B=?7):@=8>@CD)EF)>=A8;489G4<==?=D)?=)EF)6:A9H)

I9?J;<GH<)B=?7)K8:>)<LK<=A<9@<:M)N:);<8=9A9H)H?<:)?9D)8@>A?9):<;<@>A?9)78F)

=<;F)?9)K=<GA@>AO<)7<@C89A:7:D):6@C)8:)=<>=A<O8;)?B)7?>?=)7<7?=A<:)B=?7)J<;;)

<:>8E;A:C<G):<9:?=A7?>?= 78K:D)?=)@?9:>=6@>A?9)89G)6:<)?B)A9><=98;)7?G<;:)A9)

9<J;F)<9@?69><=<G)<9OA=?97<9>:M)"<A9B?=@<7<9>)P<8=9A9H)Q"PRD)8)

@?7K6>8>A?98;)>C<?=F)?B)8G8K>AO<)?K>A78;)@?9>=?;D):6HH<:>:)>J?)8@>A?9)

:<;<@>A?9)7<>C?G:)>C8>)=<:<7E;<)=<8;)C6789)E<C8OA?=S)2?G<;4T=<<)Q2TR)

7<>C?G)6:<:)8@>A?9)O8;6<)B69@>A?9:)>?)K=<GA@>)B6>6=<)=<J8=G:)E8:<G)?9)@6==<9>)

:>8><:)89G)8O8A;8E;<)8@>A?9:D)89G)2?G<;4+8:<G)Q2+R)7<>C?G)6:<:)8)B?=J8=G)

7?G<;)>?)K=<GA@>)>C<)B6>6=<):>8><:)=<8@C<G)EF)CFK?>C<>A@8;)8@>A?9:M)"<@<9>)

@?7K6>8>A?98;)7?G<;:)QU?F8D)0VVVD)U8J <>)8;D)(''WR)C8O<)CFK?>C<:AX<G)>C8>)

2+)89G)2T)8=<)A7K;<7<9><G)A9)>C<)C6789)E=8A9)>C=?6HC)@?=>A@?4E8:8;)H89H;A8)

;??K:D)A9@;6GA9H)>C<)K=<B=?9>8;)@?=><L)89G)G?=:8;):>=A8>67M)/?)><:>)JC<>C<=)89G)

C?J)C6789:)6>A;AX<)2+)89G)2T):>=8><HA<:D)J<)K<=B?=7<G)89)B2"Y <LK<=A7<9>)

6:A9H)8)ZH=AG):8A;A9H)>8:I[M

+!2$./#)2+.!

("2*./,*6E\<@>:S)0])Q()B<78;<R)C<8;>CFD)9?9476:A@A89D)=AHC>4C89G<GD)8H<)(W!3F?M))

/=8A9A9H)Q3)E;?@I:D)(^')>=A8;:R)89G)/<:>)Q()E;?@I:D)0]')>=A8;:R):<::A?9:)J<=<)

:<K8=8><G)EF)8)9AHC>):;<<K)A9><=O8;M)*6E\<@>:)J<=<)=89G?7;F)8::AH9<G)>?)?9<)?B)

>C=<<)H=?6K:)JA>C)>C<):87<)>8:I)@?9GA>A?9:M)*6E\<@>:)C8G)>?)7?O<)8)@6=:?=)

Q>=A89H;<R)B=?7)>C<):>8=>)Q=<GR)>?)>C<)>8=H<>)H?8;)QE;6<R)EF):<_6<9>A8;;F)K=<::A9H)

>C=<<)I<F:M)Y9:>=6@>A?9)J8:)>?);<8=9)8):A9H;<)?K>A78;)8@>A?9):<_6<9@<)Q:C?=><:>)

K8>CJ8FR)B?=)<8@C)?B)>C<)124*!)K8A=:D)JCA@C)J<=<)@?9:>89>)>C=?6HC?6>)>C<)

<LK<=A7<9>M)/C<):>8=>)K?:A>A?9)@?;?=):K<@ABA<G)JC<>C<=)>?):>8=>)8)=<:K?9:<)

A77<GA8><;F)?=)8B><=)8)G<;8F)Q`:<@RM

INDEX MIDDLE RING

Buttons & Fingers

Key-Maps

Start-Goal Positions

SG1 SG2 SG3

SG4 SG5

SG2SG4

KM1 SG1

KM2 SG2

KM3 SG3

Training Session Test Session

KM-SG combinations

Cond 2: Learned Key-Maps

Cond 3: Learned KM-SG

Test Conditions

Cond 1: New Key-Map

ITI DELAY

Goal Reach

3~5 sec 4~6 sec 6 sec

Reward

IMMEDIATE

No Reach

Training Schedule Test Schedule

TB1 TB2

=<<:F=8G: F:78L:F

Random Random

4 trials

240 trials

4 trials=<<:F=8G: F:78L:F

Random Random

4 trials 4 trials=<<:F=8G: F:78L:F

Random Random

9 trials 9 trials

TB1 TB2 180 trials

Sequence of Trial Events

Time-over: 0 Fail: -5 Optimal: 100

Group 1

(6 subj.)

SG2SG5

Group 2

(6 subj.)

SG1SG5

Group 3

(6 subj.)

KM1 SG1

KM2 SG2

KM3 SG3

KM1 SG1

KM2 SG2

KM3 SG3

KM1 KM2 KM3

MODEL-FREE MODEL-BASED

SEQUENCE MEMORY

(Habit)

Q-LEARNING

KNOWLEDGEQ-a%P#Y/N/Y#bR

LEARNED

POLICY

RANDOMQ-a%P#"N/Y#bR

EXPECTEDPERFORMANCE

LEARNING STAGES

Trial-and-error learning for key-map and action sequence

Forward model for planning future actions

Sequential motor memory for fast action selection

COMPUTATIONAL STRATEGIES

TASK CONDITIONS

Condition 2

Known Key-Map

Condition 1New Key-Map

Condition 3

Learned KM-SG

KM2 SG2

KM1 SG2

KM1 SG3

KM2 SG1

KM2 SG3

KM3 SG1

KM3 SG2

KM3 SG3

HIGH PERFORMANCE

BOOSTED LEARNING

SLOW LEARNING

>A7< QtR

PLANNING

Q(s,a)

P(s’,r’|s,a)

P(a|s)

$",#&2,'0 2$%+!+!3

"-*$P/*)4 /-*/N* "-*$P/*)4 /-*/N*

Immediate Delay Immediate Delay Immediate Delay0

rial T

ibution

Block 1 Block 2 Block 3

Optimal

Suboptimal

A-2A-1

A-1A-3

0200400600800

10001200140016001800200022002400

TrialAction Sequence

Reaction T

051015

A-2A-1

A-1A-3

0200400600800

10001200140016001800200022002400

Reaction T

A-2 A-1 A-1 A-3 A-30

Reaction T

Action Sequence

).!/+2+.!'> ).!/+2+.!'@

N($+ W /"&%D'4"$+./'%)2+5+2D).!/+2+.!'>'X').!/+2+.!'M ).!/+2+.!'@'X').!/+2+.!'M

).!/+2+.!'M'X'

Y).!/+2+.!'>?').!/+2+.!'@Z

UP%T.Q+N^`R UP%T.Q+N^`R

c<9>=8;)%2.Q+N^^d^WR

P8><=8;)%2.

N9><=A?=)+8:8;)!89H;A8

%?:><=A?=)+8:8;)!89H;A8%=?K<=4*2N

%?:><=A?=).<=<E<;;67

N9><=A?=).<=<E<;;67

*?78>?:<9:?=F .?=><L

*M)%8=A<>8;).?=><L

c<9>=8;)%2.Q+N^^d^WR

P8><=8;)%2.

%=?K<=4*2N

%=<4*2N%=<4*2N

*M)%8=A<>8;).?=><L

*?78>?:<9:?=F .?=><L

N9><=A?=)+8:8;)!89H;A8

%?:><=A?=)+8:8;)!89H;A8

N9><=A?=).<=<E<;;67

%?:><=A?=).<=<E<;;67

[\0>>?'L\]?'^\0_

)C9F=G=C9'@'`R7I:a

)C9F=G=C9'>'`;:Fa

$"3+.!'.-'+!2"$",2'%!%&D,+,

&"-2'*"(+,4*"$" $+3*2'*"(+,4*"$" &"-2'*"(+,4*"$" $+3*2'*"(+,4*"$" &"-2'*"(+,4*"$" $+3*2'*"(+,4*"$"

BGGQbccPPPd9Jd=;QdC=EGdKQc

e<)@?9BA=7<G)>C<);<8=9A9H)=<;8><G)A7K=?O<7<9>:)89G)

B?=78>A?9)?B)C8EA>):<_6<9@<:)EF):AH9ABA@89>)A7K=?O<7<9>:)

A9)8;;)K<=B?=789@<)7<8:6=<:D)JA>C)89)A9@=<8:<)A9)=<J8=G)

8@_6A:A>A?9)89G)8)G<@=<8:<)A9)>C<)967E<=)?B)7?O<:)>?)

=<8@C)>C<)>8=H<>)H?8;D)8:)J<;;)8:)A9)>C<)>A7<)>?):>8=>)89G)

BA9A:C)89)8@>A?9):<_6<9@<M)/C<):>8E;<)K<=B?=789@<)J8:)

8;:?)?E:<=O<G)A9)>C<):C?=>)K=8@>A@<)E<B?=<)>C<)B2"Y ><:>D)

89G)8B><=)?9<)9AHC>):;<<KM

).!/+2+.!'MN):<=A<:)?B)34J8F)Nb#cN):C?J<G)8)

:AH9ABA@89>)<BB<@>)?B)*>8=>)/A7<D)/8:I)

.?9GA>A?9:D)89G)/<:>)+;?I: QKf'M''0R

N)76;>A@?7K8=A:?9 898;F:A:):C?J<G)

>C8>).?9GA>A?9)0)C8G)>C<);?J<:>)

K<=B?=789@<D).?9GA>A?9)()J8:)

A9><=7<GA8><D)89G).?9GA>A?9)3)J8:)

>C<)E<:>M

N;:?D).?9GA>A?9)():C?J<G):AH9ABA@89>)

A9)=<J8=G)8@_6A:A>A?9)8B><=)>C<)G<;8F)

K<=A?GD)JCA@C)7AHC>)A9GA@8><)8)

E<9<BA@A8;)<BB<@>)?B)>C<)G<;8F)K<=A?G)

?9)>C<)A7K;<7<9>8>A?9)?B)8)B?=J8=G)

7?G<;)7<@C89A:7)B?=)8@>A?9)

:<;<@>A?9M

%<=B?=789@<)A9)).?9GA>A?9)0)

=<78A9<G)@?9:>89>)89G)CAHC)

A==<:K<@>AO<)?B)>C<):>8=>)>A7<)

@?9GA>A?9

).!/+2+.!'> ).!/+2+.!'@ ).!/+2+.!'M

).!)&#,+.!'4 e<)C8O<)G<O<;?K<G)8)ZH=AG4:8A;A9H)>8:I[)>?)><:>)C?J)C6789:)6>A;AX<)GABB<=<9>):>=8><HA<:)B?=)>C<);<8=9A9H)?B)8@>A?9)

:<;<@>A?9M))UA:>A9@>)K<=B?=789@<)K=?BA;<:)B?=)<8@C)>8:I)@?9GA>A?9)J<=<)B?69GD)89G)<8@C)8;:?)=<_6A=<G):K<@ABA@)9<6=8;)9<>J?=I: B?=)

:?;OA9H)>C<)>8:IM)#6=)9<L>):><K)A:)>?)K<=B?=7)8)=<A9B?=@<7<9>);<8=9A9H)7?G<;4E8:<G)898;F:A:)>?):<<I)JC<>C<=):6E\<@>:g)E<C8OA?=)

69G<=)<8@C)>8:I)@?9GA>A?9)BA>:)J<;;)>?):K<@ABA@)8@>A?9):<;<@>A?9):>=8><HA<:)Q2TD)2+R)89G)>C<)E=8A9)8=<8:)A7K;<7<9>A9H)>C<7M

$",#&2,'0 2",2

/8:I)@?9GA>A?9:)GABB<=<9>A8;;F)

7?G6;8><G)>C<)+#PU):AH98;)G6=A9H)

>C<)G<;8F)K<=A?GM).?9GA>A?9)()

:C?J<G):6:>8A9<G)89G):>=?9H<=)

8@>AO8>A?9)A9)UP%T.D)c%2.D)P%2.D)

89><=A?=)+!)89G)K?:><=A?=).+M))

.?9GA>A?9)3)8@>AO8><G)7?=<)7?>?=)

=<;8><G):>=6@>6=<:D)89G).?9GA>A?9)0)

:C?J<G)>C<);?J<:>D)E6>)9<8=)>?)

E8:<;A9<)8@>AO8>A?9)K8>><=9:M