Reversible online control of habitual behavior by ... · Reversible online control of habitual...

6
Reversible online control of habitual behavior by optogenetic perturbation of medial prefrontal cortex Kyle S. Smith a,1 , Arti Virkud a , Karl Deisseroth b , and Ann M. Graybiel a,1 a McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139; and b Department of Bioengineering, Stanford University School of Medicine, Stanford, CA 94305 Contributed by Ann M. Graybiel, September 18, 2012 (sent for review August 20, 2012) Habits tend to form slowly but, once formed, can have great stability. We probed these temporal characteristics of habitual behaviors by intervening optogenetically in forebrain habit cir- cuits as rats performed well-ingrained habitual runs in a T-maze. We trained rats to perform a maze habit, conrmed the habitual behavior by devaluation tests, and then, during the maze runs (ca. 3 s), we disrupted population activity in a small region in the medial prefrontal cortex, the infralimbic cortex. In accordance with evidence that this region is necessary for the expression of habits, we found that this cortical disruption blocked habitual behavior. Notably, however, this blockade of habitual performance occurred on line, within an average of three trials (ca. 9 s of inhibition), and as soon as during the rst trial (<3 s). During subsequent weeks of training, the rats acquired a new behavioral pattern. When we again imposed the same cortical perturbation, the rats regained the suppressed maze-running that typied the original habit, and, simultaneously, the more recently acquired habit was blocked. These online changes occurred within an average of two trials (ca. 6 s of infralimbic inhibition). Measured changes in generalized performance ability and motivation to consume reward were un- affected. This immediate toggling between breaking old habits and returning to them demonstrates that even semiautomatic behaviors are under cortical control and that this control occurs online, second by second. These temporal characteristics dene a framework for uncovering cellular transitions between xed and ex- ible behaviors, and corresponding disturbances in pathologies. learning | limbic | time | reinforcement | neuroplasticity H abits are among the most stable and powerful behaviors that we have. Forming such strongly ingrained behaviors requires that a durable representation of the movement repertoire be acquired. Much evidence suggests that this process involves a gradual transition from exible and goal-directed behavior to a more xed, habitual behavioral strategy (17). How these properties of habits map onto neural circuitry has been the focus of classic lesion and chemical inactivation studies, which have identied regions of the striatum as essential for the expression of habits (24). In addition, such studies have established that a region in the medial prefrontal cortex also must be intact for habits to be expressed (79). This medial prefrontal region [called infralimbic (IL) cortex in rodents] is linked to emotion-related limbic circuitry and projects to sites that promote behavioral exibility at the expense of habits (e.g., prelimbic cortex and medial striatum) (3, 4, 7, 10). Based on this anatomy, the IL cortex is thought to be at an executive level in the control of habits and behavioral strategies (1, 8, 9, 11). This sketch of the circuitry for habits and skilled habit-like behaviors opens key questions about how habitual behaviors are controlled on a moment-by-moment basis. The slow emergence of habits favors a gradual biasing of these behaviors toward au- tomaticity, but the stability of habits favors their being performed without moment-to-moment biasing or executive cortical con- trol. We tested these dynamics of habit control by the prefrontal IL region by targeting precisely timed optogenetic inhibition to the IL cortex with light to drive virally introduced halorhodopsin (eNpHR3.0) (12). This tool allowed us to examine not only the online contribution of cortical activity to the control of habitual behavior, but also, because of its unique repeatability, to track the effects of such brief, seconds-long manipulations over weeks of subsequent performance. To our surprise, we found that de- spite the automaticity of habitual behavior, it is subject to online control by the prefrontal cortex. Results Overtrained T-Maze Running Is Habitual. We used a navigational T- maze task (Fig. 1A) similar to one developed to examine neural activity in the striatum during habit formation (1315). We tracked the learning curves of multiple sets of rat subjects (n = 10). In daily sessions of ca. 40 trials, the rats were required to initiate maze runs in response to a warning cue, run down the maze, and turn right or left, depending on an auditory instruction cue, to receive one of two rewards (chocolate milk or sucrose solution). For each rat, each reward type was assigned to one end arm, and entry into an incorrect arm resulted in no reward. Rats were trained to a criterion of statistically signicant performance accuracy (72.5% correct; criterion training stage 5; Fig. 1B) and then were overtrained for 10 or more additional sessions. Peak performance accuracy was high at 90% correct (Fig. 1B). We applied a reward-devaluation protocol as a quantitative test for habit formation after this extensive training (16). We devalued one of the two maze rewards in home-cage sessions and then tested in the maze experiments for habitual running to the end arm baited with the now-devalued reward, relative to runs to the normally valued reward in the other end arm. To induce devaluation, we paired one reward with a nauseogenic dose of lithium chloride in home-cage sessions (17, 18). In home-cage tests, we conrmed that this method produced an aversion to the devalued reward (Fig. 1C). After devaluation, the rats were placed in the maze to run the task again, but the end arms were not baited. This procedure allowed us to determine whether the animals would still run when instructed to the side with the now- devalued reward, suggesting that this running was habit-driven. The behavior of the control rats (n = 4) established that overtrained T-maze running was normally habitual. In the probe test after reward devaluation, these rats continued running to the devalued goal when so instructed, just as they had in the session before devaluation (Fig. 1D). Their runs to the nondevalued goal were unchanged and accurate as well (Fig. 1D). This insensitivity to reward devaluation conrmed that the training procedures produced ingrained, habitual maze runs. To test for the effects of IL cortical activity on this habitual behavior, we used the strategy of perturbing the IL cortex during the unrewarded probe session. We applied the intervention ex- clusively during the performance of the runs to ensure that it was Author contributions: K.S.S. and A.M.G. designed research; K.S.S. and A.V. performed research; K.D. contributed new reagents/analytic tools; K.S.S. and A.M.G. analyzed data; and K.S.S., K.D., and A.M.G. wrote the paper. The authors declare no conict of interest. 1 To whom correspondence may be addressed. E-mail: [email protected] or graybiel@mit. edu. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1216264109/-/DCSupplemental. 1893218937 | PNAS | November 13, 2012 | vol. 109 | no. 46 www.pnas.org/cgi/doi/10.1073/pnas.1216264109 Downloaded by guest on January 26, 2021

Transcript of Reversible online control of habitual behavior by ... · Reversible online control of habitual...

Page 1: Reversible online control of habitual behavior by ... · Reversible online control of habitual behavior by optogenetic perturbation of medial prefrontal cortex Kyle S. Smitha,1, Arti

Reversible online control of habitual behavior byoptogenetic perturbation of medial prefrontal cortexKyle S. Smitha,1, Arti Virkuda, Karl Deisserothb, and Ann M. Graybiela,1

aMcGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139;and bDepartment of Bioengineering, Stanford University School of Medicine, Stanford, CA 94305

Contributed by Ann M. Graybiel, September 18, 2012 (sent for review August 20, 2012)

Habits tend to form slowly but, once formed, can have greatstability. We probed these temporal characteristics of habitualbehaviors by intervening optogenetically in forebrain habit cir-cuits as rats performed well-ingrained habitual runs in a T-maze.We trained rats to perform a maze habit, confirmed the habitualbehavior by devaluation tests, and then, during the maze runs(ca. 3 s), we disrupted population activity in a small region in themedial prefrontal cortex, the infralimbic cortex. In accordance withevidence that this region is necessary for the expression of habits,we found that this cortical disruption blocked habitual behavior.Notably, however, this blockade of habitual performance occurredon line, within an average of three trials (ca. 9 s of inhibition), andas soon as during the first trial (<3 s). During subsequent weeks oftraining, the rats acquired a new behavioral pattern. When weagain imposed the same cortical perturbation, the rats regainedthe suppressed maze-running that typified the original habit, and,simultaneously, the more recently acquired habit was blocked.These online changes occurred within an average of two trials(ca. 6 s of infralimbic inhibition). Measured changes in generalizedperformance ability and motivation to consume reward were un-affected. This immediate toggling between breaking old habitsand returning to them demonstrates that even semiautomaticbehaviors are under cortical control and that this control occursonline, second by second. These temporal characteristics define aframework for uncovering cellular transitions between fixed and flex-ible behaviors, and corresponding disturbances in pathologies.

learning | limbic | time | reinforcement | neuroplasticity

Habits are among the most stable and powerful behaviors thatwe have. Forming such strongly ingrained behaviors requires

that a durable representation of the movement repertoire beacquired. Much evidence suggests that this process involvesa gradual transition from flexible and goal-directed behavior toa more fixed, habitual behavioral strategy (1–7). How theseproperties of habits map onto neural circuitry has been the focusof classic lesion and chemical inactivation studies, which haveidentified regions of the striatum as essential for the expressionof habits (2–4). In addition, such studies have established thata region in the medial prefrontal cortex also must be intact forhabits to be expressed (7–9). This medial prefrontal region [calledinfralimbic (IL) cortex in rodents] is linked to emotion-relatedlimbic circuitry and projects to sites that promote behavioralflexibility at the expense of habits (e.g., prelimbic cortex andmedial striatum) (3, 4, 7, 10). Based on this anatomy, the ILcortex is thought to be at an executive level in the control ofhabits and behavioral strategies (1, 8, 9, 11).This sketch of the circuitry for habits and skilled habit-like

behaviors opens key questions about how habitual behaviors arecontrolled on a moment-by-moment basis. The slow emergenceof habits favors a gradual biasing of these behaviors toward au-tomaticity, but the stability of habits favors their being performedwithout moment-to-moment biasing or executive cortical con-trol. We tested these dynamics of habit control by the prefrontalIL region by targeting precisely timed optogenetic inhibition tothe IL cortex with light to drive virally introduced halorhodopsin(eNpHR3.0) (12). This tool allowed us to examine not only the

online contribution of cortical activity to the control of habitualbehavior, but also, because of its unique repeatability, to trackthe effects of such brief, seconds-long manipulations over weeksof subsequent performance. To our surprise, we found that de-spite the automaticity of habitual behavior, it is subject to onlinecontrol by the prefrontal cortex.

ResultsOvertrained T-Maze Running Is Habitual. We used a navigational T-maze task (Fig. 1A) similar to one developed to examine neuralactivity in the striatum during habit formation (13–15). Wetracked the learning curves of multiple sets of rat subjects (n =10). In daily sessions of ca. 40 trials, the rats were required toinitiate maze runs in response to a warning cue, run down themaze, and turn right or left, depending on an auditory instructioncue, to receive one of two rewards (chocolate milk or sucrosesolution). For each rat, each reward type was assigned to one endarm, and entry into an incorrect arm resulted in no reward. Ratswere trained to a criterion of statistically significant performanceaccuracy (72.5% correct; criterion training stage 5; Fig. 1B) andthen were overtrained for 10 or more additional sessions. Peakperformance accuracy was high at ∼90% correct (Fig. 1B).We applied a reward-devaluation protocol as a quantitative

test for habit formation after this extensive training (16). Wedevalued one of the two maze rewards in home-cage sessions andthen tested in the maze experiments for habitual running to theend arm baited with the now-devalued reward, relative to runs tothe normally valued reward in the other end arm. To inducedevaluation, we paired one reward with a nauseogenic dose oflithium chloride in home-cage sessions (17, 18). In home-cagetests, we confirmed that this method produced an aversion to thedevalued reward (Fig. 1C). After devaluation, the rats wereplaced in the maze to run the task again, but the end arms werenot baited. This procedure allowed us to determine whether theanimals would still run when instructed to the side with the now-devalued reward, suggesting that this running was habit-driven.The behavior of the control rats (n = 4) established that

overtrained T-maze running was normally habitual. In the probetest after reward devaluation, these rats continued running to thedevalued goal when so instructed, just as they had in the sessionbefore devaluation (Fig. 1D). Their runs to the nondevalued goalwere unchanged and accurate as well (Fig. 1D). This insensitivityto reward devaluation confirmed that the training proceduresproduced ingrained, habitual maze runs.To test for the effects of IL cortical activity on this habitual

behavior, we used the strategy of perturbing the IL cortex duringthe unrewarded probe session. We applied the intervention ex-clusively during the performance of the runs to ensure that it was

Author contributions: K.S.S. and A.M.G. designed research; K.S.S. and A.V. performedresearch; K.D. contributed new reagents/analytic tools; K.S.S. and A.M.G. analyzed data;and K.S.S., K.D., and A.M.G. wrote the paper.

The authors declare no conflict of interest.1To whom correspondence may be addressed. E-mail: [email protected] or [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1216264109/-/DCSupplemental.

18932–18937 | PNAS | November 13, 2012 | vol. 109 | no. 46 www.pnas.org/cgi/doi/10.1073/pnas.1216264109

Dow

nloa

ded

by g

uest

on

Janu

ary

26, 2

021

Page 2: Reversible online control of habitual behavior by ... · Reversible online control of habitual behavior by optogenetic perturbation of medial prefrontal cortex Kyle S. Smitha,1, Arti

online. We did not include either the prerun periods, the periodsduring reward delivery, or intertrial intervals. In the experi-mental rats (n = 6), an AAV-5 viral vector encoding eNpHR3.0fused to EYFP, and targeted preferentially to glutamatergicpyramidal neurons through the calcium/calmodulin-dependentprotein kinase IIα promoter (AAV5-CaMKIIα-eNpHR3.0-EYFP) (12), was injected bilaterally into the IL cortex 1–3 mobefore behavioral training (Fig. 2A). Anterograde viral labelingwas later observed in the medial striatum but not in the senso-rimotor striatum, confirming a lack of a direct corticostriatalprojection from the IL cortex to the dorsolateral region of thestriatum known to also be essential for habit expression (Fig. 2B)(3, 10). Two of the control rats were given laser exposure fol-lowing injections of control virus lacking halorhodopsin (AAV5-CaMKIIα-EYFP). The other two were injected with hal-orhodopsin-containing virus, but the laser was inactive.

Optical Stimulation of Halorhodopsin-Expressing IL Neurons InhibitsSpike Activity. IL manipulation was accomplished by delivery of593.5-nm-wavelength laser light through optical fibers implantedbilaterally so as to extend to the dorsal edge of the targeted ILcortex (Fig. 2A). Partial inhibition of IL activity was verified intwo behaving rats fitted with a head stage carrying tetrodes andoptical fibers by applying equivalent illumination parameters tothose that would be used in the maze task (Fig. 2 C–G). Neu-ronal spike activity was suppressed by 56% on average duringillumination in 11 units, and the suppression was consistent over40+ trials. This protocol resulted in a more temperate, as well astemporally specific, suppression of spike activity than the fullblockade of activity produced, presumably, by lesions or druginfusions. In six units, we observed increased firing rates duringlight delivery, and these units were often recorded on the sametetrode as others showing inhibition (Fig. 2F). Thus, the opto-genetic inhibition targeted to pyramidal cells influenced local

microcircuitry (19, 20), yielding a time-locked disruption ofpopulation spike activity (inhibition:excitation ratio, 1.8:1).

IL Perturbation Blocks Habits Online.We began by disrupting the ILcortex online while the rats performed the postdevaluation probetest (Fig. 3). Light-on occurred immediately after gate-opening,and light-off occurred immediately after goal-reaching (ca. 3 s oflight per trial; <2 min total per day). This within-run treatmentproduced a dramatic effect: the rats acted as though they had notacquired the habit of running to the devalued goal. As shown inFig. 3D, the rats with IL perturbation sharply reduced their runsto the end arm that would have had the devalued reward—runscharacteristically made by the overtrained control rats (Fig. 3E).Instead, during the intervention, the rats had a propensity to runto the nondevalued end arm (Fig. 4 A and B). Such avoidancebehavior was seldom observed in the control group (Fig. 4B).The devaluation sensitivity induced by IL perturbation was

evident almost immediately (Fig. 4C). It took on average of threelight trials for rats to begin avoiding the devalued goal wheninstructed there. In one animal, the avoidance was present dur-ing the first trial. This onset of habit blockade corresponded toan average within-run illumination time of ca. 7–9 s, with themost immediate effect within 3 s in the single rat. The online ILintervention did not have a generalized effect on the perfor-mance ability of the rats, however, because they ran accuratelyand seemingly automatically with IL illumination when they wereinstructed to run to the nondevalued reward side (Fig. 3D).These experiments suggested that IL cortical activity was re-quired online, during the actual maze runs, in order for the ex-pression of running behavior as a habit.

Replacement Habit Forms with Postdevaluation Training. We nextanalyzed the behavior of the rats when they again received re-ward for correct performance in subsequent days of mazetraining (Figs. 4 and 5). In accordance with classic findings (3, 17,18, 21), all of the rats, including the control animals, avoided thedevalued end arm after being reexposed to the reward that hadbeen devalued (Figs. 4 A and B and 5 A and B), and they almostnever consumed it (Fig. 5C). Thus, the control rats requiredactual contact with the devalued reward on the maze to triggera loss of habitual behavior that rats with a disrupted IL cortexalready exhibited in the unrewarded probe session.Under IL perturbation, when the rats avoided the now-

devalued goal, they almost never stopped the task. Instead, theycontinued to run to the nondevalued end arm (Figs. 4B and 5B).These “wrong-way runs” increased in frequency over days, de-spite the fact that the rats had not been instructed to go to thatend arm and did not ever receive reward for these runs. The ratsshowed no overt anticipatory behavior such as licking or signs ofdistress at the lack of reward. In control rats, the high frequencyof wrong-way runs lasted for as long as we tested (>3 wk).Moreover, these runs also appeared immune to the modest lossof in-maze aversion to the devalued outcome that appeared tooccur during this time, as indicated by a small recovery of in-maze drinking of the devalued reward when the rats did run to it(Fig. 5C). This pattern of behavior suggested that the rats de-veloped a new habit in these postdevaluation days, namely ofalways running to the nondevalued end arm.

IL Perturbation Blocks Replacement Habit and Returns OriginalBehavior. The nearly immediate loss of habitual behavior dur-ing online perturbation of IL activity suggested that switching offIL cortex might be like flipping an off-switch for habitual be-havior, consistent with prior work (8, 9).To evaluate this possibility further, we extended the laser

experiments after the original probe test, gradually increasing thetime for up to a month. The IL cortex was first disrupted duringone to two rewarded sessions immediately after the initial probetest intervention [postprobe stage (PP)1 (first laser-on days afterprobe); Figs. 4 and 5]. This intervention produced no detectable

Devaluation

0

100

Last OTsession

Probesession

NS

111

A

D

NS

NS

Instructed to devalued goalInstructed to non-devalued goal

Acq OT Probe Post-probe

Devaluation

Gate

Warning cue

Instruction cue

0

100B

Per

cent

cor

rect

Per

cent

cor

rect

Training stage

Pre Post

C

0

40

Hom

e-ca

ge in

take

of

deva

lued

rew

ard

(ml)

Fig. 1. T-maze training and habitual behavior of control rats. (A) Protocolfor task training. (B) Performance across training stages for control rats. (C)Home-cage reward drinking before (Pre) and after (Post) devaluation. (D)Performance during the last session before devaluation (Left) and thenduring probe test after (Right), for control rats (solid, cued to devalued goal;dashed, cued to nondevalued goal). NS, not significantly different. *P < 0.05.Data are presented as means ± SEM throughout.

Smith et al. PNAS | November 13, 2012 | vol. 109 | no. 46 | 18933

NEU

ROSC

IENCE

Dow

nloa

ded

by g

uest

on

Janu

ary

26, 2

021

Page 3: Reversible online control of habitual behavior by ... · Reversible online control of habitual behavior by optogenetic perturbation of medial prefrontal cortex Kyle S. Smitha,1, Arti

effect on behavior; both groups of rats equally avoided thedevalued end arm on most trials. Nor did further IL perturbationintroduced at up to 6 d after the initial procedure change thebehavior of the rats (n = 2), which remained stable for up to 14 d[stages PP2–PP5 (laser-off days); Figs. 4–6]. This result suggestedthat once the habitual behavior had been blocked, and the IL-disrupted rats were outcome-guided in behavior like the controlrats, further IL disruption failed to affect maze runs.We obtained sharply different results when we extended the

time before imposing the further perturbation (Figs. 4–6). Whenwe again disrupted the IL cortex online 2–3 wk after the initialdisruption, on days 13 (n = 4), 15 (n = 1), or 21 (n = 1) (stagePP6), the effect was immediate and surprising: the rats nowreadily ran to the devalued side when so instructed (Figs. 4A and5A). They displayed the original behavior that they had hadbefore any IL manipulation and did so within fewer than twolaser trials on average (and on the first trial for three of the rats)(Fig. 4D). Moreover, they drank the devalued reward every timethey ran to it (Fig. 5C). Simultaneously, the rats nearly stoppedperforming wrong-way runs (Fig. 4B). Thus, their behavior be-came similar for runs to the devalued and nondevalued sides(Fig. 5 A–D).This change in behavior was abrupt. These rats, on the day

before the late IL perturbation, ran to the devalued reward whenso cued 6 times on average and drank it 2.17 times on average(∼0.7 mL), at most twice consecutively (Figs. 4D and 6 A and B).Despite having sampled the devalued reward on this prior day, aswell as on days before that (average = 1.88 drinks/day), these ratsdid not return to their original habitual behavior of running tothe devalued reward when cued to do so (Figs. 4D and 6 A and

B). By contrast, on the following day with IL illumination, therats ran to the devalued reward and drank it earlier in the ses-sion, quickly reached an equivalent number as on the prior day,surpassed it by the fifth such trial on average, and then keptgoing: they ran there and drank 14.5 times on average (∼4.4 mL),often in long repeated sequences of consecutive runs and drinks(Fig. 6B). Simultaneously, this late IL perturbation blocked thenew wrong-way running habit that had developed. Both effectsoccurred within seconds, and within few trials, as had the originalhabit blockade. Control rats rarely ran to the devalued reward onthe equivalent test trials (mean runs and drinks, 1.25 times) and,instead, continued to mainly avoid the devalued reward (Figs. 4–6, black lines).This apparent return of the original habitual behavior after the

late IL disruption remained during subsequent laser-off days(stages PP7–PP8, up to 4 test days; Figs. 4 and 5). When we thenadministered another laser session (stage PP9), this treatmentfully returned the original habitual behavior, which remained foras long as we tested (up to 20 d after the third silencing, stagePP10) (Figs. 4 and 5). Over the same length of time, control ratscontinued to avoid the devalued goal and to not consume it(Figs. 4 and 5).Day-by-day analysis of the running patterns suggested that

a tipping point for return of the original learned behavior oc-curred between 6 and 13 d after the initial probe test, followingwhich the effect of IL perturbation changed from blocking theinitial habit to seemingly promoting it (Fig. 6C). This reversal ofthe effect of IL perturbation corresponded to the time when thenew wrong-way runs had been repeated over a number of ses-sions. The coordinate effects on the original and new behavior of

L-on L-off

Firi

ng r

ate

(Hz)

Time (s) Time (s)

Time (s)

-1-3 1 30

2.540

1

Tria

l

TetrodesOpticalfibers

IL IL

A

C

HGF

ED

B

L-on L-off L-on L-off-3 sec Light +3 sec -3 sec Light +3 sec

P = 0.05

Tria

lTr

ial

0

Firi

ng r

ate

(Hz)

0.4

NS

First 10 trialsLast 10 trials

0

15

Firi

ng r

ate

(Hz)

Firi

ng r

ate

(Hz)

0

5

-3 3 -3 3

1

44

1

44

Fiber

Tetrode

Mid

line

1 mm 1 mm

25 µm

25 µm 25 µm

0.1 mm 50 µm

cc

Fig. 2. Optogenetic modulation of IL neuronalactivity. (A) Photomicrograph of optical fiber track(black, activated microglia stain) and virally infectedneuropil (brown, EYFP stain for eNpHR3.0-EYFP).Images show areas of weak (Upper) and dense(Lower) expression. (B) Photomicrograph of virallyinfected IL axons and terminals in the medial, butnot lateral, striatum (black, EYFP stain). The high-magnification image shows dense terminal field.(C) Arrangement of tetrodes and optical fibers in ILcortex. (D) Photomicrographs of virally infectedneurons in IL cortex and track of optical fiber im-plant (Left) and tetrode (Right). cc, corpus callosum.(E) Raster and histogram plots of spike activity foran IL unit (50-ms bins), recorded ±3 s around lightdelivery (yellow) during each of 40 trials. (F) Op-posite modulation by light of the activity of two ILunits recorded simultaneously with the same tet-rode. One is inhibited during light delivery (withsingle spike rebound at light offset), whereas theother shows excitation (with momentary inhibitionat light offset). (G) Average per-session spike ac-tivity of inhibited units (n = 11, from 2 rats) 3 sbefore light onset, during light delivery, and 3 safter light offset. (H) Average spiking during first 10illumination trials (blue) and last 10 trials (red). #P <0.01; +P < 0.001.

18934 | www.pnas.org/cgi/doi/10.1073/pnas.1216264109 Smith et al.

Dow

nloa

ded

by g

uest

on

Janu

ary

26, 2

021

Page 4: Reversible online control of habitual behavior by ... · Reversible online control of habitual behavior by optogenetic perturbation of medial prefrontal cortex Kyle S. Smitha,1, Arti

the rats suggested that IL perturbation might have turnedoff both the initial habit and the new habit and that blockadeof the second habit might, thus, have uncovered the originalhabitual behavior.

IL Perturbation Does Not Affect Generalized Motivation to Consumethe Devalued Reward or Taste-Aversion Memory. We tested the al-ternative possibility that the IL manipulation was changing a gen-eralized motivation to drink or the associated taste-aversionmemory irrespective of maze habits by examining home-cagedrinking of the devalued reward on test days following the late ILperturbation (around stage PP9). The cages were in the mazeroom to provide context similarity (Fig. 5E). On sequential testdays, light was delivered or not delivered while the reward wasfreely available to the rats in their cages (n = 6). This IL manipu-lation had no effect on drinking of the devalued reward (Fig. 5E,“Late”), which was also similar to home-cage drinking assessed atan equivalent time in control rats (Fig. 5E).We likewise tested theeffect of IL perturbation on in-cage drinking just after devaluationand again found no effect (Fig. 5E, “Early”). This lack of effectsharply contrasted to the major effect of IL intervention ondrinking after correct performance in the maze experimentproper, both at the first IL intervention and at the late inter-ventions (Figs. 5C and 6B). Thus, if IL perturbation was affectingappetitive motivation, it was doing so only in the maze.The in-cage test showed that both the control and IL-manipu-

lated rats drank more at the late time than they did right afterdevaluation, with control rats reaching 33% (ca. 10 mL) of pre-devaluation drinking. This partial recovery also contrasted sharplywith the consistently low levels of drinking in the maze during thesame time period: in the maze, the rats still rejected drinking over

half of the times that they ran to it (Figs. 5C and 6A). Thus, eventhough the incentive value of the reward at home was partly re-covering over time, it had remained weak in the maze normally.These disconnects between in-cage and in-maze drinking accordwith the strong context-dependency of habits.

DiscussionOur findings demonstrate that well-ingrained habits can becontrolled online by optogenetic manipulation of a specific sitein the prefrontal cortex, the IL cortex. Strikingly, this control waseffective even when exerted only during performance of thebehavior. Moreover, although optogenetic inhibition of the ILcortex could block expression of an ingrained habit, further ILinhibition could return this habitual behavior if enough time hadelapsed to allow the formation of a new habitual behavior. Theseeffects occurred within seconds of online performance time.These findings carry implications for the temporal dynamics

and online scope of behavioral control exerted by the neocortexover habitual behavior. First, despite the seemingly automaticbehavior typified by habits, their behavioral automaticity requiresongoing permissive or supervisory activity in the medial pre-frontal cortex. Second, this prefrontal control appears to favornew habits if there is a competition between old and new habits;IL perturbation can block an old habit abruptly but can alsoabruptly bring back an old habitual behavior by blocking thenewer one. Third, online manipulation of the medial prefron-tal cortex affects the expression of habitual behavior almostimmediately (within a performance cycle or two, totaling only

ILperturbation

Acq OT Probe Post-probe

Devaluation

1

Control groupIL-halo group

11

A

Gate

Warning cue

Instruction cue

Training stage

D E

0

100

Instructed to devalued goalInstructed to non-devalued goal

NSNS

Per

cent

cor

rect

0

100

Per

cent

cor

rect

Hom

e-ca

ge in

take

of

deva

lued

rew

ard

(ml)

0

40

Pre Post0

100

B C

Per

cent

cor

rect

Devaluation

Last OTsession

Probesession

Probe session

Instructed to devalued goal

Instructed to non- devalued goal

Hal

o

Hal

o

Con

trol

Con

trol

Fig. 3. Optogenetic perturbation of IL cortex blocks habits. (A) IL-illumi-nation protocol. (B) Equivalent performance across training for hal-orhodopsin-treated (red) and control (dashed gray) rats before devaluationand silencing. (C) Home-cage drinking pre- and postdevaluation for IL-hal-orhodopsin (IL-halo) rats. (D) Performance during last session before de-valuation (Left) and probe test after (Right), for rats with IL-halo (solid, cuedto devalued goal; dashed, cued to nondevalued goal). (E) Performanceduring the probe session for control and IL-halo rat groups. Interaction ofgoal value and rat group on probe: P = 0.001; main comparisons shown: *P <0.05; #P < 0.01, +P < 0.001. NS, not significantly different.

Per

cent

tria

ls in

stru

cted

to

dev

alue

d go

alW

rong

way

run

prob

e se

ssio

n

Run

and

drin

kst

age

PP

6

Devaluation Devaluation

Running to devlauedgoal and drinking

-5 -1 PP1 PP10PP6 PP6

Pro

be -5 -1 PP1 PP10P

robe StageStage

“Wrong Way” runningto non-devalued goal

100

0

IL-halo groupControl group

P

P

P

A B

PP

DC

P

1

0

1

0

Trial cued to devalued goalTrial cued to devalued goal20120115

Control group Control groupIL-halo group IL-halo group

1151

Fig. 4. IL perturbation reinstates original habit and simultaneously breaksreplacement habit. (A) Percentage of trials in which rats instructed to run tothe devalued goal went there correctly and also drank the reward (IL-halogroup, red; control group, black; see Fig. 5 for runs and drinks separately).Shown are five sessions leading up to reward devaluation, the unrewardedprobe session (only correct runs shown), and the postdevaluation stages [PP1initial light-on session(s) or first postprobe session for controls; PP2–PP5 andPP7–PP8: sessions without light; PP6: light-on session or equivalent for con-trol; PP9: light-on session(s); PP10: final session(s) without light]. *P < 0.05;#P < 0.01, +P < 0.001 between groups within stage; differences lacking sym-bols indicate not significantly different. (B) Percentage of wrong-way runs tonondevalued goal. (C) Trial-by-trial plot for the IL-halo group and controlgroup, in which rats ran the wrong way during probe session (i.e., habitblockade). For each trial, wrong-way run was scored “1,” and correct run wasscored “0,” and then scores were averaged (e.g., “0.5”: half of rats ran thewrong way and half ran correctly; “1”: all rats ran the wrong way). (D) Trialplot for IL-halo and control groups, in which rats ran to the devalued rewardwhen instructed and drank it, during stage PP6 (i.e., habit reinstatement).

Smith et al. PNAS | November 13, 2012 | vol. 109 | no. 46 | 18935

NEU

ROSC

IENCE

Dow

nloa

ded

by g

uest

on

Janu

ary

26, 2

021

Page 5: Reversible online control of habitual behavior by ... · Reversible online control of habitual behavior by optogenetic perturbation of medial prefrontal cortex Kyle S. Smitha,1, Arti

seconds of disrupted neuronal activity) when exerted only duringperformance, not during pretask anticipation and planning orduring postperformance reinforcement or consolidation times.These results support the findings of classic work on the habit

system suggesting that in rodents, the IL cortex is necessary forhabit expression (7–9) and point to the startling extent and po-tency of control exerted by this small cortical region. Ourexperiments also generated the unanticipated finding that ILperturbation can have the opposite effect when applied later tothe same animals, when they had acquired a new habit: it canresult in the expression of the apparently same habitual behaviorthat earlier IL perturbation originally blocked. Runs to thenondevalued goal, and home-cage drinking, were unaffected bythe intervention, ruling out generalized effects on performanceability, motivation to drink, or devaluation memory.One interpretation of these findings is that after devaluation,

the original habit lost access to behavior, but its representa-tion was maintained in the brain, and the late IL perturbationunmasked it. This conclusion is in good accordance with evi-dence, dating from the time of Pavlov (22), that when a habit isbroken, it is not forgotten; rather, a new one replaces it. By thisview, the IL cortex might serve, in part, as an online executivecontroller that favors newly acquired habits over old strategies(8, 11). Certainly, the IL cortex did not appear to act as a sim-ple bidirectional on/off switch for habitual behavior: when weperformed the second manipulation before the 1- to 2-wk period

after which the new habit was well developed, the perturbationdid not reinstate the original habit (nor did it block the emerg-ing new behavior). We take this result to suggest that the rein-statement was locked to the blockade of the second habit. Theview that the IL cortex supervises newly established habits overold strategies, even habits, is consistent with evidence that the ILcortex helps maintain current response tendencies when theycompete with prior ones (7, 11, 23).An alternative view is that the late perturbation could have

returned the rats to a state of value-driven behavior. There isusually a close coupling of reward-proximal stimuli and actionsto current value (7, 21, 24, 25); the fact that the rats drank thedevalued reward after this late manipulation of IL suggests thatthe reward might no longer have been aversive. However, thisinterpretation must face three issues. First, in-maze drinking wasconsistently low throughout testing in control rats and was low inthe IL-halorhodopsin rats up to the putative “reinstatement.”Even when these rats ran to the devalued reward, they drankit <50% of the time; they had the opportunity to drink more onthe maze but did not. Thus, the devalued reward was consis-tently aversive on the maze up to the point of IL-perturbation.However, then, during the late IL intervention, pursuit of thisdevalued reward jumped far above this level. Second, the IL-halorhodopsin rats had experience with the devalued reward insessions before IL perturbation on the few trials they drank it,and yet this experience failed to evoke a return of the original

Running todevalued goal

Running tonon-devalued goal

B C DDrinkingdevalued reward

Drinkingnon-devalued reward

AP

erce

nt tr

ials

-5 -1 PP1 PP10PP6

Pro

be

Stage

100

0

Stage StageStage

PP6-5 -1 PP1 PP10P

robe PP6 PP6-5 -1 PP1 PP10 -5 -1 PP1 PP10

Per

cent

drin

kw

hen

corr

ect

0

100

Hom

e-ca

ge in

take

of

deva

lued

rew

ard

(ml)

0

40

Pre

Post

Pre

Post

E IL-halogroup

Controlgroup

L-on

L-off

L-on

L-off

Early Late Late

NS

NS

IL-halo groupControl group

Fig. 5. Maze running and reward drinking during IL perturbation. (A andB) Percentage of correct performance on trials with instruction to devalued goal (A) orto nondevalued goal (B). *P < 0.05; #P < 0.01; +P < 0.001; differences lacking symbols indicate not significantly different. (C and D) Percentage of drinking ofdevalued reward (C) and nondevalued reward (D) when the trial was run correctly. The inverse is runs to the devalued goal without drinking. (E) Home-cagedrinking of devalued reward for IL-halo group in two light-on and light-off sessions after devaluation (“Early” test, conducted around PP1–PP2), and in light-onand light-off sessions after habit reinstatement (“Late” test, conducted around PP9). *P< 0.05 comparedwith predevaluation intake (“Pre,” dashed gray line);+P < 0.05 compared with intake just after devaluation (“Post”). (Right) Home-cage drinking of devalued reward for control group (around PP9).

A B C

Fig. 6. Timeline of late IL perturbation effects. (A) Behavior when cued to devalued goal for each rat (top to bottom) and each trial, showing control rats onPP6, and IL-halo rats the day before PP6 and on PP6. (B) Habit reinstatement on PP6 in IL-halo rats (red), compared with the prior day (green) and to controlrats on PP6 (black). Measures of running and drinking devalued goal: first trial of occurrence in the session (if ≥ one occurrence), number of repeats (twoconsecutive), percentage occurring in a repeat, and resulting intake volume. *P < 0.05 IL-halo rats on PP6 compared with the prior day (green) or control ratson PP6 (black). (C) Effects over postprobe days in 5-d blocks on performance during trials instructed to the devalued goal (solid, correct runs; dashed, wrong-way runs). Dots show individual data points for correct runs during light sessions, color-coded by order of light delivery after the initial probe session (e.g.,black, second time overall that the rat received IL light).

18936 | www.pnas.org/cgi/doi/10.1073/pnas.1216264109 Smith et al.

Dow

nloa

ded

by g

uest

on

Janu

ary

26, 2

021

Page 6: Reversible online control of habitual behavior by ... · Reversible online control of habitual behavior by optogenetic perturbation of medial prefrontal cortex Kyle S. Smitha,1, Arti

behavior. Only during the IL intervention, and only in the laterperiod of testing (PP6), did rats continue to pursue the devaluedgoal beyond a few samples. Third, the return of pursuing thedevalued goal was nearly immediate, as soon as the very firstlaser trial, suggesting that the runs were based on a stored valuerather than rats changing their performance within the sessionafter contact with the devalued reward.The maze behavior we analyzed here constituted a complex

habit, with components related to the two different rewards, onlyone of which was devalued. Our findings suggest, as a favoredinterpretation, that the return of the original behavior of runningto the devalued goal was a readjustment of a component of thelarger behavioral repertoire learned by the rats and may haveinfluenced the coupling of the components and subcomponentsof running and drinking as well.This evidence, collectively, places the temporal control of habit

expression in a paradoxical context: despite the apparent auto-matic performance of habits, classically considered as outcome-independent and noncognitive, the prefrontal cortex still monitorsongoing contingencies time step by time step, and does so with thecapacity to reverse the semiautomatic behavioral expression. Ourfindings further raise the possibility that scripts for alternate ha-bitual behaviors are somehow stored when not expressed and thatthey can be unmasked if IL activity is disturbed, suggesting co-ordination with other brain regions. Our experiments do not settlehow this coordination is accomplished or whether it is accom-plished exclusively online. Still, our anatomical evidence fromusing the viral vector to trace the connections from the IL-injectionsites confirmed that this region did not have detectable directconnections with the sensorimotor striatum but, rather, withregions (10) that should give the IL cortex direct access to circuitsinvolved in flexibility and reinforcement as well as addiction (e.g.,via projections to prelimbic neocortex and to the medial andventral striatum) (3, 4, 6, 7, 15, 23, 25, 26), and also to habit-pro-moting circuits through the central amygdala (27). Each could beimportant for the IL habit-toggling function.These observations raise the question of whether the IL cortex,

or its human brain homolog, could similarly control addictions

or states in which behavioral flexibility and behavioral fixity areout of balance, as seen in major neurologic and neuropsychiatricdisorders (5, 6, 23, 26, 28–30). Evoking fast, robust, and enduringbehavioral change by targeting the IL cortex or its homologcould be of substantial value in treating disease states in a rangeof clinical settings, in addition to serving as a powerful approachto studying the real-time making and breaking of normal ha-bitual behaviors.

Materials and MethodsRats were trained on a T-maze task requiring them to respond to auditoryinstruction cues by turning into maze end arms to receive reward (∼0.3 mLchocolate milk or sucrose, each paired with a distinct cue). Training pro-ceeded over daily sessions through task acquisition (72.5% accuracy) andthrough 10+ additional overtraining sessions. For reward devaluation, ratsreceived three pairings of free home-cage intake with lithium chloride in-jection and were later returned to the task for an unrewarded probe sessionand rewarded sessions. Task events were controlled by computer software(MED-PC; Med Associates). Behavior was monitored by in-maze photobeamsand an overhead video camera. For optogenetic manipulation, injectionsof AAV5-eNpHR3.0-CaMKIIα-EYFP or AAV5-CaMKIIα-EYFP were made bi-laterally into the IL cortex, and bilateral dual-ferrule optical fibers wereimplanted to terminate at the dorsal aspect of the IL cortex. To perturbneurons during maze runs, yellow light (2.5–5 mW) was delivered from thewarning cue to goal arrival (ca. 3 s). In tests to analyze spiking dynamics,neuronal activity was recorded from 12 to 24 tetrodes while light was de-livered (3-s-on/10-s-off pulses). Immunostaining and Nissl-staining proce-dures were used to label tetrode tracks, fiber optic cannulae tracks, and YFP-positive neurons. ANOVA and neuronal spike distribution statistics wereused to assess behavioral and neuronal activity changes, with significance setat P < 0.05. Also see SI Materials and Methods.

ACKNOWLEDGMENTS. We thank Christine Keller-McGandy, Alex McWhin-nie, Dr. Daniel J. Gibson, Henry F. Hall, Dr. Dan Hu, Dr. Yasuo Kubota, andDordaneh Sugano for their technical help. This work was supported by Na-tional Institutes of Health (NIH) Grants R01 MH060379 (to A.M.G.) and F32MH085454 (to K.S.S.); by the Stanley H. and Sheila G. Sydney Fund (A.M.G.);funding from Mr. R. Pourian and Julia Madadi (A.M.G.); and grants from theNIH, Defense Advanced Research Projects Agency, and the Gatsby Founda-tion (to K.D.).

1. Daw ND, Niv Y, Dayan P (2005) Actions, policies, values, and the basal ganglia. RecentBreakthroughs in Basal Ganglia Research, ed Bezard E (Nova Science Publishers,Hauppauge, NY), pp 91–106.

2. Packard MG (2009) Exhumed from thought: Basal ganglia and response learning inthe plus-maze. Behav Brain Res 199(1):24–31.

3. Yin HH, Knowlton BJ (2006) The role of the basal ganglia in habit formation. Nat RevNeurosci 7(6):464–476.

4. Balleine BW, O’Doherty JP (2010) Human and rodent homologies in action control:Corticostriatal determinants of goal-directed and habitual action. Neuro-psychopharmacology 35(1):48–69.

5. Graybiel AM (2008) Habits, rituals, and the evaluative brain. Annu Rev Neurosci 31:359–387.

6. Everitt BJ, Robbins TW (2005) Neural systems of reinforcement for drug addiction:From actions to habits to compulsion. Nat Neurosci 8(11):1481–1489.

7. Killcross S, Coutureau E (2003) Coordination of actions and habits in the medialprefrontal cortex of rats. Cereb Cortex 13(4):400–408.

8. Coutureau E, Killcross S (2003) Inactivation of the infralimbic prefrontal cortex re-instates goal-directed responding in overtrained rats. Behav Brain Res 146(1-2):167–174.

9. Hitchcott PK, Quinn JJ, Taylor JR (2007) Bidirectional modulation of goal-directedactions by prefrontal cortical dopamine. Cereb Cortex 17(12):2820–2827.

10. Hurley KM, Herbert H, Moga MM, Saper CB (1991) Efferent projections of the in-fralimbic cortex of the rat. J Comp Neurol 308(2):249–276.

11. Rich EL, Shapiro M (2009) Rat prefrontal cortical neurons selectively code strategyswitches. J Neurosci 29(22):7208–7219.

12. Gradinaru V, et al. (2010) Molecular and cellular approaches for diversifying andextending optogenetics. Cell 141(1):154–165.

13. Jog MS, Kubota Y, Connolly CI, Hillegaart V, Graybiel AM (1999) Building neuralrepresentations of habits. Science 286(5445):1745–1749.

14. Barnes TD, Kubota Y, Hu D, Jin DZ, Graybiel AM (2005) Activity of striatal neuronsreflects dynamic encoding and recoding of procedural memories. Nature 437(7062):1158–1161.

15. Thorn CA, Atallah H, Howe M, Graybiel AM (2010) Differential dynamics of activitychanges in dorsolateral and dorsomedial striatal loops during learning. Neuron 66(5):781–795.

16. Dickinson A (1985) Actions and habits: The development of behavioral autonomy.

Philos Trans R Soc Lond B Biol Sci 308:67–78.17. Adams CD (1982) Variations in the sensitivity of instrumental responding to reinforcer

devaluation. Q J Exp Psychol B 34:77–98.18. Holland PC, Straub JJ (1979) Differential effects of two ways of devaluing the un-

conditioned stimulus after Pavlovian appetitive conditioning. J Exp Psychol Anim

Behav Process 5(1):65–78.19. Anikeeva P, et al. (2012) Optetrode: A multichannel readout for optogenetic control

in freely moving mice. Nat Neurosci 15(1):163–170.20. Han X, et al. (2009) Millisecond-timescale optical control of neural dynamics in the

nonhuman primate brain. Neuron 62(2):191–198.21. Holland PC, Wheeler DS (2009) Representation-mediated food aversions. Conditioned

Taste Aversion: Behavioral and Neural Processes, eds Reilly S, Schachtman T (Oxford

Univ Press, Oxford), pp 196–225.22. Pavlov I (1927) Conditioned Reflexes (Dover Publications, Mineola, NY), 448 pp.23. Peters J, Kalivas PW, Quirk GJ (2009) Extinction circuits for fear and addiction overlap

in prefrontal cortex. Learn Mem 16(5):279–288.24. Balleine BW, Garner C, Gonzalez F, Dickinson A (1995) Motivational control of

heterogeneous instrumental chains. J Exp Psychol Anim Behav Process 21:

203–217.25. Smith KS, Berridge KC, Aldridge JW (2011) Disentangling pleasure from incentive

salience and learning signals in brain reward circuitry. Proc Natl Acad Sci USA 108(27):

E255–E264.26. Pascoli V, Turiault M, Lüscher C (2012) Reversal of cocaine-evoked synaptic potenti-

ation resets drug-induced adaptive behaviour. Nature 481(7379):71–75.27. Lingawi NW, Balleine BW (2012) Amygdala central nucleus interacts with dorsolateral

striatum to regulate the acquisition of habits. J Neurosci 32(3):1073–1081.28. Hyman SE, Malenka RC, Nestler EJ (2006) Neural mechanisms of addiction: The role of

reward-related learning and memory. Annu Rev Neurosci 29:565–598.29. Gillan CM, et al. (2011) Disruption in the balance between goal-directed behavior and

habit learning in obsessive-compulsive disorder. Am J Psychiatry 168(7):718–726.30. Leckman JF, Riddle MA (2000) Tourette’s syndrome: When habit-forming systems

form habits of their own? Neuron 28(2):349–354.

Smith et al. PNAS | November 13, 2012 | vol. 109 | no. 46 | 18937

NEU

ROSC

IENCE

Dow

nloa

ded

by g

uest

on

Janu

ary

26, 2

021