A causal link between prediction errors, dopamine neurons and learning - PubMed (original) (raw)

Elizabeth E Steinberg et al. Nat Neurosci. 2013 Jul.

Abstract

Situations in which rewards are unexpectedly obtained or withheld represent opportunities for new learning. Often, this learning includes identifying cues that predict reward availability. Unexpected rewards strongly activate midbrain dopamine neurons. This phasic signal is proposed to support learning about antecedent cues by signaling discrepancies between actual and expected outcomes, termed a reward prediction error. However, it is unknown whether dopamine neuron prediction error signaling and cue-reward learning are causally linked. To test this hypothesis, we manipulated dopamine neuron activity in rats in two behavioral procedures, associative blocking and extinction, that illustrate the essential function of prediction errors in learning. We observed that optogenetic activation of dopamine neurons concurrent with reward delivery, mimicking a prediction error, was sufficient to cause long-lasting increases in cue-elicited reward-seeking behavior. Our findings establish a causal role for temporally precise dopamine neuron signaling in cue-reward learning, bridging a critical gap between experimental evidence and influential theoretical frameworks.

PubMed Disclaimer

Figures

Fig. 1

Fig. 1. Behavioral demonstration of the blocking effect

(a) Experimental design of the blocking task. (b) During reinforced trials, sucrose delivery was contingent upon reward port entry during the 30s cue. After entry, sucrose was delivered for 3s followed by a 2s timeout. Up to 6 sucrose rewards could be earned per trial, depending on the rats’ behavior. (c) Performance across all single cue and compound training sessions. Inset, mean performance among groups over the last four days of single-cue training did not differ; controls showed reduced behavior during compound training (***p<0.001). (d) Performance during visual cue test. The blocking group exhibited reduced responding to the cue at test relative to controls (main effect of group, p=0.003, group × trial interaction, p=0.286). (e) Visual cue test performance for the first trial and the average of all three trials. The blocking group showed reduced cue responding for the 3-trial measure (**p=0.003) but were not different on the first trial (p=0.095). For all figures, values depicted are means and error bars represent SEM.

Fig. 2

Fig. 2. Dopamine neuron stimulation drives new learning

(a) Example histology from a Th::Cre+ rat injected with a Cre-dependent ChR2-containing virus. Vertical track indicates optical fiber placement above VTA. Scale bar = 1mm. (b) Experimental design for blocking task with optogenetics. All groups received identical behavioral training according to the “blocking” group design in Fig. 1a. (c) Optical stimulation (1s train, 5ms pulse, 20 Hz, 473nm) was synchronized with sucrose delivery in Paired (Cre+ and Cre), but not Unpaired (Cre+), groups. (d) Performance across all single cue and compound training sessions. Inset, no group differences over last four days of single cue training or during compound training. (e) Performance during visual cue test. The PairedCre+ group exhibited increased responding to the cue relative to both control groups at test on the first trial (**p<0.005). (f) Visual cue test performance for the first trial and all three trials averaged. The PairedCre+ group exhibited increased cue responding relative to controls for the 1-trial measure (PairedCre+ vs. UnpairedCre+, **p=0.005, PairedCre+ vs. PairedCre, *p=0.025, PairedCre vs. UnpairedCre+,p=0.26); there was a trend for a group effect for the 3-trial average (main effect of group, p=0.055).

Fig. 3

Fig. 3. Dopamine neuron stimulation attenuates behavioral decrements associated with a downshift in reward value

(a) Experimental design for reward downshift test. Optical stimulation (3s train, 5ms pulse, 20 Hz, 473nm) was either paired with the water “reward” (PairedCre+ and Cre groups) or explicitly unpaired (UnpairedCre+). (b) Percent time in port during the cue across training sessions. Inset, no difference in average performance during the last two training sessions. (c) Percent time in port during the cue for the downshift test. Data are displayed for single trials (left) and as a session average (right). PairedCre+ rats exhibited increased time in port compared to controls (PairedCre+ vs. UnpairedCre+, ***p<0.001, PairedCre+ vs. PairedCre, ***p<0.001, PairedCre vs. UnpairedCre+, p=0.691). (d). Percent time in port during the cue for downshift recall. Data are displayed for single trials (left) and as a session average (right). There were no group differences during this phase (2-way RM ANOVA, main effect of group p=0.835). (e) Latency to enter the reward port after cue onset. Inset, no group differences during last two training sessions. (f) As in C, but for latency. PairedCre+ rats responded faster to the cue compared to controls during the downshift test (PairedCre+ vs. UnpairedCre+, ***p<0.001, PairedCre+ vs. PairedCre, ***p<0.001, PairedCre vs. Unpaired Cre+, p=0.375). (g) As in D, but for latency. PairedCre+ rats responded faster to the cue compared to controls during downshift recall (PairedCre+ vs. UnpairedCre+, *p=0.024, PairedCre+ vs. PairedCre, *p=0.025, PairedCre vs. UnpairedCre+, p=0.706).

Fig. 4

Fig. 4. Dopamine neuron stimulation attenuates behavioral decrements associated with reward omission

(a) Experimental design for extinction test. Note that the same subjects from the downshift experiment were used for this procedure, with Cre+ groups shuffled between experiments (see Methods). Optical stimulation (3s train, 5ms pulse, 20 Hz, 473nm) was delivered at the time of expected reward for Paired groups and during ITI for UnpairedCre+ rats. (b) Percent time in port during the cue across training sessions. Inset, no difference in average performance during the last two training sessions. (c) Percent time in port during the cue for theextinction test. Data are displayed for single trials (left) and as a session average (right). PairedCre+ rats exhibited increased time in port compared to controls (PairedCre+ vs. UnpairedCre+, ***p<0.001, PairedCre+ vs. PairedCre, ***p<0.001, PairedCre vs. UnpairedCre+, p=0.920). (d). Percent time in port during the cue for extinction recall. Data are displayed for single trials (left) and as a session average (right). PairedCre+ rats exhibited increased time in port compared to controls (PairedCre+ vs. UnpairedCre+, ***p<0.001, PairedCre+ vs. PairedCre, ***p<0.001, PairedCre vs. UnpairedCre+, p=0.984). (e) Latency to enter the reward port after cue onset. Inset, no group differences during last two training sessions. (f) As in C, but for latency. PairedCre+ rats responded faster to the cue compared to controls during the extinction test (PairedCre+ vs. UnpairedCre+, *p=0.038, PairedCre+ vs. PairedCre, *p=0.04, PairedCre vs. UnpairedCre+ p=0.727). (g) As in D, but for latency. PairedCre+ rats responded faster to the cue compared to controls during extinction recall (PairedCre+ vs. UnpairedCre+, ***p<0.001, PairedCre+ vs. PairedCre, ***p<0.001, PairedCre vs. UnpairedCre+, p=0.211).

Comment in

Similar articles

Cited by

References

    1. Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Black AH, Prokasy WF, editors. Classical Conditioning II: Current Research and Theory. Appleton Century Crofts; New York: 1972. pp. 64–99. (1972)
    1. Glimcher PW. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc Natl Acad Sci U S A. 2011;108(Suppl 3):15647–54. - PMC - PubMed
    1. Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci. 1996;16:1936–47. - PMC - PubMed
    1. Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–9. - PubMed
    1. Schultz W, Dickinson A. Neuronal coding of prediction errors. Annu Rev Neurosci. 2000;23:473–500. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources