A causal link between prediction errors, dopamine neurons and learning - PubMed (original) (raw)

A causal link between prediction errors, dopamine neurons and learning

Elizabeth E Steinberg et al. Nat Neurosci. 2013 Jul.

Abstract

Situations in which rewards are unexpectedly obtained or withheld represent opportunities for new learning. Often, this learning includes identifying cues that predict reward availability. Unexpected rewards strongly activate midbrain dopamine neurons. This phasic signal is proposed to support learning about antecedent cues by signaling discrepancies between actual and expected outcomes, termed a reward prediction error. However, it is unknown whether dopamine neuron prediction error signaling and cue-reward learning are causally linked. To test this hypothesis, we manipulated dopamine neuron activity in rats in two behavioral procedures, associative blocking and extinction, that illustrate the essential function of prediction errors in learning. We observed that optogenetic activation of dopamine neurons concurrent with reward delivery, mimicking a prediction error, was sufficient to cause long-lasting increases in cue-elicited reward-seeking behavior. Our findings establish a causal role for temporally precise dopamine neuron signaling in cue-reward learning, bridging a critical gap between experimental evidence and influential theoretical frameworks.

PubMed Disclaimer

Figures

Fig. 1. Behavioral demonstration of the blocking effect

(a) Experimental design of the blocking task. (b) During reinforced trials, sucrose delivery was contingent upon reward port entry during the 30s cue. After entry, sucrose was delivered for 3s followed by a 2s timeout. Up to 6 sucrose rewards could be earned per trial, depending on the rats’ behavior. (c) Performance across all single cue and compound training sessions. Inset, mean performance among groups over the last four days of single-cue training did not differ; controls showed reduced behavior during compound training (***p<0.001). (d) Performance during visual cue test. The blocking group exhibited reduced responding to the cue at test relative to controls (main effect of group, p=0.003, group × trial interaction, p=0.286). (e) Visual cue test performance for the first trial and the average of all three trials. The blocking group showed reduced cue responding for the 3-trial measure (**p=0.003) but were not different on the first trial (p=0.095). For all figures, values depicted are means and error bars represent SEM.

Fig. 2. Dopamine neuron stimulation drives new learning

(a) Example histology from a Th::Cre+ rat injected with a Cre-dependent ChR2-containing virus. Vertical track indicates optical fiber placement above VTA. Scale bar = 1mm. (b) Experimental design for blocking task with optogenetics. All groups received identical behavioral training according to the “blocking” group design in Fig. 1a. (c) Optical stimulation (1s train, 5ms pulse, 20 Hz, 473nm) was synchronized with sucrose delivery in Paired (Cre+ and Cre–), but not Unpaired (Cre+), groups. (d) Performance across all single cue and compound training sessions. Inset, no group differences over last four days of single cue training or during compound training. (e) Performance during visual cue test. The PairedCre+ group exhibited increased responding to the cue relative to both control groups at test on the first trial (**p<0.005). (f) Visual cue test performance for the first trial and all three trials averaged. The PairedCre+ group exhibited increased cue responding relative to controls for the 1-trial measure (PairedCre+ vs. UnpairedCre+, **p=0.005, PairedCre+ vs. PairedCre–, *p=0.025, PairedCre– vs. UnpairedCre+,p=0.26); there was a trend for a group effect for the 3-trial average (main effect of group, p=0.055).

Fig. 3. Dopamine neuron stimulation attenuates behavioral decrements associated with a downshift in reward value

(a) Experimental design for reward downshift test. Optical stimulation (3s train, 5ms pulse, 20 Hz, 473nm) was either paired with the water “reward” (PairedCre+ and Cre– groups) or explicitly unpaired (UnpairedCre+). (b) Percent time in port during the cue across training sessions. Inset, no difference in average performance during the last two training sessions. (c) Percent time in port during the cue for the downshift test. Data are displayed for single trials (left) and as a session average (right). PairedCre+ rats exhibited increased time in port compared to controls (PairedCre+ vs. UnpairedCre+, ***p<0.001, PairedCre+ vs. PairedCre–, ***p<0.001, PairedCre– vs. UnpairedCre+, p=0.691). (d). Percent time in port during the cue for downshift recall. Data are displayed for single trials (left) and as a session average (right). There were no group differences during this phase (2-way RM ANOVA, main effect of group p=0.835). (e) Latency to enter the reward port after cue onset. Inset, no group differences during last two training sessions. (f) As in C, but for latency. PairedCre+ rats responded faster to the cue compared to controls during the downshift test (PairedCre+ vs. UnpairedCre+, ***p<0.001, PairedCre+ vs. PairedCre–, ***p<0.001, PairedCre– vs. Unpaired Cre+, p=0.375). (g) As in D, but for latency. PairedCre+ rats responded faster to the cue compared to controls during downshift recall (PairedCre+ vs. UnpairedCre+, *p=0.024, PairedCre+ vs. PairedCre–, *p=0.025, PairedCre– vs. UnpairedCre+, p=0.706).

Fig. 4. Dopamine neuron stimulation attenuates behavioral decrements associated with reward omission

(a) Experimental design for extinction test. Note that the same subjects from the downshift experiment were used for this procedure, with Cre+ groups shuffled between experiments (see Methods). Optical stimulation (3s train, 5ms pulse, 20 Hz, 473nm) was delivered at the time of expected reward for Paired groups and during ITI for UnpairedCre+ rats. (b) Percent time in port during the cue across training sessions. Inset, no difference in average performance during the last two training sessions. (c) Percent time in port during the cue for theextinction test. Data are displayed for single trials (left) and as a session average (right). PairedCre+ rats exhibited increased time in port compared to controls (PairedCre+ vs. UnpairedCre+, ***p<0.001, PairedCre+ vs. PairedCre–, ***p<0.001, PairedCre– vs. UnpairedCre+, p=0.920). (d). Percent time in port during the cue for extinction recall. Data are displayed for single trials (left) and as a session average (right). PairedCre+ rats exhibited increased time in port compared to controls (PairedCre+ vs. UnpairedCre+, ***p<0.001, PairedCre+ vs. PairedCre–, ***p<0.001, PairedCre– vs. UnpairedCre+, p=0.984). (e) Latency to enter the reward port after cue onset. Inset, no group differences during last two training sessions. (f) As in C, but for latency. PairedCre+ rats responded faster to the cue compared to controls during the extinction test (PairedCre+ vs. UnpairedCre+, *p=0.038, PairedCre+ vs. PairedCre–, *p=0.04, PairedCre– vs. UnpairedCre+ p=0.727). (g) As in D, but for latency. PairedCre+ rats responded faster to the cue compared to controls during extinction recall (PairedCre+ vs. UnpairedCre+, ***p<0.001, PairedCre+ vs. PairedCre–, ***p<0.001, PairedCre– vs. UnpairedCre+, p=0.211).

Comment in

Dopamine signals mimic reward prediction errors.
Schoenbaum G, Esber GR, Iordanova MD. Schoenbaum G, et al. Nat Neurosci. 2013 Jul;16(7):777-9. doi: 10.1038/nn.3448. Nat Neurosci. 2013. PMID: 23799468 Free PMC article.

Cited by

Individual differences in decision-making shape how mesolimbic dopamine regulates choice confidence and change-of-mind.
Kocharian A, Redish AD, Rothwell PE. Kocharian A, et al. bioRxiv [Preprint]. 2024 Sep 16:2024.09.16.613237. doi: 10.1101/2024.09.16.613237. bioRxiv. 2024. PMID: 39345599 Free PMC article. Preprint.
Ventral tegmental area dopamine neural activity switches simultaneously with rule representations in the prefrontal cortex and hippocampus.
Ding M, Tomsick PL, Young RA, Jadhav SP. Ding M, et al. bioRxiv [Preprint]. 2024 Sep 10:2024.09.09.611811. doi: 10.1101/2024.09.09.611811. bioRxiv. 2024. PMID: 39314328 Free PMC article. Preprint.
Daily Social Isolation Maps Onto Distinctive Features of Anhedonic Behavior: A Combined Ecological and Computational Investigation.
Gigli V, Castellano P, Ghezzi V, Ang YS, Schettino M, Pizzagalli DA, Ottaviani C. Gigli V, et al. Biol Psychiatry Glob Open Sci. 2024 Jul 31;4(6):100369. doi: 10.1016/j.bpsgos.2024.100369. eCollection 2024 Nov. Biol Psychiatry Glob Open Sci. 2024. PMID: 39282653 Free PMC article.
The association between liking, learning and creativity in music.
Zioga I, Harrison PMC, Pearce M, Bhattacharya J, Di Bernardi Luft C. Zioga I, et al. Sci Rep. 2024 Aug 16;14(1):19048. doi: 10.1038/s41598-024-70027-z. Sci Rep. 2024. PMID: 39152203 Free PMC article.
Dopamine reuptake and inhibitory mechanisms in human dopamine transporter.
Li Y, Wang X, Meng Y, Hu T, Zhao J, Li R, Bai Q, Yuan P, Han J, Hao K, Wei Y, Qiu Y, Li N, Zhao Y. Li Y, et al. Nature. 2024 Aug;632(8025):686-694. doi: 10.1038/s41586-024-07796-0. Epub 2024 Aug 7. Nature. 2024. PMID: 39112701

References

1. Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Black AH, Prokasy WF, editors. Classical Conditioning II: Current Research and Theory. Appleton Century Crofts; New York: 1972. pp. 64–99. (1972)
1. Glimcher PW. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc Natl Acad Sci U S A. 2011;108(Suppl 3):15647–54. - PMC - PubMed
1. Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci. 1996;16:1936–47. - PMC - PubMed
1. Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–9. - PubMed
1. Schultz W, Dickinson A. Neuronal coding of prediction errors. Annu Rev Neurosci. 2000;23:473–500. - PubMed

A causal link between prediction errors, dopamine neurons and learning - PubMed (original) (raw)