Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans - PubMed (original) (raw)
Controlled Clinical Trial
. 2006 Aug 31;442(7106):1042-5.
doi: 10.1038/nature05051. Epub 2006 Aug 23.
Affiliations
- PMID: 16929307
- PMCID: PMC2636869
- DOI: 10.1038/nature05051
Controlled Clinical Trial
Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans
Mathias Pessiglione et al. Nature. 2006.
Abstract
Theories of instrumental learning are centred on understanding how success and failure are used to improve future decisions. These theories highlight a central role for reward prediction errors in updating the values associated with available actions. In animals, substantial evidence indicates that the neurotransmitter dopamine might have a key function in this type of learning, through its ability to modulate cortico-striatal synaptic efficacy. However, no direct evidence links dopamine, striatal activity and behavioural choice in humans. Here we show that, during instrumental learning, the magnitude of reward prediction error expressed in the striatum is modulated by the administration of drugs enhancing (3,4-dihydroxy-L-phenylalanine; L-DOPA) or reducing (haloperidol) dopaminergic function. Accordingly, subjects treated with L-DOPA have a greater propensity to choose the most rewarding action relative to subjects treated with haloperidol. Furthermore, incorporating the magnitude of the prediction errors into a standard action-value learning algorithm accurately reproduced subjects' behavioural choices under the different drug conditions. We conclude that dopamine-dependent modulation of striatal activity can account for how the human brain uses reward prediction errors to improve future decisions.
Figures
Figure 1
Experimental task and behavioural results. a, Experimental task. Subjects selected either the upper or lower of two abstract visual stimuli presented on a display screen, and subsequently observed the outcome. In this example, the chosen stimulus is associated with a probability of 0.8 of winning £1 and a probability of 0.2 of winning nothing. Durations of the successive screens are given in milliseconds. b, Behavioural results. Left: observed behavioural choices for initial placebo (grey), superimposed over the results from the subsequent drug groups: L-DOPA (green) and haloperidol (red). The learning curves depict, trial by trial, the proportion of subjects that chose the ‘correct’ stimulus (associated with a probability of 0.8 of winning £1) in the gain condition (circles, upper graph), and the ‘incorrect’ stimulus (associated with a probability of 0.8 of losing £1) in the loss condition (squares, lower graph). Right: modelled behavioural choices for L-DOPA (green) and haloperidol (red) groups. The learning curves represent the probabilities predicted by the computational model. Circles and squares representing observed choices have been left for the purpose of comparison. All parameters of the model were the same for the different drug conditions, except the reinforcement magnitude R, which was estimated from striatal BOLD response.
Figure 2
I Statistical parametric maps of prediction error and stimulus-related activity. Coronal slices (bottom) were taken at local maxima of interest indicated by red arrows on the axial projection planes (top). Areas shown in grey on axial planes and in orange or yellow on coronal slices showed significant effect after family-wise error correction for multiple comparisons (P , 0.05). a, Brain activity correlated with prediction errors derived from the computational model. Reward prediction errors (positive correlation) were found by conjunction of gain and loss conditions (left panels), whereas punishment prediction errors (negative correlation) were found in the loss condition alone (right panel). From left to right, MNI (Montreal Neurological Institute) coordinates are given for the maxima found in the left posterior putamen, left ventral striatum and right anterior insula. b, Statistical parametric maps resulting from main contrasts between stimuli conditions. Go and NoGo refer to stimuli position requiring, or not requiring, a button press to get the optimal outcome. Gain, neutral and loss correspond to the different pairs of stimuli. As above, the maxima shown are located in the left posterior putamen, left ventral striatum and right anterior insula, from left to right.
Figure 3
Time course of brain responses reflecting prediction errors. Time courses were averaged across trials throughout the entire learning sessions. Error bars are inter-subject s.e.m. a, Overlaid positive (grey circles) and negative (black squares) reward prediction errors in the striatum for both L-DOPA-treated and haloperidol-treated groups, and in both gain and loss trials. b, Overlaid positive (black squares) and negative (grey circles) punishment prediction errors in the right anterior insula, during the loss trials.
Similar articles
- L-DOPA reduces model-free control of behavior by attenuating the transfer of value to action.
Kroemer NB, Lee Y, Pooseh S, Eppinger B, Goschke T, Smolka MN. Kroemer NB, et al. Neuroimage. 2019 Feb 1;186:113-125. doi: 10.1016/j.neuroimage.2018.10.075. Epub 2018 Oct 28. Neuroimage. 2019. PMID: 30381245 - Dopamine Modulates Adaptive Prediction Error Coding in the Human Midbrain and Striatum.
Diederen KM, Ziauddeen H, Vestergaard MD, Spencer T, Schultz W, Fletcher PC. Diederen KM, et al. J Neurosci. 2017 Feb 15;37(7):1708-1720. doi: 10.1523/JNEUROSCI.1979-16.2016. J Neurosci. 2017. PMID: 28202786 Free PMC article. Clinical Trial. - Pharmacological modulation of subliminal learning in Parkinson's and Tourette's syndromes.
Palminteri S, Lebreton M, Worbe Y, Grabli D, Hartmann A, Pessiglione M. Palminteri S, et al. Proc Natl Acad Sci U S A. 2009 Nov 10;106(45):19179-84. doi: 10.1073/pnas.0904035106. Epub 2009 Oct 22. Proc Natl Acad Sci U S A. 2009. PMID: 19850878 Free PMC article. - A computational substrate for incentive salience.
McClure SM, Daw ND, Montague PR. McClure SM, et al. Trends Neurosci. 2003 Aug;26(8):423-8. doi: 10.1016/s0166-2236(03)00177-2. Trends Neurosci. 2003. PMID: 12900173 Review. - Predictive reward signal of dopamine neurons.
Schultz W. Schultz W. J Neurophysiol. 1998 Jul;80(1):1-27. doi: 10.1152/jn.1998.80.1.1. J Neurophysiol. 1998. PMID: 9658025 Review.
Cited by
- Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making.
Xu HA, Modirshanechi A, Lehmann MP, Gerstner W, Herzog MH. Xu HA, et al. PLoS Comput Biol. 2021 Jun 3;17(6):e1009070. doi: 10.1371/journal.pcbi.1009070. eCollection 2021 Jun. PLoS Comput Biol. 2021. PMID: 34081705 Free PMC article. - Examining belief and confidence in schizophrenia.
Joyce DW, Averbeck BB, Frith CD, Shergill SS. Joyce DW, et al. Psychol Med. 2013 Nov;43(11):2327-38. doi: 10.1017/S0033291713000263. Epub 2013 Mar 22. Psychol Med. 2013. PMID: 23521846 Free PMC article. - Identifying predictors, moderators, and mediators of antidepressant response in major depressive disorder: neuroimaging approaches.
Phillips ML, Chase HW, Sheline YI, Etkin A, Almeida JR, Deckersbach T, Trivedi MH. Phillips ML, et al. Am J Psychiatry. 2015 Feb 1;172(2):124-38. doi: 10.1176/appi.ajp.2014.14010076. Am J Psychiatry. 2015. PMID: 25640931 Free PMC article. Review. - Anatomical dissociation of intracerebral signals for reward and punishment prediction errors in humans.
Gueguen MCM, Lopez-Persem A, Billeke P, Lachaux JP, Rheims S, Kahane P, Minotti L, David O, Pessiglione M, Bastin J. Gueguen MCM, et al. Nat Commun. 2021 Jun 7;12(1):3344. doi: 10.1038/s41467-021-23704-w. Nat Commun. 2021. PMID: 34099678 Free PMC article. - Striatal miR-183-5p inhibits methamphetamine-induced locomotion by regulating glucocorticoid receptor signaling.
Song SH, Jang WJ, Jang EY, Kim OH, Kim H, Son T, Choi DY, Lee S, Jeong CH. Song SH, et al. Front Pharmacol. 2022 Sep 26;13:997701. doi: 10.3389/fphar.2022.997701. eCollection 2022. Front Pharmacol. 2022. PMID: 36225577 Free PMC article.
References
- Calabresi P, et al. Synaptic transmission in the striatum: from plasticity to neurodegeneration. Prog. Neurobiol. 2000;61:231–265. - PubMed
- Tremblay L, Hollerman JR, Schultz W. Modifications of reward expectation-related neuronal activity during learning in primate striatum. J. Neurophysiol. 1998;80:964–977. - PubMed
- Frank MJ, Seeberger LC, O'Reilly RC. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science. 2004;306:1940–1943. - PubMed
- Hollerman JR, Tremblay L, Schultz W. Influence of reward expectation on behavior-related neuronal activity in primate striatum. J. Neurophysiol. 1998;80:947–963. - PubMed
- Lauwereyns J, Watanabe K, Coe B, Hikosaka O. A neural correlate of response bias in monkey caudate nucleus. Nature. 2002;418:413–417. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources