Dopamine-mediated learning and switching in cortico-striatal circuit explain behavioral changes in reinforcement learning - PubMed (original) (raw)
Dopamine-mediated learning and switching in cortico-striatal circuit explain behavioral changes in reinforcement learning
Simon Hong et al. Front Behav Neurosci. 2011.
Abstract
The basal ganglia are thought to play a crucial role in reinforcement learning. Central to the learning mechanism are dopamine (DA) D1 and D2 receptors located in the cortico-striatal synapses. However, it is still unclear how this DA-mediated synaptic plasticity is deployed and coordinated during reward-contingent behavioral changes. Here we propose a computational model of reinforcement learning that uses different thresholds of D1- and D2-mediated synaptic plasticity which are antagonized by DA-independent synaptic plasticity. A phasic increase in DA release caused by a larger-than-expected reward induces long-term potentiation (LTP) in the direct pathway, whereas a phasic decrease in DA release caused by a smaller-than-expected reward induces a cessation of long-term depression, leading to LTP in the indirect pathway. This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations. The changes in saccade latency become quicker as the monkey becomes more experienced. This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively. Our model also shows how D1- or D2-receptor blocking experiments affect selectively either reward or no-reward trials. The proposed mechanisms also explain the behavioral changes in Parkinson's disease.
Keywords: LTD; LTP; latency; model; motivation; reaction time; reward; saccade.
Figures
Figure 1
Reinforcement learning experiments and involved learning circuit. (A) Sequence of events in the one direction rewarded saccade task (1DR). The monkey first fixated at the central spot (the dotted circle indicates the eye position). As the fixation point disappeared, a target appeared randomly on the right or left and the monkey was required to make a saccade to it immediately. Correct saccades in one direction were followed by a tone and juice reward; saccades in the other direction followed by a tone alone. The rewarded direction was fixed in a block of 24 trials, and was changed in the following block. (B) Distribution of saccade latencies in reward trials (in red) and in no-reward trials (in blue). (C) Illustration of D1 and D2 antagonist experiments. D1 or D2 antagonist was administrated in the caudate to examine the behavioral consequence in the 1DR task. Black, red, and purple connections indicate excitatory, inhibitory, and dopaminergic modulatory connections, respectively. (D) Hypothesized circuit involving D1 and D2 mediated plasticities. D1 and D2 mediated plasticities in direct and indirect pathways are assumed to contribute to eye movements. The purple arrows indicate dopaminergic modulatory connections. The lines with rectangle ends indicate inhibitory connections. Arrow ends indicate excitatory connections. Figures (A) and (B) are from Hong and Hikosaka (2008). Abbreviations: CD, caudate nucleus; D1, D2, D1, and D2 receptors; SC, superior colliculus; SNc/SNr, substantia nigra pars compacta/reticulata; GPe, globus pallidus external segment; FEF, frontal eye field; SEF, supplementary eye field; DLPF, dorsolateral prefrontal cortex; LIP, lateral intraparietal area; STN, subthalamic nucleus.
Figure 2
Dopamine-mediated learning mechanisms in the striatum. DA-dependent LTP and LTD are labeled in red; DA-independent LTP and LTD are in blue. (A) Null state, where free saccades happen with no-task. Indirect pathway MSNs, which express D2 receptors, show weak DA-dependent LTD and DA-independent LTP. Direct pathway MSNs, which express D1 receptors, show DA-independent LTD. (B) Hypothesized D1 and D2 thresholds in relation to the levels of DA during big-reward and no (or small) reward trials. (C) In big-reward trials, the increased level of DA causes DA-dependent LTP in the direct pathway and enhances DA-dependent LTD in the indirect pathway. (D) In no-reward trials, the decreased level of DA causes an attenuation of DA-dependent LTD in the indirect pathway. The changes in DA level are assumed to coincide with the activation of the cortical input and the activation of the connected MSN neuron to enable the DA-dependent LTP and LTD (see the eligibility traces in Eqs 6–8). In both cases (C and D) DA-independent LTD in the direct pathway and DA-independent LTP in the indirect pathway remain unchanged, but their effects become relatively weak in big-reward trials (C) and relatively strong in no-reward trials (D). The red arrows indicate the amplitudes (large: 2 arrows, small: 1 arrow) and directions (up, down) of the change of neural activity compared to the null state shown in (A). Note, that even though the DA level in figure (A) and (D) are both under the threshold, LTD in the case of figure (D) happens more vigorously because of enabled eligibility (Eqs 6–8). The thickness of the connections indicates the resulting output in each module. Black and open circles indicate inhibitory and excitatory neurons, respectively. In (C) and (D), the GPb–LHb–SNc circuit is omitted for clarity. LTP/LTD, long-term potentiation/depression; GPb, border region of globus pallidus; LHb, lateral habenula.
Figure 3
Experience-dependent emergence of a switching mechanism that allows rapid changes of saccade latency in response to the change in reward location: before (A–C) and after (D–F) sufficient experience of the 1DR task. We hypothesize the presence of “reward-category neurons” (RWD), a key driver of the switching, that have excitatory connections to FEF neurons and direct pathway MSNs in the CD in the same hemisphere. They would become active before target onset selectively when a reward is expected on the contralateral side (see Figure 4), an assumption based on experimental observations of neuronal activity in the FEF, CD, SNr, and SC. Before sufficient experience of the 1DR task (A–C), the saccade latency changes gradually in both the small-to-big-reward transition [red in **(B,C)**] and the big-to-small-reward transition [blue in **(B,C)**] similarly by experimental observation (B) and computer simulation (C). The saccade latency data in (B) is from monkeys C, D, and T. After sufficient experience of the 1DR task (D–F), the saccade latency changes quickly as shown in experiments (E) and computer simulation (F). This is mainly due to the additional excitatory input from the reward-category neurons. Note, however, that the decrease in saccade latency in the small-to-big-reward transition [red in **(E,F)**] is quicker than the increase in saccade latency in the big-to-small-reward transition [blue in **(E,F)**]. This asymmetry is due to the asymmetric learning algorithm operated by two parallel circuits in the basal ganglia illustrated in Figure 2. Figure (E) from Matsumoto and Hikosaka (2007).
Figure 4
Simulated neural components of the model performing reward and no-reward trials of 1DR task. In reward trials (A) the reward-category unit (REW category) ramps up its activity shortly after the presentation of the fixation point. The activity shuts off in response to the burst activity of DA unit (DA) signaling the reward value of the target. The FEF unit combines the tonic reward-category activity and the phasic target signal. In the BG, both the direct pathway MSN unit (D1) and the indirect pathway MSN unit (D2) receive an input from the FEF. The direct pathway MSN unit (D1), in addition, receives an input directly from the reward-category unit and therefore shows larger ramping activity than the indirect pathway MSN unit (D2). The activity of the direct pathway MSN unit (D1) is further enhanced by DA-dependent LTP, which is triggered by the DA burst, and mediated by D1 receptors. This results in a stronger disinhibition of the SC by the SNr leading to a stronger activity in the SC. In contrast, the activity of indirect pathway MSN unit (D2) is further depressed by DA-dependent LTD, which is triggered by the DA burst, and mediated by D2 receptors. This results in the suppression of the excitatory input from the STN to the SNr, further enhancing the SC activity. The combined effects from the direct and indirect pathways lead to a shorter latency saccade (see the arrow head on top, indicating the time of saccade initiation). In no-reward trials (B) the activity of the reward-category unit is much weaker, thus lowering the activity of the FEF unit and the direct pathway MSN unit (D1). The activity of the direct pathway MSN unit (D1) is further depressed by DA-independent LTD. In contrast, the activity of D2 MSN increases because DA-dependent LTD is attenuated due to the “pause” of DA activity (DA) and thus is dominated by DA-independent LTP. The combined effects from the direct and indirect pathways lead to a weaker activation of the SC unit and hence a longer latency saccade. The scale of all the ordinate axes is from 0 to 1.
Figure 5
Influence of D1 antagonist on saccadic latency. (A) Trial-by-trial changes in the latency of contralateral saccades, before (black) and after (red) injection of a D1 antagonist into the CD. Data are from Nakamura and Hikosaka (, p. 60). (B) Simulated trial-by-trial changes in saccade latency. (C) After D1 antagonist injection, average saccade latency increased in big-reward trials, but not in small-reward trials. The experimental data was replicated by computer simulation. (D) Hypothesized mechanism of the effect of D1 antagonist in big-reward trials. The D1 antagonist effectively elevates the D1 threshold and therefore induces a smaller-than-usual LTP in the direct pathway MSNs, whereas the indirect pathway MSNs are unaffected. The attenuated LTP leads to a weaker activation of the SC and therefore a longer latency saccade. (E) Hypothesized mechanism of the effect of D1 antagonist in big-reward trials. The DA level remains below the D1 threshold similarly to the control condition (Figure 2D) and therefore the saccade latency is not changed.
Figure 6
Influence of D2 antagonist on saccadic latency. (A) Trial-by-trial changes in the latency of contralateral saccades, before (black) and after (blue) injection of a D2 antagonist into the CD. Data are from Nakamura and Hikosaka (2006). (B) Simulated trial-by-trial changes in saccade latency. (C) After D1 antagonist injection, average saccade latency increased in small-reward trials, but not in big-reward trials. The experimental data was replicated by computer simulation. (D) Hypothesized mechanism of the effect of D2 antagonist in big-reward trials. The phasic increase in the DA level exceeds both the D1 and D2 thresholds, although the D2 antagonist elevates the D2 threshold, and therefore the saccade latency remains largely unchanged. (E) Hypothesized mechanism of the effect of D2 antagonist in small-reward trials. The elevated D2 threshold eliminates the DA-dependent LTD in the indirect pathway MSNs and therefore potentiates the SNr-induced inhibition of the SC, leading to a longer latency saccade.
Figure 7
Simulation of disrupted plasticity in Parkinson's disease (PD). (A) Disrupted plasticity in MSNs (green and purple dots) in a rat PD model. When input stimulation was followed by excitation of a MSN repeatedly, the response of the MSN to the input changed gradually, in the directions opposite to control subjects. Data are from Shen et al. (2008), adapted with permission. (B) Hypothesized changes in the DA level and D1/D2 thresholds in PD and PD with
l
-DOPA. See text for details. (C) Simulated plasticity in the direct pathway MSNs (D1) and the indirect pathway MSNs (D2) in PD subjects performing the 1DR task. The simulation shows disrupted plasticity in PD, which is similar to that shown in the rat PD model (A). Note that the magnitude of the plasticity is larger for no-reward trials than for reward trials. (D) Simulated saccade latency in PD subjects with no treatment (PD) and PD subjects with
l
-DOPA. See text for further explanations.
Similar articles
- Maladaptive striatal plasticity and abnormal reward-learning in cervical dystonia.
Gilbertson T, Humphries M, Steele JD. Gilbertson T, et al. Eur J Neurosci. 2019 Oct;50(7):3191-3204. doi: 10.1111/ejn.14414. Epub 2019 May 14. Eur J Neurosci. 2019. PMID: 30955204 Free PMC article. - A Dual Role Hypothesis of the Cortico-Basal-Ganglia Pathways: Opponency and Temporal Difference Through Dopamine and Adenosine.
Morita K, Kawaguchi Y. Morita K, et al. Front Neural Circuits. 2019 Jan 7;12:111. doi: 10.3389/fncir.2018.00111. eCollection 2018. Front Neural Circuits. 2019. PMID: 30687019 Free PMC article. - Opposing patterns of abnormal D1 and D2 receptor dependent cortico-striatal plasticity explain increased risk taking in patients with DYT1 dystonia.
Gilbertson T, Arkadir D, Steele JD. Gilbertson T, et al. PLoS One. 2020 May 4;15(5):e0226790. doi: 10.1371/journal.pone.0226790. eCollection 2020. PLoS One. 2020. PMID: 32365120 Free PMC article. - Dopamine D1-like receptors and reward-related incentive learning.
Beninger RJ, Miller R. Beninger RJ, et al. Neurosci Biobehav Rev. 1998 Mar;22(2):335-45. doi: 10.1016/s0149-7634(97)00019-5. Neurosci Biobehav Rev. 1998. PMID: 9579323 Review. - Striatal action-learning based on dopamine concentration.
Morris G, Schmidt R, Bergman H. Morris G, et al. Exp Brain Res. 2010 Jan;200(3-4):307-17. doi: 10.1007/s00221-009-2060-6. Epub 2009 Nov 11. Exp Brain Res. 2010. PMID: 19904530 Review.
Cited by
- The neurobiology and neural circuitry of cognitive changes in Parkinson's disease revealed by functional neuroimaging.
Ray NJ, Strafella AP. Ray NJ, et al. Mov Disord. 2012 Oct;27(12):1484-92. doi: 10.1002/mds.25173. Epub 2012 Oct 4. Mov Disord. 2012. PMID: 23038645 Free PMC article. Review. - A plastic corticostriatal circuit model of adaptation in perceptual decision making.
Hsiao PY, Lo CC. Hsiao PY, et al. Front Comput Neurosci. 2013 Dec 10;7:178. doi: 10.3389/fncom.2013.00178. eCollection 2013. Front Comput Neurosci. 2013. PMID: 24339814 Free PMC article. - Parallel basal ganglia circuits for voluntary and automatic behaviour to reach rewards.
Kim HF, Hikosaka O. Kim HF, et al. Brain. 2015 Jul;138(Pt 7):1776-800. doi: 10.1093/brain/awv134. Epub 2015 May 16. Brain. 2015. PMID: 25981958 Free PMC article. Review. - A hypothesis for basal ganglia-dependent reinforcement learning in the songbird.
Fee MS, Goldberg JH. Fee MS, et al. Neuroscience. 2011 Dec 15;198:152-70. doi: 10.1016/j.neuroscience.2011.09.069. Epub 2011 Oct 13. Neuroscience. 2011. PMID: 22015923 Free PMC article. Review. - Phasic dopamine release induced by positive feedback predicts individual differences in reversal learning.
Klanker M, Sandberg T, Joosten R, Willuhn I, Feenstra M, Denys D. Klanker M, et al. Neurobiol Learn Mem. 2015 Nov;125:135-45. doi: 10.1016/j.nlm.2015.08.011. Epub 2015 Sep 5. Neurobiol Learn Mem. 2015. PMID: 26343836 Free PMC article.
References
- Apicella P., Scarnati E., Ljungberg T., Schultz W. (1992). Neuronal activity in monkey striatum related to the expectation of predictable environmental events. J. Neurophysiol. 68, 945–960 - PubMed
- Behrman A. L., Cauraugh J. H., Light K. E. (2000). Practice as an intervention to improve speeded motor performance and motor learning in Parkinson's disease. J. Neurol. Sci. 174, 127–136 - PubMed
LinkOut - more resources
Full Text Sources
Other Literature Sources