Dopamine-mediated learning and switching in cortico-striatal circuit explain behavioral changes in reinforcement learning - PubMed (original) (raw)

Dopamine-mediated learning and switching in cortico-striatal circuit explain behavioral changes in reinforcement learning

Simon Hong et al. Front Behav Neurosci. 2011.

Abstract

The basal ganglia are thought to play a crucial role in reinforcement learning. Central to the learning mechanism are dopamine (DA) D1 and D2 receptors located in the cortico-striatal synapses. However, it is still unclear how this DA-mediated synaptic plasticity is deployed and coordinated during reward-contingent behavioral changes. Here we propose a computational model of reinforcement learning that uses different thresholds of D1- and D2-mediated synaptic plasticity which are antagonized by DA-independent synaptic plasticity. A phasic increase in DA release caused by a larger-than-expected reward induces long-term potentiation (LTP) in the direct pathway, whereas a phasic decrease in DA release caused by a smaller-than-expected reward induces a cessation of long-term depression, leading to LTP in the indirect pathway. This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations. The changes in saccade latency become quicker as the monkey becomes more experienced. This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively. Our model also shows how D1- or D2-receptor blocking experiments affect selectively either reward or no-reward trials. The proposed mechanisms also explain the behavioral changes in Parkinson's disease.

Keywords: LTD; LTP; latency; model; motivation; reaction time; reward; saccade.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Reinforcement learning experiments and involved learning circuit. (A) Sequence of events in the one direction rewarded saccade task (1DR). The monkey first fixated at the central spot (the dotted circle indicates the eye position). As the fixation point disappeared, a target appeared randomly on the right or left and the monkey was required to make a saccade to it immediately. Correct saccades in one direction were followed by a tone and juice reward; saccades in the other direction followed by a tone alone. The rewarded direction was fixed in a block of 24 trials, and was changed in the following block. (B) Distribution of saccade latencies in reward trials (in red) and in no-reward trials (in blue). (C) Illustration of D1 and D2 antagonist experiments. D1 or D2 antagonist was administrated in the caudate to examine the behavioral consequence in the 1DR task. Black, red, and purple connections indicate excitatory, inhibitory, and dopaminergic modulatory connections, respectively. (D) Hypothesized circuit involving D1 and D2 mediated plasticities. D1 and D2 mediated plasticities in direct and indirect pathways are assumed to contribute to eye movements. The purple arrows indicate dopaminergic modulatory connections. The lines with rectangle ends indicate inhibitory connections. Arrow ends indicate excitatory connections. Figures (A) and (B) are from Hong and Hikosaka (2008). Abbreviations: CD, caudate nucleus; D1, D2, D1, and D2 receptors; SC, superior colliculus; SNc/SNr, substantia nigra pars compacta/reticulata; GPe, globus pallidus external segment; FEF, frontal eye field; SEF, supplementary eye field; DLPF, dorsolateral prefrontal cortex; LIP, lateral intraparietal area; STN, subthalamic nucleus.

Figure 2

Figure 2

Dopamine-mediated learning mechanisms in the striatum. DA-dependent LTP and LTD are labeled in red; DA-independent LTP and LTD are in blue. (A) Null state, where free saccades happen with no-task. Indirect pathway MSNs, which express D2 receptors, show weak DA-dependent LTD and DA-independent LTP. Direct pathway MSNs, which express D1 receptors, show DA-independent LTD. (B) Hypothesized D1 and D2 thresholds in relation to the levels of DA during big-reward and no (or small) reward trials. (C) In big-reward trials, the increased level of DA causes DA-dependent LTP in the direct pathway and enhances DA-dependent LTD in the indirect pathway. (D) In no-reward trials, the decreased level of DA causes an attenuation of DA-dependent LTD in the indirect pathway. The changes in DA level are assumed to coincide with the activation of the cortical input and the activation of the connected MSN neuron to enable the DA-dependent LTP and LTD (see the eligibility traces in Eqs 6–8). In both cases (C and D) DA-independent LTD in the direct pathway and DA-independent LTP in the indirect pathway remain unchanged, but their effects become relatively weak in big-reward trials (C) and relatively strong in no-reward trials (D). The red arrows indicate the amplitudes (large: 2 arrows, small: 1 arrow) and directions (up, down) of the change of neural activity compared to the null state shown in (A). Note, that even though the DA level in figure (A) and (D) are both under the threshold, LTD in the case of figure (D) happens more vigorously because of enabled eligibility (Eqs 6–8). The thickness of the connections indicates the resulting output in each module. Black and open circles indicate inhibitory and excitatory neurons, respectively. In (C) and (D), the GPb–LHb–SNc circuit is omitted for clarity. LTP/LTD, long-term potentiation/depression; GPb, border region of globus pallidus; LHb, lateral habenula.

Figure 3

Figure 3

Experience-dependent emergence of a switching mechanism that allows rapid changes of saccade latency in response to the change in reward location: before (A–C) and after (D–F) sufficient experience of the 1DR task. We hypothesize the presence of “reward-category neurons” (RWD), a key driver of the switching, that have excitatory connections to FEF neurons and direct pathway MSNs in the CD in the same hemisphere. They would become active before target onset selectively when a reward is expected on the contralateral side (see Figure 4), an assumption based on experimental observations of neuronal activity in the FEF, CD, SNr, and SC. Before sufficient experience of the 1DR task (A–C), the saccade latency changes gradually in both the small-to-big-reward transition [red in **(B,C)**] and the big-to-small-reward transition [blue in **(B,C)**] similarly by experimental observation (B) and computer simulation (C). The saccade latency data in (B) is from monkeys C, D, and T. After sufficient experience of the 1DR task (D–F), the saccade latency changes quickly as shown in experiments (E) and computer simulation (F). This is mainly due to the additional excitatory input from the reward-category neurons. Note, however, that the decrease in saccade latency in the small-to-big-reward transition [red in **(E,F)**] is quicker than the increase in saccade latency in the big-to-small-reward transition [blue in **(E,F)**]. This asymmetry is due to the asymmetric learning algorithm operated by two parallel circuits in the basal ganglia illustrated in Figure 2. Figure (E) from Matsumoto and Hikosaka (2007).

Figure 4

Figure 4

Simulated neural components of the model performing reward and no-reward trials of 1DR task. In reward trials (A) the reward-category unit (REW category) ramps up its activity shortly after the presentation of the fixation point. The activity shuts off in response to the burst activity of DA unit (DA) signaling the reward value of the target. The FEF unit combines the tonic reward-category activity and the phasic target signal. In the BG, both the direct pathway MSN unit (D1) and the indirect pathway MSN unit (D2) receive an input from the FEF. The direct pathway MSN unit (D1), in addition, receives an input directly from the reward-category unit and therefore shows larger ramping activity than the indirect pathway MSN unit (D2). The activity of the direct pathway MSN unit (D1) is further enhanced by DA-dependent LTP, which is triggered by the DA burst, and mediated by D1 receptors. This results in a stronger disinhibition of the SC by the SNr leading to a stronger activity in the SC. In contrast, the activity of indirect pathway MSN unit (D2) is further depressed by DA-dependent LTD, which is triggered by the DA burst, and mediated by D2 receptors. This results in the suppression of the excitatory input from the STN to the SNr, further enhancing the SC activity. The combined effects from the direct and indirect pathways lead to a shorter latency saccade (see the arrow head on top, indicating the time of saccade initiation). In no-reward trials (B) the activity of the reward-category unit is much weaker, thus lowering the activity of the FEF unit and the direct pathway MSN unit (D1). The activity of the direct pathway MSN unit (D1) is further depressed by DA-independent LTD. In contrast, the activity of D2 MSN increases because DA-dependent LTD is attenuated due to the “pause” of DA activity (DA) and thus is dominated by DA-independent LTP. The combined effects from the direct and indirect pathways lead to a weaker activation of the SC unit and hence a longer latency saccade. The scale of all the ordinate axes is from 0 to 1.

Figure 5

Figure 5

Influence of D1 antagonist on saccadic latency. (A) Trial-by-trial changes in the latency of contralateral saccades, before (black) and after (red) injection of a D1 antagonist into the CD. Data are from Nakamura and Hikosaka (, p. 60). (B) Simulated trial-by-trial changes in saccade latency. (C) After D1 antagonist injection, average saccade latency increased in big-reward trials, but not in small-reward trials. The experimental data was replicated by computer simulation. (D) Hypothesized mechanism of the effect of D1 antagonist in big-reward trials. The D1 antagonist effectively elevates the D1 threshold and therefore induces a smaller-than-usual LTP in the direct pathway MSNs, whereas the indirect pathway MSNs are unaffected. The attenuated LTP leads to a weaker activation of the SC and therefore a longer latency saccade. (E) Hypothesized mechanism of the effect of D1 antagonist in big-reward trials. The DA level remains below the D1 threshold similarly to the control condition (Figure 2D) and therefore the saccade latency is not changed.

Figure 6

Figure 6

Influence of D2 antagonist on saccadic latency. (A) Trial-by-trial changes in the latency of contralateral saccades, before (black) and after (blue) injection of a D2 antagonist into the CD. Data are from Nakamura and Hikosaka (2006). (B) Simulated trial-by-trial changes in saccade latency. (C) After D1 antagonist injection, average saccade latency increased in small-reward trials, but not in big-reward trials. The experimental data was replicated by computer simulation. (D) Hypothesized mechanism of the effect of D2 antagonist in big-reward trials. The phasic increase in the DA level exceeds both the D1 and D2 thresholds, although the D2 antagonist elevates the D2 threshold, and therefore the saccade latency remains largely unchanged. (E) Hypothesized mechanism of the effect of D2 antagonist in small-reward trials. The elevated D2 threshold eliminates the DA-dependent LTD in the indirect pathway MSNs and therefore potentiates the SNr-induced inhibition of the SC, leading to a longer latency saccade.

Figure 7

Figure 7

Simulation of disrupted plasticity in Parkinson's disease (PD). (A) Disrupted plasticity in MSNs (green and purple dots) in a rat PD model. When input stimulation was followed by excitation of a MSN repeatedly, the response of the MSN to the input changed gradually, in the directions opposite to control subjects. Data are from Shen et al. (2008), adapted with permission. (B) Hypothesized changes in the DA level and D1/D2 thresholds in PD and PD with

l

-DOPA. See text for details. (C) Simulated plasticity in the direct pathway MSNs (D1) and the indirect pathway MSNs (D2) in PD subjects performing the 1DR task. The simulation shows disrupted plasticity in PD, which is similar to that shown in the rat PD model (A). Note that the magnitude of the plasticity is larger for no-reward trials than for reward trials. (D) Simulated saccade latency in PD subjects with no treatment (PD) and PD subjects with

l

-DOPA. See text for further explanations.

Similar articles

Cited by

References

    1. Apicella P., Scarnati E., Ljungberg T., Schultz W. (1992). Neuronal activity in monkey striatum related to the expectation of predictable environmental events. J. Neurophysiol. 68, 945–960 - PubMed
    1. Behrman A. L., Cauraugh J. H., Light K. E. (2000). Practice as an intervention to improve speeded motor performance and motor learning in Parkinson's disease. J. Neurol. Sci. 174, 127–136 - PubMed
    1. Breitenstein C., Korsukewitz C., Floel A., Kretzschmar T., Diederich K., Knecht S. (2006). Tonic dopaminergic stimulation impairs associative learning in healthy subjects. Neuropsychopharmacology 31, 2552–256410.1038/sj.npp.1301167 - DOI - PubMed
    1. Bromberg-Martin E. S., Matsumoto M., Hong S., Hikosaka O. (2010). A pallidus-habenula-dopamine pathway signals inferred stimulus values. J. Neurophysiol. 104, 1068–1076 - PMC - PubMed
    1. Brown J. W., Bullock D., Grossberg S. (2004). How laminar frontal cortex and basal ganglia circuits interact to control planned and reactive saccades. Neural Netw. 17, 471–51010.1016/j.neunet.2003.08.006 - DOI - PubMed

LinkOut - more resources