Tripartite mechanism of extinction suggested by dopamine neuron activity and temporal difference model - PubMed (original) (raw)

Tripartite mechanism of extinction suggested by dopamine neuron activity and temporal difference model

Wei-Xing Pan et al. J Neurosci. 2008.

Abstract

Extinction of behavior enables adaptation to a changing world and is crucial for recovery from disorders such as phobias and drug addiction. However, the brain mechanisms underlying behavioral extinction remain poorly understood. Midbrain dopamine (DA) neurons appear to play a central role in most acquisition processes of appetitive conditioning. Here, we show that the responses of putative DA neurons to conditioned reward predicting cues also dynamically encode two classical features of extinction: decrement in amplitude of previously learned excitatory responses and rebound of responding on subsequent retesting (spontaneous recovery). Crucially, this encoding involves development of inhibitory responses in the DA neurons, reflecting new, extinction-specific learning in the brain. We explored the implications of this finding by adding such inhibitory inputs to a standard temporal difference model of DA cell activity. We found that combining extinction-triggered plasticity of these inputs with a time-dependent spontaneous decay of weights, equivalent to a forgetting process as described in classical behavioral extinction literature, enabled the model to simulate several classical features of extinction. A key requirement to achieving spontaneous recovery was differential rates of spontaneous decay for weights representing original conditioning and for subsequent extinction learning. A testable prediction of the model is thus that differential decay properties exist within the wider circuits regulating DA cell activity. These findings are consistent with the hypothesis that extinction processes at both cellular and behavioral levels involve a dynamic interaction between new (inhibitory) learning, forgetting, and unlearning.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Characteristics of recorded neurons. Scatter plot shows results of an analysis of waveform shape using measures described by Roesch et al. (2007). Filled circles show presumed dopamine cells, on the basis of inhibitory response to dopamine agonist and slow firing rate. Circles with superimposed crosses indicate presumed dopamine cells that did not show trial-to-trial prediction error signaling (see supplemental material, available at

www.jneurosci.org

). Open circles show results for a sample of neurons that were not accepted as dopaminergic for the purposes of this study on the basis of either no response or an excitatory response to dopamine agonist. Horizontal axis, Waveform amplitude ratio index; vertical axis, waveform trough duration. Histograms show the firing rates of example cells from across the distribution before and after systemic injection of apomorphine (Apo) or quinpirole (Quin), with inset average waveforms. The atlas section (Paxinos and Watson, 1997) shows the recovered position of the indicated cell.

Figure 2.

Figure 2.

Loss of conditioned excitation and development of new inhibition in DA neurons during extinction training. a, b, Solenoid extinction. a, Robust response of cell to sound of solenoid click associated with reward delivery in the unsignaled reward paradigm. The dot raster shows trials in time order (first trial in session at top). The star and droplet symbols indicate the time of solenoid activation delivering fluid reward. The histogram shows the averaged activity across all trials. Blue shading demarcates the duration of the excitatory peak. b, Response of same cell as in a during extinction in a subsequent session of the solenoid only paradigm. Solenoid click (star symbol) was now not associated with fluid delivery. Blue shading shows the duration of the preextinction excitatory peak, from the histogram in a. c–e, Cue extinction. c, Dashed arrow and gray speaker/star/droplet symbol indicate previous exposure of the animal to conditioning with the signaled reward paradigm the previous day, before this cell was encountered. The raster and histogram show the first recording of this cell, which was during the cues-only paradigm, i.e., an extinction session as far as the animal was concerned. Blue shading indicates duration of excitatory peak during conditioning (from the histogram in d). d, The same cell as in c, showing retraining with the signaled reward on a subsequent session. e, Extinction with cues only, after the retraining in d.

Figure 3.

Figure 3.

Quantitative analysis of DA neuron responses during extinction. a, b, Response to cue tones. a, Graph shows data for all cells (n = 15) during conditioning, when cues predicted rewards, and extinction, when cues were not associated with reward. As indicated on the inset example PSTH, for this analysis the immediate postcue period was divided into two periods, epoch 1 from 0 to 125 ms and epoch 2 from 125 to 250 ms after cue onset (time 0 on the inset PSTH). For each epoch, change in firing rate from baseline (horizontal dotted line in the inset example PSTH) after the cue is expressed as a modulation index (see Materials and Methods); values >1 suggest excitation and <1 suggest inhibition of firing compared with baseline. In the main plot, lines connect data from the two epochs from each cell. The same cells occur in both conditioned and extinction columns, but these are not connected across columns for clarity. b, Histogram shows mean + SEM of the data in a, with conditioning epochs in white and extinction epochs in black. **p < 0.01, post hoc Dunn's multiple-comparison test; ###p < 0.001, one-sample t test comparing mean to theoretical mean = 1. c, d, Response to solenoid. Graphs show data for responses to solenoid clicks (not preceded by cues) that either delivered (conditioned) or did not deliver (extinction) fluid reward. Details are as for a and b except that epochs were 50–175 and 175–300 ms after click to allow for longer latency of solenoid responses. ns, Not significant, post hoc Dunn's multiple-comparison test; ##p < 0.01, one-sample t test comparing mean to theoretical mean = 1.

Figure 4.

Figure 4.

Spontaneous recovery of neural responses. a, Single neuron. Dot rasters show action potential occurrences on all trials in three separate sessions of extinction training (Ex1, Ex2, Ex3) of a single neuron. Ex1 and Ex2 were on the same day, separated by several hours; Ex3 was performed the next day. The black triangle indicates the onset of the tone cue. Traces above the rasters show the average firing rate across subblocks of 25 trials within each session, 25 ms bins, smoothed with a three-bin moving average. Calibration: 0.5 s, 20 Hz. Reduction of the response amplitude at the end of each session (white arrowheads) indicates extinction of neural responsiveness, and spontaneous recovery is reflected in the return of the response at the beginning of the following session (black arrowheads). b, Quantitative analysis of spontaneous recovery in cells tested with three sequential blocks (Ex1, Ex2, Ex3) of extinction (n = 5). Histogram bars show the mean + SEM recovery index (see Results) for the first and last subblocks [Ex1(1), Ex1(4), etc.] of each session. A dotted vertical line divides sessions recorded on the same day, i.e., separated by a relatively short interval, whereas a dashed vertical line indicates that Ex3 was recorded the next day. *p < 0.05, Wilcoxon signed rank test; #p < 0.05, ##p < 0.01, one-sample t test. n = 6 for all groups except Ex3(4) (n = 5).

Figure 5.

Figure 5.

Behavioral extinction. a, Parallel extinction of behavioral and neural responding during solenoid extinction. Rasters show example of changes in neural activity and behavior for one DA cell recording. Left rasters show spikes of the cell on each trial of unsignaled reward (top) and solenoid extinction (bottom), whereas rasters on right show all licks made at the spout over the same trials. Solenoid click time is indicated by the star symbol; a droplet symbol indicates that fluid was also delivered or, if crossed out, that fluid was not delivered. Trials are in time order with the first at the top. The curved arrow indicates the last lick made during the session. The inset graph shows mean data across all cells tested (n = 5). Filled and open circles show cell firing rate and licking behavior, respectively, expressed as an extinction index (see Results), in the last 10 trials of the unsignaled reward (UR) session and for five sequential blocks of 10 trials of the solenoid extinction session (Ex1–Ex5). Error bars show SEM. b, Time course of extinction of behavioral responses to cue tones, recorded in a separate series of animals. Points show the mean (±SEM; n = 6 rats) number of trials on which any lick occurred for successive 25-trial blocks over three successive cues-only training sessions. c, Effect on behavioral extinction of injection of GABA antagonist into VTA. The top graph shows mean ± SEM number of trials with licking responses across consecutive blocks of 10 trials during extinction training. Open squares, Extinction training with intracerebral bicuculline (Bicuc) infusion; filled circles, vehicle (saline). The bottom graph shows statistical analysis. The vertical axis shows the conditioned response index (arcsine transformed, in degrees, to normalize the variance). Lines show the linear regression of transformed data. The error bar on last bicuculline point shows within-subject SEM; the floating error bar indicates ±SEM across rats. *p < 0.05, ANOVA on the linear component comparing the two data sets overall.

Figure 6.

Figure 6.

TD model. a, Schematic diagram. Separate positive (excitatory, w+; blue,) and negative (inhibitory, w_−; red) weights both contribute to generation of predictions (P) after an external sensory stimulus (Sl) represented by state vector xl. TD, Temporal difference of predictions; r, reward; δ, prediction error, postulated to be represented in the firing of DA neurons. The prediction error feeds back to influence both positive and negative weights. b, Schematic diagram demonstrating weight changes in the model during learning and extinction. Slopes have been exaggerated to emphasize important features. The lines illustrate how, during different phases, changes in positive (blue) and negative (red) weights result in net weight changes (black) that underpin conditioning, extinction, and spontaneous recovery. Positive weights are strengthened (learning) by positive prediction error signals (+δ) that occur when cues are paired with rewards (C → R), and weakened (unlearning) by negative prediction errors (−δ) generated during extinction training, when cues do not predict rewards (C → X). Rates of both changes are determined by the parameter α. Conversely, negative weights are strengthened by negative prediction error signals during extinction, at a rate set by parameter β. When there is no prediction error signal, both weights undergo decay (forgetting) as a function of time (Δ_t), at rates determined by parameters ψ+ and ψ− for positive and negative weights, respectively.

Figure 7.

Figure 7.

Prediction-error output of the TD model during learning and extinction. a, Exploration of parameter space. 3D plots show prediction-error responses to cues and rewards for each time step during learning (300 trials), extinction (100 trials), and tests for spontaneous recovery (100 trials each), for different ratios of decay (ψ−/ψ+), and learning (β/α) parameters. For the _y_-axis, values of ψ− range from 0.999999 to 0.9998 with ψ+ = 0.999999, for ratios (top to bottom) of 1, 0.999951, 0.999901, and 0.999801. For the _x_-axis, values of β range from 0.1 to 0.4, with α = 0.005, for ratios (left to right) of 20, 40, and 80. The outlined plot is shown enlarged below. Note that the profile of cue and reward responses during cue–reward association learning (C → R, green arrow) are the same as in the standard TD model (Pan et al., 2005). During extinction training (cues without rewards, C → X, black arrow), the cue response disappears, but after a pause (equivalent in duration to 100 trials) during which no cues are delivered, the cue response briefly reappears for initial trials when the cue is once again presented (spontaneous recovery, red arrow) but rapidly falls away. Retesting after a further pause reveals a further spontaneous recovery (blue arrow). b, Cue responses across the parameter space. Plots show the prediction error across trials of learning, extinction, and tests for spontaneous recovery, at the time step of cue delivery, for each of the 3D plots in a. Horizontal dashed red lines indicate the level of prediction error output obtained on the last trial of the extinction training. A prediction error signal above this level on the first trial of subsequent reexposure to cues indicates spontaneous recovery.

Figure 8.

Figure 8.

Patterns of model output during extinction match behavioral features of extinction learning. a, Prediction error responses of the model (δ) to cue signals within and across three extinction sessions, averaged from the data shown in the enlarged 3D surface in Figure 6_a_ and formatted to compare with the behavioral and neuronal data shown in Figure 2, b and c. b, A longer delay between extinction sessions 1 and 2 (twice as long as in a) results in enhanced spontaneous recovery at the beginning of the second session. c, Speeded relearning after extinction. Points show height of the prediction error response to cue over sequential trials. Top plot shows growth in prediction-error response to cue during initial learning of cue–reward association (C → R). The middle plot shows loss of response during extinction (C → X), and faster rate of growth in prediction error response when cue–reward association is reinstigated (C → R). The circle indicates the first C → R trial, which occurs immediately after extinction, so there is no spontaneous recovery. The bottom plot shows speeded relearning after a postextinction pause, during which no stimuli are delivered. Here, relearning is superimposed on the spontaneous recovery of cue response that occurs on the first trial (circle).

References

    1. Aebischer P, Schultz W. The activity of pars compacta neurons of the monkey substantia nigra is depressed by apomorphine. Neurosci Lett. 1984;50:25–29. - PubMed
    1. Bouton ME, Moody EW. Memory processes in classical conditioning. Neurosci Biobehav Rev. 2004;28:663–674. - PubMed
    1. Chiodo LA, Bannon MJ, Grace AA, Roth RH, Bunney BS. Evidence for the absence of impulse-regulating somatodendritic and synthesis-modulating nerve terminal autoreceptors on subpopulations of mesocortical dopamine neurons. Neuroscience. 1984;12:1–16. - PubMed
    1. Comoli E, Coizet V, Boyes J, Bolam JP, Canteras NS, Quirk RH, Overton PG, Redgrave P. A direct projection from superior colliculus to substantia nigra for detecting salient visual events. Nat Neurosci. 2003;6:974–980. - PubMed
    1. Delamater AR. Experimental extinction in pavlovian conditioning: behavioural and neuroscience perspectives. Q J Exp Psychol B. 2004;57:97–132. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources