Habenula lesions reveal that multiple mechanisms underlie dopamine prediction errors (original) (raw)

. Author manuscript; available in PMC: 2016 Sep 23.

SUMMARY

Dopamine (DA) neurons are thought to facilitate learning by signaling reward prediction errors (RPEs), the discrepancy between actual and expected reward. However, how RPEs are calculated remains unknown. It has been hypothesized that DA neurons receive RPE signals from the lateral habenula. Here, we tested how lesions of the habenular complex affect the response of optogenetically-identified DA neurons in mice. We found that lesions impaired specific aspects of RPE signaling in DA neurons. The inhibitory responses caused by reward omission were greatly diminished while inhibitory responses to aversive stimuli, such as air puff-predictive cues or air puff, remained unimpaired. Furthermore, we found that after habenula lesions, DA neurons’ ability to signal graded levels of positive RPEs became unreliable, yet significant excitatory responses still remained. These results demonstrate that the habenula plays a critical role in DA RPE signaling but suggest that it is not the exclusive source of RPE signals.

INTRODUCTION

The ability to predict future outcomes based on sensory inputs is critical for making proper decisions. Psychological studies of animal learning have shown that temporal contiguity between two events (e.g. a sensory cue and reward) is not sufficient for establishing an association between them. Instead, it has been postulated that the efficiency of learning depends on how surprising the outcome is (Kamin, 1969; Rescorla and Wagner, 1972; Steinberg et al., 2013). According to Rescorla and Wagner (1972), learning (Δν) is proportional to the discrepancy between the value of obtained reward (R) and the predicted value of reward (ν), or reward prediction error (RPE) (R – ν):

DA neurons in the midbrain signal RPEs (Bayer and Glimcher, 2005; Cohen et al., 2012; Schultz et al., 1997). DA neurons are excited by unpredicted reward. When a sensory cue predicts a reward, DA neurons respond to the reward-predictive cue and their response to the predicted reward is greatly reduced. Furthermore, when a predicted reward is omitted, DA neurons decrease their firing (‘dip’) from baseline. The mechanisms by which DA neurons generate RPE signals, however, remain largely unknown.

Recent studies have found that neurons in the lateral habenula (LHb) also encode RPEs, but in the opposite direction compared to DA neurons (Matsumoto and Hikosaka, 2007, 2008), and that stimulation of LHb causes transient inhibition of DA neurons (Christoph et al., 1986; Ji and Shepard, 2007; Matsumoto and Hikosaka, 2007). Furthermore, stimulation of LHb neurons is sufficient to cause aversive learning (Matsumoto and Hikosaka, 2011; Stamatakis and Stuber, 2012). These findings raised the possibility that RPE signals are already calculated in LHb and simply relayed to DA neurons. Although intriguing, this hypothesis has not been tested experimentally. Moreover, at the behavioral level, it is unclear whether and how the habenula regulates learning that depends on positive or negative RPEs.

To address these questions, we performed bilateral lesions of the habenula and examined the effects on the firing patterns of DA neurons as well as behavioral performance in a classical conditioning task. We found that habenula lesions significantly impaired DA neurons’ normal inhibitory response to reward omission. By contrast, inhibitory responses to air puff-predictive cues or air puffs remained intact in lesioned animals. Furthermore, DA neurons encoded RPE signals during reward-predictive cues and reward in lesioned animals, although these responses were less reliable. At the behavioral level, we observed a phenotype that is consistent with a relative reduction of negative, over positive, RPEs. Taken together, our results support the idea that multiple inputs play a role in the generation of RPE-related activities of DA neurons.

RESULTS

Behavioral paradigm and habenula lesions

We trained mice in a classical conditioning paradigm in which odor cues (conditioned stimuli, CS) predicted either appetitive or aversive outcomes (unconditioned stimuli, US) with different probabilities (Figures 1A and 1B). In reward trials, the size of water was constant and the probability of water delivery was varied (90, 50, 10%) for different odor cues. In aversive trials, air puffs were delivered to the face of the mouse with 90% probability. Each behavioral trial began with an odor cue (CS; 1s) followed by a 1s delay and an outcome (US). In addition to the cued trials, we interleaved reward alone (free reward) or air puff alone (free air puff) in some trials without an odor cue. Mice began to lick during the delay between the reward-predictive cue and reward. Lick frequencies were higher in trials with higher reward probabilities, indicating that the animals learned the association of odor cues and the expected value of the upcoming reward (anticipatory licking for 100% reward > 50% reward > 0% reward; P < 0.001 for both, Wilcoxon signed rank test, n = 12 mice; Figure 1C).

Figure 1. Odor-outcome association task.

Figure 1

(A) Experimental set-up.

(B) Task design.

(C) Frequency of licking during pre-operative period (mean ± s.e.m., bins of two days).

(D) Frequency of licking during post-operative period.

(E) Histogram of anticipatory lick frequency. n, number of trials.

(F) Ratio between anticipatory lick frequencies in trials with 50% and 90% probabilities of reward for early and late days after operation. *, P < 0.05; n.s, _P_ > 0.05 (Wilcoxon rank-sum test). One control animal did not perform enough sessions during the 11–15 day period, thus was omitted from the analysis.

See Figure S1 for details of lesion.

To examine the role of habenula in the activity of DA neurons and behavior, we lesioned the habenular complex bilaterally in a set of animals (n = 5 mice), after initial conditioning training. To make the lesions of the habenula as complete as possible, we chose to perform electrolytic lesions (See Discussion and Experimental Procedures). For comparison, another set of animals (n = 7 mice) underwent sham-lesions or no operations. Lesions covered a large portion of the habenula with occasional, small lesions in the medial part of the hippocampus and the paraventricular thalamic nucleus (PVT) (Figure S1, Table S1).

Elevated and less discriminable anticipatory licking in lesion animals

During the training period, there was no difference between lesion and control groups in anticipatory licking (P = 0.095, 3-way analysis of variance [ANOVA]). During the postoperative period, control animals’ lick rates were stable across days (Figure 1D). However, lesion animals’ anticipatory licking in both 50% and 90% reward trials gradually increased over 15 days (r = 0.89, P < 0.001, 50% trials; r = 0.77, P < 0.001, 90% trials; Pearson correlation; Figure 1D), resulting in significantly higher lick rates than control mice (P <0.0001, 3-way ANOVA). Furthermore, the distribution of anticipatory licks in 50% reward trials became more similar to 90% reward trials in the lesion group (Figure 1E,F). These results show that the lick frequency became less sensitive to fine differences in the probability of upcoming reward.

Overall firing patterns of dopamine neurons in control and lesion animals

We recorded the spiking activity of neurons in the VTA (170 neurons in 7 control animals and 276 neurons in 5 lesion animals, Figure S1B; control: 14 ± 7 days, lesion: 15 ± 8 days, mean ± s.d.). To identify DA neurons while recording, we tagged DA neurons with a light-gated cation channel, channelrhodopsin-2 (ChR2) (Experimental Procedures). DA neurons were identified based on their responses to light delivered through an optical fiber placed near the tip of the electrodes. In addition to reliable spiking responses to light, we also verified that the shape of light-evoked spikes was almost identical to that of spontaneous spikes (correlation coefficient > 0.9). Based on these criteria (see Experimental Procedures for detail), we obtained 45 and 44 DA neurons in control and lesion animals, respectively (control: 6.4 ± 5.3, lesion: 8.8 ± 4.9 neurons per animal; mean ± s.d. Figures 2A–2F). These neurons showed short-latency spikes in response to light (3.4 ± 1.1ms, mean ± s.d.), and had little jitter in the latency of the first spike (Figure 2F), indicating that they were directly activated by light.

Figure 2. Optogenetic identification of dopamine neurons and the overall firing patterns of dopamine neurons.

Figure 2

(A) Voltage trace an example DA neuron. Cyan bars: laser stimulation. One spontaneous (left) and one light-triggered (right) spikes are shown below.

(B) Raster plot of the same neuron to 10Hz (left) and 50Hz (right) stimulations.

(C) Isolation of this neuron (arrow) from the noise cluster. The energy of spike waveform is defined as the integral of the squared voltage values (∫v2dt).

(D) Histogram of p-values testing whether light-activation induced significant changes in spike timing (n = 446 units). The P-values were derived from SALT (Stimulus-Associated spike Latency Test; see Experimental Procedures) (Kvitsiani et al., 2013). Neurons with P-values < 0.05 and waveform correlations > 0.9 were considered identified (grey).

(E) Probability of a spike as a function of stimulation frequency for each DA neuron (grey) and the mean across DA neurons (blue).

(F) Histogram of mean (left) and S.D. (right) spike latency to light stimulation.

(G) Average firing rates of all identified DA neurons recorded in control (left) and lesioned (right) animals. US+, Trials in which outcome was delivered; US−, Trials in which outcome was omitted. Grey area indicates the time of odor stimulation. Dash line indicates expected US onset time. n, number of neurons from 7 control and 5 lesion animals. The same sets of neurons are used for (H) and (I). See also Figure S2.

(H) Boxplot of baseline firing rate of DA neurons. The edges of the boxes are the 25th and 75th percentiles, and the whiskers extend to the most extreme data points not considered outliers. Points are drawn as outliers if they are larger than Q3+1.5*(Q3-Q1) or smaller than Q1−1.5*(Q3-Q1), where Q1and Q3 are the 25th and 75th percentiles. Outliers were plotted as individual data points. *, P < 0.05 (Wilcoxon rank-sum test).

(I) Baseline firing rate of individual DA neurons plotted as a function of days of recording. The firing rate of neurons in lesion group is not significantly correlated with recording days (r = 0.24, P = 0.11; Pearson correlation). Black, control; red, lesion.

Identified DA neurons in the control group showed characteristic firing patterns consistent with RPE signals (Figure 2G, left). First, they were activated by reward-predictive CSs in a value-dependent manner. Second, they responded strongly to unexpected reward, and their reward responses were reduced when an odor cue predicted the reward. Third, when an odor predicted reward, omission of that reward caused a transient decrease (“dip”) in activity below baseline.

In lesion animals, the baseline firing rate of DA neurons was elevated (control: 5.60 ± 2.26 spikes/s; lesion: 6.64 ± 3.20 spikes/s; mean ± s.d.; P < 0.05; Wilcoxon rank sum test; Figure 2H), but did not correlate with days of recording (Figure 2I). DA neurons largely maintained their phasic response patterns (Figure 2G, right), even in the two animals with almost complete lesions (100% medial habenula and >85% lateral habenula lesion; Figure S2). However, the fraction of neurons responding to particular events and the magnitude of these responses were altered. We will quantify these results in greater detail in the following sections.

Inhibitory responses during reward omission were greatly reduced in lesion animals

In control animals, reward omission caused a significant dip in most DA neurons (see Figure 2G for average responses and Figure 3A for example neuron). The response of individual DA neurons was visualized using the receiver-operating characteristic (ROC) analysis (Green and Swets, 1966). For each neuron, firing rate change from baseline was quantified using the area under the ROC curve (auROC) in a sliding window. Values greater than 0.5 indicate increases in firing rate and values less than 0.5 indicate decreases (shown by yellow and blue, respectively, in Figure 3B). In control animals, many individual DA neurons showed a transient decrease in firing rate around the time when a reward was expected. In lesion animals, DA neurons’ dip during reward omission became sporadic in timing and less prominent over the population (Figure 3B).

Figure 3. Inhibitory responses during reward omission were diminished in lesion animals.

Figure 3

(A) Example neuron’s responses during reward omission.

(B) Temporal profiles of all DA neurons during omission of 90% reward. Colors indicate an increase (yellow) or decrease (blue) from baseline, as quantified using a sliding-window auROC analysis (time bin: 200ms; against baseline). Each row represents one neuron.

(C) Percentage of neurons that showed a significant response to reward omission (versus baseline, P < 0.05, Wilcoxon signed-rank test). ***, P < 0.001; *, P < 0.05 (Binomial test). Analysis window is 2–3s after odor onset, same for D and E. Black, control; red, lesion.

(D) Response magnitude during reward omission as measured by firing rate changes from baseline (mean ± s.e.m.). Blue asterisks indicate a significant difference between control and lesion groups (***, P < 0.001; no labeling, _P_ > 0.05; Wilcoxon rank-sum test). Black and red asterisks indicate a significant difference between responses to two probabilities of reward within the control or lesion group (***, P < 0.001; n.s., not significant; Wilcoxon signed-rank test).

(E) Boxplot of the auROC values (omission of 90% reward versus omission of 50% reward). ***, P < 0.001 (Wilcoxon rank-sum test).

See also Figure S3.

In control animals, 86.7% of the DA neurons showed a significant dip in activity during omission of 90% reward (P < 0.05, Wilcoxon rank sum test), while only 47.7% in lesion animals did so. The magnitude of the dip, as quantified by the decrease in firing rate from baseline during omission of 90% reward, was also reduced in lesion animals (control: −2.6 ± 0.2, spikes/s; lesion: −1.5 ± 0.2, spikes/s; mean ± s.e.m; P < 0.0001, Wilcoxon rank sum test; Figure 3D).

Furthermore, we found that the firing rate difference between the omission of 50% reward versus that of 90% reward was smaller in lesion animals (firing rate difference: control: P < 0.001, lesion: _P_ > 0.05, Wilcoxon signed rank test; Figure 3D; auROC, 90% versus 50% reward omission: control, 0.30 ± 0.13; lesion, 0.46 ± 0.18; mean ± s.d; P < 0.0001, Wilcoxon rank sum test; Figure 3E; see Experimental Procedures). We also noticed that DA neurons began decreasing their firing rate before the expected timing of reward delivery. This is probably due to uncertainty in reward timing because the timing at which an odor reaches the olfactory epithelium may vary depending on the timing of inhalation onset. We found that this “pre-reward dip” was affected by lesions in a similar manner as the dip after the time of expected reward onset (Figure S3).

We examined whether unintended lesion of PVT contributed to the lesion effect; we did not find significant correlations of the PVT lesion size and reward omission response (r = −0.19, P = 0.21; Pearson correlation). Together, these results demonstrate that the inhibitory responses in DA neurons during reward omission were impaired by lesioning the habenular complex.

Inhibitory responses to air puff remain unimpaired

LHb neurons are excited by aversive outcomes and cues that predict them, in addition to the omission of predicted reward (Matsumoto and Hikosaka, 2008). If habenula plays a general role in generating the inhibitory responses of DA neurons, DA neurons’ inhibition by air puff-predictive cues and air puff should also be reduced.

Many identified DA neurons showed biphasic responses to air puff, consisting typically of a brief excitation followed by inhibition (Figures 4A, B). In both excitatory and inhibitory phases, an unpredicted (“free”) air puff caused significantly stronger responses than did a predicted air puff (Figure S4). We first quantified the net response of individual neurons using 0–400 ms time window to cover the entire air puff response period. Contrary to the above prediction, a larger fraction of neurons in the lesion group were significantly inhibited by a free air puff than in the control group (control: 18%, lesion: 36%, P <0.05, Wilcoxon signed rank test; the difference in the fractions between control and lesion animals: P < 0.05, Binomial test; Figure 4C). Similar trends were observed when the responses were quantified using firing rate changes (Figure 4D). We also examined the prediction error coding for air puff by quantifying the discriminability between predicted and unpredicted air puff using the auROC values; there was no significant difference between the control and lesion group (Figure 4E).

Figure 4. Inhibitory responses to air puff-predictive cues and air puff were unimpaired.

Figure 4

(A) Average firing rates in air puff trials. Grey area, CS period. Black bars, time windows for data analysis in (C)–(E). n, number of neurons; n = 37 for control group’s response to free airpuff; same set of neurons is plotted for (C)–(F).

(B) Temporal profiles of individual DA neurons in air puff trials using a sliding-window auROC analysis (time bin: 100ms; against baseline). The same neuron’s response to free air puff (unpredicted air puff) is shown on the right. White areas on the right are sessions without free air puff trials.

(C) Percentage of neurons that showed a significant response to air puff or air puff-predictive cues. Black, control; red, lesion. Empty bar, significant excitation (versus pre-odor baseline, P < 0.05, Wilcoxon signed-rank test); filled bar, significant inhibition (versus pre-odor baseline, P < 0.05, Wilcoxon signed-rank test). *, P < 0.05 (Binomial test)

(D) Comparison of air puff response amplitudes between control and lesion group, measured by firing rate changes from baseline. *, P < 0.05 (Wilcoxon rank-sum test).

(E) Boxplot of auROC (90% airpuff versus free air puff).

See Figure S4, S5 for additional information.

We next analyzed the data by dividing the analysis window into early excitatory and late inhibitory epochs based on the average time course in control animals (0–120 ms and 120–400 ms from air puff onset for excitatory and inhibitory periods, respectively). During the early excitation epoch, we observed no significant difference in neurons’ responses to predicted or free air puffs (Figure S5A). In the late inhibitory response epoch, neurons in lesion animals were significantly more inhibited than those in control animals (Figure S5B). These results hold true even if we shifted the boundary between early and late analysis windows (e.g. 100 ms or 150 ms).

During the CS period, about half of identified DA neurons showed a decrease in firing in response to air puff-predictive cues compared with pre-odor baseline activity (Figures 4C, D). These inhibitory responses were also unimpaired by habenula lesions. In summary, these results show that habenula lesions preferentially impaired the response to reward omission while leaving other inhibitory responses either unaffected or even increased.

Positive RPEs were weakened but preserved

We next analyzed phasic excitation following positive RPE events including reward-predictive cues and reward (Figure 5). In control animals, almost all identified DA neurons showed a transient excitation in response to the 90% reward CS (95.6%, 43 of 45 DA neurons; spike counts in a 0–600 ms window after CS onset against baseline; P < 0.05, Wilcoxon rank sum test; Figure 5A) or to unexpected reward (100%, 45 of 45 DA neurons; spike counts in a 0–400 ms window after US onset against baseline, P < 0.05, Wilcoxon rank sum test; Figure 5B). By contrast, in lesion animals, fewer neurons responded significantly to 90% reward CS (70.5%, 31 of 44 DA neurons; Figure 5A) or to unexpected reward (84.1%, 37 of 44 DA neurons; Figure 5B). When the magnitude of the responses was compared, phasic firing to events with positive RPE signals was smaller in the lesion group (54% decrease for 90% reward predictive cue; 38% decrease for free or 10% reward; Figures 5C, D). These CS and US responses were also less reliable in distinguishing 90% versus. 50% probability of reward in lesion animals (auROC, 90% versus 50% reward, P < 0.0001 for both CS and US responses when comparing control and lesion group, Wilcoxon rank sum test; Figures 5E, F).

Figure 5. Phasic excitations to reward CS and US were weakened in lesion animals.

Figure 5

(A) Percent of neurons that showed a significant response (versus baseline, P < 0.05, Wilcoxon signed-rank test) during reward CS (0–600ms after CS onset). ***, P < 0.001; *, P < 0.05; n.s, not significant (Binomial test). Black, control (n = 45 neurons, 7 mice); red, lesion (n = 44 neurons, 5 mice). The same sets of neurons are plotted for (B)–(F).

(B) Percent of neurons that showed a significant response during reward (0–400ms after US onset). Free reward trials were not significantly different from 10%-predicted-reward trials, and these trials types were combined.

(C and D) The magnitude of response (mean ± s.e.m.) to reward CS (C) and reward US (D), subtracted by baseline before trial starts. Filled symbols indicate a significant deviation from zero (P < 0.05, Wilcoxon signed-rank test). Blue asterisks indicate a significant difference between control group and lesion group (***, _P_ < 0.001; **, _P_ < 0.01; no labeling, _P_ > 0.05, Wilcoxon rank-sum test). Black and red asterisks indicate a significant difference within control or lesion group to different probabilities of reward (***, P < 0.001; **, P < 0.01; *, P < 0.05; Wilcoxon signed-rank test)

(E and F) Boxplot of the auROC values (90% versus 50% reward trials) during reward CS (E) and reward US (F). Asterisks indicate a significant difference between control group and lesion group (***, P < 0.001; n.s, _P_ > 0.05, Wilcoxon rank-sum test).

Although DA neurons in the lesion group signaled positive RPE less reliably, these responses were still modulated by reward expectation (P < 0.001, Wilcoxon signed rank test; Figures 5C, D), exhibiting the hallmark of RPE-related activity. This is true even in the animals with almost complete lesions of the habenula (Figures S3B–C). These results suggest that the habenula boosts the positive RPE-related responses of DA neurons, but may not be required for these responses.

Analysis of putative GABA and other unidentified VTA neurons

VTA contains a large number of GABA neurons, which are directly innervated by excitatory projections from the LHb (Brinschwitz et al., 2010; Omelchenko et al., 2009). The firing patterns of VTA neurons in a similar task can be classified into three distinct clusters using an unsupervised method (Cohen et al., 2012). DA neurons and GABA neurons corresponded to two of the three types identified with this method. Although we did not directly identify GABA neurons in the present study, we analyzed the putative GABA neurons by classifying neurons into three response types. First, we clustered all recorded VTA neurons in control and lesion animals into three clusters based on their firing patterns in 90%-reward trials (Figure 6A). All identified DA neurons in control animals and most (37/44) identified DA neurons in lesion animals fell into the first cluster, consistent with our previous study (Cohen et al., 2012) (Figures 6A, B). Note that cluster 1 neurons in control group included a small fraction of neurons that were activated by air puff more strongly than any optogenetically-identified DA neurons; this population, which is likely to be non-dopaminergic, was largely absent in the lesion group (see Experimental Procedures for details). Neurons in the second cluster showed sustained excitation after CS onset whose magnitude monotonically increased with reward probabilities, similar to optogenetically-identified GABA neurons (Cohen et al., 2012). In addition, there was a third cluster of neurons that were inhibited during the delay.

Figure 6. Comparison of all VTA neurons’ responses in control and lesion group.

Figure 6

(A) Clustering of response profiles. All recorded neurons are clustered into three groups based on their response profiles in 90% reward trials. The same neurons’ responses in 90% air puff trials are shown in the right panel. Clusters are separated by red lines. From top to bottom, cluster 1, 2 and 3 in (B) respectively.

(B) Average firing patterns of three neuronal clusters in (A). Colors are as in Fig 2G. Only US+ trials are shown. n, number of neurons. The same sets of neurons are plotted in (C)–(F).

(C) Baseline firing rates (mean ± s.e.m.). ***, P < 0.001 (Wilcoxon rank-sum test). cl1, cluster 1; cl2, cluster 2; cl3, cluster 3.

(D) Response to air puff (0–200ms after air puff onset) (mean ± s.e.m., baseline subtracted). ***, P < 0.001; **, P < 0.01; n.s., not significant (Wilcoxon rank-sum test).

(E) Delay period activity (auROC against baseline, 1–2s after odor onset) in 90% reward trials. ***, P < 0.001; n.s., not significant (Wilcoxon rank-sum test).

(F) Delay period activity (1–2s after odor onset) in 90% reward trials versus 50% reward trials as quantified using the auROC against each other. ***, P < 0.001 (Wilcoxon rank-sum test).

Consistent with the habenula’s disynaptic inhibitory connections to DA neurons and direct excitatory connections to VTA GABA neurons, habenula lesions caused a slight increase in the baseline firing rates of neurons in cluster 1 and a decrease in cluster 2 (Figure 6C). In addition to changes in baseline, habenula lesions also altered the task- related responses of neurons in cluster 2 (putative GABA neurons). First, whereas air puff caused phasic excitation in cluster 2 neurons of control group, these responses were dramatically decreased in lesion group (Figure 6B, D). In addition, the robustness of sustained excitation during the delay in 90%-reward trials was greatly reduced in lesion animals (P < 0.0001, Wilcoxon rank sum test; Figure 6E). The sustained excitations in 90% and 50% reward trials were less discriminable (P = 0.0017, Wilcoxon rank sum test; Figure 6F).

Behavioral phenotype is consistent with a relative decrease of negative over positive RPE

The above results demonstrate that habenula lesions impaired aspects of RPE-related responses of DA neurons. In control animals, lick frequency in 50%-reward-probability trials reached an asymptote at an intermediate level. This can be parsimoniously explained by RPE: in 50%-reward-probability trials, animals receive positive RPEs when reward was delivered and negative RPEs when reward was omitted. As a result, the predicted value of reward reaches equilibrium. We hypothesized that the elevated lick frequency in 50%-reward trials in lesion animals can be explained by unbalanced RPEs.

To test this idea, we implemented a simple reinforcement learning model (Rescorla and Wagner, 1972). In this model, animals learn to predict the value of upcoming rewards associated with different odor cues. The value of each odor was updated based on the magnitude of RPE multiplied by a learning rate parameter (α). To dissociate the effect of positive versus negative RPEs in learning, two learning rate parameters (αP and αN) were assigned separately for each (Figure 7A). The third parameter, R, corresponds to the value of the outcome.

Figure 7. Simulation of anticipatory licking based on the Rescorla-Wagner model.

Figure 7

(A) Schematic of the learning algorithm.VN, associative strength between odor cue and reward at trial N, corresponds to anticipatory lick frequency measured in trial N; αP, learning rate from positive prediction error; αN, learning rate from negative prediction error. The value of VN is updated in a trial-by-trial manner depending on whether reward is delivered or not, as well as the current value of VN.

(B) The quality of model fitting, indicated by log likelihood of observing the experimental data, depending on the combination of αP and αN. Control group has the best fit when αP and αN have similar values (left); lesion group has the best fit above the diagonal line (i.e. αP > αN). Best fit parameters (αP and αN) are marked by asterisks and correspond to the conditions for data fitting in (C). Due to differences in the duration of experiments, 16 days of data is used for control data fitting and 27 days of data is used for lesion group. Using 16 days of the data to fit both groups yielded qualitatively similar results. Same values of R (6.8 licks/s) are used for control and lesion group.

(C) Example fitted data from (B) using best fitted parameters.

We first fit the model to the data in control animals during training and during the post- operative period. We found that the fitted value for αP (0.0020 ± 0.0002) was very close to the fitted value for αN (0.0022 ± 0.0003). The fitted value of R (6.8 ± 0.3 licks/s) reflected the asymptotic lick frequency in high reward probability trials. Next, assuming that the lesion group has the same value of R, we examined what combinations of αP and αN best matched the behavioral data in the lesion group. Given this model, we found that the probability of obtaining the observed behavioral data in lesion group was highest when the ratio between αP and αN was 0.0055 to 0.0004, while similar value of αP and αN best predicted the control data (Figures 7B, C). Even when R was allowed to be different in the control and lesion group, we obtained consistent results that optimal fitting was achieved only when αP > αN. In summary, the behavioral phenotype is consistent with a relative reduction of negative RPE-based learning over positive RPE-based learning.

DISCUSSION

In the present study, we examined the role of the habenular complex (including both medial and lateral habenula) in RPE signaling of VTA DA neurons and in behavioral performance. Our results demonstrate that various aspects of RPE were impaired in habenula lesion animals. We found that the inhibitory responses of DA neurons upon reward omission were greatly diminished although inhibitory responses to aversive stimuli were relatively unimpaired. On the other hand, the effects on excitatory responses were much milder. These results suggest that the habenula contributes in generating DA responses, but only to particular events; other inputs are likely responsible for responses to other events.

Technical considerations

The present study was designed to examine the necessity of the habenular complex for reward prediction error signals in DA neurons. The habenula in mice is a longitudinal structure extending about 1 mm along the anterior-posterior axis and is located immediately adjacent to the third ventricle. In order to reliably inactivate the whole structure, we chose electrolytic lesions over other methods such as excitotoxic lesions, optogenetics, pharmacogenetics or local pharmacology. These alternative methods run the risk of partial inactivation. Further, given the high baseline firing rates of habenula neurons, transient inactivation might cause a great increase in the baseline firing rates of DA neurons, which would in turn make it difficult to isolate the effects on phasic responses. For our purpose, permanent lesions have certain advantages. First, inactivation is complete, and it is easier to confirm the extent of inactivation (lesions) by histology. This method can be applied to an elongated brain area, and it is expected to be less prone to causing large changes in baseline firing in DA neurons. On the other hand, lesions have disadvantages. For instance, long-term compensation may occur over time (discussed further below).

After lesions of the habenula, DA neurons were identified using an optogenetic tagging method. This type of unambiguous identification was necessary because lesions could have dramatically altered the firing patterns of DA neurons. Furthermore, spike waveforms, a commonly used method to identify DA neurons, have been shown to be unreliable in recording conditions similar to the present study (Cohen et al., 2012).

Although the habenula is relatively isolated from other surrounding areas, it is possible that our habenula lesions might have damaged fibers of passage in the stria medullaris and fasciculus retroflexus. The stria medullaris mainly consists of afferent fibers to the habenula from forebrain areas such as the lateral hypothalamus, nucleus of diagonal band, septum nucleus and endopedunculus nucleus (Klemm, 2004). The fasciculus retroflexus is a fiber tract linking the habenula and subcortical areas such as the VTA, rostromedial tegmental nucleus (RMTg), raphe nucleus and interpeduncular nucleus (Araki et al., 1988; Herkenham and Nauta, 1979). Although we cannot exclude the possibility that we lesioned a fraction of these fibers that just passed the habenula, most of these fibers constitute inputs and outputs of the habenula. In addition, in some cases, we also damaged the dorsal part of the PVT. However, the size of PVT lesions was not correlated with the deficit in DA “dip” during reward omission. Although we cannot exclude the possibility that these unintended lesions contributed to the effect of lesions, in the following, we will provide a parsimonious explanation for our data based on reported neuronal activities and projections of neurons in the habenular complex (Hong et al., 2011; Jhou et al., 2009; Matsumoto and Hikosaka, 2007, 2009).

It had been proposed that LHb neurons send a relatively complete set of RPE signals to DA neurons (Hong et al., 2011; Matsumoto and Hikosaka, 2007). Our results partially support this idea, as habenula lesions affected phasic inhibitory responses during reward omission as well as excitatory responses to reward-predictive cues and reward. However, our results also suggest that multiple inputs underlie the generation of RPE-related activities of DA neurons. First, inhibitory responses to air puff and air puff-predictive cues remained intact even after lesions. Although we cannot completely rule out the possibility that the lack of effect was due to a compensatory mechanism after lesions, our results are consistent with the idea that inhibitions caused by reward omission involve different mechanisms than inhibitions caused by air puff or air puff predictive cues. In other words, if both inhibitory responses during reward omission and air puff are caused by the same mechanism, compensation should occur in a same degree to both of these responses. This was not the case. Second, in contrast to the results for reward omission, the effects on positive RPE were less evident. Specifically, in the lesion group, the responses to reward predictive cues and rewards were still significantly modulated by expectation, while the responses to reward omission failed to distinguish high (90%) and medium probability (50%). In addition, a higher percentage of DA neurons in habenula-lesioned animals still showed significant responses to positive RPE than to reward omission. Even in the animals with more than 85% lesions of habenula, the phasic excitation remained largely intact (Figure S2). These results suggest that excitation of DA neurons requires other inputs than those from the habenula. This is consistent with a previous study which showed that LHb neurons have longer latency than DA neurons in responding to reward-predictive cues or reward (Matsumoto and Hikosaka, 2007). Thus, the habenula appears responsible mainly for the inhibitory responses of DA neurons to reward omission, but not to aversive stimuli, and only partially to reward. Taken together, these results suggest that the RPE-related activities of DA neurons are generated by multiple inputs.

The habenula can influence firing patterns of DA neurons through multiple pathways. Neurons in the LHb send glutamatergic projections to the RMTg, which contains GABA neurons synapsing onto DA neurons (Hong et al., 2011; Jhou et al., 2009). It should also be noted that there are other pathways by which habenula neurons influence DA neurons. First, LHb neurons send direct projections to VTA GABA neurons as well as, to a lesser extent, to DA neurons (Jhou et al., 2009; Omelchenko et al., 2009). Interestingly, cluster 2 (putative VTA GABA) neurons drastically reduced its activity after habenula lesions. The decreased baseline firing of VTA GABA neurons may, in turn, increase the baseline activity of DA neurons. Furthermore, the habenula projects to the dorsal raphe nucleus (Ogawa et al., 2014; Pollak Dorocic et al., 2014; Sutherland, 1982; Weissbourd et al., 2014), which is a major source of monosynaptic inputs to VTA DA neurons (Watabe-Uchida et al., 2012). Lastly, a less studied pathway is the medial habenula’s projection to the interpeduncular nucleus (IPN) (Viswanath et al., 2013). Lesions of this pathway lead to an increase in DA levels (Nishikawa et al., 1986), which might contribute partly to the increase of baseline firing. Recording from the medial and lateral habenula in similar tasks will elucidate its function in reward processing. Furthermore, projection-specific manipulation of activity (Felix-Ortiz et al., 2013; Stamatakis and Stuber, 2012; Tye et al., 2011) will be a powerful means with which to uncover the significance of each of these pathways from the habenula to DA neurons.

Neural mechanisms underlying prediction error-driven learning

Prediction error-based learning allows an animal to learn the proper values associated with different stimuli (Dayan and Abbott, 2001; Rescorla and Wagner, 1972). Optimal learning requires a correct balance between learning from positive and negative RPEs. After habenula lesions, licking in 50% and 90% reward trials was elevated, which could be explained by the effect of weakened negative RPEs compared to positive RPEs. This is consistent with the decrease in negative RPEs and the comparably mild change in positive RPEs in DA neurons. Alternatively, licking might be increased because of a change in motivation. For example, DA neurons’ phasic response to reward-predictive cues as well as tonic firing have been linked to motivation (Niv et al., 2007). However, these mechanisms cannot explain the behavioral change. First, cue-evoked phasic responses of DA neurons were reduced overall after lesions (Figure 5C). Second, although licking gradually increased over several days, the baseline firing did not increase over this period (Figure 2I).

The behavioral changes by lesion of habenula may, however, be mediated by other mechanisms than DA neurons. First, it remains less evident whether a brief dip in DA firing during reward omission is sufficient to cause learning from negative RPEs because previous studies often used relatively long inactivation of DA neurons compared to the natural dip (Danjo et al., 2014; Tan et al., 2012). Second, the habenula also has strong projections to the median as well as the dorsal raphe nuclei (Aghajanian and Wang, 1977; Kalén et al., 1989; Lecourtier and Kelly, 2007; Ogawa et al., 2014), which has been implicated in learning from negative events (Agetsuma et al., 2010; Amo et al., 2014; Cohen et al., 2015). Thus, impaired negative RPE-based learning in behavior could arise from pathways independent of DA neurons. Further studies are needed to elucidate the pathways through which the habenula controls RPE-based learning in mammals.

EXPERIMENTAL PROCEDURES

Animals

All procedures were carried out in accordance with NIH standards and approved by Harvard University Institutional Animal Care and Use Committee (IACUC). We used 13 adult male mice, backcrossed with C57/BL6 mice, heterozygous for Cre recombinase under the control of the DAT gene (B6.SJL-Slc6a3tm1.1(cre)Bkmn/J, the Jackson Laboratory)(Bäckman et al., 2006). Five animals in the habenula lesion group were verified by histology. Seven animals were in the control group including two with sham-lesion operation, one with only small contra-lateral side lesion of the media habenula, and four animals without operations in the habenula. Animals were singly housed on a 12-h dark/12-h light cycle.

Surgery and viral injections

Mice were surgically implanted with a custom-made metal plate (a head plate). During the same surgery, 500–1000 nl adeno-associated virus (AAV), serotype 5, carrying an inverted ChR2 (H134R)-EYFP flanked by double loxP sites (Atasoy et al., 2008; Cohen et al., 2012) was injected into the VTA (from bregma: 3.1 mm posterior, 0.7 mm lateral, 4–4.2 mm ventral). The expression of this virus in DA neurons is highly selective and efficient, and ChR2 expression is uniform across DA neurons with different projection targets (Cohen et al., 2012; Lammel et al., 2015).

After 10 days of training on the conditioning task, mice were randomly selected to be in lesion or sham-lesion group. Electrolytic lesions were made bilaterally using a stainless steel electrode (15kΩ, MicroProbes, MS301G). Each side of the brain was lesioned at two locations (from bregma: 1.6 mm/1.9 mm posterior, 1.15 mm lateral, 2.93 mm depth, with a 14 degree angle). A cathodal current of 150 μA was applied for 75s at bregma −1.6mm and 90s at bregma −1.9 mm. The head plate attached to the skull was used as the anode. For sham-lesion operations, no current was applied. In the same surgery, after lesions were made, a microdrive containing electrodes and an optical fiber was implanted in the VTA (from bregma: 3.1 mm posterior, 0.7 mm lateral, 3.8–4.0 mm ventral).

All surgery was performed under aseptic conditions with animals either under ketamine/medetomidine anesthesia (60/0.5 mg kg−1, intraperitoneal, respectively) or isoflurane inhalation anesthesia (1–2% at 0.5–1.0 L/min). Analgesics (ketoprofen, 1.3 mg kg−1 intraperitoneal, and buprenorphine, 0.1mg kg−1 intraperitoneal) were administered postoperatively.

Behavioral task

After >1 week of recovery, mice were water-deprived. The body weight was maintained above 85% of their full body weight. Animals were head-restrained using a head plate and habituated for ~15 min for 1–2 d before training on the task. Odors were delivered with a custom-made olfactometer (Uchida and Mainen, 2003). Odors were isoamyl acetate, eugenol, 1-hexanol, p-cymene, ethyl butyrate, 1-butanol, and carvone (1/10 dilution in paraffin oil). A set of odor was assigned randomly for each animal. Licks were detected by breaks of an infrared beam placed in front of the water tube. Behavioral signals are digitized and recorded at 1kHz (PCI-6251, National Instruments).

During the training period, each odor predicted a drop of water (3.75 μl; valve open for 70ms) with different probabilities: 100%, 50%, nothing, or air puff delivered to the animal’s face. The strength of air puff was enough to cause blinking behavior and was shown to be aversive in a previous study (Cohen et al., 2015). Air puff trials were added after conditioning only with water for about 3–4 days. To measure the responses when actual outcome violates the expectation, during the recording sessions, we changed the reward probabilities to 90%, 50% and 10% and air puff probability to 10%. Inter-trial intervals (ITIs) were drawn from an exponential distribution, resulting in a flat ITI hazard function. Data from control mice were obtained from 61 sessions (1–10 sessions per animal, 9±7 sessions); data from lesion mice were obtained from 77 sessions (9–19 sessions per animal, 15±5 sessions). Animals performed between 300 and 600 trials per day (441±82 trials.).

Electrophysiology

Recordings were made using a custom-built 200-μm-fibreoptic-coupled screw-driven microdrive with eight implanted tetrodes. Tetrodes were glued to the fiber optic with epoxy. The ends of the tetrodes were 350–600 μm from the end of fiber optic. Neural signals were amplified 200-fold with a filter between 0.1 and 9,000 Hz (RHA2116, Intan Technologies LLC), and digitized at 30kHz (PCIe-6351, National Instruments). To extract timing of spikes, signals were band-pass-filtered between 300 and 6,000 Hz. Spikes were sorted offline using MClust-3.5 software (David Redish). At the end of each session, the fiber and tetrodes were lowered by 40–80μm to record new neurons next day.

To identify DA neurons, we used ChR2 to observe stimulation-locked spikes (Cohen et al., 2012; Jennings and Stuber, 2014; Kvitsiani et al., 2013; Lima et al., 2009). The optical fiber was coupled with a diode-pumped solid-state laser with analogue amplitude modulation (Laserglow Technologies). Before and after each behavioral session, we delivered trains of 5 to 10 pulses of 473 nm light, each 5 ms long, at 1, 5, 10, 20 and 50 Hz, with power between 5 to 20 mW mm−2. Spike shape was measured using a broadband signal (0.1– 9,000 Hz) to ensure that spike waveform was not distorted.

To include a neuron in our data set, the neuron must have been well isolated (L-ratio < 0.05) (Schmitzer-Torbert et al., 2005) and recorded between two identified dopaminergic neurons or within 200 μm of an identified DA neuron to ensure that all neurons came from VTA. Recording sites were further verified histologically with electrolytic lesions using 15–20 s of 100 μA direct current.

Data analysis

To test whether the control and habenula-lesion group were different in behavioral performance during the training period, we performed a three-way ANOVA. The three factors were lesion versus control (_F_1,338 = 2.81, P = 0.095), trial type (_F_3,338 = 6.19, P <10−6) and days of training (_F_9,338= 128.6, P < 10−6). Same analysis was applied on data after lesion or sham-lesion operation. Both lesion operation and trial types had a significant effect on licking, but not days (lesion/sham-lesion: _F_1,598 = 40.65, P <10-4; trial types: _F_3,598 = 107.44, P <10−4; days: _F_14,598 = 0.47, P =0.95).

To identify neurons as DA, we used a stimulus-associated spike latency test (SALT) algorithm (Kvitsiani et al., 2013) to determine whether light pulses significantly changed a neuron’s spike timing (Figure 2). For SALT algorithm, we used a time window of 10ms after laser onset and a significance value of P < 0.05. To ensure that spike sorting was not contaminated by light artifacts, all light-identified DA neurons had Pearson’s correlation coefficients greater than 0.9 between spontaneous and light-evoked spike waveforms, as described in (Cohen et al., 2012). These criteria together allow us to identify neurons expressing channelrhodopsin-2 unequivocally, as shown in the bimodal distribution of Salt P value (Figure 2D).

To measure firing rates, peristimulus time histograms (PSTHs) were constructed using 1-ms bins. Average PSTHs in figures were smoothed with a box filter of 100 ms (t ± 50 ms). Responses to specific behavioral events were calculated based on un-smoothed PSTH. In all the analyses, the baseline was calculated based on the activity during the inter-trial-interval immediately preceding odor onset of the same duration as the response time window. It should be noted that, because the baseline firing rates of DA neurons are elevated in lesion group, effects of lesions on task-related activities may be under- or over-estimated depending on the quantification methods used. With respect to our observation that the inhibitions caused during reward omissions is reduced by lesions, using the change from the baseline is a conservative approach. That is, with an increase in baseline firing, comparing the absolute firing rates during this period without subtracting the baseline is problematic because inhibitory responses could be underestimated in lesioned animals simply due to the elevated baseline firing rates.

Because this phenotype is the main conclusion of the present study, we present the results using the change from the baseline throughout the manuscript. For the effects on inhibitory responses, our conclusions remain the same when we used the absolute firing rates. On the other hand, analysis based on the absolute firing rate underestimates lesion’s effects on excitatory responses (i.e. reward-predictive cues or rewards). Nevertheless, there is still a trend that reward responses are smaller in lesion group (P = 0.05 for 50% reward, P = 0.13 for free reward, Wilcoxon rank sum test). Overall, though, these results are consistent with our conclusions that dip during reward omission is more severely affected by lesions and that excitatory responses are partially compromised by lesions. To complement these analyses based on firing rates, we also quantified how well DA neurons’ responses discriminated the different probabilities of reward on a trial-by-trial basis. For this, we calculated the auROC value of each neuron. An ROC value was calculated using spike counts obtained in the time windows as defined above.

To cluster the response profiles of all recorded VTA neurons, we first obtained a normalized response time course of each neuron in trials with 90% probability of reward by calculating a sliding-window auROC by comparing the distribution of firing rates in a 100-ms window against the distribution of baseline firing rates (900 ms before odor onset). We then performed a principal component analysis on the response time courses of neurons from the control group (from CS onset to 1s after reward delivery). The first four eigenvectors captures 87% of the variance, and the projections to those four principal components were used for _K_-means clustering to get three response types (clusters). Lesion group neurons were first projected to the same eigenvectors derived from control group and were clustered with _K_-means method later. Using this method, the phasic air puff response in cluster1 in the control group was significantly stronger than that in optogenetically-identified DA neurons. This indicates that cluster 1 may contain non-DA neurons. We observed strong air puff responses in a small fraction of non-light-identified neurons in cluster 1 (only 19 out of 121 neurons had phasic air puff response higher than 20 spikes/s, whereas the remaining 102 neurons had the average firing rate of 2.5 spikes/s). Thus the majority of neurons in cluster 1 still highly resemble the firing pattern of optogenetically-identified DA neurons. The decrease of air puff excitation in Cluster 1 neurons after lesion was largely due to the loss of these non-light-identified neurons with air puff response >20 spikes/s. The presence of non-canonical neurons in cluster 1 does not affect our main conclusions in the text.

In simulation using a Rescorla-Wagner model, we used anticipatory licking behavior as a behavioral readout of associative learning. This analysis only used trials in which rewards were anticipated with the probability of either 50% or 90%. We assumed no interaction between trials with different odor CSs. We also assumed a linear relationship between anticipatory licking and VN (the learned value of the odor CS in trial N). We updated the value of odor CS for 50% or 90% reward probability, separately, in a trial-by-trial manner using the following formula:

RPE=r-VN-1IfRPE>0:VN=VN-1+αP∗RPEIfRPE<0:VN=VN-1+αN∗RPE (1)

RPE represents the reward prediction error signal from the previous (N-1) trial; r represents reward received in N-1 trial (with value of R or 0, depending on whether reward is delivered or not); _V_0 is the initial value of the odor CS. Under these assumptions, the value of water reward R was the same as the number of anticipatory licks when the animal fully learned the association between the odor and 100% water. The reward history was randomly generated based on the probability of reward, with 100 trials for each trial type per session. We set _V_0 as 0 for the training data; for simulation of post-operative data, _V_0 = 0.5*R for 50% reward trials, and _V_0 = R for 90% reward trials (since during training the reward probability is 100%). We used the average lick rate for each session across animals to fit the average lick rate of a simulated session. To fit the parameters (R, αP and αN), we used a maximum likelihood fitting (“fminsearch” function in the MATLAB) to find the set of parameters that best predicted the experimental data. The likelihood of observing the data given the model was calculated each day on group averaged data by assuming a Gaussian noise with standard deviation of 1, for both 50% reward and 90% rewarded trials. Then the likelihood was summed across all days of data and used for parameter fitting. We ran the simulation 10 times to obtain the standard deviation of fitted parameters.

To quantify the size of lesion, we linearly transformed the standard atlas (Franklin and Paxinos, 2008) to best match the corresponding Nissl-stained histological images using anatomical landmarks such as the hippocampus. We labeled brain regions based on the best-matching transformed atlas. We marked the area of lesion by identifying the areas that were destroyed or that no longer displayed Nissl-stained neuronal somata. We calculated the intersect area of lesioned tissue and different brain regions, including the medial and the lateral habenula, paraventricular thalamic nucleus, and hippocampus using a customized Matlab code.

Immunohistochemistry

After recording, mice were given an overdose of ketamine/medetomidine, exsanguinated with saline, perfused with paraformaldehyde, and brains were cut in 100 μm coronal sections. VTA sections were immunostained with antibodies to tyrosine hydroxylase (TH)(AB152, Millipore, Billerica, MA, USA) and secondary antibodies labelled with Cy3 (Jackson Immunoresearch). Sections were further stained with 4′,6-diamidino-2-phenylindole (DAPI) to visualize nuclei. Recording sites were identified and verified to be amid EYFP staining and TH staining in VTA. Habenula sections were stained with Nissl as described before (Cury and Uchida, 2010).

Supplementary Material

1

HIGHLIGHTS.

Acknowledgments

We thank N. Eshel for his comments on the manuscript and other members of the Uchida lab for discussions; R. Komaki for assistance with behavioral training of animals; M.L. Andermann, B.P. Ölveczky and B.L. Sabatini for discussions and comments on the manuscript; E. Soucy and J. Greenwood for help with building the electrophysiology recording system; C. Dulac for sharing resources. This work was supported by the Sackler Scholar Programme in Psychobiology (J.T.) and NIH grants R01MH095953 (N.U.), and R01MH101207 (N.U.).

Footnotes

AUTHOR CONTRIBUTIONS

J.T. and N.U. designed the experiments. J.T. collected data and performed analysis. J.T. and N.U. wrote the manuscript.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Agetsuma M, Aizawa H, Aoki T, Nakayama R, Takahoko M, Goto M, Sassa T, Amo R, Shiraki T, Kawakami K, et al. The habenula is crucial for experience-dependent modification of fear responses in zebrafish. Nat Neurosci. 2010;13:1354–1356. doi: 10.1038/nn.2654. [DOI] [PubMed] [Google Scholar]
  2. Aghajanian GK, Wang RY. Habenular and other midbrain raphe afferents demonstrated by a modified retrograde tracing technique. Brain Res. 1977;122:229–242. doi: 10.1016/0006-8993(77)90291-8. [DOI] [PubMed] [Google Scholar]
  3. Amo R, Fredes F, Kinoshita M, Aoki R, Aizawa H, Agetsuma M, Aoki T, Shiraki T, Kakinuma H, Matsuda M, et al. The habenulo-raphe serotonergic circuit encodes an aversive expectation value essential for adaptive active avoidance of danger. Neuron. 2014;84:1034–1048. doi: 10.1016/j.neuron.2014.10.035. [DOI] [PubMed] [Google Scholar]
  4. Araki M, McGeer PL, Kimura H. The efferent projections of the rat lateral habenular nucleus revealed by the PHA-L anterograde tracing method. Brain Res. 1988;441:319–330. doi: 10.1016/0006-8993(88)91410-2. [DOI] [PubMed] [Google Scholar]
  5. Atasoy D, Aponte Y, Su HH, Sternson SM. A FLEX Switch Targets Channelrhodopsin-2 to Multiple Cell Types for Imaging and Long-Range Circuit Mapping. J Neurosci Off J Soc Neurosci. 2008;28:7025–7030. doi: 10.1523/JNEUROSCI.1954-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bäckman CM, Malik N, Zhang Y, Shan L, Grinberg A, Hoffer BJ, Westphal H, Tomac AC. Characterization of a mouse strain expressing Cre recombinase from the 3' untranslated region of the dopamine transporter locus. Genes N Y N 2000. 2006;44:383–390. doi: 10.1002/dvg.20228. [DOI] [PubMed] [Google Scholar]
  7. Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. doi: 10.1016/j.neuron.2005.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Brinschwitz K, Dittgen A, Madai VI, Lommel R, Geisler S, Veh RW. Glutamatergic axons from the lateral habenula mainly terminate on GABAergic neurons of the ventral midbrain. Neuroscience. 2010;168:463–476. doi: 10.1016/j.neuroscience.2010.03.050. [DOI] [PubMed] [Google Scholar]
  9. Christoph GR, Leonzio RJ, Wilcox KS. Stimulation of the lateral habenula inhibits dopamine-containing neurons in the substantia nigra and ventral tegmental area of the rat. J Neurosci Off J Soc Neurosci. 1986;6:613–619. doi: 10.1523/JNEUROSCI.06-03-00613.1986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cohen JY, Haesler S, Vong L, Lowell BB, Uchida N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature. 2012;482:85–88. doi: 10.1038/nature10754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cohen JY, Amoroso MW, Uchida N. Serotonergic neurons signal reward and punishment on multiple timescales. eLife. 2015;4:e06346. doi: 10.7554/eLife.06346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cury KM, Uchida N. Robust Odor Coding via Inhalation-Coupled Transient Activity in the Mammalian Olfactory Bulb. Neuron. 2010;68:570–585. doi: 10.1016/j.neuron.2010.09.040. [DOI] [PubMed] [Google Scholar]
  13. Danjo T, Yoshimi K, Funabiki K, Yawata S, Nakanishi S. Aversive behavior induced by optogenetic inactivation of ventral tegmental area dopamine neurons is mediated by dopamine D2 receptors in the nucleus accumbens. Proc Natl Acad Sci. 2014;111:6455–6460. doi: 10.1073/pnas.1404323111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Dayan P, Abbott LF. Theoretical neuroscience 2001 [Google Scholar]
  15. Felix-Ortiz AC, Beyeler A, Seo C, Leppla CA, Wildes CP, Tye KM. BLA to vHPC Inputs Modulate Anxiety-Related Behaviors. Neuron. 2013;79:658–664. doi: 10.1016/j.neuron.2013.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Franklin KBJ, Paxinos G. The mouse brain in stereotaxic coordinates 2008 [Google Scholar]
  17. Green DM, Swets JA. Signal Detection Theory and Psychophysics. New York: Wiley; 1966. [Google Scholar]
  18. Herkenham M, Nauta WJ. Efferent connections of the habenular nuclei in the rat. J Comp Neurol. 1979;187:19–47. doi: 10.1002/cne.901870103. [DOI] [PubMed] [Google Scholar]
  19. Hong S, Jhou TC, Smith M, Saleem KS, Hikosaka O. Negative reward signals from the lateral habenula to dopamine neurons are mediated by rostromedial tegmental nucleus in primates. J Neurosci. 2011;31:11457–11471. doi: 10.1523/JNEUROSCI.1384-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Jennings JH, Stuber GD. Tools for resolving functional activity and connectivity within intact neural circuits. Curr Biol CB. 2014;24:R41–R50. doi: 10.1016/j.cub.2013.11.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Jhou TC, Fields HL, Baxter MG, Saper CB, Holland PC. The rostromedial tegmental nucleus (RMTg), a GABAergic afferent to midbrain dopamine neurons, encodes aversive stimuli and inhibits motor responses. Neuron. 2009;61:786–800. doi: 10.1016/j.neuron.2009.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Ji H, Shepard PD. Lateral habenula stimulation inhibits rat midbrain dopamine neurons through a GABA(A) receptor-mediated mechanism. J Neurosci. 2007;27:6923–6930. doi: 10.1523/JNEUROSCI.0958-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kalén P, Strecker RE, Rosengren E, Björklund A. Regulation of striatal serotonin release by the lateral habenula-dorsal raphe pathway in the rat as demonstrated by in vivo microdialysis: role of excitatory amino acids and GABA. Brain Res. 1989;492:187–202. doi: 10.1016/0006-8993(89)90901-3. [DOI] [PubMed] [Google Scholar]
  24. Kamin LJ. Predictability, surprise, attention, and conditioning. Punishm Aversive Behav. 1969:279–296. [Google Scholar]
  25. Klemm WR. Habenular and interpeduncularis nuclei: shared components in multiple-function networks. Med Sci Monit. 2004;10:RA261–RA273. [PubMed] [Google Scholar]
  26. Kvitsiani D, Ranade S, Hangya B, Taniguchi H, Huang JZ, Kepecs A. Distinct behavioural and network correlates of two interneuron types in prefrontal cortex. Nature. 2013;498:363–366. doi: 10.1038/nature12176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lammel S, Steinberg EE, Földy C, Wall NR, Beier K, Luo L, Malenka RC. Diversity of Transgenic Mouse Models for Selective Targeting of Midbrain Dopamine Neurons. Neuron. 2015;85:429–438. doi: 10.1016/j.neuron.2014.12.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Lecourtier L, Kelly PH. A conductor hidden in the orchestra? Role of the habenular complex in monoamine transmission and cognition. Neurosci Biobehav Rev. 2007;31:658–672. doi: 10.1016/j.neubiorev.2007.01.004. [DOI] [PubMed] [Google Scholar]
  29. Lima SQ, Hromádka T, Znamenskiy P, Zador AM. PINP: A New Method of Tagging Neuronal Populations for Identification during In Vivo Electrophysiological Recording. PLoS ONE. 2009;4:e6099. doi: 10.1371/journal.pone.0006099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Matsumoto M, Hikosaka O. Lateral habenula as a source of negative reward signals in dopamine neurons. Nature. 2007;447:1111–1115. doi: 10.1038/nature05860. [DOI] [PubMed] [Google Scholar]
  31. Matsumoto M, Hikosaka O. Representation of negative motivational value in the primate lateral habenula. Nat Neurosci. 2008;12:77–84. doi: 10.1038/nn.2233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Matsumoto M, Hikosaka O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature. 2009;459:837–U4. doi: 10.1038/nature08028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Matsumoto M, Hikosaka O. Electrical stimulation of the primate lateral habenula suppresses saccadic eye movement through a learning mechanism. PloS One. 2011;6:e26701. doi: 10.1371/journal.pone.0026701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Nishikawa T, Fage D, Scatton B. Evidence for, and nature of, the tonic inhibitory influence of habenulointerpeduncular pathways upon cerebral dopaminergic transmission in the rat. Brain Res. 1986;373:324–336. doi: 10.1016/0006-8993(86)90347-1. [DOI] [PubMed] [Google Scholar]
  35. Niv Y, Daw ND, Joel D, Dayan P. Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology (Berl) 2007;191:507–520. doi: 10.1007/s00213-006-0502-4. [DOI] [PubMed] [Google Scholar]
  36. Ogawa SK, Cohen JY, Hwang D, Uchida N, Watabe-Uchida M. Organization of Monosynaptic Inputs to the Serotonin and Dopamine Neuromodulatory Systems. Cell Rep. 2014;8:1105–1118. doi: 10.1016/j.celrep.2014.06.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Omelchenko N, Bell R, Sesack SR. Lateral habenula projections to dopamine and GABA neurons in the rat ventral tegmental area. Eur J Neurosci. 2009;30:1239–1250. doi: 10.1111/j.1460-9568.2009.06924.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Pollak Dorocic I, Fürth D, Xuan Y, Johansson Y, Pozzi L, Silberberg G, Carlén M, Meletis K. A Whole-Brain Atlas of Inputs to Serotonergic Neurons of the Dorsal and Median Raphe Nuclei. Neuron. 2014;83:663–678. doi: 10.1016/j.neuron.2014.07.002. [DOI] [PubMed] [Google Scholar]
  39. Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. Class Cond II Curr Res Theory. 1972:64–99. [Google Scholar]
  40. Schmitzer-Torbert N, Jackson J, Henze D, Harris K, Redish AD. Quantitative measures of cluster quality for use in extracellular recordings. Neuroscience. 2005;131:1–11. doi: 10.1016/j.neuroscience.2004.09.066. [DOI] [PubMed] [Google Scholar]
  41. Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Sci N Y NY. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
  42. Stamatakis AM, Stuber GD. Activation of lateral habenula inputs to the ventral midbrain promotes behavioral avoidance. Nat Neurosci. 2012;15:1105–1107. doi: 10.1038/nn.3145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Steinberg EE, Keiflin R, Boivin JR, Witten IB, Deisseroth K, Janak PH. A causal link between prediction errors, dopamine neurons and learning. Nat Neurosci. 2013;16:966–973. doi: 10.1038/nn.3413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Stopper CM, Floresco SB. What's better for me? Fundamental role for lateral habenula in promoting subjective decision biases. Nat Neurosci. 2014;17:33–35. doi: 10.1038/nn.3587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Sutherland RJ. The dorsal diencephalic conduction system: a review of the anatomy and functions of the habenular complex. Neurosci Biobehav Rev. 1982;6:1–13. doi: 10.1016/0149-7634(82)90003-3. [DOI] [PubMed] [Google Scholar]
  46. Tan KR, Yvon C, Turiault M, Mirzabekov JJ, Doehner J, Labouèbe G, Deisseroth K, Tye KM, Lüscher C. GABA Neurons of the VTA Drive Conditioned Place Aversion. Neuron. 2012;73:1173–1183. doi: 10.1016/j.neuron.2012.02.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Thornton EW, Bradbury GE. Effort and stress influence the effect of lesion of the habenula complex in one-way active avoidance learning. Physiol Amp Behav. 1989;45:929–935. doi: 10.1016/0031-9384(89)90217-5. [DOI] [PubMed] [Google Scholar]
  48. Thornton EW, Davies C. A water-maze discrimination learning deficit in the rat following lesion of the habenula. Physiol Amp Behav. 1991;49:819–822. doi: 10.1016/0031-9384(91)90324-h. [DOI] [PubMed] [Google Scholar]
  49. Tsai HC, Zhang F, Adamantidis A, Stuber GD, Bonci A, de Lecea L, Deisseroth K. Phasic firing in dopaminergic neurons is sufficient for behavioral conditioning. Sci N Y NY. 2009;324:1080–1084. doi: 10.1126/science.1168878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Tye KM, Prakash R, Kim SY, Fenno LE, Grosenick L, Zarabi H, Thompson KR, Gradinaru V, Ramakrishnan C, Deisseroth K. Amygdala circuitry mediating reversible and bidirectional control of anxiety. Nature. 2011;471:358–362. doi: 10.1038/nature09820. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Uchida N, Mainen ZF. Speed and accuracy of olfactory discrimination in the rat. Nat Neurosci. 2003;6:1224–1229. doi: 10.1038/nn1142. [DOI] [PubMed] [Google Scholar]
  52. Viswanath H, Carter AQ, Baldwin PR, Molfese DL, Salas R. The medial habenula: still neglected. Front Hum Neurosci. 2013;7:931. doi: 10.3389/fnhum.2013.00931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Watabe-Uchida M, Zhu L, Ogawa SK, Vamanrao A, Uchida N. Whole-brain mapping of direct inputs to midbrain dopamine neurons. Neuron. 2012;74:858–873. doi: 10.1016/j.neuron.2012.03.017. [DOI] [PubMed] [Google Scholar]
  54. Weissbourd B, Ren J, DeLoach KE, Guenthner CJ, Miyamichi K, Luo L. Presynaptic Partners of Dorsal Raphe Serotonergic and GABAergic Neurons. Neuron. 2014;83:645–662. doi: 10.1016/j.neuron.2014.06.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1