Social and monetary reward learning engage overlapping neural substrates

Journal Article

1California Institute of Technology, Computations and Neural Systems, MC 136-93 Pasadena and 2California Institute of Technology, Division of Humanities and Social Sciences, MC 228-77 Pasadena, CA 91125, USA

Search for other works by this author on:

Received:

26 October 2010

Accepted:

17 January 2011

Cite

Alice Lin, Ralph Adolphs, Antonio Rangel, Social and monetary reward learning engage overlapping neural substrates, Social Cognitive and Affective Neuroscience, Volume 7, Issue 3, March 2012, Pages 274–281, https://doi.org/10.1093/scan/nsr006
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Learning to make choices that yield rewarding outcomes requires the computation of three distinct signals: stimulus values that are used to guide choices at the time of decision making, experienced utility signals that are used to evaluate the outcomes of those decisions and prediction errors that are used to update the values assigned to stimuli during reward learning. Here we investigated whether monetary and social rewards involve overlapping neural substrates during these computations. Subjects engaged in two probabilistic reward learning tasks that were identical except that rewards were either social (pictures of smiling or angry people) or monetary (gaining or losing money). We found substantial overlap between the two types of rewards for all components of the learning process: a common area of ventromedial prefrontal cortex (vmPFC) correlated with stimulus value at the time of choice and another common area of vmPFC correlated with reward magnitude and common areas in the striatum correlated with prediction errors. Taken together, the findings support the hypothesis that shared anatomical substrates are involved in the computation of both monetary and social rewards.

INTRODUCTION

The brain needs to compute several distinct signals in order for an organism to learn how to make sound decisions among alternatives. First, at the time of choice, values need to be assigned to the different stimuli associated with each choice option [which we refer to as stimulus values (SV)]; these are subsequently compared in order to choose the option with the highest value (Wallis, 2007; Rangel et al., 2008; Kable and Glimcher, 2009; Rushworth et al., 2009; Rangel and Hare, 2010). Stimulus value signals have been found in ventral and medial sectors of the prefrontal cortex (vmPFC) in several human fMRI (Kable and Glimcher, 2007; Plassmann et al., 2007; Tom et al., 2007; Hare et al., 2008, 2009; Chib et al., 2009; FitzGerald et al., 2009; Litt et al., 2009; Levy et al., 2010; Plassmann et al., 2010) and non-human primate electrophysiological studies (Wallis and Miller, 2003; Padoa-Schioppa and Assad, 2006, 2008; Kennerley et al., 2009; Kennerley and Wallis, 2009; Padoa-Schioppa, 2009) during choices involving non-social rewards, as well as during social decisions such as donations to charities (Hare et al., 2010).

Having made a choice, the brain needs to compute the reward value associated with the outcomes generated by the choice. These signals are often called reward magnitude or experienced utility (R). Several human fMRI studies have found that activity in medial regions of orbitofrontal cortex (OFC) correlates with behavioral measures of experienced utility for a wide variety of social and non-social reward modalities (Blood and Zatorre, 2001; Small et al., 2001, 2003; de Araujo et al., 2003; McClure et al., 2003; Kringelbach, 2005; Plassmann et al., 2008; Smith et al., 2010).

A third critical component is the combination of the previous two signals into a prediction-error signal (PE) that is used to update stimulus values (Schultz et al., 1997). The key involvement of the ventral striatum in this third component is borne out by a sizable and rapidly growing body of human fMRI studies of reinforcement learning that have used almost exclusively non-social rewards such as monetary payments (Delgado et al., 2000; Berns et al., 2001; Pagnoni et al., 2002; O'Doherty et al., 2003b, 2004; Pessiglione et al., 2006; Yacubian et al., 2006; Seymour et al., 2007; Hare et al., 2008).

Although the findings summarized above have been replicated across species, techniques and experimental designs, the vast majority of studies have used only non-social rewards such as juice, food or money, and only a handful have directly compared social and non-social rewards. This raises a fundamental question: do the same brain regions implement reward-learning computations for social and non-social rewards? Or might the areas that encode SV, PE and R be different for social rewards, analogously to the specialized perceptual processing of social stimuli (Kanwisher and Yovel, 2006)? While a very few other studies have recently approached this issue (Izuma et al., 2008; Zink et al., 2008; Smith et al., 2010), no study to date has investigated the question using identical tasks across the same subjects, and in a task that allows us to compare the encoding of the three types of basic reward signals defined above. We undertook such an investigation here using model-based fMRI.

METHODS

Participants

Twenty-seven female participants from the Caltech community participated in the study (mean age = 22.4 years; range 18–28). Five were excluded from further analyses: four due to excessive head movement, one due to failure to understand task instructions. All participants were fully right-handed, had normal or corrected-to-normal vision, had no history of psychiatric or neurological disease and were not taking medications that might have interfered with BOLD-fMRI. All gave informed consent under a protocol approved by the Caltech IRB.

Task

Participants played two structurally identical versions of an instrumental learning task, one with monetary rewards, the second with social rewards (Figure 1A). A trial began with the display of two visually distinctive slot machines, each associated with one of three outcome distributions: mean-positive, -negative and -neutral (Figure 1B).

Fig. 1

Task and behavioral results. (A) Timeline of the monetary and social reward trials. Choice trials paired a neutral slot machine with a valenced slot machine. Trials were identical except for the nature of the outcomes: monetary trials had a gain/loss of +$1, 0$ or −$1, whereas social trials revealed happy, neutral or angry faces accompanied with sound effects of similar emotional valence. The experiment also included no-choice trials (in which a pair of identical slot machines were shown: neutral, negative or positive) to help separate the learning and stimulus value signals. Specific slot machines were randomly assigned to specific reward outcomes at the start of the experiment for each subject, and distinct between monetary and social condition blocks. (B) Distribution of outcomes for each slot machine. First row: negative machine. Second row: positive machine. Bottom row: neutral machine. The same distribution was used in the monetary and social conditions. Actual appearance of the slot machines was randomly paired with a reward outcome distribution and distinct between monetary and social condition blocks. (C) Plot of group subject choices across trials (only the first 30 are shown). (D) Psychometric choice curve for monetary and social conditions. Bars denote standard error measures computed across subjects.

All participants completed one social and one monetary block of 148 trials each; block order was randomized between participants. There were two types of trials in each block. In 100 choice trials the neutral slot machine was shown paired with either the positive or negative slot machine (50/50 probability with randomized order), and participants chose one by pressing a left or right button. We refer to these as free choice trials. In 48 non-choice trials two identical copies of one of the three slot machines were shown (1/3, 1/3, 1/3 probability with randomized order), and participants merely pressed either the left or right button in order to advance the trial. We refer to these as forced choice trials. Up to 2.5 s were allowed for choice in both cases, followed by a uniformly blank screen displayed for 1–5 s (flat distribution), followed by the reward outcome displayed for 1.5 s, followed by an intertrial interval of a uniformly blank screen displayed for 1–6 s (flat distribution). Note that participants were not told the reward probabilities associated with each slot machine and had to learn them by trial and error during the task.

The forced trials provide an essential control for a potential important confound in the study. One potential concern is that the presentation of positive and aversive social outcomes might induce in the brain ‘correct’ and ‘error’ feedback signals at outcome during the social trials. This is a problem because this would suggest that the common locus of activity is not due to the activation of a social reward, but to the activation of these error feedback signals. The forced trials provide a control for this concern because when there is no free choice, there can be no error feedback regarding the correctness of the choice.

Stimuli and rewards

The slot machines in both conditions were represented by cartoon images of actual slot machines that varied in color and pattern (Figure 1). In the social condition, reward outcomes were color photographs of unfamiliar faces from the NimStim collection (Tottenham et al., 2009) showing either an angry (negative outcome), neutral (neutral outcome) or happy (positive outcome) emotional expression, presented together with emotionally matched words played through headphones (normalized for volume and duration). Examples of positive words are excellent, bravo and fantastic. Examples of negative words are stupid, moron and wrong. Examples of neutral words are desk, paper and stapler. Extensive prior piloting had demonstrated the behavioral efficacy of these stimuli in reward learning.

In the monetary condition, the positive outcome was a gain of one dollar (an image of a dollar bill), the negative condition was a loss of one dollar (image of a dollar bill crossed out) and the neutral condition involved no change in monetary payoff (image of an empty rectangle). Subjects were paid out the sum of their earnings at the end of the experiment.

Computational model

We computed trial- and subject-specific values for each of the three variables described in the Introduction. The SV for every slot machine was calculated as the 10-trial moving average proportion of times that the machine was chosen when it was shown, a continuous value between 0–1. Consistent with this coding, R were assigned a value of 1 if they were positive; a value of 0.5 if they were neutral and a value of 0 if they were negative. PE at the time of outcome were calculated using a simple Rescorla–Wagner learning rule (Rescorla and Wagner, 1972) as the difference between the value of the reward outcome and the stimulus value of the machine selected for that trial: PEt = Rt – SVt.

Note three things about the value normalizations. First, our approach deviates from the usual practice in neuroscience studies of reinforcement learning (Pessiglione et al., 2006, 2008; Seymour et al., 2007; Lohrenz et al., 2007; Hare et al., 2008; Wunderlich et al., 2009) in which it is customary to fit the values of the SV signal based on the predictions of the best fitting learning model. Here we depart from that practice because the revealed preference approach provides more accurate measures of the values computed at the time of choice (as shown in Figure 1D). Second, without loss of generality we normalize the reward outcome signals to 0 for negative outcomes and 1 for positive outcomes. Note that given the parametric nature of the general linear model specified below, this normalization does not affect the identification of areas that exhibit significant correlation with this variable. Third, we use the standard definition of prediction errors used in the literature.

Image acquisition

T2*-weighted gradient-echo echo-planar (EPI) images with BOLD contrast were collected on a Siemens 3T Trio. To optimize signal in the OFC, we acquired slices in an oblique orientation of 30° to the anterior commissure–posterior commissure line (Deichmann et al., 2003) and used an eight-channel phased array head coil. Each volume comprised 32 slices. Data was collected in four sessions ( ∼ 12 min each). The imaging parameters were as follows: TR = 2 s, TE = 30 ms, FOV = 192 mm, 32 slices with 3 mm thickness resulting in isotropic 3 mm voxels. Whole-brain high-resolution T1-weighted structural scans (1 × 1 × 1 mm) were co-registered with their mean T2*-weighted images and averaged together to permit anatomical localization of the functional activations at the group level.

fMRI pre-processing

The imaging data was analyzed using SPM5 (Wellcome Department of Imaging Neuroscience, Institute of Neurology, London, UK). Functional images were corrected for slice acquisition time within each volume, motion-corrected with realignment to the last volume, spatially normalized to the standard Montreal Neurological Institute EPI template and spatially smoothed using a Gaussian kernel with a full-width at half-maximum of 8 mm. Intensity normalization and high-pass temporal filtering (filter width = 128 s) were also applied to the data.

fMRI data analysis

The data analysis proceeded in three steps. First, we estimated a general linear model with AR(1). This model was designed to identify regions in which BOLD activity was parametrically related to SV, R and PE. The model included the following regressors:

(R1)
An indicator function for the decision screen in free choice monetary trials.
(R2)
An indicator function for the decision screen in free choice monetary trials multiplied by the SV of the two slot machines shown in that trial (summed SV).
(R3)
An indicator function for the decision screen in free choice monetary trials multiplied by the reaction time for that trial.
(R4–R6)
Analogous indicator functions for decision screen events in free choice social trials.
(R7)
An indicator function for the decision screen in forced monetary trials.
(R8)
An indicator function for the decision screen in forced monetary trials multiplied by the SV of the slot machine displayed.
(R9–R10)
Analogous indicator functions for decision screen events in forced social trials.
(R11)
A delta function for the time of response in the monetary condition.
(R12)
A delta function for the time of response in the social condition.
(R13)
An indicator function for the outcome screen in free monetary trials (both choice and non-choice).
(R14)
An indicator function for the outcome screen in free monetary trials multiplied by the PE for the trial.
(R15)
An indicator function for the outcome screen in free monetary trials multiplied by the R for the trial.
(R16–R18)
Analogous indicator functions for outcome screen events in free social trials (both choice and non-choice).

We orthogonalized the modulators for the main regressors that had more than one modulator (e.g. R2 and R3). The model also included six head motion regressors, session constants and missed trials as regressors of no interest. The regressors of interest and missed trial regressor were convolved with a canonical HRF.

Second, we calculated the following first-level single-subject contrasts: (i) R2 vs baseline, (ii) R5 vs baseline, (iii) R14 vs baseline, (iv) R15 vs baseline, (v) R17 vs baseline and (vi) R18 vs baseline.

Third, we calculated second-level group contrasts using a one-sample _t_-test of the first level contrast statistics.

Finally, we also performed a conjunction analysis between the equivalent contrasts for the monetary and social conditions to identify areas involved in similar computations in both cases. The results are shown in Figure 2 and reported in Tables 1–3. For inference purposes we used an omnibus threshold of P < 0.001 uncorrected with an extent threshold of 15 voxels_._ However, given the strong priors from the previous literature about the role of the vmPFC in encoding stimulus value and reward outcome signals, as well as the role of the ventral striatum in encoding prediction errors, we also report activity in these two areas if they survive small volume corrections (SVC) at P < 0.05. The mask for the SVC in vmPFC at choice was taken using a sphere of 10-mm radius defined around the peak activation coordinates that correlated with stimulus values in Rolls et al. (Rolls et al., 2008). The mask for the vmPFC SVC at reward outcome was given by a sphere of 10-mm radius defined around the peak coordinates that correlated with the magnitude of reward outcome in O’Doherty et al. (O’Doherty et al., 2002). The mask for the SVC in ventral striatum was taken using a sphere of 10-mm radius defined around the peak activation coordinates that correlated with prediction errors in Pessiglione et al. (Pessiglione et al., 2006). For display purposes only activity in selected SPMs is reported at P < 0.005 uncorrected with an extent threshold of five voxels. Anatomical localizations were performed by overlaying the _t-_maps on a normalized structural image averaged across subjects, and with reference to an anatomical atlas (Duvernoy, 1999).

Fig. 2

Basic Neuroimaging results. (Top) Activation in the vmPFC correlated with SV at the time of free choice in both monetary and social conditions. (Middle) Activation in the vStr correlated with PE at the time of outcome in both monetary and social free choice conditions (albeit the conjunction did not survive our omnibus threshold). (Bottom) Activation in the vmPFC correlated with R in both monetary and social free choice conditions. For illustration purposes only, all images are thresholded at P < 0.005 uncorrected with an extent threshold of 15 voxels, except for the conjunction of PE which is P < 0.005 with an extent threshold of five voxels (see Tables 1–3 for details).

Table 1

Regions correlating with stimulus value at cue

Region	No. of voxels	_Z_-score	x	y	z
Areas correlating with SV in monetary choice trials (R2 vs baseline)
Medial orbitofrontal cortex	214	4.53†	0	27	−21
Frontal superior	52	4.19	−18	42	51
Mid cingulum	46	4.01	0	−30	45
Angular gyrus	61	3.91	−57	−66	30
Middle temporal gyrus	24	3.85	60	−15	−6
Areas correlating with SVs in social choice trials (R5 vs baseline)
Medial orbitofrontal cortex	40	3.16†	6	27	−15
Areas correlating with SVs in both monetary and social choice trials
Medial orbitofrontal cortex	37	3.16†	6	27	−15

Region	No. of voxels	_Z_-score	x	y	z
Areas correlating with SV in monetary choice trials (R2 vs baseline)
Medial orbitofrontal cortex	214	4.53†	0	27	−21
Frontal superior	52	4.19	−18	42	51
Mid cingulum	46	4.01	0	−30	45
Angular gyrus	61	3.91	−57	−66	30
Middle temporal gyrus	24	3.85	60	−15	−6
Areas correlating with SVs in social choice trials (R5 vs baseline)
Medial orbitofrontal cortex	40	3.16†	6	27	−15
Areas correlating with SVs in both monetary and social choice trials
Medial orbitofrontal cortex	37	3.16†	6	27	−15

Regions are significant at P < 0.001 uncorrected and 15 voxels extent threshold.