Social and monetary reward learning engage overlapping neural substrates (original) (raw)

Journal Article

,

1California Institute of Technology, Computations and Neural Systems, MC 136-93 Pasadena and 2California Institute of Technology, Division of Humanities and Social Sciences, MC 228-77 Pasadena, CA 91125, USA

Search for other works by this author on:

,

1California Institute of Technology, Computations and Neural Systems, MC 136-93 Pasadena and 2California Institute of Technology, Division of Humanities and Social Sciences, MC 228-77 Pasadena, CA 91125, USA

Search for other works by this author on:

1California Institute of Technology, Computations and Neural Systems, MC 136-93 Pasadena and 2California Institute of Technology, Division of Humanities and Social Sciences, MC 228-77 Pasadena, CA 91125, USA

Search for other works by this author on:

Received:

26 October 2010

Accepted:

17 January 2011

Cite

Alice Lin, Ralph Adolphs, Antonio Rangel, Social and monetary reward learning engage overlapping neural substrates, Social Cognitive and Affective Neuroscience, Volume 7, Issue 3, March 2012, Pages 274–281, https://doi.org/10.1093/scan/nsr006
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Learning to make choices that yield rewarding outcomes requires the computation of three distinct signals: stimulus values that are used to guide choices at the time of decision making, experienced utility signals that are used to evaluate the outcomes of those decisions and prediction errors that are used to update the values assigned to stimuli during reward learning. Here we investigated whether monetary and social rewards involve overlapping neural substrates during these computations. Subjects engaged in two probabilistic reward learning tasks that were identical except that rewards were either social (pictures of smiling or angry people) or monetary (gaining or losing money). We found substantial overlap between the two types of rewards for all components of the learning process: a common area of ventromedial prefrontal cortex (vmPFC) correlated with stimulus value at the time of choice and another common area of vmPFC correlated with reward magnitude and common areas in the striatum correlated with prediction errors. Taken together, the findings support the hypothesis that shared anatomical substrates are involved in the computation of both monetary and social rewards.

INTRODUCTION

The brain needs to compute several distinct signals in order for an organism to learn how to make sound decisions among alternatives. First, at the time of choice, values need to be assigned to the different stimuli associated with each choice option [which we refer to as stimulus values (SV)]; these are subsequently compared in order to choose the option with the highest value (Wallis, 2007; Rangel et al., 2008; Kable and Glimcher, 2009; Rushworth et al., 2009; Rangel and Hare, 2010). Stimulus value signals have been found in ventral and medial sectors of the prefrontal cortex (vmPFC) in several human fMRI (Kable and Glimcher, 2007; Plassmann et al., 2007; Tom et al., 2007; Hare et al., 2008, 2009; Chib et al., 2009; FitzGerald et al., 2009; Litt et al., 2009; Levy et al., 2010; Plassmann et al., 2010) and non-human primate electrophysiological studies (Wallis and Miller, 2003; Padoa-Schioppa and Assad, 2006, 2008; Kennerley et al., 2009; Kennerley and Wallis, 2009; Padoa-Schioppa, 2009) during choices involving non-social rewards, as well as during social decisions such as donations to charities (Hare et al., 2010).

Having made a choice, the brain needs to compute the reward value associated with the outcomes generated by the choice. These signals are often called reward magnitude or experienced utility (R). Several human fMRI studies have found that activity in medial regions of orbitofrontal cortex (OFC) correlates with behavioral measures of experienced utility for a wide variety of social and non-social reward modalities (Blood and Zatorre, 2001; Small et al., 2001, 2003; de Araujo et al., 2003; McClure et al., 2003; Kringelbach, 2005; Plassmann et al., 2008; Smith et al., 2010).

A third critical component is the combination of the previous two signals into a prediction-error signal (PE) that is used to update stimulus values (Schultz et al., 1997). The key involvement of the ventral striatum in this third component is borne out by a sizable and rapidly growing body of human fMRI studies of reinforcement learning that have used almost exclusively non-social rewards such as monetary payments (Delgado et al., 2000; Berns et al., 2001; Pagnoni et al., 2002; O'Doherty et al., 2003b, 2004; Pessiglione et al., 2006; Yacubian et al., 2006; Seymour et al., 2007; Hare et al., 2008).

Although the findings summarized above have been replicated across species, techniques and experimental designs, the vast majority of studies have used only non-social rewards such as juice, food or money, and only a handful have directly compared social and non-social rewards. This raises a fundamental question: do the same brain regions implement reward-learning computations for social and non-social rewards? Or might the areas that encode SV, PE and R be different for social rewards, analogously to the specialized perceptual processing of social stimuli (Kanwisher and Yovel, 2006)? While a very few other studies have recently approached this issue (Izuma et al., 2008; Zink et al., 2008; Smith et al., 2010), no study to date has investigated the question using identical tasks across the same subjects, and in a task that allows us to compare the encoding of the three types of basic reward signals defined above. We undertook such an investigation here using model-based fMRI.

METHODS

Participants

Twenty-seven female participants from the Caltech community participated in the study (mean age = 22.4 years; range 18–28). Five were excluded from further analyses: four due to excessive head movement, one due to failure to understand task instructions. All participants were fully right-handed, had normal or corrected-to-normal vision, had no history of psychiatric or neurological disease and were not taking medications that might have interfered with BOLD-fMRI. All gave informed consent under a protocol approved by the Caltech IRB.

Task

Participants played two structurally identical versions of an instrumental learning task, one with monetary rewards, the second with social rewards (Figure 1A). A trial began with the display of two visually distinctive slot machines, each associated with one of three outcome distributions: mean-positive, -negative and -neutral (Figure 1B).

Task and behavioral results. (A) Timeline of the monetary and social reward trials. Choice trials paired a neutral slot machine with a valenced slot machine. Trials were identical except for the nature of the outcomes: monetary trials had a gain/loss of +$1, 0$ or −$1, whereas social trials revealed happy, neutral or angry faces accompanied with sound effects of similar emotional valence. The experiment also included no-choice trials (in which a pair of identical slot machines were shown: neutral, negative or positive) to help separate the learning and stimulus value signals. Specific slot machines were randomly assigned to specific reward outcomes at the start of the experiment for each subject, and distinct between monetary and social condition blocks. (B) Distribution of outcomes for each slot machine. First row: negative machine. Second row: positive machine. Bottom row: neutral machine. The same distribution was used in the monetary and social conditions. Actual appearance of the slot machines was randomly paired with a reward outcome distribution and distinct between monetary and social condition blocks. (C) Plot of group subject choices across trials (only the first 30 are shown). (D) Psychometric choice curve for monetary and social conditions. Bars denote standard error measures computed across subjects.

Fig. 1

Task and behavioral results. (A) Timeline of the monetary and social reward trials. Choice trials paired a neutral slot machine with a valenced slot machine. Trials were identical except for the nature of the outcomes: monetary trials had a gain/loss of +$1, 0$ or −$1, whereas social trials revealed happy, neutral or angry faces accompanied with sound effects of similar emotional valence. The experiment also included no-choice trials (in which a pair of identical slot machines were shown: neutral, negative or positive) to help separate the learning and stimulus value signals. Specific slot machines were randomly assigned to specific reward outcomes at the start of the experiment for each subject, and distinct between monetary and social condition blocks. (B) Distribution of outcomes for each slot machine. First row: negative machine. Second row: positive machine. Bottom row: neutral machine. The same distribution was used in the monetary and social conditions. Actual appearance of the slot machines was randomly paired with a reward outcome distribution and distinct between monetary and social condition blocks. (C) Plot of group subject choices across trials (only the first 30 are shown). (D) Psychometric choice curve for monetary and social conditions. Bars denote standard error measures computed across subjects.

All participants completed one social and one monetary block of 148 trials each; block order was randomized between participants. There were two types of trials in each block. In 100 choice trials the neutral slot machine was shown paired with either the positive or negative slot machine (50/50 probability with randomized order), and participants chose one by pressing a left or right button. We refer to these as free choice trials. In 48 non-choice trials two identical copies of one of the three slot machines were shown (1/3, 1/3, 1/3 probability with randomized order), and participants merely pressed either the left or right button in order to advance the trial. We refer to these as forced choice trials. Up to 2.5 s were allowed for choice in both cases, followed by a uniformly blank screen displayed for 1–5 s (flat distribution), followed by the reward outcome displayed for 1.5 s, followed by an intertrial interval of a uniformly blank screen displayed for 1–6 s (flat distribution). Note that participants were not told the reward probabilities associated with each slot machine and had to learn them by trial and error during the task.

The forced trials provide an essential control for a potential important confound in the study. One potential concern is that the presentation of positive and aversive social outcomes might induce in the brain ‘correct’ and ‘error’ feedback signals at outcome during the social trials. This is a problem because this would suggest that the common locus of activity is not due to the activation of a social reward, but to the activation of these error feedback signals. The forced trials provide a control for this concern because when there is no free choice, there can be no error feedback regarding the correctness of the choice.

Stimuli and rewards

The slot machines in both conditions were represented by cartoon images of actual slot machines that varied in color and pattern (Figure 1). In the social condition, reward outcomes were color photographs of unfamiliar faces from the NimStim collection (Tottenham et al., 2009) showing either an angry (negative outcome), neutral (neutral outcome) or happy (positive outcome) emotional expression, presented together with emotionally matched words played through headphones (normalized for volume and duration). Examples of positive words are excellent, bravo and fantastic. Examples of negative words are stupid, moron and wrong. Examples of neutral words are desk, paper and stapler. Extensive prior piloting had demonstrated the behavioral efficacy of these stimuli in reward learning.

In the monetary condition, the positive outcome was a gain of one dollar (an image of a dollar bill), the negative condition was a loss of one dollar (image of a dollar bill crossed out) and the neutral condition involved no change in monetary payoff (image of an empty rectangle). Subjects were paid out the sum of their earnings at the end of the experiment.

Computational model

We computed trial- and subject-specific values for each of the three variables described in the Introduction. The SV for every slot machine was calculated as the 10-trial moving average proportion of times that the machine was chosen when it was shown, a continuous value between 0–1. Consistent with this coding, R were assigned a value of 1 if they were positive; a value of 0.5 if they were neutral and a value of 0 if they were negative. PE at the time of outcome were calculated using a simple Rescorla–Wagner learning rule (Rescorla and Wagner, 1972) as the difference between the value of the reward outcome and the stimulus value of the machine selected for that trial: PEt = Rt – SVt.

Note three things about the value normalizations. First, our approach deviates from the usual practice in neuroscience studies of reinforcement learning (Pessiglione et al., 2006, 2008; Seymour et al., 2007; Lohrenz et al., 2007; Hare et al., 2008; Wunderlich et al., 2009) in which it is customary to fit the values of the SV signal based on the predictions of the best fitting learning model. Here we depart from that practice because the revealed preference approach provides more accurate measures of the values computed at the time of choice (as shown in Figure 1D). Second, without loss of generality we normalize the reward outcome signals to 0 for negative outcomes and 1 for positive outcomes. Note that given the parametric nature of the general linear model specified below, this normalization does not affect the identification of areas that exhibit significant correlation with this variable. Third, we use the standard definition of prediction errors used in the literature.

Image acquisition

T2*-weighted gradient-echo echo-planar (EPI) images with BOLD contrast were collected on a Siemens 3T Trio. To optimize signal in the OFC, we acquired slices in an oblique orientation of 30° to the anterior commissure–posterior commissure line (Deichmann et al., 2003) and used an eight-channel phased array head coil. Each volume comprised 32 slices. Data was collected in four sessions ( ∼ 12 min each). The imaging parameters were as follows: TR = 2 s, TE = 30 ms, FOV = 192 mm, 32 slices with 3 mm thickness resulting in isotropic 3 mm voxels. Whole-brain high-resolution T1-weighted structural scans (1 × 1 × 1 mm) were co-registered with their mean T2*-weighted images and averaged together to permit anatomical localization of the functional activations at the group level.

fMRI pre-processing

The imaging data was analyzed using SPM5 (Wellcome Department of Imaging Neuroscience, Institute of Neurology, London, UK). Functional images were corrected for slice acquisition time within each volume, motion-corrected with realignment to the last volume, spatially normalized to the standard Montreal Neurological Institute EPI template and spatially smoothed using a Gaussian kernel with a full-width at half-maximum of 8 mm. Intensity normalization and high-pass temporal filtering (filter width = 128 s) were also applied to the data.

fMRI data analysis

The data analysis proceeded in three steps. First, we estimated a general linear model with AR(1). This model was designed to identify regions in which BOLD activity was parametrically related to SV, R and PE. The model included the following regressors:

We orthogonalized the modulators for the main regressors that had more than one modulator (e.g. R2 and R3). The model also included six head motion regressors, session constants and missed trials as regressors of no interest. The regressors of interest and missed trial regressor were convolved with a canonical HRF.

Second, we calculated the following first-level single-subject contrasts: (i) R2 vs baseline, (ii) R5 vs baseline, (iii) R14 vs baseline, (iv) R15 vs baseline, (v) R17 vs baseline and (vi) R18 vs baseline.

Third, we calculated second-level group contrasts using a one-sample _t_-test of the first level contrast statistics.

Finally, we also performed a conjunction analysis between the equivalent contrasts for the monetary and social conditions to identify areas involved in similar computations in both cases. The results are shown in Figure 2 and reported in Tables 1–3. For inference purposes we used an omnibus threshold of P < 0.001 uncorrected with an extent threshold of 15 voxels_._ However, given the strong priors from the previous literature about the role of the vmPFC in encoding stimulus value and reward outcome signals, as well as the role of the ventral striatum in encoding prediction errors, we also report activity in these two areas if they survive small volume corrections (SVC) at P < 0.05. The mask for the SVC in vmPFC at choice was taken using a sphere of 10-mm radius defined around the peak activation coordinates that correlated with stimulus values in Rolls et al. (Rolls et al., 2008). The mask for the vmPFC SVC at reward outcome was given by a sphere of 10-mm radius defined around the peak coordinates that correlated with the magnitude of reward outcome in O’Doherty et al. (O’Doherty et al., 2002). The mask for the SVC in ventral striatum was taken using a sphere of 10-mm radius defined around the peak activation coordinates that correlated with prediction errors in Pessiglione et al. (Pessiglione et al., 2006). For display purposes only activity in selected SPMs is reported at P < 0.005 uncorrected with an extent threshold of five voxels. Anatomical localizations were performed by overlaying the _t-_maps on a normalized structural image averaged across subjects, and with reference to an anatomical atlas (Duvernoy, 1999).

Basic Neuroimaging results. (Top) Activation in the vmPFC correlated with SV at the time of free choice in both monetary and social conditions. (Middle) Activation in the vStr correlated with PE at the time of outcome in both monetary and social free choice conditions (albeit the conjunction did not survive our omnibus threshold). (Bottom) Activation in the vmPFC correlated with R in both monetary and social free choice conditions. For illustration purposes only, all images are thresholded at P < 0.005 uncorrected with an extent threshold of 15 voxels, except for the conjunction of PE which is P < 0.005 with an extent threshold of five voxels (see Tables 1–3 for details).

Fig. 2

Basic Neuroimaging results. (Top) Activation in the vmPFC correlated with SV at the time of free choice in both monetary and social conditions. (Middle) Activation in the vStr correlated with PE at the time of outcome in both monetary and social free choice conditions (albeit the conjunction did not survive our omnibus threshold). (Bottom) Activation in the vmPFC correlated with R in both monetary and social free choice conditions. For illustration purposes only, all images are thresholded at P < 0.005 uncorrected with an extent threshold of 15 voxels, except for the conjunction of PE which is P < 0.005 with an extent threshold of five voxels (see Tables 1–3 for details).

Table 1

Regions correlating with stimulus value at cue

Region No. of voxels _Z_-score x y z
Areas correlating with SV in monetary choice trials (R2 vs baseline)
Medial orbitofrontal cortex 214 4.53 0 27 −21
Frontal superior 52 4.19 −18 42 51
Mid cingulum 46 4.01 0 −30 45
Angular gyrus 61 3.91 −57 −66 30
Middle temporal gyrus 24 3.85 60 −15 −6
Areas correlating with SVs in social choice trials (R5 vs baseline)
Medial orbitofrontal cortex 40 3.16 6 27 −15
Areas correlating with SVs in both monetary and social choice trials
Medial orbitofrontal cortex 37 3.16 6 27 −15
Region No. of voxels _Z_-score x y z
Areas correlating with SV in monetary choice trials (R2 vs baseline)
Medial orbitofrontal cortex 214 4.53 0 27 −21
Frontal superior 52 4.19 −18 42 51
Mid cingulum 46 4.01 0 −30 45
Angular gyrus 61 3.91 −57 −66 30
Middle temporal gyrus 24 3.85 60 −15 −6
Areas correlating with SVs in social choice trials (R5 vs baseline)
Medial orbitofrontal cortex 40 3.16 6 27 −15
Areas correlating with SVs in both monetary and social choice trials
Medial orbitofrontal cortex 37 3.16 6 27 −15

Regions are significant at P < 0.001 uncorrected and 15 voxels extent threshold.

†Survives P < 0.05 small volume correction. Coordinates reported in MNI space.

Table 1

Regions correlating with stimulus value at cue

Region No. of voxels _Z_-score x y z
Areas correlating with SV in monetary choice trials (R2 vs baseline)
Medial orbitofrontal cortex 214 4.53 0 27 −21
Frontal superior 52 4.19 −18 42 51
Mid cingulum 46 4.01 0 −30 45
Angular gyrus 61 3.91 −57 −66 30
Middle temporal gyrus 24 3.85 60 −15 −6
Areas correlating with SVs in social choice trials (R5 vs baseline)
Medial orbitofrontal cortex 40 3.16 6 27 −15
Areas correlating with SVs in both monetary and social choice trials
Medial orbitofrontal cortex 37 3.16 6 27 −15
Region No. of voxels _Z_-score x y z
Areas correlating with SV in monetary choice trials (R2 vs baseline)
Medial orbitofrontal cortex 214 4.53 0 27 −21
Frontal superior 52 4.19 −18 42 51
Mid cingulum 46 4.01 0 −30 45
Angular gyrus 61 3.91 −57 −66 30
Middle temporal gyrus 24 3.85 60 −15 −6
Areas correlating with SVs in social choice trials (R5 vs baseline)
Medial orbitofrontal cortex 40 3.16 6 27 −15
Areas correlating with SVs in both monetary and social choice trials
Medial orbitofrontal cortex 37 3.16 6 27 −15

Regions are significant at P < 0.001 uncorrected and 15 voxels extent threshold.

†Survives P < 0.05 small volume correction. Coordinates reported in MNI space.

Table 2

Regions correlating with prediction error at outcome

Region No. of voxels _Z_-score x y z
Areas correlating with PE in monetary choice trials (R13 vs baseline)
Putamen 25 4.07 −15 6 −12
Caudate 22 3.75 9 9 −3
Precuneus 15 3.49 −18 −51 33
Areas correlating with PE in social choice trials (_R_16 vs baseline)
Areas correlating with PE in both monetary and social choice trials
Region No. of voxels _Z_-score x y z
Areas correlating with PE in monetary choice trials (R13 vs baseline)
Putamen 25 4.07 −15 6 −12
Caudate 22 3.75 9 9 −3
Precuneus 15 3.49 −18 −51 33
Areas correlating with PE in social choice trials (_R_16 vs baseline)
Areas correlating with PE in both monetary and social choice trials

Regions are significant at P < 0.001 uncorrected and 15 voxels extent threshold.

†Survives P < 0.05 small volume correction. Coordinates reported in MNI space.

Table 2

Regions correlating with prediction error at outcome

Region No. of voxels _Z_-score x y z
Areas correlating with PE in monetary choice trials (R13 vs baseline)
Putamen 25 4.07 −15 6 −12
Caudate 22 3.75 9 9 −3
Precuneus 15 3.49 −18 −51 33
Areas correlating with PE in social choice trials (_R_16 vs baseline)
Areas correlating with PE in both monetary and social choice trials
Region No. of voxels _Z_-score x y z
Areas correlating with PE in monetary choice trials (R13 vs baseline)
Putamen 25 4.07 −15 6 −12
Caudate 22 3.75 9 9 −3
Precuneus 15 3.49 −18 −51 33
Areas correlating with PE in social choice trials (_R_16 vs baseline)
Areas correlating with PE in both monetary and social choice trials

Regions are significant at P < 0.001 uncorrected and 15 voxels extent threshold.

†Survives P < 0.05 small volume correction. Coordinates reported in MNI space.

Table 3

Regions correlating with reward at outcome

Region No. of Voxels Z score x y z
Areas correlating with R in monetary choice trials (R14 vs baseline)
Occipital 124 4.74 21 −75 15
Insula 125 4.68 −33 3 12
Inferior parietal 116 4.43 −51 −36 27
Occipital 59 4.29 −6 87 18
Insula 33 4.23 39 −18 18
Cingulum 52 3.99 −6 9 36
Medial frontal gyrus 86 3.96 −15 −6 57
Inferior parietal 78 3.95 51 −33 30
Medial obitofrontal cortex 136 3.88 6 33 −12
Superior frontal gyrus 26 3.84 −18 27 57
Superior frontal gyrus 20 3.66 −30 36 33
Rolandic operculum 18 3.66 57 0 12
Heschl gyrus 21 3.63 −39 −24 3
Inferior parietal 21 3.61 −36 −27 24
Calcarine 15 3.42 −18 −72 9
Areas correlating with R in social choice trials (_R_17 vs baseline)
Medial orbitofrontal cortex 29 4.16 −6 36 −15
Areas correlating with R in both monetary and social choice trials
Medial orbitofrontal cortex 129 4.16 −6 36 −15
Region No. of Voxels Z score x y z
Areas correlating with R in monetary choice trials (R14 vs baseline)
Occipital 124 4.74 21 −75 15
Insula 125 4.68 −33 3 12
Inferior parietal 116 4.43 −51 −36 27
Occipital 59 4.29 −6 87 18
Insula 33 4.23 39 −18 18
Cingulum 52 3.99 −6 9 36
Medial frontal gyrus 86 3.96 −15 −6 57
Inferior parietal 78 3.95 51 −33 30
Medial obitofrontal cortex 136 3.88 6 33 −12
Superior frontal gyrus 26 3.84 −18 27 57
Superior frontal gyrus 20 3.66 −30 36 33
Rolandic operculum 18 3.66 57 0 12
Heschl gyrus 21 3.63 −39 −24 3
Inferior parietal 21 3.61 −36 −27 24
Calcarine 15 3.42 −18 −72 9
Areas correlating with R in social choice trials (_R_17 vs baseline)
Medial orbitofrontal cortex 29 4.16 −6 36 −15
Areas correlating with R in both monetary and social choice trials
Medial orbitofrontal cortex 129 4.16 −6 36 −15

Regions are significant at P < 0.001 uncorrected and 15 voxels extent threshold.

†Survives P < 0.05 small volume correction. Coordinates reported in MNI space.

Table 3

Regions correlating with reward at outcome

Region No. of Voxels Z score x y z
Areas correlating with R in monetary choice trials (R14 vs baseline)
Occipital 124 4.74 21 −75 15
Insula 125 4.68 −33 3 12
Inferior parietal 116 4.43 −51 −36 27
Occipital 59 4.29 −6 87 18
Insula 33 4.23 39 −18 18
Cingulum 52 3.99 −6 9 36
Medial frontal gyrus 86 3.96 −15 −6 57
Inferior parietal 78 3.95 51 −33 30
Medial obitofrontal cortex 136 3.88 6 33 −12
Superior frontal gyrus 26 3.84 −18 27 57
Superior frontal gyrus 20 3.66 −30 36 33
Rolandic operculum 18 3.66 57 0 12
Heschl gyrus 21 3.63 −39 −24 3
Inferior parietal 21 3.61 −36 −27 24
Calcarine 15 3.42 −18 −72 9
Areas correlating with R in social choice trials (_R_17 vs baseline)
Medial orbitofrontal cortex 29 4.16 −6 36 −15
Areas correlating with R in both monetary and social choice trials
Medial orbitofrontal cortex 129 4.16 −6 36 −15
Region No. of Voxels Z score x y z
Areas correlating with R in monetary choice trials (R14 vs baseline)
Occipital 124 4.74 21 −75 15
Insula 125 4.68 −33 3 12
Inferior parietal 116 4.43 −51 −36 27
Occipital 59 4.29 −6 87 18
Insula 33 4.23 39 −18 18
Cingulum 52 3.99 −6 9 36
Medial frontal gyrus 86 3.96 −15 −6 57
Inferior parietal 78 3.95 51 −33 30
Medial obitofrontal cortex 136 3.88 6 33 −12
Superior frontal gyrus 26 3.84 −18 27 57
Superior frontal gyrus 20 3.66 −30 36 33
Rolandic operculum 18 3.66 57 0 12
Heschl gyrus 21 3.63 −39 −24 3
Inferior parietal 21 3.61 −36 −27 24
Calcarine 15 3.42 −18 −72 9
Areas correlating with R in social choice trials (_R_17 vs baseline)
Medial orbitofrontal cortex 29 4.16 −6 36 −15
Areas correlating with R in both monetary and social choice trials
Medial orbitofrontal cortex 129 4.16 −6 36 −15

Regions are significant at P < 0.001 uncorrected and 15 voxels extent threshold.

†Survives P < 0.05 small volume correction. Coordinates reported in MNI space.

RESULTS

Behavioral results

Participants reliably learned to select the slot machine associated with the highest probability of a positive valenced outcome within a few choice trials for both social and non-social rewards (Figure 1C). The figure also reveals two additional interesting patterns about the learning process. First, participants were somewhat slower at learning to discriminate between social rewards than between monetary rewards. For example, by the 10th exposure, the positive monetary machine was chosen with 92% whereas the social positive machine was chosen with 72% frequency (P < 0.001). Second, participants were slower in learning to avoid the negative slot machines than in choosing the positive ones. For example, by the tenth presentation the positive slot machines were chosen 85% of the time, whereas the negative ones were avoided only 68% of the time (P < 0.001). Both differences were not significant on the last third of the learning trials, which suggests that they are related to the speed of learning, and not to the ability to ultimately learn the value of the stimuli.

Figure 1D shows the psychometric choice curves for the social and monetary conditions based on their SV. Note several things about the curves. First, when the values of valenced and neutral slot machines were identical, participants exhibited no choice bias (0.5 on the _y_-axis corresponds to 0.0 on the _x_-axis). Second, the choice curves are not significantly different from each other (greatest difference at x = 0.25 had P = 0.32 with Bonferroni correction). Third, the choice curve is asymmetric: whereas participants chose the valenced slot machine over the neutral slot machine with probability close to one when its relative stimulus value was sufficiently positive (far right side of curve), subjects chose the neutral slot machine only 80% of the time even when it was the most favorable (far left side of curve).

Neural correlates of stimulus values

We estimated a parametric general linear model of the BOLD signal to identify areas in which activation correlated with SV at the time of choice, and with PE and R at outcome, during free choice trials (see ‘Methods’ section for details). In the free choice monetary task, activation in the vmPFC correlated with SV of the slot machines. SV signals were additionally found in the mid-cingulum, the superior frontal gyrus and the angular gyrus (Table 1 and Figure 2). In the free choice social task, activation correlating with SV was also found in a similar region of vmPFC. A conjunction analysis showed that activation in a common area of vmPFC correlated with SV in both social and monetary conditions.

Neural correlates of prediction errors

In the free choice monetary task, PE correlated with activation in the caudate and putamen (Table 2 and Figure 2). In the free choice social task, PE did not exhibit any correlations at our omnibus threshold (P < 0.001 uncorrected, 15 voxels). However, for completeness we show areas of the striatum that correlate with PE in the social free choice condition at P < 0.005 uncorrected, as well as the resulting conjunction results using this lower threshold.

Neural correlates reward magnitude

In the free choice monetary task, reward outcome correlated with activation in vmPFC, insula, occipital cortex, cingulate gyrus and superior frontal gyrus (Table 3 and Figure 2). In the free choice social task, reward outcome correlated with activation in vmPFC. A conjunction analysis revealed that activation in a common area of the vmPFC correlated with reward magnitude in the social and non-social conditions.

Ruling out a potential confound

A non-trivial potential confound is that the happy and angry faces might activate ‘correct’ and ‘error’ feedback signals in the brain regarding the adequacy of choice, and that the areas of co-activation might be due to the presence of these error signals, and not the computation of social rewards. In fact, these types of stimuli have previously been used just for that purpose (Cools et al., 2007). Fortunately, the forced choice trials provide a control that allows us to test if the previous results are driven by this potential confound. Figure 3 describes the strength of the correlation between outcome reward signals and BOLD activity in the area of vmPFC identified by the conjunction of outcome rewards in both conditions. It shows that the strength of the correlation in the social and monetary trials is of similar magnitude and not statistically different (P = 0.91, two-sided paired _t_-test) even in the absence of error feedback. This implies that the signal in the vmPFC during social outcomes cannot be attributed to error feedback, and that the concern about the potential confound in this task was unfounded.

ROI analysis of outcome reward signals in vmPFC during forced choice trials. Average beta plots for activity during reward outcome in forced choice trials. The functional mask of vmPFC is given by the area that exhibits correlation with reward outcomes in social and monetary free choice trials at P < 0.05 SVC. The P-values inside the bars are for t-tests vs zero.

Fig. 3

ROI analysis of outcome reward signals in vmPFC during forced choice trials. Average beta plots for activity during reward outcome in forced choice trials. The functional mask of vmPFC is given by the area that exhibits correlation with reward outcomes in social and monetary free choice trials at P < 0.05 SVC. The _P_-values inside the bars are for _t_-tests vs zero.

DISCUSSION

A fundamental open question in behavioral and social neuroscience is whether common regions of the brain encode the value signals that are necessary to make sound decisions for both social and non-social rewards. Prior evidence suggested that there might be such an overlap. In the case of stimulus values, a recent paper found that the values of charities at the time of decision making were encoded in areas of the vmPFC that overlap with those that have been found for private rewards (Hare et al., 2010). In the case of experienced utility for social rewards, several studies found that activity in the OFC correlates with the perceived attractiveness of faces (Aharon et al., 2001; O’Doherty et al., 2003a; Cloutier et al., 2008; Smith et al., 2010). Finally, in the case of prediction errors, studies have found that activity in the ventral striatum correlates with prediction error-like signals in a task involving the receipt of anticipated social rewards (Spreckelmeyer et al., 2009) and in tasks involving social reputation and status (Izuma et al., 2008; Zink et al., 2008). These latter two studies in particular, compared both social and monetary rewards, as we did in the present study, and provided strong initial evidence for the idea that neural representations for these two types of rewards are at least partly overlapping. What has been missing to date is a study that compared social and non-social rewards across tasks whose basic structure and reward probabilities are matched for the two types of rewards, and in which the three basic computations associated with reward learning (SV, PE and R) are at work.

We addressed this open question by asking subjects to perform an otherwise identical simple probabilistic learning decision-making task in which stimuli were associated with either monetary or social rewards. We found evidence for common signals in all cases: a common area of vmPFC correlated with SV, a common area of vmPFC correlated with R, and common areas of ventral striatum correlated with PE, albeit in the later case only at a relatively low threshold of P < 0.005 unc. Together with other recent findings (Izuma et al., 2008; Zink et al., 2008; Chib et al., 2009; Hare et al., 2010), our results provide increasing support that overlapping areas of vmPFC and ventral striatum encode value signals for both types of rewards (Montague and Berns, 2002; Rangel, 2008).

Behaviorally, our subjects were slower to learn the value of social and negative stimuli. Since the type of reinforcement learning models that have been successfully used to account for the behavioral data do not predict such asymmetries (Rescola and Wagner, 1972; Sutton and Barto, 1998; Montague and Berns, 2002; Niv and Montague, 2008), this raises an apparent puzzle. However, there are two potential explanations for this aspect of the findings. First, the reward magnitude of both types of stimuli might not have been perfectly matched in our population (so that, for example, subjects found the $1 outcome more rewarding than the positive social stimuli). Second, individuals stop selecting the negative slot machine after a while, which means that learning stops and subjects might not get sufficient negative reinforcement to learn the full extent of the negative outcomes associated with these machines.

We emphasize that the existence of areas involved in the encoding of reward in social and non-social situations does not mean that the full network involved in processing both types of rewards is identical. For example, it is known that areas involved is theory of mind computations are more likely to become active during social decisions than during choices among non-social rewards (Saxe and Kanswisher, 2003; Saxe, 2006; Krach et al., 2010).

It is important to highlight two limitations of our results. First, given the limited spatial resolution of fMRI we cannot rule out the possibility that there might be neuronal subpopulations within the vmPFC and ventral striatum specialized in valuing certain types of rewards. Future studies using fMRI adaptation designs, or direct electrophysiological recordings within these regions, will have to address this issue before the existence of a common valuation currency can be definitely established.

Second, previous experiments suggest that males and females process some types of social rewards differently (Spreckelmeyer et al., 2009), which opens the possibility that there might be a gender difference in the extent to which common circuitry is used in the social and non-social domains to carry out basic reward computations. Unfortunately, we cannot resolve this issue with this data set since only females participated in the experiment.

Conflict of Interest

None declared.

This work is supported in part by grants from the Betty and Gordon Moore Foundation, an NSF IGERT (to A.L.) training grant, and a grant from NIMH (to R.A.).

REFERENCES

Beautiful faces have variable reward value: fMRI and behavioral evidence

,

Neuron

,

2001

, vol.

32

(pg.

537

-

51

)

Predictability modulates human brain response to reward

,

Journal of Neuroscience

,

2001

, vol.

21

(pg.

2793

-

8

)

Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion

,

Proceedinds of the National Academy of Sciences USA

,

2001

, vol.

98

(pg.

11818

-

23

)

Evidence for a common representation of decision values for dissimilar goods in human ventromedial prefrontal cortex

,

Journal of Neuroscience

,

2009

, vol.

29

(pg.

12315

-

20

)

Are attractive people rewarding? Sex differences in the neural substrates of facial attractiveness

,

Journal of Cognitive Neuroscience

,

2008

, vol.

20

(pg.

941

-

51

)

L-DOPA disrupts activity in the nucleus accumbens during reversal learning in Parkinson's disease

,

Neuropsychopharmacology

,

2007

, vol.

32

(pg.

180

-

9

)

Taste-olfactory convergence, and the representation of the pleasantness of flavour, in the human brain

,

European Journal of Neuroscience

,

2003

, vol.

18

(pg.

2059

-

68

)

Optimized EPI for fMRI studies of the orbitofrontal cortex

,

Neuroimage

,

2003

, vol.

19

(pg.

430

-

41

)

Tracking the hemodynamic responses to reward and punishment in the striatum

,

Journal of Neurophysiology

,

2000

, vol.

84

(pg.

3072

-

7

)

,

The Human Brain: Surface, Three-Dimensional Sectional Anatomy with MRI, and Blood Supply

,

1999

Berlin

Springer

The role of human orbitofrontal cortex in value comparison for incommensurable objects

,

Journal of Neuroscience

,

2009

, vol.

29

(pg.

8388

-

95

)

Self-control in decision-making involves modulation of the vMPFC valuation system

,

Science

,

2009

, vol.

324

(pg.

646

-

8

)

Value computations in ventral medial prefrontal cortex during charitable decision making incorporate input from regions involved in social cognition

,

Journal of Neuroscience

,

2010

, vol.

30

(pg.

583

-

90

)

Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors

,

Journal of Neuroscience

,

2008

, vol.

28

(pg.

5623

-

30

)

Processing of social and monetary rewards in the human striatum

,

Neuron

,

2008

, vol.

58

(pg.

284

-

94

)

The neural correlates of subjective value during intertemporal choice

,

Nature Neuroscience

,

2007

, vol.

10

(pg.

1625

-

33

)

The neurobiology of decision: consensus and controversy

,

Neuron

,

2009

, vol.

63

(pg.

733

-

45

)

The fusiform face area: a cortical region specialized for the perception of faces

,

Philosophical Transactions of The Royel Society London B: Biological Science

,

2006

, vol.

361

(pg.

2109

-

28

)

Neurons in the frontal lobe encode the value of multiple decision variables

,

Journal of Cognitive Neuroscience

,

2009

, vol.

21

(pg.

1162

-

78

)

Evaluating choices by single neurons in the frontal lobe: outcome value encoded across multiple decision variables

,

European Journal of Neuroscience

,

2009

, vol.

29

(pg.

2061

-

73

)

The rewarding nature of social interactions

,

Frontiers in Behavioural Neuroscience

,

2010

, vol.

4

pg.

22

The human orbitofrontal cortex: linking reward to hedonic experience

,

Nature Reviews Neuroscience

,

2005

, vol.

6

(pg.

691

-

702

)

The neural representation of subjective value under risk and ambiguity

,

Journal of Neurophysiology

,

2010

, vol.

103

(pg.

1036

-

47

)

Dissociating valuation and saliency signals during decision-making

,

Cerebral Cortex

,

2011

, vol.

21

(pg.

95

-

102

)

Neural signature of fictive learning signals in a sequential investment task

,

Proceedings of The National Academy of Sciences USA

,

2007

, vol.

104

(pg.

9493

-

8

)

Temporal prediction errors in a passive learning task activate human striatum

,

Neuron

,

2003

, vol.

38

(pg.

339

-

46

)

Neural economics and the biological substrates of valuation

,

Neuron

,

2002

, vol.

36

(pg.

265

-

84

)

Theoretical and empirical studies of learning

,

Neuroeconomics: Decision-Making and the Brain

,

2008

New York

Elsevier

Temporal difference models and reward-related learning in the human brain

,

Neuron

,

2003

, vol.

38

(pg.

329

-

37

)

Neural responses during anticipation of a primary taste reward

,

Neuron

,

2002

, vol.

33

(pg.

815

-

26

)

Dissociable roles of ventral and dorsal striatum in instrumental conditioning

,

Science

,

2004

, vol.

304

(pg.

452

-

4

)

Beauty in a smile: the role of medial orbitofrontal cortex in facial attractiveness

,

Neuropsychologia

,

2003

, vol.

41

(pg.

147

-

55

)

Range-adapting representation of economic value in the orbitofrontal cortex

,

Journal of Neuroscience

,

2009

, vol.

29

(pg.

14004

-

14

)

Neurons in the orbitofrontal cortex encode economic value

,

Nature

,

2006

, vol.

441

(pg.

223

-

6

)

The representation of economic value in the orbitofrontal cortex is invariant for changes of menu

,

Nature Neuroscience

,

2008

, vol.

11

(pg.

95

-

102

)

Activity in human ventral striatum locked to errors of reward prediction

,

Nature Neuroscience

,

2002

, vol.

5

(pg.

97

-

8

)

Subliminal instrumental conditioning demonstrated in the human brain

,

Neuron

,

2008

, vol.

59

(pg.

561

-

7

)

Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans

,

Nature

,

2006

, vol.

442

(pg.

1042

-

5

)

Orbitofrontal cortex encodes willingness to pay in everyday economic transactions

,

Journal of Neuroscience

,

2007

, vol.

27

(pg.

9984

-

8

)

Appetitive and aversive goal values are encoded in the medial orbitofrontal cortex at the time of decision making

,

Journal of Neuroscience

,

2010

, vol.

30

(pg.

10799

-

808

)

Marketing actions can modulate neural representations of experienced pleasantness

,

Proceedings of The National Academy of Sciences USA

,

2008

, vol.

105

(pg.

1050

-

4

)

The computation and comparison of value in goal-directed choice

,

Neuroeconomics: Decision Making and the Brain

,

2008

New York

Elsevier

A framework for studying the neurobiology of value-based decision making

,

Nature Reviews Neuroscience

,

2008

, vol.

9

(pg.

545

-

56

)

Neural computations associated with goal-directed choice

,

Current Opinion in Neurobiology

,

2010

, vol.

20

(pg.

262

-

70

)

A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement

,

Classical Conditioning II: Current Research and Theory

,

1972

New York, NY

Appleton Century Crofts

(pg.

406

-

12

)

Expected value, reward outcome, and temporal difference error representations in a probabilistic decision task

,

Cerebral Cortex

,

2008

, vol.

18

(pg.

652

-

63

)

General mechanisms for making decisions?

,

Currrent Opinion Neurobiology

,

2009

, vol.

19

(pg.

75

-

83

)

Uniquely human social cognition

,

Currrent Opinion in Neurobiology

,

2006

, vol.

16

(pg.

235

-

9

)

People thinking about thinking people: the role of the temporo-parietal junction in “theory of mind”

,

Neuroimage

,

2003

, vol.

19

(pg.

1835

-

42

)

A neural substrate of prediction and reward

,

Science

,

1997

, vol.

275

(pg.

1593

-

9

)

Differential encoding of losses and gains in the human striatum

,

Journal of Neuroscience

,

2007

, vol.

27

(pg.

4826

-

31

)

Dissociation of neural representation of intensity and affective valuation in human gustation

,

Neuron

,

2003

, vol.

39

(pg.

701

-

11

)

Changes in brain activity related to eating chocolate: from pleasure to aversion

,

Brain

,

2001

, vol.

124

(pg.

1720

-

33

)

Distinct value signals in anterior and posterior ventromedial prefrontal cortex

,

Journal of Neuroscience

,

2010

, vol.

30

(pg.

2490

-

5

)

et al.

Anticipation of monetary and social reward differently activates mesolimbic brain structures in men and women

,

Social Cognitive and Affective neuroscience

,

2009

, vol.

4

(pg.

158

-

65

)

,

Reinforcement Learning: An Introduction

,

1998

Cambridge

MIT Press

The neural basis of loss aversion in decision-making under risk

,

Science

,

2007

, vol.

315

(pg.

515

-

8

)

et al.

The NimStim set of facial expressions: judgments from untrained research participants

,

Psychiatry Research

,

2009

, vol.

168

(pg.

242

-

9

)

Orbitofrontal cortex and its contribution to decision-making

,

Annual Review of Neuroscience

,

2007

, vol.

30

(pg.

31

-

56

)

Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task

,

European Journal of Neuroscience

,

2003

, vol.

18

(pg.

2069

-

81

)

Neural computations underlying action-based decision making in the human brain

,

Proceedings of The National Academy of Sciences USA

,

2009

, vol.

106

(pg.

17199

-

204

)

Dissociable systems for gain- and loss-related value predictions and errors of prediction in the human brain

,

Journal of Neuroscience

,

2006

, vol.

26

(pg.

9530

-

7

)

Know your place: neural processing of social hierarchy in humans

,

Neuron

,

2008

, vol.

58

(pg.

273

-

283

)

© The Author(s) (2011). Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Citations

Views

Altmetric

Metrics

Total Views 7,497

5,751 Pageviews

1,746 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 3
December 2016 8
January 2017 58
February 2017 50
March 2017 40
April 2017 41
May 2017 51
June 2017 41
July 2017 33
August 2017 45
September 2017 45
October 2017 24
November 2017 18
December 2017 45
January 2018 74
February 2018 105
March 2018 175
April 2018 143
May 2018 82
June 2018 69
July 2018 53
August 2018 99
September 2018 147
October 2018 138
November 2018 181
December 2018 136
January 2019 88
February 2019 126
March 2019 174
April 2019 199
May 2019 208
June 2019 164
July 2019 246
August 2019 204
September 2019 133
October 2019 144
November 2019 71
December 2019 67
January 2020 52
February 2020 67
March 2020 70
April 2020 45
May 2020 62
June 2020 71
July 2020 58
August 2020 70
September 2020 51
October 2020 73
November 2020 59
December 2020 52
January 2021 26
February 2021 61
March 2021 75
April 2021 70
May 2021 54
June 2021 76
July 2021 66
August 2021 50
September 2021 57
October 2021 44
November 2021 59
December 2021 99
January 2022 52
February 2022 45
March 2022 114
April 2022 58
May 2022 91
June 2022 74
July 2022 66
August 2022 59
September 2022 73
October 2022 116
November 2022 61
December 2022 84
January 2023 62
February 2023 64
March 2023 71
April 2023 51
May 2023 65
June 2023 45
July 2023 51
August 2023 38
September 2023 68
October 2023 65
November 2023 47
December 2023 87
January 2024 71
February 2024 77
March 2024 86
April 2024 85
May 2024 75
June 2024 63
July 2024 78
August 2024 40
September 2024 51
October 2024 51
November 2024 18

Citations

259 Web of Science

×

Email alerts

Citing articles via

More from Oxford Academic