Reward encoding in the monkey anterior cingulate cortex (original) (raw)

. Author manuscript; available in PMC: 2007 Jul 24.

Published in final edited form as: Cereb Cortex. 2005 Oct 5;16(7):1040–1055. doi: 10.1093/cercor/bhj046

Abstract

The ACC is known to play a crucial role in the fast adaptations of behavior based on immediate reward values. What is less certain is whether the ACC is also involved in long-term adaptations to situations with uncertain outcomes. To study this issue, we placed macaque monkeys in a probabilistic context in which the appropriate strategy to maximize reward was to identify the stimulus with the highest reward value (optimal stimulus). Only knowledge of the theoretical average reward value associated with this stimulus, -referred to as “the task value”-, was available. Remarkably, in each trial, ACC pre-reward activity correlated with the task value. Importantly, this neuronal activity was observed prior to the discovery of the optimal stimulus. We hypothesize that the received rewards and the task-value, constructed a priori through learning, are used to guide behavior and identify the optimal stimulus. We tested this hypothesis by muscimol deactivation of the ACC. As predicted, this inactivation impaired the search for the optimal stimulus. We propose that ACC participates in long-term adaptation of voluntary reward-based behaviors by encoding general task values and received rewards.

Keywords: Action Potentials; physiology; Adaptation, Physiological; physiology; Animals; Cerebral Cortex; physiology; Decision Making; physiology; Gyrus Cinguli; physiology; Long-Term Potentiation; physiology; Macaca mulatta; Male; Neuronal Plasticity; physiology; Reward; Task Performance and Analysis; Volition; physiology

Introduction

Studies carried out in both humans and non-human primates suggest that the anterior cingulate cortex (ACC) plays a major role in diverse aspects of the control of voluntary reward-guided behaviors. This involvement is coherent with its identity as a limbic structure (Porrino et al., 1981; Morecraft et al., 1993; Morecraft and Van Hoesen, 1998).

First, the ACC relates actions to their expected consequences (in terms of expected rewards), and guides decisions about choice of actions (Bush et al., 2002; Walton et al., 2004; Rushworth et al., 2004, for review see Krawczyk, 2002). The expected reward is based on the knowledge of the potential resources of the environment. However, to what extent the ACC encodes the resources that are available in a given situation remains unknown.

Second, the ACC appears to be involved in the processing of actual outcomes (positive or negative) when decisions have to be made based on these outcomes. Neurophysiological studies in human have shown a specific role of the ACC in the processing of monetary gains and losses (Gehring and Willoughby, 2002). Experimental studies in monkey have also revealed that lesions or inactivation of the ACC significantly impair the reward-based selection of appropriate motor responses. One hypothesis is that the lesioned animal is unable to use the size of the reward as a cue to choice of action (Shima and Tanji, 1998; Hadland et al., 2003). However, there is no report assessing whether the ACC encodes the size of the obtained rewards.

Finally, electrophysiological studies in behaving monkeys revealed that the ACC is implicated in behavior directed toward distant rewards because the ACC assesses reward proximity (Shidara and Richmond, 2002) and is involved in reward-based sequence performance and learning (Procyk et al., 2000; Procyk and Joseph, 2001). The role of ACC in situations where subjects have to choose among different reward objects and in which the discrimination has to be made over a relatively long period of time remains largely unknown.

Using a stimulus-selection task, referred to as the “choice task”, we have investigated the contribution of the ACC to performance monitoring when an animal has to decide which of two novel target-stimuli is associated with the largest average liquid reward (the optimal stimulus). In one choice test, the discovery of the optimal stimulus requires several trials because the reward quantities associated with each stimulus is not constant. Selection of the optimal stimulus is rewarded by 1.2ml of liquid with a probability of 0.7, and by 0.4ml with a probability of 0.3. Selection of the non-optimal stimulus is reinforced by the same quantities, but with the opposite probabilities. These parameters are stable; only the stimuli change from one test to another. The appropriate strategy to maximize reward is to search for the optimal stimulus and to maintain this choice in subsequent trials.

The single unit firing frequency suggests that ACC neurons provide an evaluation of the theoretical average reward of the optimal stimulus at the beginning of the choice tests. We refer to this theoretical value as the “task value”. A second population of neurons encode the size of the actual reward obtained. The encoding of these two parameters is discussed in relation to the role of the ACC in the adaptation of voluntary reward-based behaviors.

Materials and Methods

Two male Rhesus monkeys (M1 and M2) were used in this experiment.

The animal was seated in a primate chair within arm’s reach of a tangent touch-screen (Microtouch System) coupled to a TV monitor. In the front panel of the chair, an arm-projection window was opened, allowing the monkey to touch the screen with one hand. A computer recorded the position and accuracy of each touch. It also controlled the presentation via the monitor of visual stimuli (color shapes), which served as light-targets (CORTEX software, NIMH Laboratory of Neuropsychology, Betnesda, Maryland). One of two 2 × 2 cm white squares located 10 cm below the targets was illuminated and served as starting position to initiate a trial. One position was randomly chosen (50/50) by the computer in each trial. M1 and M2 worked with the left and right hand, respectively.

Behavioral paradigms

The animals were trained to perform a choice task and a NO-choice task.

Choice task

when the monkey touched the starting position, two visual stimuli (“A” and “B”) simultaneously appear in two fixed spatial virtual windows (6° × 6°) centred on the horizontal plane at 10 cm to the right and to the left of the screen centre (Fig. 1A). Each stimulus was a combination of colored shapes. After a 2 – 3s delay period, the stimuli were briefly (100 ms) extinguished. This was the “GO” signal. The monkey then had to release the starting position and touch one of the two stimuli within 1000 ms of the onset of the GO signal. Although touching a target ended the visual presentation of the two stimuli, the animal was required to keep touching the target position until reward was delivered (squirt of fruit juice) 1 to 1.5s later (Fig. 1B). The reward was followed by a 3s time-out. Then, one starting position was re-illuminated indicating the start of a new trial. If the monkey released the position before the GO signal, the trial was aborted. The monkey then had to resume the trial until successful completion.

Fig. 1A–C.

Fig. 1A–C

A. Display monitor. Location of the two target positions. A 2 × 2 cm square located 10 cm below either one (randomly, 50/50) of the two targets served as starting position (SP) of the hand (in this figure, the left SP is represented). B. Trial events in the choice task. Grey areas correspond to the time of illumination of the starting position (SP) and of the target stimuli. E1 to E6: epochs for analysis. C. Location of task-related cells. Abbreviations: CC, corpus callosum; ArS, rostral extent of the superior branch of the arcuate sulcus; Ars, arcuate sulcus; end of SP, caudal extent of the Sulcus Principalis; SGm, medial superior gyrus; Cgd and Cgv, dorsal and ventral banks of cingulate sulcus; CgG, cingulate gyrus.

Touching stimulus “A” yielded 1.2ml of juice with a probability P = 0.7, and 0.4ml with a probability Q = 0.3. The reinforcement ratio for stimulus “B” was the opposite: the touch stimulus “B” yielded 1.2ml with a probability P = 0.3, and 0.4ml with a probability Q = 0.7. These probabilities were implemented as follow: the animal performed successive blocks of 20 trials, in which the computer randomly selected trials without repetition. In 14 trials, the choice of stimulus “A” was rewarded with 1.2ml and the choice of stimulus “B” with 0.4ml. In the other 6 trials, the stimulus-reward associations were the opposite. Stimulus “A” was presented on the right side of the screen in 50% of trials.

We defined the search period as the series of consecutive trials during which the animal searched for the good stimulus by touching either one of the two. We defined the repetition period as a series of 5 consecutive trials in which the same stimulus was selected followed by selection of the same stimulus in a) the next 5 trials, or b) 5 of the next 6 trials. The probability, by chance alone, of touching the same stimulus in a series of 10 successive choices is 0.0009 (1/1024), and the probability of touching the same stimulus, by chance alone, in the conditions of case b, is less than 0.0024 (5/2048). Thus, the probability that a repetition period occurs by chance alone is less than 1% (7/2048=0.0034). We took this performance level to indicate that the animal had selected one particular stimulus and that a decision had been reached. The search and repetition periods associated with one unknown couple of stimuli is referred to as a “choice test”. If no repetition period occurred after 50 trials, the test was aborted. When the repetition period was terminated, two new stimuli were selected and another test was initialized.

NO-choice task

This task was identical to the choice task, except that the two stimuli were fixed, identical and well-learned, and the quantity of reward delivered at the end of the trials was predictable. In this task, three kinds of trials were presented: 1) in trial 1.2ml, the stimuli (two blue rectangles) were associated with a reward of 1.2ml; 2) In trial 0.4ml, two green ellipses were associated with a reward of 0.4ml; 3). In trial 0ml, two red disks were associated with no-reward. Touching one of the two disks was mandatory for the experiment to proceed to other (rewarded) trials. The trials 1.2ml and 0.4ml trials were mixed and formed a separate block of trials. Since the animals were reluctant to work for no reward - in particular if this trial occurred too frequently-, trials 0ml were interspersed with the choice trials, trials 1.2ml and trials 0.4ml in a 1.5/10 ratio. In general, after one or two refusal(s), the animal performed the trial.

Task values

In the choice task, the systematic choice of stimulus “A” was the optimal strategy and was rewarded on the average in each trial by 0.96 ml (0.7 × 1.2 + 0.3 × 0.4 = 0.96). We refer to this average quantity as the task value. The task value represents the maximum average reward per trial offered in the task and is therefore different from the actual rewards received in single trials. Systematic choice of stimulus “B” was the worst strategy and was rewarded on the average per trial by 0.64 ml (0.3 × 1.2 + 0.7 × 0.4 = 0.64). The average reward of any intermediate strategy combining choices of “A” and “B” was between 0.64 ml and 0.96 ml, and was a linear function of the probability of choosing “A”.

The task values in trials 1.2ml, 0.4ml and 0ml were respectively equal to 1.2ml, 0.4ml and 0ml. Thus, the task values in these trials are not different from the actual rewards obtained.

Surgical procedure

Surgical procedures were carried out according to the 1986 European Communities Council Directive. They were performed under aseptic conditions. The animals received general anaesthesia during the attachment of a head holder, and later, during the implantation of recording chambers.

The animal first received an intramuscular injection of the neuroleptic chlorpromazine (Largactil 1mg/kg, im). Following premedication with atropine (1.25 mg, i.m.) and dexamethasone (4 mg, i.m.), the monkeys were prepared for surgery with ketamine hydrochloride (20mg/kg, i.m.) and chlorpromazine (2 mg/kg, i.m.). Anaesthesia was maintained with halothane in N2O/O2 (70/30). Heart rate was monitored and artificial respiration adjusted to maintain the end-tidal CO2 at 4.5 – 6%. Rectal temperature was maintained at 37°C.

A bar was attached to the skull with small stainless steel screws and then embedded in an acrylic assembly to permit subsequent head fixation. When behavioral training was completed, the animal was re-operated. With stereotaxic guidance, a stainless steel recording chambers was implanted. The chamber was positioned to provide access bilaterally to the anterior cingulate cortex (Fig. 1C).

Later, during a pause in the recording sessions, a scleral search coil constructed with Teflon-coated stainless steel wire was implanted around the conjunctiva according to the procedure described by Judge et al. (1980).

Data Analysis

Behavioral data

In trials of the choice and NO-choice tasks, the number and direction of saccades in the time period between stimulus onset and stimulus touch were studied. The identity (“A” or “B”) and location (R or L) of the touched stimulus, as well the quantity of reward given to the monkey, were recorded. Duration of the search period in each test was computed. Hand reaction time (RTs) and movement times (MTs) from the starting position to the selected stimulus were computed for each trial.

In decision-making studies, the parameters of the so-called win-stay/lose-shift strategy are commonly used (In the choice task, “win” and “loose” would correspond to the delivery of the large and small rewards, respectively). In the tests, we measured the frequency with which the win-stay (called here large-keep) and lose-stay (called small-keep) strategies were used (Fig. 2D and 2E). The large-keep strategy is the spontaneous strategy; the small-keep strategy is correct during repetition because the animal must keep selecting target “A” although this target provides the small reward in 30% of trials. The dependency of choices in trial N+1 on choices made- and on rewards obtained- at more distant previous trials (i.e. at trials N−1, N−2, etc...) was not studied.

Fig. 2A–F. Behavioral data in the choice task.

Fig. 2A–F

A. Population histogram of search sizes (in number of trials) for a sample of tests performed by the two monkeys (M1: n=197, M2 n=189). B. Global strategy of keep and change during the search period. Histograms show the absolute total number of cases in which a choice on “A” or “B” (current) was preceded by a choice on A that led to a large reward (AL), by a choice on A that led to a small reward (AS), etc. C. Large-keep and small-keep strategies during tests. In ordinates, the percent of trials using these strategies. The plots are aligned on the second trial of tests. D. Performance of the model in the choice task measured at different stages, compared to monkeys’ performance(on the right). E. Performance of the model measured for different reference values, compared to monkeys’ performance (on the right). F. Movement times (MT) and reaction times (RT) measured for the two monkeys in the different tasks (control 1.2ml, Choice task (C), control 0.4ml, and control 0ml), and during the two periods of the Choice task (Search: SEA and repetition: REP).

We analysed the large-keep and small-keep strategies in the following way: Let’s suppose that, in a population of p choice tests, the animal has received in n tests (n<p) a large reward in trial N. The proportion of choice tests -out of n - in which the same stimulus was chosen in trial N+1 represents the proportion of large-keep trials at that point and measures the strength of the corresponding strategy. In the other tests (p-n), the animal has received the small reward in trial N. The proportion of choice tests in which the same stimulus was chosen in trial N+1 measures the strength of the small-keep strategy at that point. The proportions of large-keep and small-keep trials are independent since they bear on 2 different groups of tests. The size of the 2 groups varies at each trial, but their sum remains constant (p). We also examined the possibility that the animal adopted spatial strategies, such as left-keep or right-keep.

Proportions close to 100 % of both large-keep and small-keep trials indicate that the same stimulus is systematically chosen, regardless of the rewards obtained in each trial. The strategy of the animal is “keep” in that case. Proportions close to 100% and 50% respectively, indicate that the animal has a tendency to choose the same stimulus in case of large reward, but has no fixed strategy (keep or shift) in case of small reward.

Neuronal data

Extra-cellular neuronal activity was recorded while the animal performed the choice- and the NO-choice tasks. Neurons showing clear changes in firing rates in relation to one or more task-events were selected for on-line storage in digital form (resolution, 1ms).

Rasters and peri-event histograms were constructed for all recorded neurons (MATOFF Software, NIMH, LSN, USA). Average firing rates were computed trial-by-trial in 6 epochs: the epoch starting 500ms prior to stimulus presentation (anticipatory, epoch E1), the first part of the delay i.e. 0–1000ms after stimulus onset (visual, epoch E2), the latter part of the delay before the GO-signal (pre-movement, epoch E3), the epoch between the GO-signal and the target touch (movement, epoch E4), the time interval between target touch and reward (post-movement, epoch E5) and the epoch up to 2000ms after reward delivery (post-reward, epoch E6) (Fig. 1B).

We evaluated whether the average activity in each epoch was different from baseline i.e. from the 500ms period preceding onset of the starting position (Wilcoxon paired test, p < 0.05). This was done separately for the 4 conditions: the 3 conditions of the NO-choice task and the repetition period of the choice task. If at least one of these was significant, the corresponding epoch was declared task-related and was kept for further analysis.

The effects of the expected reward (1.2, 0.4, and 0ml) in the NO-choice trials were studied in a repeated measure ANCOVA (p < 0.05). We included RTs and MTs as covariates to eliminate the possible effect of these parameters on neuronal activity. In each epoch, activities of trials 1.2ml, 0.4ml and 0ml were also compared with a Post-hoc Fisher LSD test (p < 0.05) (Table 1). The influence of spatial parameters (position of the starting point and position of the selected stimulus) was studied with ANCOVA (p < 0.05) in the choice trials (Table 2).

Table 1.

Classification of Pre- and Post-reward task-related epochs according to their average activity in the NO-choice task

Pre	Post
1	1.2 = 0.4 = 0	81 (17%)	19(24%)

2	1.2 ≠ 0.4 ≠ 0	54	5
3	1.2 ≠ (0.4 = 0)	61	8

4	Sub-total 1.2 ≠ 0.4	115(25%)	13(16%)

5	(1.2 = 0.4) > 0	207	34
6	(1.2 = 0.4) < 0	64	14

7	Sub-total (1.2 = 0.4) ≠ 0	271 (58%)	48 (60%)

Total	467(100%)	80(100%)

Table 2.

Proportion (in %) of epochs exhibiting a spatially selective activity (right/left differences) in the choice task.

Epochs	Origin effect	End-point effect
Anticipatory (E1)	13.5%(N=44)	13.5%(N=38)
Visual (E2)	7% (N=66)	18%(N=97)
Pre-movement (E3)	12%(N=60)	29% (N= 98)
Movement (E4)	9% (N=75)	38%(N=133)
Post- movement (E5)	4.5% (N=47)	34% (N=98)

Sub-total (Pre-reward)	9% (N=292)	29% (N= 464)

Post – reward (E6)	2.5% (N=42)	8% (N= 83)

To further compare activity for different epochs across neurons, and so as to reduce potential biases from high firing rate cells, we used either one of two normalization procedures which converted activity in trial 1.2ml and trial 0.4ml to standard values.

Let Xi be the average activity measured in one epoch in trial 0ml, trial 0.4ml, trial 1.2ml, or choice trials (i=1 to 4 respectively):

If this activity in trial 1.2ml = activity in trial 0.4ml = Y, then we normalized Xi by:
With this transformation, the activity in the epoch in trial 1.2ml and trial 0.4ml became both equal to 100 (cf. Fig. 5D–F).
If activity in trial 1.2ml (=Y) ≠ activity in trial 0.4ml (=Z), then we normalized Xi by:

Fig. 5A–F. Population data. Normalized pre-reward activity in trials 0ml, 0.4ml, 1.2ml and choice trial.

Fig. 5A–F

Task values are in abscissa. Data points in ordinates are the normalized average activities of epochs (m ± sd) (There is no error-bar for trial 1.2ml and trial 0.4ml, which are equal, in each epoch, to 100 and 0, respectively -see methods). Data are from 4 groups of epochs defined in Table 1 (?,✦, ? and ■).

A: 115 epochs in 57 cells; Group 1.2 ? 0.4 ? 0 (✦, line 2 in table 1) and group 1.2 ? (0.4 = 0) (?, line 3 in table). The average activity in the choice trials is 68.6 and 67.9, resp. In B and C are detailed the epochs considered in A, with their average value and number.

D: 271 epochs in 133 cells; Group (1.2 = 0.4) > 0 (?, line 5 in table 1) and group (1.2 = 0.4) < 0 (■, line 6 in table) The average activity in the choice trials is 95.1 and 104.3, resp. In E and F are detailed the epochs considered in D, with their average value and number.

With this transformation, the activity in trial 0.4ml and in trial 1.2ml became equal to 0 and 100 respectively, independently of whether activity in trial 1.2ml was initially larger or smaller than in trial 0.4ml (cf Fig. 5A–C).

We compared the time of occurrence of activity changes within a choice test with performance changes (cf. Fig. 6). To identify first epochs showing activity changes, we examined the neuronal activity curve. This curve had in abscissa the successive trials in the test and, in ordinates, the activity of the epoch in these trials. We calculated the average activity (M) and the standard deviation (SD) in the 5 last trials of the repetition period. A change during the test was noted if i) activity in the first trial of the test was not within the limits M ± 1.96 × SD, or ii) average activity of the two first or of the three first trials were different from M (Mann-Whitney U test, p < 0.05).

Fig. 6A–D. Correlation analysis between performance and activity.

Fig. 6A–D

A. Activity curve and performance curve (Data are from Cell 2, Test 2 in Fig.3). B. Example of correlation analysis between performance and activity (from data in A). Peak correlation (r = 0.7), significant at p<0.05, at lag +1. The positive lag indicates an advance of the activity over the performance. C. Population data. Intervals 11 to 15. Distribution of significant peak correlation coefficients in the two monkeys. D. Population data. Distribution of lags for the peak correlations. Positive lags are more numerous than negative lags (χ2=17.42, p<10−4, Mac Nemar).

In case where there was an activity change during a test, we then examined the “performance” curve of the test, which had in abscissa the successive trials, and in ordinates the touch of target A arbitrarily scored “1” and that of target B scored “0”. The activity curve and the performance curve were compared by a cross-correlation analysis (Mitz et al, 1991). When the cross-correlation was statistically significant at a time-lag, this lag –positive or negative– indicated a statistically significant time-shift between the two curves and an advance of one curve over the other. When significant correlations were observed at different lags, we arbitrarily selected the time lag corresponding to the largest correlation.

The correlation analyses were adapted to spatially selective cells by excluding trials in which the cell’s non-preferred side was chosen.

All statistical analyses were performed with Statistica®.

Muscimol injections

The effects of saline and muscimol injections in the ACC on the choice task were studied after the completion of the electrophysiological recordings. The sessions were “saline” or “muscimol” and were alternated daily, a muscimol session being preceded and/or followed by a saline session.

The unilateral injections were made in the ACC contralateral to the working arm. Each unilateral session started with 3 injections at 3 different sites. Each bilateral session started with 5 injections at 5 different sites (3 sites on one side and 2 sites on the other). The different injection sites in each monkey were fixed and were chosen so as to inject the largest area within the region represented in Fig. 1C.

Each session started with a period of re-training (13–30min) followed by a pause (30–40min) during which injections were made (“saline” or “muscimol”). The tip of a micro-seringe (Hamilton Company, Ø = 0.2mm) was pushed by a motorized micro-driver (Trent Wells) through a guide tube held in place by a XY micro-positioner. At each site, 2 μl of a muscimol solution (5mg/ml) or of isotonic saline were injected by pressure at a speed of 1(μl/min. The cannula was withdrawn 1 minute after each injection. The animals were tested immediately after the last injection.

Two groups of tests (i.e. saline and muscimol tests) were considered. In each group, we measured the proportions of correct choices and the frequency with which the large-keep and small-keep strategies were used.

Histology

Recording and injection sites were reconstructed by localizing, on coronal sections stained with Cresyl violet, electrolytic micro-lesions made after the experiments at crucial points within the explored regions.

Results

Behavioral results in the choice task

Determination of the probabilities P/Q

We varied the reward probabilities P/Q (P>Q, P+Q=1) of the choice task on three monkeys in a pilot study that lasted approximately 6 weeks. Increasing the difference between P and Q made the identification of the optimal stimulus theoretically easier. The different values of P/Q tested were: 0.85/0.15, 0.7/0.3, and 0.6/0.4. The size of the large and small rewards was kept constant (1.2ml and 0.4ml). The corresponding task values were 1.08ml, 0.96ml, and 0.88ml, respectively. Performance of the monkeys with each of these couples of probabilities was studied on average in 36 tests.

The animals found the optimal stimulus in 100% of the tests. According to the used P/Q parameters, the average number of trials needed to discover the optimal stimulus was 21 (± 16.4, SD), 36 (± 17.1), and 52 (± 20.4), respectively (data pooled from the three monkeys). We further tested the animals with probabilities P and Q close to 0.5. In these cases the performance dropped to chance level.

Following the pilot study, we selected P/Q = 0.7/0.3 (i.e. task value = 0.96ml) for the electrophysiological recording sessions. Two monkeys were then retrained for an additional 3 months in the choice task using this probability ratio. During this training period, the average duration of the search period decreased considerably from 36 trials to 5–9 trials. At the end of this 3-month period, we commenced extracellular electrophysiological recordings.

Strategy in the choice task during the electrophysiological recordings

When possible, we presented multiple couples of novel target-stimuli to the monkey while recording from the same neuron. This was possible in 79 cells in M1 and in 28 cells in M2.

We analysed the performance and strategies in 200 tests for each monkey. M1 and M2 found the optimal stimulus in 98% and 94.5% of these tests, respectively. In 2% of tests in M1 and in 5.5% of tests in M2, the non-optimal stimulus was chosen. The search period lasted on the average 6.4 (± 5.6, SD) and 8 (± 6.9, SD) trials, respectively. Fig. 2A shows the distribution of search lengths. In 50% of tests, duration of the search period was less than 5 trials in M1 and less than 8 trials in M2.

Overall, choices were partly driven by the size of the reward obtained in a particular trial (Fig. 2B, 2C). The diagrams in Fig. 2B represent the target choices (“A” or “B”) in the trials of the search period as a function of the choices (“A” or “B”) and of the rewards (Large or Small) obtained in the preceding trial (AL for choice of target “A” followed by a Large reward, BS for choice of target “B” followed by a Small reward, etc...). These diagrams show that receiving a large reward in a trial (AL or BL) usually led to the repetition of the same choice in the following trial. A small reward did not induce any bias in the following choice. Fig. 2C shows that the strategy Large-keep was observed from the beginning of tests (trials 2, 3, 4). The strategy Small-keep was observed in approximately half of the early trials. It then increased, in parallel with the systematic choice of the preferred stimulus during repetition, while remaining close to 70% in the late trials.

Using these data as a reference, we modelled animal’s performance and pattern of choices in terms of global strategy and reaction to reward size (see Appendix). In short, the model showed that although the large-keep/small-keep strategies are important for optimal performance in the choice task (Fig. 2D), comparing a stimulus value with the task value is the key component to reproduce a performance comparable to that of monkeys (Fig. 2D–E). Moreover, this comparison must be made continuously during the task so as to avoid perseveration on incorrect choices.

At the beginning of the search period, the animals had a preference for the position nearest to their acting arm (i.e. left arm/left position in M1, right arm/right position in M2), regardless of the identity of the stimulus. This was observed in approximately 70% of the early trials (not illustrated). This spatial bias disappeared during the repetition period. The animal then touched the right or the left position with equal probability, according to the location of the preferred stimulus.

Oculomotor and motor behavior

Oculomotor and motor activities were analysed in a restricted sample of tests (60 in M1 and 34 in M2) performed at different moments during the electrophysiological sessions. The number of saccades per trial was stable across tasks and across periods of choice tests (search/repetition) for both monkeys. During the search period, M1 and M2 fixated target “A” and target “B” equally; during the repetition, M1 fixated target “A” three times longer and M2 five times longer.

In the NO-choice trials, the reaction times (RTs) were modulated by the task-value (M1: F(2,3281)=56.94, p<10−6; M2: F(2,1952)=31.34, p<10−6, ANOVA). In the choice trial, in the two monkeys, the data showed no statistical differences between the RTs in the search and repetition periods (M1: t=0.4645, ns; M2: t=0.202, ns, unpaired t-test).

Neurons were recorded from area 24c in the dorsal bank of the anterior cingulate sulcus, in a region anterior to the level of the genu of the arcuate sulcus (Fig. 1C). We recorded 383 and 124 neurons in M1 and M2, respectively. From 372 task-related neurons, 9 cells responded to free rewards only. 31 other cells discharged in relation to execution of errors (premature release of the starting position or break of ocular fixation). They are not analysed in the present report. 195 task-related cells were recorded both in the choice and NO-choice tasks, and could be fully analysed (Table 1). In these neurons, 587 epochs were task-related (2.8 epochs per neuron).

Activity of single ACC neurons is modulated by the task value in the NO-choice task

In the NO-choice task, a modulation in relation to the task value occurred in 67% (313/467) of pre-reward task-related epochs, totalling 82% (160/195) of neurons (ANCOVA statistical test, with RTs and MTs as covariates). The modulation was observed in particular in the visual (76%), pre-movement (70%) and post-movement (74%) epochs. A modulation was also observed in relation to received reward in 31% (61/195) of post-reward epochs. RTs and MTs had no detectable effect on ACC activity given that, by comparison, a simple ANOVA (i.e. without the RTs and MTs as covariates) revealed similar modulations with task value. Fig. 3 shows examples of a motor-related (cell 1) and of a visual-related activity (cell 2) that are modulated by the task values. Fig. 4A shows a post-reward activity modulated by the quantity of obtained reward.

Fig. 3. Activity of two ACC cells during performance of the NO-choice task and two tests in the choice task.

Fig. 3

Each raster line displays cell activity recorded during one trial. Time scale is indicated below the last raster on the left. Activity scales are indicated, for each cell. Rasters and histograms are aligned on the target touch (Cell 1) and on onset of the target-stimuli (Cell 2). At the right of each raster line, activity in epoch E4 (between the GO-signal and the touch in Cell 1) and in epoch E2 (1 second after onset of the stimuli in Cell 2) are represented along the abscissa on the normalized scale 0–100. In the choice trials, the horizontal solid line “R” in the middle of rasters indicates the beginning of the repetition period. “A” or “B” and “0.4ml” or “1.2ml” at right of each raster line indicates the identity of the touched stimulus and the reward obtained. (The sequence of “A” and “B” is used to construct the performance curve; “A”=1, and “B”=0. cf.fig.6A).

In cell 1, the activity in the choice task (around 70) was statistically different from the activity in trial 1.2ml (Test1 : Z=2.21, p < 0.027; Test2 : Z=1.92, p < 0.05, Mann-Whitney U test) and in trial 0.4ml (Test1 : Z=−3.46, p < 0.0005; Test2 : Z=−3.25, p < 0.001). In cell 2, the activity in the repetition of the choice task (around 70) was statistically different from the activity in trial 1.2ml (Test1: Z=3.26, p < 0.001; Test2 : Z=3.04, p < 0.002) and in trial 0.4ml (Test1: Z=−3.25, p < 0.001; Test2 : Z=−3.25, p < 0.001). In both cells, similar patterns of activity were observed in the repetition period of tests (Test1 vs Test2 in Cell 1: F(1,19)=1.42; p<0.249; Test1 vs Test2 in Cell 2: F(1,18)=0.10; p<0.757, ANOVA). In cell 1, no statistical change of discharge between the 3 first trials and the 5 last trials was observed (Test 1: Z=1.93, ns, Test 2: Z=1.93, ns).

Fig. 4. Activity of an ACC cell responding to reward delivery in the NO-choice trials 1.2ml and 0.4ml and population data.

Fig. 4

A : Neuronal discharges are aligned on reward delivery. The data show that activity of the cell is modulated by the reward amount in the NO-choice trials 1.2 and 0.4ml (F(1,66)=32,89, p <10−6, ANOVA). B and C : Population data. Normalized post-reward activity in trials 0.4ml, 1.2ml and choice trials. B : 13 epochs in 13 cells; Group 1.2 ? 0.4 ? 0. The average activity in the choice trials when the current reward obtained is 1.2 ml and 0.4 ml is 103.4 and 18.6, respectively. C: 48 epochs in 48 cells; Group (1.2 = 0.4) ? 0. In C and D, the post-reward activity in both search and repetition periods of choice tests are pooled. The triangle represents trials in which the reward obtained was 0.4 ml (i.e. in the NO-choice trials 0.4 ml and in the choice trials in which 0.4 ml was obtained). The circle represents trials in which the reward obtained was 1.2 ml (i.e. in the NO-choice trials 1.2 ml and in the choice trials in which 1.2 ml was obtained). The average activity in the choice trials when the current reward obtained is 1.2 ml and 0.4 ml is 96 and 98.4, respectively.”

To compare precisely levels of activity in trials 1.2ml, 0.4ml, and 0ml, we applied a post-hoc Fisher test to the epochs studied with the ANCOVA. The results are in Table 1. Compared to the ANCOVA, this test revealed a larger number of epochs modulated by the task value (83% of pre-reward and 76% of post-reward epochs with the Fisher test).

Activity of single ACC neurons during the repetition period of choice tests

Since pre-reward activity in the NO-choice task can be modulated by the task value, what becomes of this activity in the choice task when the task value is a theoretical value based on probabilities? We examined the activity of the pre-reward epochs listed in Table 1 (except line 1) during the repetition. This activity was normalized as described in the Methods section.

We first considered the epochs exhibiting different activity in trial 0.4ml and 1.2ml (lines 2 and 3 in Table 1; 115 epochs in 57 cells in the two monkeys). An important step was to assess whether the activity of these epochs during the repetition was different from the activity in trial 0.4ml and trial 1.2ml. The difference was significant in 115 and in 111 epochs, respectively (at p<0.05, Mann-Whitney U test).

Fig. 3 displays the activity in the choice trials of the two cells already studied in the NO-choice trials, and shows two important features. First, in each cell, the normalized activity in the successive trials fluctuates around 70 on the 0–100 scale. Second, similar patterns of activity and average discharge rates occurred in two tests. The particular visual stimuli presented were unimportant. Thus, in the following statistical analyses, only the first test utilized to study a cell will be considered.

The population data illustrated in Fig. 5A show that the average activity in the choice trials is also close to 70. 70 is a key-value. On the normalized scale, it corresponds to the weighting of activities associated with large and small rewards in trials 1.2ml and 0.4ml (100 and 0) by their probability of occurrence (0.7 and 0.3) when the optimal stimulus is selected (0.7 × 100 + 0.3 × 0 = 70). The task value −0.96ml- corresponds to the same weighting of the two reward quantities (0.7 × 1.2 + 0.3 × 0.4 = 0.96ml). These results reveal that pre-reward activity in the choice task is modulated by the task value. Note that the oculomotor behaviour does not influence the encoding of the task-value by ACC neurons. In E1, the monkey looked at the starting position, in E2 he fixated the central fixation point, in E4 he looked at the choosen target (i.e. that is randomly located on the left or right position), in E3 and E5 he was free to perform saccadic eye movements. However, within all these epochs and whatever the oculomotor behaviour, we show that the neuronal activity encodes the task value (see Fig. 5).

We then considered the epochs in which activity was statistically the same in trials 1.2ml and 0.4ml (lines 5 and 6 in Table 1; 271 epochs in 133 cells in the two monkeys). Fig. 5D–E show that if activity is the same in trial 1.2ml and 0.4ml, it is also the same, on average, in the choice trials. Activity in these epochs is not modulated by the task value. The activity levels contrast rewarded and non-rewarded trials.

“Concerning the post-reward epochs, the results are differents. In the group of epochs exhibiting different activity in trial 0.4ml and 1.2ml in the NO-choice task (lines 2 and 3 in Table 1; 13 epochs in 13 cells in the two monkeys), the population data show that the average activity in both search and repetition periods of the choice task is close to 0 when the animal received 0.4 ml and close to 100 when he received 1.2 ml. In the group of epochs in which the activity is statistically identical in trials 1.2ml and 0.4ml but different from trials 0ml in the NO-choice task (lines 5 and 6 in Table 1, 48 epochs in 48 cells in the two monkeys), the population data show that the average activity in the choice task is close to 100 when the animal received 0.4 ml or 1.2 ml. These results reveal that post-reward activity encodes the current quantity of reward in both search and repetition periods in the choice task.”

Pre-reward ACC activity is stable during the choices tests

We tested whether the pre-reward activity observed during the repetition was already observed during the search period. For this purpose, we searched for changes in activity in all epochs in Table 1, except line 1. In 88% (340/386) of epochs, there was no statistical change in activity over the course of the test (e.g. Fig. 3, Cell 1).

In the remaining 12% of epochs (46/386), changes in activity were observed. In these epochs, the neuronal activity was at baseline level during the first trial(s), and then evolved towards the level of activity specific of the repetition period (Fig. 3, Cell 2). In these epochs, we analysed the temporal relationships between the activity and the performance curves, using the cross-correlation statistical method (Fig. 6A–D). The data showed a significant advance of the performance over the activity in 2% (9/386) of epochs, and no difference or no relation between activity and performance in 5% (18/386). In 5% (19/386) of epochs, the data showed an advance of the activity change over the performance change.

Spatial parameters and ACC activity during the repetition period of the choice test

We examined the effect of the starting position (origin effect) and of the position of the preferred stimulus (end-point effect) on ACC activity during the repetition period of choice tests. During this period, the direction of the arm-movement varied from trial to trial with the random assignment (Left or Right, 50/50) of the starting position and location of the preferred stimulus. The influence of spatial parameters was not studied in the NO-choice trials because the movements of the animals in these trials were biased towards their preferred side, i.e. the target nearest to the acting arm.

The results (Table 2) show that the ACC neurons have access to the spatial parameters of stimuli and movements. Spatial selectivity was observed in 46.2% (145/317) of neurons, in at least one epoch. The starting position had an incidence on activity in 9%-and the position of the preferred stimulus in 29%-of pre-reward epochs. Co-variance of the two factors was observed only in a small number (22) of epochs. T he position of the optimal stimulus determined the level of activity in 19 % of the visual epochs, and in 38% of the arm-movement-related epochs. Fig. 7 illustrates the activity of two spatially selective ACC cells.

Fig. 7. Spatial selectivity of ACC neurons.

Fig. 7

Cell 1 and cell 2 were recorded in the left and right ACC of M1, during the repetition in 2 tests. Positions of letters “A” and “B” in the insets designate the position of the optimal and non-optimal stimulus. Cell 1 is more active when the optimal stimulus is located on the right (Left vs Right, epoch E2: F(1,22)=867.9, p<10−6. Cell 2 is more active when the arm-movements are directed towards the right position (Left vs Right, epoch E4: F(1,31)=239.65, p< 10−6).

Behavioral deficits after muscimol injections in the ACC

The electrophysiological results show that the ACC encodes two parameters in the choice task (the task value and the size of obtained reward), which may be important for the identification of the optimal stimulus. We tested whether deactivation of this structure impaired task performance.

We injected muscimol (i.e. a GABA agonist) and saline (i.e. control) solutions where task-related neurons were recorded. Bilateral injections were made only in the second monkey, M2. The uncooperativeness of M1 prevented us from studying the effects of bilateral injections with this animal. Data were obtained from 4 unilateral saline (29 tests) and 4 unilateral muscimol (52 tests) sessions with M1, and 4 unilateral saline (31 tests), 4 unilateral muscimol (35 tests), and 2 bilateral muscimol (25 tests) sessions with M2.

In both monkeys, saline or muscimol injections did not induce impairment in gross behavior during the testing sessions nor later in their home cage. They worked daily for approximately the same amount of time (1 hour), their motivation was apparently unaffected, and we did not detect loss of appetite. In the saline sessions, performance in the choice and NO-choice tasks was normal and identical to that observed earlier during neuronal recordings.

Muscimol injections impaired performance in the choice task. Immediately after the injections, monkeys found the optimal stimulus of choice tests at chance levels (Fig. 8). Performance in the NO-choice task was at normal level. The deficit was most evident at the beginning of the testing period (period 1, Fig. 8). The animal then slowly recovered, and after 60 minutes, the performance was similar to that observed after saline injections. The proportion of successful searches was not statistically different after unilateral or bilateral injections (69% and 54% of choice tests, respectively; ns, _p_>0.24) (data pooled from both monkeys for the unilateral injections).

Fig. 8. Evolution of the performance in the choice task after unilateral (UNI) and bilateral (Bl) muscimol injections in the ACC.

Fig. 8

In abscissa, Period1: 0–15min after the last muscimol injection; Period2: 15–30min; Period3: 30–45min; Period4: 45–60min. In ordinates: percents of tests in which the good stimulus is discovered. The saline results in the 4 periods are pooled. Data are from 70 saline, 87 UNI-muscimol, and 25 BI-muscimol choice tests. The data show a deficit in the muscimol sessions during the first 45mn. Difference at _p <_ 0.001 (**) and at _p_<0.0001 (***). During period1 and period2 following unilateral and bilateral muscimol injections, performance is at chance level (50%) (Period 1: UNI: ns at _p_>0.897, BI: ns at _p_>0.636; Period 2: UNI: ns at _p_>0. 667, BI: ns at _p_>0. 636)..

The poor performance of the animal resulted from a defective strategy. After muscimol injections, the animals were still able to select one particular stimulus (A or B) during 10 successive trials in a majority of choice tests (101/112 = 90%) but tended to continue selecting the stimulus chosen in the first trial, regardless of the quantity of reward obtained. This is illustrated in Fig. 9 showing the frequent use of both the small-keep and the large-keep strategies at the beginning of tests. The strategy of the animal was “keep”, whatever the size of the reward obtained (large or small), and whatever the position of the selected stimulus. This abnormal “keep” strategy led to the frequent choice of the non-optimal stimulus, and to a shorter duration of the search period (In M1: 1.8 trial with muscimol vs 3.7 trial with saline, different at p<2.10−3; In M2: 3.8 trials with muscimol vs 8.7 trial with saline, different at p<6.10−6, Wilcoxon paired test). Spatial perseveration, which would have prevented choosing the same stimulus in the successive trials, was not observed.

Fig. 9. A–B. Usage frequency of the “large-keep” and “small-keep” strategies in the choice trials, in the saline and muscimol sessions.

Fig. 9

A. In the saline sessions, the usage frequency of the small-keep and large keep strategies in both monkeys are similar to those observed during the electrophysiological recordings (in Fig. 2C). B. In the muscimol sessions, the usage frequency of the strategy “ small-keep” at the beginning of tests is higher than in the saline sessions. Comparison of the 4 first trials in monkey 1 (χ2=7.6, p<0.006), and of the 8 first trials in monkey 2 (χ2=7.7, p<0.005). The usage frequency of the strategy “large-keep” is the same in the muscimol and in the saline sessions in M1 (χ2=0.01, ns at p<0.05), and in M2 (χ2=0.02, ns). These data indicate that, in the muscimol sessions, the “large-keep” or “small-keep” strategies are prevalent in all trials. The strategy of the animal is “keep”. (Dotted lines: confidence limits (at p<0.05) of the null hypothesis, i.e. no strategy, random choices).

Oculomotor activities were analysed during the repetition period of choice tests during saline and muscimol. The search periods were not analysed because they were rarely observed during muscimol sessions. Our results indicated that the number of saccades per trial, as well as the duration of fixation on the preferred target (i.e. the chosen one) remained stable across saline and the four muscimol periods for both monkeys [number of saccades in M1 : F(4,859)=2.37; p<0.0506, ns; number of saccades in M2 : F(4,623)=1.51; p<0.1982, ns; time spent on the preferred target in M1 : F(4,858)=2.11; p<0.0781, ns; time spent on the preferred target in M2: F(4,623)=0.56; p<0.6944, ns, ANOVA]. Furthermore, during the saline and the four muscimol sessions, M1 fixated consistently target “A” two times longer than target “B” and M2 fixated consistently target “A” four times longer than target “B”. Consequently, the poor performances observed in the muscimol session can not be attributed to oculomotor deficits.

In contrast, the animals systematically choose the optimal stimulus when this stimulus had been identified prior to the injections (for instance during the re-training period). Thus, in the first period following transient deactivation of the dorsal ACC (i.e. 0–15min after the last muscimol injection), the animals had not lost the ability to recall and choose the correct solution in known situations. Both animals selected the good stimulus (i.e. stimulus A) in 100% of trials in saline sessions (132/132), as well as in the first period following muscimol injections (96/96).”

In a few muscimol tests (11/112 = 10%), animals were unable to select one particular stimulus during 10 successive trials, even after 50 trials. This situation never occurred in the saline sessions, and was observed mostly in M2 after bilateral muscimol injections. Clearly, it was not the consequence of a spatial deficit (e.g., a perseveration towards the preferred side). The preferred location (left for M1 and right for M2) was selected in 71% of trials in these tests and in 64% of trials in the saline tests (ns, p > 0.05) (data pooled from the two monkeys).

In the unilateral saline and muscimol sessions, the animals worked with the arm contra-lateral to the injections. The time to initiate a trial (time between onset and touch of the starting position) was statistically the same in all sessions for both monkeys. During the first 15 minutes following the muscimol injections (period 1 in Fig. 8), the movements to the selected target were characterized by lower RTs and MTs compared to those of the saline sessions (Period 1 muscimol vs saline: MTs: t=2.28 p<0.02; RTs: t=5.56 p<0.01, unpaired t-test) (data pooled from the two monkeys). RTs and MTs were recovered after period 1.

Discussion

During the choice task, monkeys adopted an optimal strategy by identifying in successive trials which of two paired stimuli is associated with the optimal average reward (the task value) and in maintaining this choice in subsequent trials. Our data reveal that during search and repetition periods of the task, ACC activity encoding the onset of the stimuli and the execution of movements correlated with the task value. They also reveal that neurons encoded the size of obtained rewards. The importance of these activities for the successful adaptation of behavior is suggested by the detrimental effect of ACC inactivation on the search for the good stimulus.

The ACC encodes the task-value

We confirm that a majority of ACC activity (pre- and post-reward) discriminates rewarded from non-rewarded situations (Matsumoto et al., 2003). This discrimination is fundamental in the regulation of behavior. Moreover, while our data corroborate findings that some ACC neurons respond to reward delivery (Koyama et al., 2001, Akkal et al., 2002; Shidara and Richmond, 2002), we demonstrate for the first time that the reward quantity received in each trial is encoded by ACC activity. This was observed in 16 % (13/80, Table 1) of the ACC neurons responding to reward delivery. The ability to discriminate between the small and the large reward is a key-stage in the discrimination of the two stimuli in the choice task.

Importantly, the neuronal activity is modulated by the task-value in 25% of pre-reward epochs (in 29%-57/195- of ACC neurons), independently of whether the value is computed based on probabilistic or certain rewards. In the choice task, the task-value defines the optimal stimulus and may be used as a template in identifying the solution. This hypothesis is supported by the fact that the task value is encoded at the onset of search periods. The average value of the non-optimal stimulus (target “B”) is ignored

One limitation of this study is that we tested the same task value (0.96ml, associated with the probability ratio 0.7/0.3) in the two monkeys in the choice tests. In our view, this task value optimized 4 factors: difficulty of the task (designed to be relatively difficult), duration of the search period, high performance level, and motivation to search for the optimal stimulus. Further investigations using tasks different from the choice task would be necessary to confirm the ability of single ACC neurons to encode different task values based on probabilistic contexts.

The possibility that modulation of pre-reward activity is not related to task values but rather to other factors, such as the visual attributes (shape, color) of the stimuli, can be discounted. For instance, in the choice tests, the level of activity in response to different pairs of stimuli reached similar values (i.e. 70 or 100 on the normalized scale), regardless of the visual characteristics of the competing stimuli. An oculomotor interpretation of change in firing rate is also unlikely, given that oculomotor activity is similar in both choice and NO-choice tasks, and neuronal activity revealed no temporal correlation with saccade execution (Procyk et al. 2000; Amiez et al. in press). The kinematic characteristics of arm-movements (i.e. MTs and RTs) did not correlate with activity.

The ACC and the behavioral space

Our data showed that ACC neurons have access to the spatial parameters of stimuli and movements. They suggest that the ACC constructs sensory-motor mappings of items of the environment indexed by their reward value and spatial position. These data are supported by those obtained by Procyk et al. (2000). Hoshi et al. (2005) have also shown that neuronal activity within the ACC is dependent on the spatial location of visual stimuli, on the direction of the movement and on the arm used in a target-reach movement task in which the target location and the arm to be used were instructed by two successive cues.

Our data provide no evidence of a topographical organization of spatial parameters in the ACC. Neurons selective for one position (R or L) were recorded in close proximity to neurons selective for the other position. Moreover, our unilateral deactivations of the ACC with muscimol induced no significant neglect -or preference-for one side or another. These data are supported by human literature that has reported no asymmetry in spatial behaviors after unilateral focal lesion of the ACC (Turken and Swick, 1999).

Relations to other structures

The modulation of neuronal activity by the task value –tested with several fixed rewards- has been already described in structures interconnected with the ACC, including prefrontal cortex (Watanabe, 1996; Leon and Shadlen, 1999; Hikosaka and Watanabe, 2000; Watanabe et al., 2002, Wallis and Miller, 2003), parietal cortex (Platt and Glimcher, 1999; Sugrue et al, 2004), posterior cingulate cortex (McCoy et al., 2003), striatum (Tremblay et al, 1998, Hassani et al, 2001; Itoh et al, 2003) and mesencephalic dopaminergic neurons (Fiorillo et al 2003). Whether and how all these structures encode the task value in probabilistic conditions such as the choice task remains to be determined. Recent works suggest that parietal cortex and dopaminergic neurons contribute to the computation and use of task values in probabilistic context (Fiorillo et al 2003; Sugrue et al 2004).

In orbitofrontal cortex (OFC), neuronal activity also depends on the expected reward value (Wallis and Miller, 2003). We recorded neuronal activity from the OFC in M1 and M2 during performance of the choice task (Amiez and Joseph, in preparation). During the search period, the activity of OFC neurons remains close to that observed in trial 0.4ml; during the repetition periods, the activity is not proportional to the task value, but is equal to that observed in trial 1.2ml. These results suggest that the response obtained in trial 1.2ml is the response to the preferred targets (Tremblay and Schultz, 1999).

How is the ACC information integrated at the system level? To be effective the task value must be used as the reference value during successive trials. One possibility is that the task value is compared to stimulus-specific reward information updated in the dorsolateral prefrontal cortex. Recent work suggested that reward-based response selection might involve a particular bond between ACC activity representing reward-response associations and lateral prefrontal cortex activity (Matsumoto et al., 2003). Our model (see Appendix) shows that a decision-test using comparison of a stimulus value with the task value is a key component to generate optimal performance in the choice task and that this comparison has to be continuously updated to avoid perseveration on incorrect choices.

Deactivation of the ACC and discovery of the optimal stimulus

We show that after ACC deactivation with muscimol, the animal does not cease to work for reward, but selects the optimal stimulus at chance level. There is a tendency to re-select in successive trials the stimulus selected in the first, regardless of the reward obtained in that trial.

The deactivation of the ACC does not induce a general impairment in all aspects of stimulus selection. When presented with a pair of stimuli in which the optimal stimulus had already been discovered during the retraining period prior to the injections, the optimal selection was maintained. The animal thus repeats the choices made in previous tests. This result is coherent with the fact that, within a choice test, the animal also repeats the choices made in the first trial. Furthermore, when presented with trial 0ml, there was reluctance to work. Learned behavioral reactions remain unaffected by ACC deactivation.

Hadland et al. (2003) have also shown that bilateral ACC lesions do not interfere with the performance of learned visual discriminations. Further, this study claimed that the lesions failed to impair learning of new discriminations and concluded that the ACC does not establish the relationships between stimulus and reward, but instead establish those between response and reward. Shima and Tanji (1998) have reported that inactivation of the cingulate motor area (CMAr) induced an inability to switch motor responses when selection was instructed by decrements in reward level, but not when the switch was instructed by an auditory cue.. In summary, both studies indicated that the ACC is concerned with action-reward associations more than it is with stimulus-reward associations.

In apparent contrast, we show that dorsal ACC deactivation does interfere with visual discrimination learning in the choice task. The poor performance cannot be explained by a defective response selection or by a motor deficit. The motor responses were not erratic, but instead varied according to the position of the stimulus selected in the first trial.

These two sets of results suggest that the ACC is only modestly involved when the learning of simple reward-based visual discrimination simply requires the ability to discriminate a rewarded stimulus from a non-rewarded one (e.g. paradigm of Hadland et al.). In contrast, the ACC is strongly implicated in stimulus-reward association tasks when competing stimuli are rewarded according to a probabilistic rule, and when the discrimination has to be made over a relatively long period of time (e.g. the choice task). We propose that the deficit observed after muscimol injections results from an inability to use the outcomes of action and/or the task value.

Role of the ACC

In line with the conflict-monitoring hypothesis (Botvinick et al, 2001), one could suggest that, in the choice task, conflicts between stimuli or between responses are more likely to occur during the search period than during the repetition period. In this hypothesis, the ACC is a conflict detector, and its activity should be different in the two periods. Our data show that the activity is nearly the same. This indicates that the hypothesis of a role for the ACC in conflict monitoring does not account for the present data.

The concepts of executive attention or attention to action (Posner and DiGirolamo 1998; Passingham 1996) are too general to characterize accurately the role of the ACC in the choice task. Our data fit with recent propositions about the role of the ACC in monitoring actions in relation to outcomes (Rushworth et al., 2004). This function might be seen as a particular dimension of attention to action.

A key function of the human and non-human primate ACC would be to construct representations of the task value in given situations. It is likely that this process relies on the integrity of a specific network composed of the ACC, the lateral PFC, the ventral striatum, the orbitofrontal cortex and the dopaminergic system (Schultz 2000). The task value is contextual and might be based on a cost-benefit analysis, i.e. both on the value of the expected outcome and on the cost of performing the corresponding action (Rushworth et al., 2004). We propose that these representations, embedded in the sensory-motor activity, are used to guide behavioral choices.

To become efficient, the representations of the task-value must be updated when environmental reward contingencies are modified. We propose a reiterant circular process in which, firstly, the online processing of negative and positive outcomes modifies task values. Updating of the task values by integrating outcomes is fundamental to rapid adaptations of behavior. Secondly, outcomes are evaluated in relation to existing task values constructed through successive trials. Here, the relative encoding of outcomes is important for appropriate reactions to particular contexts and expectancies. Alteration of this function would lead to aberrant decision-making and abnormal reward seeking such as those observed in drug addiction.

Acknowledgments

We would like to thank H. Kennedy and S. Mackey for helpful comments on the manuscript.

Appendix

Using data obtained in monkeys trained in the choice task as a reference, we tried to model the behavior of the animal in terms of global performance and pattern of choices.

Following the analysis of behavior and neuronal activity in the ACC during the choice task, we hypothesized that two main parameters should be important to solve the task: 1) a rapid trial to trial adaptation to reward size, and 2) an evaluation of stimulus values relative to the optimal reward, i.e. to the task value. We thus introduced and tested both parameters in the model which was compared to the performance and strategies of the animals.

Programming

The algorithms used to model monkeys’ behavior were developed using MatLab programming. Random values (X) uniformly distributed in the interval (0,1) were generated with the function rand(). They were used to generate reward sizes and choices in each trial.

The reward size associated with each choice is given according to: If A is chosen and X ≥ 0.3 then reward size is 1.2; else reward size is 0.4 (70% large rewards). If B is chosen and X ≥ 0.7 then reward size is 1.2; else reward size is 0.4 (30% large rewards).

Two main parameters were used to evaluate performances: Percent of repetitions of target B, and mean number of trials in the search period. A repetition was counted when, as for monkeys’ behavioral evaluation, 5 identical successive choices were followed by 5 choices in which at worst one choice was different from the others. For instance, series like A-A-A-A-A-A-A-A-A-A and A-A-A-A-A-A-A-B-A-A were identified as repetitions of target A.

1. Initial testing

The initial limited version of the model is based on the assumption that the search behavior in the choice task follows a strategy of adaptation to the size of reward received: the large-keep/small-keep strategy. In this version, the average values of rewards obtained with stimulus A and B are not taken into account in the successive trials.

The first choice (A or B) is randomly made (pA=pB=0.5). Subsequent choices were then determined by the large-keep/small-keep strategy, which-follows a biased random rule similar to that observed in animal’s behavior (Fig. 2C); At this stage, the bias is used as a parameter to test the influence of this strategy on the overall performance. For instance, if a 80% large-keep /50% small-keep strategy is used, then choice N+1 is a repetition of choice N with a probability 0.8 if the reward in N was large (1.2) and with a probability of 0.5 if reward in N was small (0.4). We ran the model on 5 sessions (5 times 5000 trials) for different large-keep/small-keep strategy features

The results show that both large-keep and small-keep tendencies have an influence on performance. Increasing the frequency of large-keep (i.e. increasing the tendency to repeat a choice when it is rewarded by a large reward) improve performance at all levels (Fig. 2D, left diagram). However in this configuration the model makes proportionally more incorrect choices (i.e. chose B in repetition) than what was observed in monkeys (13.9% in the model vs 2% and 5% in monkeys).

The behavioral data in monkeys also show that the small-keep strategy increases in the successive trials of test (Fig. 2C). Increasing the frequency of small-keep (i.e. increasing the tendency to repeat a choice when it is rewarded by a small reward) reduces the average length of searches but it also increases the number of repetition in which B is chosen (average for 80% large-keep / 100% small-keep strategy: 52.1% of repetition with B- not illustrated). Thus, logically, increasing the tendency of small-keep induces a tendency to choose B. At this stage, the model is globally worse than the monkeys.

2. Including the task value

To improve the model we added a decision-test module. This test corresponds to the evaluation/comparison of stimulus values to the optimal value, i.e. the task value. The task value (0.96 ml) is defined in the core of the paper.

Initial choices are made using a large-keep/small-keep strategy of 80/50 as observed in monkeys (see Fig. 2C). After each reward, the value of the just chosen stimulus is evaluated by computing the average of all rewards obtained with this stimulus from the beginning of the test. Each stimulus value is then separately tested for being within the range TaskValue +/− Limit. If one of the stimulus values is within the task-value limits, then this stimulus is chosen to enter the repetition phase. At that point the large-keep/small-keep strategy is abandoned and choices are made according to the evaluation of stimuli relative to the task value. The Limit was varied in order to define the influence of the precision of the evaluation on global performance. A Limit of 0.15 corresponds to the range 0.8ml–1.2ml.

In this stage, the model did not allow a return from repetition if, later, the stimulus value was going out of the limits. Results show that whereas searches were short enough to mimic monkeys’ behavior, a large number of repetitions were made choosing the incorrect target (B) (Fig. 2D, model ‘no return’). We thus modified the test to continuously evaluate - even during repetition - whether the value of the stimulus chosen (A or B) stayed within the limits around the task value. If not, the alternative choice was made. The data (Fig. 2D, model ‘return’) show that in this condition both the duration of searches and the proportion of B chosen in the repetitions reach values similar to those obtained with monkeys (e.g. for Limit=0.15, percent repetition with B = 3.3%, mean searches= 4.14 trials). At this stage, the model reproduces well monkeys’ performance in the choice task.

We finally evaluated the influence of the task value (0.96) by using other values corresponding to the maximum reward (1.2), the average reward (0.8) with random choices, and the average reward when B is chosen (0.64) (Fig. 2E). Results are presented in Fig. 1B. They show that the best performance is obtained with the real task value (0.96).

To summarize, the model showed that a decision-test using comparison of a stimulus value with the task value is a key component to produce good performance in the choice task. This comparison must be made continuously to avoid perseveration on incorrect choices. The encoding of the task value all along the repetition period in the monkeys’ ACC support the hypothesis that this is indeed the strategy used to solve the task.

References

Amiez C, Joseph JP, Procyk E. Anterior cingulate error-related activity is modulated by predicted reward. doi: 10.1111/j.1460-9568.2005.04170.x. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
Akkal D, Bioulac B, Audin J, Burbaud P. Comparison of neuronal activity in the rostral supplementary and cingulate motor areas during a task with cognitive and motor demands. Eur J Neurosci. 2002;15:887–904. doi: 10.1046/j.1460-9568.2002.01920.x. [DOI] [PubMed] [Google Scholar]
Botvinick MM, Braver TS, Barch DM, Carter CS, Cohen JD. Conflict monitoring and cognitive control. Psychol Rev. 2001;108:624–652. doi: 10.1037/0033-295x.108.3.624. [DOI] [PubMed] [Google Scholar]
Bush G, Vogt BA, Holmes J, Dale AM, Greve D, Jenike MA, Rosen BR. Dorsal anterior cingulate cortex: a role in reward-based decision making. Proc Natl Acad Sci U S A. 2002;99:523–528. doi: 10.1073/pnas.012470999. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science. 2003;299(5614):1898–1902. doi: 10.1126/science.1077349. [DOI] [PubMed] [Google Scholar]
Gehring WJ, Willoughby AR. The medial frontal cortex and the rapid processing of monetary gains and losses. Science. 2002;295:2279–2282. doi: 10.1126/science.1066893. [DOI] [PubMed] [Google Scholar]
Hadland KA, Rushworth MF, Gaffan D, Passingham RE. The anterior cingulate and reward-guided selection of actions. J Neurophysiol. 2003;89:1161–1164. doi: 10.1152/jn.00634.2002. [DOI] [PubMed] [Google Scholar]
Hassani OK, Cromwell HC, Schultz W. Influence of expectation of different rewards on behavior-related neuronal activity in the striatum. J Neurophysiol. 2001;85(6):2477–2489. doi: 10.1152/jn.2001.85.6.2477. [DOI] [PubMed] [Google Scholar]
Hikosaka K, Watanabe M. Delay activity of orbital and lateral prefrontal neurons of the monkey varying with different rewards. Cereb Cortex. 2000;10:263–271. doi: 10.1093/cercor/10.3.263. [DOI] [PubMed] [Google Scholar]
Hoshi E, Sawamura H, Tanji J. Neurons in rostral cingulate motor area monitor multiple phases of visuomotor behavior with modest parametric selectivity. J Neurophysiol. 2005 doi: 10.1152/jn.01201.2004. in press. [DOI] [PubMed] [Google Scholar]
Itoh H, Nakahara H, Hikosaka O, Kawagoe R, Takikawa Y, Aihara Y. Correlation of primate caudate neural activity and saccade parameters in reward-oriented behavior. J Neurophysiol. 2003;89:1774–1783. doi: 10.1152/jn.00630.2002. [DOI] [PubMed] [Google Scholar]
Judge SJ, Richmond BJ, Chu FC. Implantation of magnetic search coils for measurement of eye position: an improved method. Vision Res. 1980;20:535–538. doi: 10.1016/0042-6989(80)90128-5. [DOI] [PubMed] [Google Scholar]
Koyama T, Kato K, Tanaka YZ, Mikami A. Anterior cingulate activity during pain-avoidance and reward tasks in monkeys. Neurosci Res. 2001;39(4):421–430. doi: 10.1016/s0168-0102(01)00197-3. [DOI] [PubMed] [Google Scholar]
Krawczyk DC. Contributions of the prefrontal cortex to the neural basis of human decision making. Neurosci and Biobehav Rev. 2002;26:631–664. doi: 10.1016/s0149-7634(02)00021-0. [DOI] [PubMed] [Google Scholar]
Leon MI, Shadlen MN. Effect of expected reward magnitude on the response of neurons in the dorsolateral prefrontal cortex of the macaque. Neuron. 1999;24:415–425. doi: 10.1016/s0896-6273(00)80854-5. [DOI] [PubMed] [Google Scholar]
Matsumoto M, Suzuki W, Tanaka K. Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science. 2003;301:229–232. doi: 10.1126/science.1084204. [DOI] [PubMed] [Google Scholar]
McCoy AN, Crowley JC, Haghighian G, Dean HL, Platt ML. Saccade reward signals in posterior cingulate cortex. Neuron. 2003;40:1031–1040. doi: 10.1016/s0896-6273(03)00719-0. [DOI] [PubMed] [Google Scholar]
McCoy AN, Platt ML. Expectations and outcomes: decision-making in the primate brain. J Comp Physiol A. 2003;191:201–211. doi: 10.1007/s00359-004-0565-9. [DOI] [PubMed] [Google Scholar]
Mitz AR, Godschalk M, Wise SP. Learning-dependant neuronal activity in the premotor cortex: activity during the acquisition of conditional motor associations. J Neurosci. 1991;11:1155–1872. doi: 10.1523/JNEUROSCI.11-06-01855.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
Morecraft RJ, Van Hoesen GW. Convergence of limbic input to the cingulate motor cortex in the rhesus monkey. Brain Res Bull. 1998;45:209–232. doi: 10.1016/s0361-9230(97)00344-4. [DOI] [PubMed] [Google Scholar]
Morecraft RJ, Geula C, Mesulam MM. Architecture of connectivity within a cingulo-fronto-parietal neurocognitive network for directed attention. Arch Neurol. 1993;50:279–284. doi: 10.1001/archneur.1993.00540030045013. [DOI] [PubMed] [Google Scholar]
Passingham RE. Attention to action. Proc R Soc Lond B Biol Sci. 1996;351:1473–1480. doi: 10.1098/rstb.1996.0132. [DOI] [PubMed] [Google Scholar]
Platt ML, Glimcher PW. Neural correlates of decision variables in parietal cortex. Nature. 1999;400:233–238. doi: 10.1038/22268. [DOI] [PubMed] [Google Scholar]
Porrino LJ, Crane AM, Goldman-Rakic PS. Direct and indirect pathways from the amygdala to the frontal lobe in rhesus monkeys. J Comp Neurol. 1981;198:121–136. doi: 10.1002/cne.901980111. [DOI] [PubMed] [Google Scholar]
Posner MI, Petersen SE, Fox PT, Raichle ME. Localization of cognitive operations in the human brain. Science. 1988;240:1627–1631. doi: 10.1126/science.3289116. [DOI] [PubMed] [Google Scholar]
Procyk E, Tanaka YL, Joseph JP. Anterior cingulate activity during routine and non-routine sequential behaviors in macaques. Nat Neurosci. 2000;3:502–508. doi: 10.1038/74880. [DOI] [PubMed] [Google Scholar]
Procyk E, Joseph JP. Characterization of serial order encoding in the monkey anterior cingulate sulcus. Eur J Neurosci. 2001;14:1041–1046. doi: 10.1046/j.0953-816x.2001.01738.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rushworth MF, Walton ME, Kennerley SW, Bannerman DM. Action sets and decisions in the medial frontal cortex. Trends Cogn Sci. 2004;8(9):410–417. doi: 10.1016/j.tics.2004.07.009. [DOI] [PubMed] [Google Scholar]
Schultz W. Multiple reward signals in the brain. Nat Rev Neurosci. 2000;1:199–207. doi: 10.1038/35044563. [DOI] [PubMed] [Google Scholar]
Sugrue LP, Corrado GS, Newsome WT. Matching behavior and the representation of value in the parietal cortex. Science. 2004;304(5678):1782–1287. doi: 10.1126/science.1094765. [DOI] [PubMed] [Google Scholar]
Shidara M, Richmond BJ. Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science. 2002;296:1709–1711. doi: 10.1126/science.1069504. [DOI] [PubMed] [Google Scholar]
Shima K, Tanji J. Role for cingulate motor area cells in voluntary movement selection based on reward. Science. 1998;282:1335–1338. doi: 10.1126/science.282.5392.1335. [DOI] [PubMed] [Google Scholar]
Tremblay L, Schultz W. Relative reward preference in primate orbitofrontal cortex. Nature. 1999;398:704–708. doi: 10.1038/19525. [DOI] [PubMed] [Google Scholar]
Tremblay L, Hollerman JR, Schultz W. Modifications of reward expectation-related neuronal activity during learning in primate striatum. J Neurophysiol. 1998;80:964–977. doi: 10.1152/jn.1998.80.2.964. [DOI] [PubMed] [Google Scholar]
Turken AU, Swick D. Response selection in the human anterior cingulate cortex. Nat Neurosci. 1999;2(10):920–924. doi: 10.1038/13224. [DOI] [PubMed] [Google Scholar]
Van Veen V, Carter CS. The anterior cingulate as a conflict monitor: fMRI and ERP studies. Physiol Behav. 2002;77:477–482. doi: 10.1016/s0031-9384(02)00930-7. [DOI] [PubMed] [Google Scholar]
Wallis JD, Miller EK. Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task. Eur J Neurosci. 2003;18(7):2069–2081. doi: 10.1046/j.1460-9568.2003.02922.x. [DOI] [PubMed] [Google Scholar]
Walton ME, Devlin JT, Rushworth M. Interactions between decision making and performance monitoring within prefrontal cortex. Nat Neurosci. 2004;7(11):1259–1265. doi: 10.1038/nn1339. [DOI] [PubMed] [Google Scholar]
Watanabe M. Reward expectancy in primate prefrontal neurons. Nature. 1996;382:629–632. doi: 10.1038/382629a0. [DOI] [PubMed] [Google Scholar]
Watanabe M, Hikosaka K, Sakagami M, Shirakawa S. Coding and monitoring of motivational context in the primate prefrontal cortex. J Neurosci. 2002;22:2391–2400. doi: 10.1523/JNEUROSCI.22-06-02391.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]