Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation - PubMed (original) (raw)
Review
Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation
Peter Dayan et al. Cogn Affect Behav Neurosci. 2014 Jun.
Abstract
Evidence supports at least two methods for learning about reward and punishment and making predictions for guiding actions. One method, called model-free, progressively acquires cached estimates of the long-run values of circumstances and actions from retrospective experience. The other method, called model-based, uses representations of the environment, expectations, and prospective calculations to make cognitive predictions of future value. Extensive attention has been paid to both methods in computational analyses of instrumental learning. By contrast, although a full computational analysis has been lacking, Pavlovian learning and prediction has typically been presumed to be solely model-free. Here, we revise that presumption and review compelling evidence from Pavlovian revaluation experiments showing that Pavlovian predictions can involve their own form of model-based evaluation. In model-based Pavlovian evaluation, prevailing states of the body and brain influence value computations, and thereby produce powerful incentive motivations that can sometimes be quite new. We consider the consequences of this revised Pavlovian view for the computational landscape of prediction, response, and choice. We also revisit differences between Pavlovian and instrumental learning in the control of incentive motivation.
Figures
Figure 1
A summary comparison of computational approaches to reward learning. Columns distinguish the two chief approaches in the computational literature: model-based versus model-free. Rows show the potential application of those approaches to Instrumental versus Pavlovian forms of reward learning (or equivalently to punishment or threat learning). We suggest the Pavlovian model-based cell (colored at lower left) has hitherto been comparatively neglected, as computational approaches have tended to treat Pavlovian learning as being purely model-free. However, evidence indicates that model-based Pavlovian learning happens and is used for mesolimbic-mediated instant transformations of motivation value. By contrast, instrumental model-based systems that model the value of an outcome, based on memory of its hedonic experience, may need to retaste or re-experience outcome again after revaluation in order to update model (see text for discussion and alternatives). Each cell contains a) a brief description of its characteristic computation, b) an example of behavioral or neural demonstrations in the experimental literature, and c) a distinguishing feature by which it can be recognized in behavioral or neural experimental findings. Citations: 1) (Dickinson & Balleine, 2010); 2) (Daw et al., 2005); 3) (Robinson & Berridge, 2013); 4) (Schultz et al., 1997).
Figure 2
Instant transformation of CS incentive salience observed in Dead sea salt study (Robinson & Berridge, 2013). Initial aversive Pavlovian training of CS+ with disgusting UCS taste produces gradual learned repulsion. CS+ value declines negatively over successive CS+ pairings with NaCl UCS (learned Pavlovian values). After training, sudden hormone injections induce novel state of salt appetite. CS value is transformed instantly into positive on very first next re-encounter in new appetite state (CS+ presented alone in crucial test, without salty UCS being retasted). Behaviorally, rats approach and nibble the CS+ lever that was previously associated with disgusting NaCl taste as UCS as avidly as a different CS previously associated with a pleasant sucrose UCS. Neurobiologically, mesolimbic brain activations were observed during combination of CS+ re-encounter plus novel appetite state in dopamine-related structures: ventral tegmentum, nucleus accumbens, prefrontal cortex, etc. Quantitative transformation depicted is based on (Zhang et al., 2009)’s computational model of incentive salience. Figure modified from (Robinson & Berridge, 2013) and (Zhang et al., 2009).
References
- Anson JE, Bender L, Melvin KB. Sources of reinforcement in the establishment of self-punitive behavior. Journal of Comparative and Physiological Psychology. 1969;67(3):376–380. -PubMed
- Balleine BW. Asymmetrical interactions between thirst and hunger in Pavlovian-instrumental transfer. Quarterly Journal of Experimental Psychology B, Comparative & Physiological Psychology. 1994;47(2):211–231. -PubMed
- Balleine BW. Neural bases of food-seeking: affect, arousal and reward in corticostriatolimbic circuits. Physiology and Behavior. 2005;86(5):717–730. -PubMed
- Balleine BW, Dickinson A. Instrumental performance following reinforcer devaluation depends upon incentive learning. The Quarterly Journal of Experimental Psychology. 1991;43(3):279–296.
- Balleine BW, Garner C, Gonzalez F, Dickinson A. Motivational control of heterogeneous instrumental chains. Journal of Experimental Psychology: Animal Behavior Processes. 1995;21(3):203.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources