Learning behaviour in a two alternative decision task in primate follows a linear single free parameter probability matching law (original) (raw)
Abstract
It is now widely believed that decisions are guided by a small number of internal subjective variables that determine choice preference. The process of learning manifests as a change in the state of these variables. It is not clear how to find the neural correlates of these variables, in particular because their state cannot be directly measured or controlled by the experimenter. Rather, these variables reflect the history of the subject’s actions and reward experience. We seek to construct a behavioral model that captures the dynamics of learning and decision making, such that the internal variables of this model will serve as a proxy for the subjective variables. We use the theory of reinforcement learning in order to find a behavioral model that best captures the learning dynamics of monkeys in a two-armed bandit reward schedule. We consider two families of learning algorithms: value function estimation and direct policy optimization. In the former, the values of the alternative ...
steeve laquitaine hasn't uploaded this paper.
Let steeve know you want this paper to be uploaded.
Ask for this paper to be uploaded.