Doina Precup | McGill University (original) (raw)

Papers by Doina Precup

Research paper thumbnail of Quantifying the determinants of outbreak detection performance through simulation and machine learning

•We developed a model for quantifying determinants of outbreak detection performance. •We used B... more •We developed a model for quantifying determinants of outbreak detection performance.
•We used Bayesian networks to model relations between outbreak and algorithm characteristics and detection performance.
•We used the model to predict detection performance for different outbreak scenarios.
•The model can provide a quantitative evaluation of new methods and data in biosurveillance systems.

Research paper thumbnail of Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning

In this paper, we advocate the use of Sparse Distributed Memories (SDMs) for on-line, value-based... more In this paper, we advocate the use of Sparse Distributed Memories (SDMs) for on-line, value-based reinforcement learning (RL). SDMs provide a linear, local function approximation scheme, designed to work when a very large/ high-dimensional input (address) space has to be mapped into a much smaller physical memory. We present an implementation of the SDM architecture for on-line, value-based RL in continuous state spaces. An important contribution of this paper is an algorithm for dynamic on-line allocation and adjustment of memory resources for SDMs, which eliminates the need for choosing the memory size and structure a priori. In our experiments, this algorithm provides very good performance while efficiently managing the memory resources.

Research paper thumbnail of Automatic basis function construction for approximate dynamic programming and reinforcement learning

Research paper thumbnail of Redagent: winner of TAC SCM 2003

Research paper thumbnail of RedAgent-2003: An Autonomous Market-Based Supply-Chain Management Agent

Research paper thumbnail of Characterizing Markov Decision Processes

Problem characteristics often have a significant influence on the difficulty of solving optimizat... more Problem characteristics often have a significant influence on the difficulty of solving optimization problems. In this paper, we propose attributes for characterizing Markov Decision Processes (MDPs), and discuss how they affect the performance of reinforcement learning algorithms that use function approximation. The attributes measure mainly the amount of randomness in the environment. Their values can be calculated from the MDP model or estimated on-line. We show empirically that two of the proposed attributes have a statistically significant effect on the quality of learning. We discuss how measurements of the proposed MDP attributes can be used to facilitate the design of reinforcement learning systems.

Research paper thumbnail of Metrics for Finite Markov Decision Processes

Research paper thumbnail of Learning in non-stationary Partially Observable Markov Decision Processes

Research paper thumbnail of Active Learning in Partially Observable Markov Decision Processes

Research paper thumbnail of A formal framework for robot learning and control under model uncertainty

Research paper thumbnail of Eligibility Traces for Off-Policy Policy Evaluation

Research paper thumbnail of Temporal Abstraction in Reinforcement Learning

Decision making usually involves choosing among different courses of action over a broad range of... more Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes high-level decisions regarding what means of transportation to use, but also chooses low-level actions, such ...

Research paper thumbnail of Classification Using Phi-Machines and Constructive Function Approximation

Research paper thumbnail of Learning Options in Reinforcement Learning

Temporally extended actions (e.g., macro actions) have proven very useful for speeding up learnin... more Temporally extended actions (e.g., macro actions) have proven very useful for speeding up learning, ensuring robustness and building prior knowledge into AI systems. The options framework (Precup, 2000; Sutton, Precup & Singh, 1999) provides a natural way of incorporating such actions into reinforcement learning systems, but leaves open the issue of how good options might be identified. In this paper, we empirically explore a simple approach to creating options. The underlying assumption is that the agent will be asked to perform different goal-achievement tasks in an environment that is othertherwise the same over time. Our approach is based on the intuition that states that are frequently visited on system trajectories, could prove to be useful subgoals (e.g., McGovern & Barto, 2001; Iba, 1989). We propose a greedy algorithm for identifying subgoals based on state visitation counts. We present empirical studies of this approach in two gridworld navigation tasks. One of the environments we explored contains bottleneck states, and the algorithm indeed finds these states, as expected. The second environment is an empty gridworld with no obstacles. Although the environment does not contain any obvious subgoals, our approach still finds useful options, which essentially allow the agent to explore the environment more quickly.

Research paper thumbnail of Using Options for Knowledge Transfer in Reinforcement Learning

... Our model can also be viewed as a partially observable Markov decision problem (POMDP), with ... more ... Our model can also be viewed as a partially observable Markov decision problem (POMDP), with a special structure that we describe. ... Although solving POMDPs is very difficult, we incorporate ideas from POMDP theory in our algorithm. ...

Research paper thumbnail of Intra-Option Learning about Temporally Abstract Actions

Research paper thumbnail of Theoretical Results on Reinforcement Learning with Temporally Abstract Options

We present new theoretical results on planning within the framework of temporally abstract reinfo... more We present new theoretical results on planning within the framework of temporally abstract reinforcement learning (Precup & Sutton, 1997; Sutton, 1995). Temporal abstraction is a key step in any decision making system that involves planning and prediction. In temporally abstract reinforcement learning, the agent is allowed to choose among “options”, whole courses of action that may be temporally extended, stochastic, and contingent on previous events. Examples of options include closed-loop policies such as picking up an object, as well as primitive actions such as joint torques. Knowledge about the consequences of options is represented by special structures called multi-time models. In this paper we focus on the theory of planning with multi-time models. We define new Bellman equations that are satisfied for sets of multi-time models. As a consequence, multi-time models can be used interchangeably with models of primitive actions in a variety of well-known planning methods including value iteration, policy improvement and policy iteration.

Research paper thumbnail of Improved Switching among Temporally Abstract Actions

Research paper thumbnail of Between MDPs and Semi-MDPs: Learning, Planning, and Representing Knowledge at Multiple Temporal Scales

Artificial Intelligence, 1998

Research paper thumbnail of Planning with Closed-Loop Macro Actions

Research paper thumbnail of Quantifying the determinants of outbreak detection performance through simulation and machine learning

•We developed a model for quantifying determinants of outbreak detection performance. •We used B... more •We developed a model for quantifying determinants of outbreak detection performance.
•We used Bayesian networks to model relations between outbreak and algorithm characteristics and detection performance.
•We used the model to predict detection performance for different outbreak scenarios.
•The model can provide a quantitative evaluation of new methods and data in biosurveillance systems.

Research paper thumbnail of Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning

In this paper, we advocate the use of Sparse Distributed Memories (SDMs) for on-line, value-based... more In this paper, we advocate the use of Sparse Distributed Memories (SDMs) for on-line, value-based reinforcement learning (RL). SDMs provide a linear, local function approximation scheme, designed to work when a very large/ high-dimensional input (address) space has to be mapped into a much smaller physical memory. We present an implementation of the SDM architecture for on-line, value-based RL in continuous state spaces. An important contribution of this paper is an algorithm for dynamic on-line allocation and adjustment of memory resources for SDMs, which eliminates the need for choosing the memory size and structure a priori. In our experiments, this algorithm provides very good performance while efficiently managing the memory resources.

Research paper thumbnail of Automatic basis function construction for approximate dynamic programming and reinforcement learning

Research paper thumbnail of Redagent: winner of TAC SCM 2003

Research paper thumbnail of RedAgent-2003: An Autonomous Market-Based Supply-Chain Management Agent

Research paper thumbnail of Characterizing Markov Decision Processes

Problem characteristics often have a significant influence on the difficulty of solving optimizat... more Problem characteristics often have a significant influence on the difficulty of solving optimization problems. In this paper, we propose attributes for characterizing Markov Decision Processes (MDPs), and discuss how they affect the performance of reinforcement learning algorithms that use function approximation. The attributes measure mainly the amount of randomness in the environment. Their values can be calculated from the MDP model or estimated on-line. We show empirically that two of the proposed attributes have a statistically significant effect on the quality of learning. We discuss how measurements of the proposed MDP attributes can be used to facilitate the design of reinforcement learning systems.

Research paper thumbnail of Metrics for Finite Markov Decision Processes

Research paper thumbnail of Learning in non-stationary Partially Observable Markov Decision Processes

Research paper thumbnail of Active Learning in Partially Observable Markov Decision Processes

Research paper thumbnail of A formal framework for robot learning and control under model uncertainty

Research paper thumbnail of Eligibility Traces for Off-Policy Policy Evaluation

Research paper thumbnail of Temporal Abstraction in Reinforcement Learning

Decision making usually involves choosing among different courses of action over a broad range of... more Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes high-level decisions regarding what means of transportation to use, but also chooses low-level actions, such ...

Research paper thumbnail of Classification Using Phi-Machines and Constructive Function Approximation

Research paper thumbnail of Learning Options in Reinforcement Learning

Temporally extended actions (e.g., macro actions) have proven very useful for speeding up learnin... more Temporally extended actions (e.g., macro actions) have proven very useful for speeding up learning, ensuring robustness and building prior knowledge into AI systems. The options framework (Precup, 2000; Sutton, Precup & Singh, 1999) provides a natural way of incorporating such actions into reinforcement learning systems, but leaves open the issue of how good options might be identified. In this paper, we empirically explore a simple approach to creating options. The underlying assumption is that the agent will be asked to perform different goal-achievement tasks in an environment that is othertherwise the same over time. Our approach is based on the intuition that states that are frequently visited on system trajectories, could prove to be useful subgoals (e.g., McGovern & Barto, 2001; Iba, 1989). We propose a greedy algorithm for identifying subgoals based on state visitation counts. We present empirical studies of this approach in two gridworld navigation tasks. One of the environments we explored contains bottleneck states, and the algorithm indeed finds these states, as expected. The second environment is an empty gridworld with no obstacles. Although the environment does not contain any obvious subgoals, our approach still finds useful options, which essentially allow the agent to explore the environment more quickly.

Research paper thumbnail of Using Options for Knowledge Transfer in Reinforcement Learning

... Our model can also be viewed as a partially observable Markov decision problem (POMDP), with ... more ... Our model can also be viewed as a partially observable Markov decision problem (POMDP), with a special structure that we describe. ... Although solving POMDPs is very difficult, we incorporate ideas from POMDP theory in our algorithm. ...

Research paper thumbnail of Intra-Option Learning about Temporally Abstract Actions

Research paper thumbnail of Theoretical Results on Reinforcement Learning with Temporally Abstract Options

We present new theoretical results on planning within the framework of temporally abstract reinfo... more We present new theoretical results on planning within the framework of temporally abstract reinforcement learning (Precup & Sutton, 1997; Sutton, 1995). Temporal abstraction is a key step in any decision making system that involves planning and prediction. In temporally abstract reinforcement learning, the agent is allowed to choose among “options”, whole courses of action that may be temporally extended, stochastic, and contingent on previous events. Examples of options include closed-loop policies such as picking up an object, as well as primitive actions such as joint torques. Knowledge about the consequences of options is represented by special structures called multi-time models. In this paper we focus on the theory of planning with multi-time models. We define new Bellman equations that are satisfied for sets of multi-time models. As a consequence, multi-time models can be used interchangeably with models of primitive actions in a variety of well-known planning methods including value iteration, policy improvement and policy iteration.

Research paper thumbnail of Improved Switching among Temporally Abstract Actions

Research paper thumbnail of Between MDPs and Semi-MDPs: Learning, Planning, and Representing Knowledge at Multiple Temporal Scales

Artificial Intelligence, 1998

Research paper thumbnail of Planning with Closed-Loop Macro Actions