Doina Precup | McGill University (original) (raw)
Papers by Doina Precup
We formalize the notion of a pseudo-ensemble, a (possibly infinite) collection of child models sp... more We formalize the notion of a pseudo-ensemble, a (possibly infinite) collection of child models spawned from a parent model by perturbing it according to some noise process. E.g., dropout (Hinton et. al, 2012) in a deep neural network trains a pseudo-ensemble of child subnetworks generated by randomly masking nodes in the parent network. We present a novel regularizer based on making the behavior of a pseudo-ensemble robust with respect to the noise process generating it. In the fully-supervised setting, our regularizer matches the performance of dropout. But, unlike dropout, our regularizer naturally extends to the semi-supervised setting, where it produces state-of-the-art results. We provide a case study in which we transform the Recursive Neural Tensor Network of (Socher et. al, 2013) into a pseudo-ensemble, which significantly improves its performance on a real-world sentiment analysis benchmark.
Proceedings of the AAAI Conference on Artificial Intelligence
Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act... more Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.
Lecture Notes in Computer Science, 2014
We consider the problem of learning in dynamical systems with hidden state. This problem is deeme... more We consider the problem of learning in dynamical systems with hidden state. This problem is deemed challenging due to the fact that the state is not completely visible to an outside observer. We explore a candidate algorithm, which we call the Merge-Split algorithm, for learning deterministic automata with observations. This is based on the work of Gavalda et al(2006) which approximates a given Hidden Markov Model (HMM) with a learned Probabilistic Deterministic Finite Automaton (PDFA).
Planning and learning at multiple levels of temporal abstraction is a key problem for artificial ... more Planning and learning at multiple levels of temporal abstraction is a key problem for artificial intelligence. In this paper we summarize an approach to this problem based on the mathematical framework of Markov decision processes and reinforcement learning. Conventional model-based reinforcement learning uses primitive actions that last one time step and that can be modeled independently of the learning agent.
International Conference on Machine Learning, 2000
Eligibility traces have been shown to speed re- inforcement learning, to make it more robust to h... more Eligibility traces have been shown to speed re- inforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference methods. Here we generalize eligibility traces to off-policy learning, in which one learns about a policy dif- ferent from the policy that generates the data. Off-policy methods can greatly multiply learn- ing,
International Conference on Machine Learning, 2001
We introduce the first algorithm for off-policy temporal-difference learning that is stable with ... more We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learn- ing is of interest because it forms the basis for popular reinforcement learning methods such as Q-learning, which has been known to diverge with linear function approximation, and because it is critical to the practical utility of multi-scale, multi-goal, learning frameworks such
International Conference on Machine Learning, 1998
Several researchers have proposed modeling temporally abstract actions in reinforcement learningb... more Several researchers have proposed modeling temporally abstract actions in reinforcement learningby the combinationof a policyand a ter- mination condition, which we refer to as an op- tion. Value functions over options and models of options can be learned using methods designed for semi-Markov decision processes (SMDPs). However, all these methods require an option to be executed to termination. In this
Neural Information Processing Systems, 1997
Planning and learning at multiple levels of temporal abstra ction is a key problem for artificial... more Planning and learning at multiple levels of temporal abstra ction is a key problem for artificial intelligence. In this paper we summar ize an ap- proach to this problem based on the mathematical framework of Markov decision processes and reinforcement learning. Current mo del-based re- inforcement learning is based on one-step models that cannot represent common-sense higher-level actions, such as
Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention, 2013
In this paper, we present a fully automated hierarchical probabilistic framework for segmenting b... more In this paper, we present a fully automated hierarchical probabilistic framework for segmenting brain tumours from multispectral human brain magnetic resonance images (MRIs) using multiwindow Gabor filters and an adapted Markov Random Field (MRF) framework. In the first stage, a customised Gabor decomposition is developed, based on the combined-space characteristics of the two classes (tumour and non-tumour) in multispectral brain MRIs in order to optimally separate tumour (including edema) from healthy brain tissues. A Bayesian framework then provides a coarse probabilistic texture-based segmentation of tumours (including edema) whose boundaries are then refined at the voxel level through a modified MRF framework that carefully separates the edema from the main tumour. This customised MRF is not only built on the voxel intensities and class labels as in traditional MRFs, but also models the intensity differences between neighbouring voxels in the likelihood model, along with employ...
Lecture Notes in Computer Science, 2014
2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014
Multiagent based Supply Chain Management, 2006
Lecture Notes in Computer Science, 2014
ABSTRACT We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to M... more ABSTRACT We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to Markov decision processes with continuous state spaces. This notion takes the form of a pseudometric on the system states, cast in terms of the equivalence of a family of functional expressions evaluated on those states and interpreted as a real-valued modal logic. Our proof amounts to a slight modification of previous techniques [2,3] used to prove equivalence with a fixed-point pseudometric on the state-space of a labelled Markov process and making heavy use of the Kantorovich probability metric. Indeed, we again demonstrate equivalence with a fixed-point pseudometric defined on Markov decision processes [4]; what is novel is that we recast this proof in terms of integral probability metrics [5] defined through the family of functional expressions, shifting emphasis back to properties of such families. The hope is that a judicious choice of family might lead to something more computationally tractable than bisimilarity whilst maintaining its pleasing theoretical guarantees. Moreover, we use a trick from descriptive set theory to extend our results to MDPs with bounded measurable reward functions, dropping a previous continuity constraint on rewards and Markov kernels.
2014 IEEE International Conference on Image Processing (ICIP), 2014
Lecture Notes in Computer Science, 2009
We propose a new approach for estimating the difference between two partially observable dynamica... more We propose a new approach for estimating the difference between two partially observable dynamical systems. We assume that one can interact with the systems by performing actions and receiving observations. The key idea is to define a Markov Decision Process (MDP) based on the systems to be compared, in such a way that the optimal value of the MDP initial state can be interpreted as a divergence (or dissimilarity) between the systems. This dissimilarity can then be estimated by reinforcement learning methods. Moreover, the optimal policy will contain information about the actions which most distinguish the systems. Empirical results show that this approach is useful in detecting both big and small differences, as well as in comparing systems with different internal structure.
Proceedings of the 12th ACM international conference adjunct papers on Ubiquitous computing - Ubicomp '10, 2010
In this abstract, we propose a novel approach to modeling time-series for the purpose of comparin... more In this abstract, we propose a novel approach to modeling time-series for the purpose of comparing segments of data in order to classify activities based on accelerometer sensor data. Our approach consists of producing an ensemble of simple classifiers that can be built and can classify new data efficiently. We present empirical results from an implementation of our algorithm running
Lecture Notes in Computer Science, 2013
In this paper, we present a fully automated hierarchical probabilistic framework for segmenting b... more In this paper, we present a fully automated hierarchical probabilistic framework for segmenting brain tumours from multispectral human brain magnetic resonance images (MRIs) using multiwindow Gabor filters and an adapted Markov Random Field (MRF) framework. In the first stage, a customised Gabor decomposition is developed, based on the combined-space characteristics of the two classes (tumour and non-tumour) in multispectral brain MRIs in order to optimally separate tumour (including edema) from healthy brain tissues. A Bayesian framework then provides a coarse probabilistic texture-based segmentation of tumours (including edema) whose boundaries are then refined at the voxel level through a modified MRF framework that carefully separates the edema from the main tumour. This customised MRF is not only built on the voxel intensities and class labels as in traditional MRFs, but also models the intensity differences between neighbouring voxels in the likelihood model, along with employing a prior based on local tissue class transition probabilities. The second inference stage is shown to resolve local inhomogeneities and impose a smoothing constraint, while also maintaining the appropriate boundaries as supported by the local intensity difference observations. The method was trained and tested on the publicly available MICCAI 2012 Brain Tumour Segmentation Challenge (BRATS) Database [1] on both synthetic and clinical volumes (low grade and high grade tumours). Our method performs well compared to state-of-the-art techniques, outperforming the results of the top methods in cases of clinical high grade and low grade tumour core segmentation by 40% and 45% respectively.
Lecture Notes in Computer Science, 2013
Lecture Notes in Computer Science, 2006
Software packages providing a whole set of data mining and machine learning algorithms are attrac... more Software packages providing a whole set of data mining and machine learning algorithms are attractive because they allow experimentation with many kinds of algorithms in an easy setup. However, these packages are often based on main-memory data structures, limiting the amount of data they can handle. In this paper we use a relational database as secondary storage in order to eliminate this limitation. Unlike existing approaches, which often focus on optimizing a single algorithm to work with a database backend, we propose a general approach, which provides a database interface for several algorithms at once. We have taken a popular machine learning software package, Weka, and added a relational storage manager as back-tier to the system. The extension is transparent to the algorithms implemented in Weka, since it is hidden behind Weka's standard main-memory data structure interface. Furthermore, some general mining tasks are transfered into the database system to speed up execution. We tested the extended system, refered to as WekaDB, and our results show that it achieves a much higher scalability than Weka, while providing the same output and maintaining good computation time.
We formalize the notion of a pseudo-ensemble, a (possibly infinite) collection of child models sp... more We formalize the notion of a pseudo-ensemble, a (possibly infinite) collection of child models spawned from a parent model by perturbing it according to some noise process. E.g., dropout (Hinton et. al, 2012) in a deep neural network trains a pseudo-ensemble of child subnetworks generated by randomly masking nodes in the parent network. We present a novel regularizer based on making the behavior of a pseudo-ensemble robust with respect to the noise process generating it. In the fully-supervised setting, our regularizer matches the performance of dropout. But, unlike dropout, our regularizer naturally extends to the semi-supervised setting, where it produces state-of-the-art results. We provide a case study in which we transform the Recursive Neural Tensor Network of (Socher et. al, 2013) into a pseudo-ensemble, which significantly improves its performance on a real-world sentiment analysis benchmark.
Proceedings of the AAAI Conference on Artificial Intelligence
Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act... more Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.
Lecture Notes in Computer Science, 2014
We consider the problem of learning in dynamical systems with hidden state. This problem is deeme... more We consider the problem of learning in dynamical systems with hidden state. This problem is deemed challenging due to the fact that the state is not completely visible to an outside observer. We explore a candidate algorithm, which we call the Merge-Split algorithm, for learning deterministic automata with observations. This is based on the work of Gavalda et al(2006) which approximates a given Hidden Markov Model (HMM) with a learned Probabilistic Deterministic Finite Automaton (PDFA).
Planning and learning at multiple levels of temporal abstraction is a key problem for artificial ... more Planning and learning at multiple levels of temporal abstraction is a key problem for artificial intelligence. In this paper we summarize an approach to this problem based on the mathematical framework of Markov decision processes and reinforcement learning. Conventional model-based reinforcement learning uses primitive actions that last one time step and that can be modeled independently of the learning agent.
International Conference on Machine Learning, 2000
Eligibility traces have been shown to speed re- inforcement learning, to make it more robust to h... more Eligibility traces have been shown to speed re- inforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference methods. Here we generalize eligibility traces to off-policy learning, in which one learns about a policy dif- ferent from the policy that generates the data. Off-policy methods can greatly multiply learn- ing,
International Conference on Machine Learning, 2001
We introduce the first algorithm for off-policy temporal-difference learning that is stable with ... more We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learn- ing is of interest because it forms the basis for popular reinforcement learning methods such as Q-learning, which has been known to diverge with linear function approximation, and because it is critical to the practical utility of multi-scale, multi-goal, learning frameworks such
International Conference on Machine Learning, 1998
Several researchers have proposed modeling temporally abstract actions in reinforcement learningb... more Several researchers have proposed modeling temporally abstract actions in reinforcement learningby the combinationof a policyand a ter- mination condition, which we refer to as an op- tion. Value functions over options and models of options can be learned using methods designed for semi-Markov decision processes (SMDPs). However, all these methods require an option to be executed to termination. In this
Neural Information Processing Systems, 1997
Planning and learning at multiple levels of temporal abstra ction is a key problem for artificial... more Planning and learning at multiple levels of temporal abstra ction is a key problem for artificial intelligence. In this paper we summar ize an ap- proach to this problem based on the mathematical framework of Markov decision processes and reinforcement learning. Current mo del-based re- inforcement learning is based on one-step models that cannot represent common-sense higher-level actions, such as
Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention, 2013
In this paper, we present a fully automated hierarchical probabilistic framework for segmenting b... more In this paper, we present a fully automated hierarchical probabilistic framework for segmenting brain tumours from multispectral human brain magnetic resonance images (MRIs) using multiwindow Gabor filters and an adapted Markov Random Field (MRF) framework. In the first stage, a customised Gabor decomposition is developed, based on the combined-space characteristics of the two classes (tumour and non-tumour) in multispectral brain MRIs in order to optimally separate tumour (including edema) from healthy brain tissues. A Bayesian framework then provides a coarse probabilistic texture-based segmentation of tumours (including edema) whose boundaries are then refined at the voxel level through a modified MRF framework that carefully separates the edema from the main tumour. This customised MRF is not only built on the voxel intensities and class labels as in traditional MRFs, but also models the intensity differences between neighbouring voxels in the likelihood model, along with employ...
Lecture Notes in Computer Science, 2014
2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014
Multiagent based Supply Chain Management, 2006
Lecture Notes in Computer Science, 2014
ABSTRACT We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to M... more ABSTRACT We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to Markov decision processes with continuous state spaces. This notion takes the form of a pseudometric on the system states, cast in terms of the equivalence of a family of functional expressions evaluated on those states and interpreted as a real-valued modal logic. Our proof amounts to a slight modification of previous techniques [2,3] used to prove equivalence with a fixed-point pseudometric on the state-space of a labelled Markov process and making heavy use of the Kantorovich probability metric. Indeed, we again demonstrate equivalence with a fixed-point pseudometric defined on Markov decision processes [4]; what is novel is that we recast this proof in terms of integral probability metrics [5] defined through the family of functional expressions, shifting emphasis back to properties of such families. The hope is that a judicious choice of family might lead to something more computationally tractable than bisimilarity whilst maintaining its pleasing theoretical guarantees. Moreover, we use a trick from descriptive set theory to extend our results to MDPs with bounded measurable reward functions, dropping a previous continuity constraint on rewards and Markov kernels.
2014 IEEE International Conference on Image Processing (ICIP), 2014
Lecture Notes in Computer Science, 2009
We propose a new approach for estimating the difference between two partially observable dynamica... more We propose a new approach for estimating the difference between two partially observable dynamical systems. We assume that one can interact with the systems by performing actions and receiving observations. The key idea is to define a Markov Decision Process (MDP) based on the systems to be compared, in such a way that the optimal value of the MDP initial state can be interpreted as a divergence (or dissimilarity) between the systems. This dissimilarity can then be estimated by reinforcement learning methods. Moreover, the optimal policy will contain information about the actions which most distinguish the systems. Empirical results show that this approach is useful in detecting both big and small differences, as well as in comparing systems with different internal structure.
Proceedings of the 12th ACM international conference adjunct papers on Ubiquitous computing - Ubicomp '10, 2010
In this abstract, we propose a novel approach to modeling time-series for the purpose of comparin... more In this abstract, we propose a novel approach to modeling time-series for the purpose of comparing segments of data in order to classify activities based on accelerometer sensor data. Our approach consists of producing an ensemble of simple classifiers that can be built and can classify new data efficiently. We present empirical results from an implementation of our algorithm running
Lecture Notes in Computer Science, 2013
In this paper, we present a fully automated hierarchical probabilistic framework for segmenting b... more In this paper, we present a fully automated hierarchical probabilistic framework for segmenting brain tumours from multispectral human brain magnetic resonance images (MRIs) using multiwindow Gabor filters and an adapted Markov Random Field (MRF) framework. In the first stage, a customised Gabor decomposition is developed, based on the combined-space characteristics of the two classes (tumour and non-tumour) in multispectral brain MRIs in order to optimally separate tumour (including edema) from healthy brain tissues. A Bayesian framework then provides a coarse probabilistic texture-based segmentation of tumours (including edema) whose boundaries are then refined at the voxel level through a modified MRF framework that carefully separates the edema from the main tumour. This customised MRF is not only built on the voxel intensities and class labels as in traditional MRFs, but also models the intensity differences between neighbouring voxels in the likelihood model, along with employing a prior based on local tissue class transition probabilities. The second inference stage is shown to resolve local inhomogeneities and impose a smoothing constraint, while also maintaining the appropriate boundaries as supported by the local intensity difference observations. The method was trained and tested on the publicly available MICCAI 2012 Brain Tumour Segmentation Challenge (BRATS) Database [1] on both synthetic and clinical volumes (low grade and high grade tumours). Our method performs well compared to state-of-the-art techniques, outperforming the results of the top methods in cases of clinical high grade and low grade tumour core segmentation by 40% and 45% respectively.
Lecture Notes in Computer Science, 2013
Lecture Notes in Computer Science, 2006
Software packages providing a whole set of data mining and machine learning algorithms are attrac... more Software packages providing a whole set of data mining and machine learning algorithms are attractive because they allow experimentation with many kinds of algorithms in an easy setup. However, these packages are often based on main-memory data structures, limiting the amount of data they can handle. In this paper we use a relational database as secondary storage in order to eliminate this limitation. Unlike existing approaches, which often focus on optimizing a single algorithm to work with a database backend, we propose a general approach, which provides a database interface for several algorithms at once. We have taken a popular machine learning software package, Weka, and added a relational storage manager as back-tier to the system. The extension is transparent to the algorithms implemented in Weka, since it is hidden behind Weka's standard main-memory data structure interface. Furthermore, some general mining tasks are transfered into the database system to speed up execution. We tested the extended system, refered to as WekaDB, and our results show that it achieves a much higher scalability than Weka, while providing the same output and maintaining good computation time.