Tomi Silander - Academia.edu (original) (raw)
Papers by Tomi Silander
arXiv (Cornell University), Jun 20, 2012
BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian ne... more BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This non-informative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, one must determine a single parameter, the equivalent sample size α. Unfortunately no generally accepted rule for determining the α parameter has been suggested. This is disturbing, since in this paper we show through a series of concrete experiments that the solution of the network structure optimization problem is highly sensitive to the chosen α parameter value. Based on these results, we are able to give explanations for how and why this phenomenon happens, and discuss ideas for solving this problem.
Mobile robots need to navigate in crowded environments to provide services to humans. Traditional... more Mobile robots need to navigate in crowded environments to provide services to humans. Traditional approaches to crowd-aware navigation decouple people motion prediction from robot motion planning, leading to undesired robot behaviours. Recent deep learning-based methods integrate crowd forecasting in the planner, assuming precise tracking of the agents in the scene. To do this they require expensive LiDAR sensors and tracking algorithms that are complex and brittle. In this paper we propose a two-step approach to first learn a robot navigation policy based on privileged information about exact pedestrian locations available in simulation. A second learning step distills the knowledge acquired by the first network into an adaptation network that uses only narrow field-of-view image data from the robot camera. While the navigation policy is trained in simulation without any expert supervision such as trajectories computed by a planner, it exhibits state-of-the-art performance on a broad range of dense crowd simulations and real-world experiments. Video results at https://europe.naverlabs.com/research/dipcan.
arXiv (Cornell University), Mar 29, 2017
The trade-off between the cost of acquiring and processing data, and uncertainty due to a lack of... more The trade-off between the cost of acquiring and processing data, and uncertainty due to a lack of data is fundamental in machine learning. A basic instance of this trade-off is the problem of deciding when to make noisy and costly observations of a discrete-time Gaussian random walk, so as to minimise the posterior variance plus observation costs. We present the first proof that a simple policy, which observes when the posterior variance exceeds a threshold, is optimal for this problem. The proof generalises to a wide range of cost functions other than the posterior variance. This result implies that optimal policies for linear-quadratic-Gaussian control with costly observations have a threshold structure. It also implies that the restless bandit problem of observing multiple such time series, has a well-defined Whittle index. We discuss computation of that index, give closed-form formulae for it, and compare the performance of the associated index policy with heuristic policies. The proof is based on a new verification theorem that demonstrates threshold structure for Markov decision processes, and on the relation between binary sequences known as mechanical words and the dynamics of discontinuous nonlinear maps, which frequently arise in physics, control and biology.
Transportation Research Record, 2016
Curb space is a valuable asset for urban areas. The space is a finite resource with competing nee... more Curb space is a valuable asset for urban areas. The space is a finite resource with competing needs from various modes, land uses, and customers. In this context, when the curb space is used for parking, it is important that the space be used as efficiently as possible. There is no unanimous conclusion on whether a demarcated or undemarcated curbside configuration accommodates more vehicles. Most information on this subject is conflicting and anecdotal. This paper presents the results of an in-depth analysis with modeling and field data collection to determine whether a specific configuration is beneficial from a utilization standpoint. It also reviews the state-of-the-practice on demarcating on-street parking spaces and the results of a survey of local jurisdictions’ policies and practice and the logic behind the decision-making process. The authors conclude that factors other than efficiency might drive the decision to demarcate (or not).
Training agents to operate in one environment often yields overfitted models that are unable to g... more Training agents to operate in one environment often yields overfitted models that are unable to generalize to the changes in that environment. However, due to the numerous variations that can occur in the real-world, the agent is often required to be robust in order to be useful. This has not been the case for agents trained with reinforcement learning (RL) algorithms. In this paper, we investigate the overfitting of RL agents to the training environments in visual navigation tasks. Our experiments show that deep RL agents can overfit even when trained on multiple environments simultaneously. We propose a regularization method which combines RL with supervised learning methods by adding a term to the RL objective that would encourage the invariance of a policy to variations in the observations that ought not to affect the action taken. The results of this method, called Invariance Regularization, show an improvement in the generalization of policies to environments not seen during t...
Scientific Reports
Quadruped robots require robust and general locomotion skills to exploit their mobility potential... more Quadruped robots require robust and general locomotion skills to exploit their mobility potential in complex and challenging environments. In this work, we present an implementation of a robust end-to-end learning-based controller on the Solo12 quadruped. Our method is based on deep reinforcement learning of joint impedance references. The resulting control policies follow a commanded velocity reference while being efficient in its energy consumption and easy to deploy. We detail the learning procedure and method for transfer on the real robot. We show elaborate experiments. Finally, we present experimental results of the learned locomotion on various grounds indoors and outdoors. These results show that the Solo12 robot is a suitable open-source platform for research combining learning and control because of the easiness in transferring and deploying learned controllers.
Le Centre pour la Communication Scientifique Directe - HAL - Inria, Oct 29, 2021
Predefined gait patterns for quadruped locomotion can hardly be optimal in all situations with re... more Predefined gait patterns for quadruped locomotion can hardly be optimal in all situations with regard to stability, cost of transport and velocity tracking error. Hence, in this work, we tackle the challenge of adapting a predefined trotting gait, implemented in the model-based controller of Solo, to optimize both energy consumption and velocity tracking. To this end, we propose a model-free reinforcement learning method for adapting the timings of the contact/swing phases for each foot. The learned agent augments a control pipeline that was previously developed for the Solo robot. We also propose to use a self-attention mechanism over the history of states in order to extract useful information for adapting the gait. Through a comprehensive set of experiments, we demonstrate how, compared to the nominal gait, our method significantly reduces energy consumption, better tracks the desired velocity, and makes it possible to reach higher speeds. A video of the method is found at https://youtu.be/ykbDUyASXs4.
Journal of Behavioral and Experimental Economics, 2020
Abstract We were interested in the factors that influence deceptive behavior, especially if stron... more Abstract We were interested in the factors that influence deceptive behavior, especially if strong incentives can make people overcome their aversion to lying. We ran an online experiment in which participants made choices between deceptive and non-deceptive actions in hypothetical scenarios of reporting their income to tax authorities. The participants could lie in their tax report in order to receive a larger tax refund or to pay less additional taxes. In some scenarios they could face a tax penalty if their report was found to be in error. While a large number of participants never deceived when it was risky and potentially costly, the rate of deception almost doubled in conditions with no detection risk or penalty. However, not all participants responded equally to the absence of risk and detection penalty (incentives). We were able to identify three types of behaviors that were related to participants’ general risk and lie attitudes, but not to their numeracy skills or risk literacy. These three types of participants also differed in their deception rate and sensitivity to changes in incentives in different conditions varying either risk of getting caught or expected advantage gained by deceiving.
A Bayesian (belief) network is a representation of a probability distribution over a set of rando... more A Bayesian (belief) network is a representation of a probability distribution over a set of random variables. One of the main advantages of this model family is that it ooers a theoretically solid machine learning framework for constructing accurate domain models from sample data eeciently and reliably. As the parameters of a Bayesian network have a precise semantic interpretation, the learned models can be used for data mining purposes, i.e., for examining regularities found in the data. In addition to this type of direct examination of the model, we suggest that the learned Bayesian networks can also be used for indirect data mining purposes through a visualization scheme which can be used for producing 2D or 3D representations of high-dimensional problem domains. Our visualization scheme is based on the predictive distributions produced by the Bayesian network model, which means that the resulting visualizations can also be used as a post-processing tool for visual inspection of ...
An adaptive questionnaire, named EDUFORM, is based on Bayesian statistical techniques that both o... more An adaptive questionnaire, named EDUFORM, is based on Bayesian statistical techniques that both optimize the number of propositions presented to each respondent and create an individual learner profile. The preliminary results show that reducing the number of propositions we may still moderately control the error ratio. The respondents’ profiling information is in most cases obtained after one third of propositions.
Frontline Learning Research, 2014
AMIA Annual Symposium Proceedings, Nov 3, 2012
We introduce an automated, pathological class level annotation system for medical volumetric brai... more We introduce an automated, pathological class level annotation system for medical volumetric brain images. While much of the earlier work has mainly focused on annotating regions of interest in medical images, our system does not require annotated region level training data nor assumes perfect segmentation results for the regions of interest; the time and effort needed for acquiring training data are hence significantly reduced. This capability of handling high-dimensional noisy data, however, poses additional technical challenges, since statistical estimation of models for such data is prone to over-fitting. We propose a framework that combines a regularized logistic regression method and a kernel-based discriminative method to address these problems. Regularized methods provide a flexible selection mechanism that is well-suited for high dimensional noisy data. Our experiments show promising results in classifying computer tomography images of traumatic brain injury patients into p...
International Conference on Automated Planning and Scheduling, 2014
Site-based or topic-specific search engines work with mixed success because of the general diffic... more Site-based or topic-specific search engines work with mixed success because of the general difficulty of the information retrieval task, and the lack of good link information to allow authorities to be identified. We are advocating an open source approach to the problem due to its scope and need for software components. We have adopted a topicbased search engine because it represents the next generation of capability. This paper outlines our scalable system for site-based or topic-specific search, and demonstrates the developing system on a small 250,000 document collection of EU and UN web pages.
Case retrieval is an important problem in several commercially signi cant application areas, such... more Case retrieval is an important problem in several commercially signi cant application areas, such as industrial con guration and manufacturing problems. In this paper we extend the Bayesian probability theory based approaches to case-based reasoning, focusing on the case matching task, an essential part of any case retrieval system. Traditional approaches to the case matching problem typically rely on some distance measure, e.g., the Euclidean or Hamming distance, although there is no a priori guarantee that such measures really re ect the useful similarities and dissimilarities between the cases. One of the main advantages of the Bayesian framework for solving this problem is that it forces one to explicitly recognize all the assumptions made about the problem domain, which helps in analyzing the performance of the resulting system. As an example of an implementation of the Bayesian case matching approach in practice, we demonstrate how to construct a case retrieval system based on a set of independence assumptions between the domain variables. In the experimental part of the paper, the Bayesian case matching metric is evaluated empirically in a case-retrieval task by using public domain discrete real-world databases. The results suggest that case retrieval systems based on the Bayesian case matching score perform much better than case retrieval systems based on the standard Hamming distance similarity metrics.
arXiv (Cornell University), Jun 20, 2012
BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian ne... more BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This non-informative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, one must determine a single parameter, the equivalent sample size α. Unfortunately no generally accepted rule for determining the α parameter has been suggested. This is disturbing, since in this paper we show through a series of concrete experiments that the solution of the network structure optimization problem is highly sensitive to the chosen α parameter value. Based on these results, we are able to give explanations for how and why this phenomenon happens, and discuss ideas for solving this problem.
Mobile robots need to navigate in crowded environments to provide services to humans. Traditional... more Mobile robots need to navigate in crowded environments to provide services to humans. Traditional approaches to crowd-aware navigation decouple people motion prediction from robot motion planning, leading to undesired robot behaviours. Recent deep learning-based methods integrate crowd forecasting in the planner, assuming precise tracking of the agents in the scene. To do this they require expensive LiDAR sensors and tracking algorithms that are complex and brittle. In this paper we propose a two-step approach to first learn a robot navigation policy based on privileged information about exact pedestrian locations available in simulation. A second learning step distills the knowledge acquired by the first network into an adaptation network that uses only narrow field-of-view image data from the robot camera. While the navigation policy is trained in simulation without any expert supervision such as trajectories computed by a planner, it exhibits state-of-the-art performance on a broad range of dense crowd simulations and real-world experiments. Video results at https://europe.naverlabs.com/research/dipcan.
arXiv (Cornell University), Mar 29, 2017
The trade-off between the cost of acquiring and processing data, and uncertainty due to a lack of... more The trade-off between the cost of acquiring and processing data, and uncertainty due to a lack of data is fundamental in machine learning. A basic instance of this trade-off is the problem of deciding when to make noisy and costly observations of a discrete-time Gaussian random walk, so as to minimise the posterior variance plus observation costs. We present the first proof that a simple policy, which observes when the posterior variance exceeds a threshold, is optimal for this problem. The proof generalises to a wide range of cost functions other than the posterior variance. This result implies that optimal policies for linear-quadratic-Gaussian control with costly observations have a threshold structure. It also implies that the restless bandit problem of observing multiple such time series, has a well-defined Whittle index. We discuss computation of that index, give closed-form formulae for it, and compare the performance of the associated index policy with heuristic policies. The proof is based on a new verification theorem that demonstrates threshold structure for Markov decision processes, and on the relation between binary sequences known as mechanical words and the dynamics of discontinuous nonlinear maps, which frequently arise in physics, control and biology.
Transportation Research Record, 2016
Curb space is a valuable asset for urban areas. The space is a finite resource with competing nee... more Curb space is a valuable asset for urban areas. The space is a finite resource with competing needs from various modes, land uses, and customers. In this context, when the curb space is used for parking, it is important that the space be used as efficiently as possible. There is no unanimous conclusion on whether a demarcated or undemarcated curbside configuration accommodates more vehicles. Most information on this subject is conflicting and anecdotal. This paper presents the results of an in-depth analysis with modeling and field data collection to determine whether a specific configuration is beneficial from a utilization standpoint. It also reviews the state-of-the-practice on demarcating on-street parking spaces and the results of a survey of local jurisdictions’ policies and practice and the logic behind the decision-making process. The authors conclude that factors other than efficiency might drive the decision to demarcate (or not).
Training agents to operate in one environment often yields overfitted models that are unable to g... more Training agents to operate in one environment often yields overfitted models that are unable to generalize to the changes in that environment. However, due to the numerous variations that can occur in the real-world, the agent is often required to be robust in order to be useful. This has not been the case for agents trained with reinforcement learning (RL) algorithms. In this paper, we investigate the overfitting of RL agents to the training environments in visual navigation tasks. Our experiments show that deep RL agents can overfit even when trained on multiple environments simultaneously. We propose a regularization method which combines RL with supervised learning methods by adding a term to the RL objective that would encourage the invariance of a policy to variations in the observations that ought not to affect the action taken. The results of this method, called Invariance Regularization, show an improvement in the generalization of policies to environments not seen during t...
Scientific Reports
Quadruped robots require robust and general locomotion skills to exploit their mobility potential... more Quadruped robots require robust and general locomotion skills to exploit their mobility potential in complex and challenging environments. In this work, we present an implementation of a robust end-to-end learning-based controller on the Solo12 quadruped. Our method is based on deep reinforcement learning of joint impedance references. The resulting control policies follow a commanded velocity reference while being efficient in its energy consumption and easy to deploy. We detail the learning procedure and method for transfer on the real robot. We show elaborate experiments. Finally, we present experimental results of the learned locomotion on various grounds indoors and outdoors. These results show that the Solo12 robot is a suitable open-source platform for research combining learning and control because of the easiness in transferring and deploying learned controllers.
Le Centre pour la Communication Scientifique Directe - HAL - Inria, Oct 29, 2021
Predefined gait patterns for quadruped locomotion can hardly be optimal in all situations with re... more Predefined gait patterns for quadruped locomotion can hardly be optimal in all situations with regard to stability, cost of transport and velocity tracking error. Hence, in this work, we tackle the challenge of adapting a predefined trotting gait, implemented in the model-based controller of Solo, to optimize both energy consumption and velocity tracking. To this end, we propose a model-free reinforcement learning method for adapting the timings of the contact/swing phases for each foot. The learned agent augments a control pipeline that was previously developed for the Solo robot. We also propose to use a self-attention mechanism over the history of states in order to extract useful information for adapting the gait. Through a comprehensive set of experiments, we demonstrate how, compared to the nominal gait, our method significantly reduces energy consumption, better tracks the desired velocity, and makes it possible to reach higher speeds. A video of the method is found at https://youtu.be/ykbDUyASXs4.
Journal of Behavioral and Experimental Economics, 2020
Abstract We were interested in the factors that influence deceptive behavior, especially if stron... more Abstract We were interested in the factors that influence deceptive behavior, especially if strong incentives can make people overcome their aversion to lying. We ran an online experiment in which participants made choices between deceptive and non-deceptive actions in hypothetical scenarios of reporting their income to tax authorities. The participants could lie in their tax report in order to receive a larger tax refund or to pay less additional taxes. In some scenarios they could face a tax penalty if their report was found to be in error. While a large number of participants never deceived when it was risky and potentially costly, the rate of deception almost doubled in conditions with no detection risk or penalty. However, not all participants responded equally to the absence of risk and detection penalty (incentives). We were able to identify three types of behaviors that were related to participants’ general risk and lie attitudes, but not to their numeracy skills or risk literacy. These three types of participants also differed in their deception rate and sensitivity to changes in incentives in different conditions varying either risk of getting caught or expected advantage gained by deceiving.
A Bayesian (belief) network is a representation of a probability distribution over a set of rando... more A Bayesian (belief) network is a representation of a probability distribution over a set of random variables. One of the main advantages of this model family is that it ooers a theoretically solid machine learning framework for constructing accurate domain models from sample data eeciently and reliably. As the parameters of a Bayesian network have a precise semantic interpretation, the learned models can be used for data mining purposes, i.e., for examining regularities found in the data. In addition to this type of direct examination of the model, we suggest that the learned Bayesian networks can also be used for indirect data mining purposes through a visualization scheme which can be used for producing 2D or 3D representations of high-dimensional problem domains. Our visualization scheme is based on the predictive distributions produced by the Bayesian network model, which means that the resulting visualizations can also be used as a post-processing tool for visual inspection of ...
An adaptive questionnaire, named EDUFORM, is based on Bayesian statistical techniques that both o... more An adaptive questionnaire, named EDUFORM, is based on Bayesian statistical techniques that both optimize the number of propositions presented to each respondent and create an individual learner profile. The preliminary results show that reducing the number of propositions we may still moderately control the error ratio. The respondents’ profiling information is in most cases obtained after one third of propositions.
Frontline Learning Research, 2014
AMIA Annual Symposium Proceedings, Nov 3, 2012
We introduce an automated, pathological class level annotation system for medical volumetric brai... more We introduce an automated, pathological class level annotation system for medical volumetric brain images. While much of the earlier work has mainly focused on annotating regions of interest in medical images, our system does not require annotated region level training data nor assumes perfect segmentation results for the regions of interest; the time and effort needed for acquiring training data are hence significantly reduced. This capability of handling high-dimensional noisy data, however, poses additional technical challenges, since statistical estimation of models for such data is prone to over-fitting. We propose a framework that combines a regularized logistic regression method and a kernel-based discriminative method to address these problems. Regularized methods provide a flexible selection mechanism that is well-suited for high dimensional noisy data. Our experiments show promising results in classifying computer tomography images of traumatic brain injury patients into p...
International Conference on Automated Planning and Scheduling, 2014
Site-based or topic-specific search engines work with mixed success because of the general diffic... more Site-based or topic-specific search engines work with mixed success because of the general difficulty of the information retrieval task, and the lack of good link information to allow authorities to be identified. We are advocating an open source approach to the problem due to its scope and need for software components. We have adopted a topicbased search engine because it represents the next generation of capability. This paper outlines our scalable system for site-based or topic-specific search, and demonstrates the developing system on a small 250,000 document collection of EU and UN web pages.
Case retrieval is an important problem in several commercially signi cant application areas, such... more Case retrieval is an important problem in several commercially signi cant application areas, such as industrial con guration and manufacturing problems. In this paper we extend the Bayesian probability theory based approaches to case-based reasoning, focusing on the case matching task, an essential part of any case retrieval system. Traditional approaches to the case matching problem typically rely on some distance measure, e.g., the Euclidean or Hamming distance, although there is no a priori guarantee that such measures really re ect the useful similarities and dissimilarities between the cases. One of the main advantages of the Bayesian framework for solving this problem is that it forces one to explicitly recognize all the assumptions made about the problem domain, which helps in analyzing the performance of the resulting system. As an example of an implementation of the Bayesian case matching approach in practice, we demonstrate how to construct a case retrieval system based on a set of independence assumptions between the domain variables. In the experimental part of the paper, the Bayesian case matching metric is evaluated empirically in a case-retrieval task by using public domain discrete real-world databases. The results suggest that case retrieval systems based on the Bayesian case matching score perform much better than case retrieval systems based on the standard Hamming distance similarity metrics.