Gradient-based relational reinforcement-learning of temporally extended policies (original) (raw)
Related papers
Using abstractions for decision-theoretic planning with time constraints
Proceedings of the National Conference on …, 1995
Recently Markov decision processes and optimal control policies have been applied to the problem of decision-theoretic planning. However, the classical methods for generating optimal policies are highly intractable, requiring explicit enumeration of large state spaces. We explore a method for generating abstractions that allow approximately optimal policies to be constructed; computational gains are achieved through reduction of the state space. Abstractions are generated by identifying propositions that are "relevant" either through their direct impact on utility, or their influence on actions. This information is gleaned from the representation of utilities and actions. We prove bounds on the loss in value due to abstraction and describe some preliminary experimental results.
Exploiting structure in policy construction
International Joint Conference on …, 1995
Markov decision processes (MDPs) have recently been applied to the problem of modeling decision-theoretic planning. While such traditional methods for solving MDPs are often practical for small states spaces, their effectiveness for large AI planning problems is questionable. We present an algorithm, called structured policy iteration (SPI), that constructs optimal policies without explicit enumeration of the state space. The algorithm retains the fundamental computational steps of the commonly used modified policy iteration algorithm, but exploits the variable and propositional independencies in reflected in a temporal Bayesian network representation of MDPs. The principles behind SPI can be applied to any structured representation of stochastic actions, and the algorithm itself can be used in conjunction with recent approximation methods.
On representing planning domains under uncertainty
Planning is an important activity in military coalitions and the support of an automated planning tool could help military planners by reducing the cognitive burden of their work. Current AI planning paradigms use two different types of formalism to represent the planning problem. Each of these formalisms entails different inference algorithms and representation of results.
Decision-theoretic planning: Structural assumptions and computational leverage
Journal of Artificial Intelligence Research, 1999
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these fields often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDP-related methods showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to describe performance criteria, in the functions used to describe state transitions and observations, and in the relationships among features used to describe states, actions, rewards, and observations. Specialized representations, and algorithms employing these representations, can achieve computational leverage by exploiting these various forms of structure. Certain AI techniques---in particular those based on the use of structured, intensional representations---can be viewed in this way. This paper surveys several types of representations for both classical and decision theoretic planning problems, and planning algorithms that exploit these representations in a number of different ways to ease the computational burden of constructing policies or plans. It focuses primarily on abstraction, aggregation and decomposition techniques based on AI-style representations.
REPLICA: Relational Policies Learning in Planning
REPLICA is a relational instance based learning module for solving STRIPS planning problems described in PDDL. REPLICA learns a reduced policy represented by a set of pairs <meta-state,action>. The meta-state represents the current planning state and the goal; the action represents the operator to execute in such meta-state. Both are described in terms of predicate logic. The next action to execute by the policy is computed as the action associated to the closest meta-state in that set. First, we extract an initial policy composed of a set of tuples (meta-state, action) from a set of solution plans. Second, we reduce this policy to obtain a subset of these tuples that generalizes the complete set. This learning proccess is done using relational nearest prototype classification. Finally, we use this policy for ordering the actions of the relaxed plans in a lookahead strategy for heuristic and forward search planning.
Relational Reinforcement Learning for Planning with Exogenous Effects
Journal of machine learning research, 2017
Probabilistic planners have improved recently to the point that they can solve difficult tasks with complex and expressive models. In contrast, learners cannot tackle yet the expressive models that planners do, which forces complex models to be mostly handcrafted. We propose a new learning approach that can learn relational probabilistic models with both action effects and exogenous effects. The proposed learning approach combines a multivalued variant of inductive logic programming for the generation of candidate models, with an optimization method to select the best set of planning operators to model a problem. We also show how to combine this learner with reinforcement learning algorithms to solve complete problems. Finally, experimental validation is provided that shows improvements over previous work in both simulation and a robotic task. The robotic task involves a dynamic scenario with several agents where a manipulator robot has to clear the tableware on a table. We show that the exogenous effects learned by our approach allowed the robot to clear the table in a more efficient way.
Planning under time constraints in stochastic domains
Artificial Intelligence, 1995
We provide a method, based on the theory of Markov decision processes, for e cient planning in stochastic domains. Goals are encoded as reward functions, expressing the desirability of each world state; the planner must nd a policy (mapping from states to actions) that maximizes future rewards. Standard goals of achievement, as well as goals of maintenance and prioritized combinations of goals, can be speci ed in this way. An optimal policy can be found using existing methods, but these methods require time at best polynomial in the number of states in the domain, where the number of states is exponential in the number of propositions (or state variables). By using information about the starting state, the reward function, and the transition probabilities of the domain, we restrict the planner's attention to a set of world states that are likely to be encountered in satisfying the goal. Using this restricted set of states, the planner can generate more or less complete plans depending on the time it has available.
Planning in hybrid relational MDPs
Machine Learning, 2017
We study planning in relational Markov decision processes involving discrete and continuous states and actions, and an unknown number of objects. This combination of hybrid relational domains has so far not received a lot of attention. While both relational and hybrid approaches have been studied separately, planning in such domains is still challenging and often requires restrictive assumptions and approximations. We propose HYPE: a samplebased planner for hybrid relational domains that combines model-based approaches with state abstraction. HYPE samples episodes and uses the previous episodes as well as the model to approximate the Q-function. In addition, abstraction is performed for each sampled episode, this removes the complexity of symbolic approaches for hybrid relational domains. In our empirical evaluations, we show that HYPE is a general and widely applicable planner in domains ranging from strictly discrete to strictly continuous to hybrid ones, handles intricacies such as unknown objects and relational models. Moreover, empirical results showed that abstraction provides significant improvements.
Learning Generalized Policies from Planning Examples Using Concept Languages
Applied Intelligence, 2000
In this paper we are concerned with the problem of learning how to solve planning problems in one domain given a number of solved instances. This problem is formulated as the problem of inferring a function that operates over all instances in the domain and maps states and goals into actions. We call such functions generalized policies and the question that we address is how to learn suitable representations of generalized policies from data. This question has been addressed recently by Roni Khardon (Technical Report TR-09-97, Harvard, 1997). Khardon represents generalized policies using an ordered list of existentially quantified rules that are inferred from a training set using a version of Rivest's learning algorithm (Machine Learning, vol. 2, no. 3, pp. 229-246, 1987). Here, we follow Khardon's approach but represent generalized policies in a different way using a concept language. We show through a number of experiments in the blocks-world that the concept language yields a better policy using a smaller set of examples and no background knowledge.