Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction (original) (raw)

Modular Deep Reinforcement Learning for Continuous Motion Planning With Temporal Logic

IEEE Robotics and Automation Letters

This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) with unknown transition probabilities over continuous state and action spaces. Linear temporal logic (LTL) is used to specify high-level tasks over infinite horizon, which can be converted into a limit deterministic generalized Büchi automaton (LDGBA) with several accepting sets. The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP by incorporating a synchronous tracking-frontier function to record unvisited accepting sets of the automaton, and to facilitate the satisfaction of the accepting conditions. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states and can overcome the issues of sparse rewards. Rigorous analysis shows that any RL method that optimizes the expected discounted return is guaranteed to find an optimal policy whose traces maximize the satisfaction probability. A modular deep deterministic policy gradient (DDPG) is then developed to generate such policies over continuous state and action spaces. The performance of our framework is evaluated via an array of OpenAI gym environments.

Optimal Probabilistic Motion Planning with Potential Infeasible LTL Constraints

2020

This paper studies optimal motion planning subject to motion and environment uncertainties. By modeling the system as a probabilistic labeled Markov decision process (PL-MDP), the control objective is to synthesize a finite-memory policy, under which the agent satisfies high-level complex tasks expressed as linear temporal logic (LTL) with desired satisfaction probability. In particular, the cost optimization of the trajectory that satisfies infinite-horizon tasks is considered, and the trade-off between reducing the expected mean cost and maximizing the probability of task satisfaction is analyzed. Instead of using traditional Rabin automata, the LTL formulas are converted to limit-deterministic Büchi automata (LDBA) with a more straightforward accepting condition and a more compact graph structure. The novelty of this work lies in the consideration of the cases that LTL specifications can be potentially infeasible and the development of a relaxed product MDP between PL-MDP and LDB...

This Time the Robot Settles for a Cost: A Quantitative Approach to Temporal Logic Planning with Partial Satisfaction

Proceedings of the AAAI Conference on Artificial Intelligence

The specification of complex motion goals through temporal logics is increasingly favored in robotics to narrow the gap between task and motion planning. A major limiting factor of such logics, however, is their Boolean satisfaction condition. To relax this limitation, we introduce a method for quantifying the satisfaction of co-safe linear temporal logic specifications, and propose a planner that uses this method to synthesize robot trajectories with the optimal satisfaction value. The method assigns costs to violations of specifications from user-defined proposition costs. These violation costs define a distance to satisfaction and can be computed algorithmically using a weighted automaton. The planner utilizes this automaton and an abstraction of the robotic system to construct a product graph that captures all possible robot trajectories and their distances to satisfaction. Then, a plan with the minimum distance to satisfaction is generated by employing this graph as the high-le...

Receding Horizon Control-Based Motion Planning With Partially Infeasible LTL Constraints

IEEE Control Systems Letters

This work considers online optimal motion planning of an autonomous agent subject to linear temporal logic (LTL) constraints. The environment is dynamic in the sense of containing mobile obstacles and time-varying areas of interest (i.e., time-varying reward and workspace properties) to be visited by the agent. Since user-specified tasks may not be fully realized (i.e., partially infeasible), this work considers hard and soft LTL constraints, where hard constraints enforce safety requirements (e.g. avoid obstacles) while soft constraints represent tasks that can be relaxed to not strictly follow user specifications. The motion planning of the agent is to generate policies, in decreasing order of priority, to 1) guarantee the satisfaction of safety constraints; 2) mostly satisfy fulfill constraints (i.e., minimize the violation cost if desired tasks are partially infeasible); and 3) optimize the objective of rewards collection (i.e., visiting dynamic areas of more interests). To achieve these objectives, a relaxed product automaton, which allows the agent to not strictly follow the desired LTL constraints, is constructed. A utility function is developed to quantify the differences between the revised and the desired motion plan, and the accumulated rewards are designed to bias the motion plan towards those areas of more interests. Receding horizon control is synthesized with an LTL formula to maximize the accumulated utilities over a finite horizon, while ensuring that safety constraints are fully satisfied and soft constraints are mostly satisfied. Simulation and experiment results are provided to demonstrate the effectiveness of the developed motion strategy.

Temporal logic motion planning for dynamic robots

Automatica, 2009

In this paper, we address the temporal logic motion planning problem for mobile robots that are modeled by second order dynamics. Temporal logic specifications can capture the usual control specifications such as reachability and invariance as well as more complex specifications like sequencing and obstacle avoidance. Our approach consists of three basic steps. First, we design a control law that enables the dynamic model to track a simpler kinematic model with a globally bounded error. Second, we built a robust temporal logic specification that takes into account the tracking errors of the first step. Finally, we solve the new robust temporal logic path planning problem for the kinematic model using automata theory and simple local vector fields. The resulting continuous time trajectory is provably guaranteed to satisfy the initial user specification.

Hybrid Controllers for Path Planning: A Temporal Logic Approach

Proceedings of the 44th IEEE Conference on Decision and Control, 2005

Robot motion planning algorithms have focused on low-level reachability goals taking into account robot kinematics, or on high level task planning while ignoring low-level dynamics. In this paper, we present an integrated approach to the design of closed-loop hybrid controllers that guarantee by construction that the resulting continuous robot trajectories satisfy sophisticated specifications expressed in the so-called Linear Temporal Logic. In addition, our framework ensures that the temporal logic specification is satisfied even in the presence of an adversary that may instantaneously reposition the robot within the environment a finite number of times. This is achieved by obtaining a Büchi automaton realization of the temporal logic specification, which supervises a finite family of continuous feedback controllers, ensuring consistency between the discrete plan and the continuous execution.

DT*: Temporal Logic Path Planning in a Dynamic Environment

ArXiv, 2021

Path planning for a robot is one of the major problems in the area of robotics. When a robot is given a task in the form of a Linear Temporal Logic (LTL) specification such that the task needs to be carried out repetitively, we want the robot to follow the shortest cyclic path so that the number of times the robot completes the mission within a given duration gets maximized. In this paper, we address the LTL path planning problem in a dynamic environment where the newly arrived dynamic obstacles may invalidate some of the available paths at any arbitrary point in time. We present DT*, an SMT-based receding horizon planning strategy that solves an optimization problem repetitively based on the current status of the workspace to lead the robot to follow the best available path in the current situation. We implement our algorithm using the Z3 SMT solver and evaluate it extensively on an LTL specification capturing a pick-and-drop application in a warehouse environment. We compare our S...

Active Perception and Control from Temporal Logic Specifications

IEEE Control Systems Letters, 2019

Next-generation autonomous systems must execute complex tasks in uncertain environments. Active perception, where an autonomous agent selects actions to increase knowledge about the environment, has gained traction in recent years for motion planning under uncertainty. One prominent approach is planning in the belief space. However, most belief-space planning starts with a known reward function, which can be difficult to specify for complex tasks. On the other hand, symbolic control methods automatically synthesize controllers to achieve logical specifications, but often do not deal well with uncertainty. In this work, we propose a framework for scalable task and motion planning in uncertain environments that combines the best of belief-space planning and symbolic control. Specifically, we provide a counterexample-guided-inductive-synthesis algorithm for probabilistic temporal logic over reals (PRTL) specifications in the belief space. Our method automatically generates actions that improve confidence in a belief when necessary, thus using active perception to satisfy PRTL specifications.

Optimal control of MDPs with temporal logic constraints

52nd IEEE Conference on Decision and Control, 2013

In this paper, we focus on formal synthesis of control policies for finite Markov decision processes with non-negative real-valued costs. We develop an algorithm to automatically generate a policy that guarantees the satisfaction of a correctness specification expressed as a formula of Linear Temporal Logic, while at the same time minimizing the expected average cost between two consecutive satisfactions of a desired property. The existing solutions to this problem are sub-optimal. By leveraging ideas from automata-based model checking and game theory, we provide an optimal solution. We demonstrate the approach on an illustrative example. . s ∈S P(s, α, s ) = 1. With a slight abuse of notation, A(s) denotes the set of all actions enabled in a state s. We assume A(s) = ∅ for every s ∈ S.

Technical Report: A Receding Horizon Algorithm for Informative Path Planning with Temporal Logic Constraints

2013

This technical report is an extended version of the paper 'A Receding Horizon Algorithm for Informative Path Planning with Temporal Logic Constraints' accepted to the 2013 IEEE International Conference on Robotics and Automation (ICRA). This paper considers the problem of finding the most informative path for a sensing robot under temporal logic constraints, a richer set of constraints than have previously been considered in information gathering. An algorithm for informative path planning is presented that leverages tools from information theory and formal control synthesis, and is proven to give a path that satisfies the given temporal logic constraints. The algorithm uses a receding horizon approach in order to provide a reactive, on-line solution while mitigating computational complexity. Statistics compiled from multiple simulation studies indicate that this algorithm performs better than a baseline exhaustive search approach.