Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction (original) (raw)
Related papers
Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight
2019
We propose a joint simulation and real-world learning framework for mapping navigation instructions and raw first-person observations to continuous control. Our model estimates the need for environment exploration, predicts the likelihood of visiting environment positions during execution, and controls the agent to both explore and visit high-likelihood positions. We introduce Supervised Reinforcement Asynchronous Learning (SuReAL). Learning uses both simulation and real environments without requiring autonomous flight in the physical environment during training, and combines supervised learning for predicting positions to visit and reinforcement learning for continuous control. We evaluate our approach on a natural language instruction-following task with a physical quadcopter, and demonstrate effective execution and exploration behavior.
Deep Learning for Drone Navigation
2020
Deep drone racing and navigation are emerging applications of deep learning which may be used in competitions and potentially to automatize a multitude of tasks accomplished by drones. In this paper, we apply the method Deep Deterministic Policy Gradient (DDPG) to train a neural network whose objective is to direct a simulated quadcopter towards a target, reproducing a simplified drone race environment. The model explored in the paper is not vision-based; it assumes the position and velocity of the drone in relation to the target are known at all times, and these variables are passed as inputs to the model. Based entirely on these variables, the neural network controls the quadcopter’s rotors angular speeds, which in turn determine the flight path taken by the drone. DDPG training requires engineering an efficient reward function, which is essential to the convergence of the model. A few different reward functions were tested and are presented in the paper. The results showed that DDPG is a suitable method for training a deep drone racing neural network, as suggested by the fact that, after training, the drone was able to make its way to the target within a certain range of initial distance and regardless of the initial directing vector from the drone to the target. In spite of that, certain minor problems remain to be solved and might be the subject of future works. Among those problems are the facts that the trajectories chosen by the neural network are generally not optimal and that the drone tends to diverge from the target as the former gets closer than a few feet to the latter.