Hierarchical Reinforcement Learning In Communication-Mediated Multiagent Coordination (original) (raw)

Hierarchical reinforcement learning for communicating agents

2004

Abstract This paper proposes hierarchical reinforcement learning (RL) methods for communication in multiagent coordination problems modelled as Markov Decision Processes (MDPs). To bridge the gap between the MDP view and the methods used to specify communication protocols in multiagent systems (using logical conditions and propositional message structure), we utilise interaction frames as powerful policy abstractions that can be combined with case-based reasoning techniques.

Reinforcement Learning of Communication in a Multi-agent Context

2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 2011

In this paper, we present a reinforcement learning approach for multi-agent communication in order to learn what to communicate, when and to whom. This method is based on introspective agents that can reason about their own actions and data so as to construct appropriate communicative acts. We propose an extension of classical reinforcement learning algorithms for multi-agent communication. We show how communicative acts and memory can help solving non-markovity and asynchronism issues in MAS.

Planning, learning and coordination in multiagent decision processes

Proceedings of the 6th conference on Theoretical …, 1996

There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from single-agent planning and learning can be applied in multiagent settings. We survey a number of different techniques from decision-theoretic planning and reinforcement learning and describe a number of interesting issues that arise with regard to coordinating the policies of individual agents. To this end, we describe multiagent Markov decision processes as a general model in which to frame this discussion. These are special n-person cooperative games in which agents share the same utility function. We discuss coordination mechanisms based on imposed conventions (or social laws) as well as learning methods for coordination. Our focus is on the decomposition of sequential decision processes so that coordination can be learned (or imposed) locally, at the level of individual states. We also discuss the use of structured problem representations and their role in the generalization of learned conventions and in approximation.

High level coordination of agents based on multiagent Markov decision processes with roles

Information & Software Technology, 2002

We present an approach for coordinating the actions of a team of real world autonomous agents on a high level. The method extends the framework of multiagent Markov decision processes with the notion of roles, a flexible and natural way to give each member of the team a clear de- scription of its task. A role in our framework defines

Hierarchical Representation Learning for Markov Decision Processes

2021

In this paper we present a novel method for learning hierarchical representations of Markov decision processes. Our method works by partitioning the state space into subsets, and defines subtasks for performing transitions between the partitions. At the high level, we use model-based planning to decide which subtask to pursue next from a given partition. We formulate the problem of partitioning the state space as an optimization problem that can be solved using gradient descent given a set of sampled trajectories, making our method suitable for high-dimensional problems with large state spaces. We empirically validate the method, by showing that it can successfully learn useful hierarchical representations in domains with high-dimensional states. Once learned, the hierarchical representation can be used to solve different tasks in the given domain, thus generalizing knowledge across tasks.

Case-Based Multiagent Reinforcement Learning: Cases as Heuristics for Selection of Actions

Proceeding of the 2010 …, 2010

This work presents a new approach that allows the use of cases in a case base as heuristics to speed up Multiagent Reinforcement Learning algorithms, combining Case-Based Reasoning (CBR) and Multiagent Reinforcement Learning (MRL) techniques. This approach, called Case-Based Heuristically Accelerated Multiagent Reinforcement Learning (CB-HAMRL), builds upon an emerging technique, Heuristic Accelerated Reinforcement Learning (HARL), in which RL methods are accelerated by making use of heuristic information. CB-HAMRL is a subset of MRL that makes use of a heuristic function H derived from a case base, in a Case-Based Reasoning manner. An algorithm that incorporates CBR techniques into the Heuristically Accelerated Minimax-Q is also proposed and a set of empirical evaluations were conducted in a simulator for the Littman's robot soccer domain, comparing the three solutions for this problem: MRL, HAMRL and CB-HAMRL. Experimental results show that using CB-HAMRL, the agents learn faster than using RL or HAMRL methods.

Hierarchical multi-agent reinforcement learning

Autonomous Agents and Multi-Agent Systems, 2006

In this paper we investigate the use of hierarchical reinforcement learning to speed up the acquisition of cooperative multi-agent tasks. We extend the MAXQ framework to the multi-agent case. Each agent uses the same MAXQ hierarchy to decompose a task into sub-tasks. Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. Coordination skills among agents are learned by using joint actions at the highest level(s) of the hierarchy. The Q nodes at the highest level(s) of the hierarchy are configured to represent the joint task-action space among multiple agents. In this approach, each agent only knows what other agents are doing at the level of sub-tasks, and is unaware of lower level (primitive) actions. This hierarchical approach allows agents to learn coordination faster by sharing information at the level of sub-tasks, rather than attempting to learn coordination taking into account primitive joint state-action values. We apply this hierarchical multi-agent reinforcement learning algorithm to a complex AGV scheduling task and compare its performance and speed with other learning approaches, including flat multi-agent, single agent using MAXQ, selfish multiple agents using MAXQ (where each agent acts independently without communicating with the other agents), as well as several well-known AGV heuristics like "first come first serve", "highest queue first" and "nearest station first". We also compare the tradeoffs in learning speed vs. performance of modeling joint action values at multiple levels in the MAXQ hierarchy. * Currently at Agilent Technologies, CA.

Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework

ArXiv, 2021

Reliable AI agents should be mindful of the limits of their knowledge and consult humans when sensing that they do not have sufficient knowledge to make sound decisions. We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans and what type of information would be helpful to request. Our framework extends partially-observed Markov decision processes (POMDPs) by allowing an agent to interact with an assistant to leverage their knowledge in accomplishing tasks. Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework: aided with an interaction policy learned by our method, a navigation policy achieves up to a 7× improvement in task success rate compared to performing tasks only by itself. The interaction policy is also efficient: on average, only a quarter of all actions taken during a task execution are requests for information. We analyze benefits and challeng...

Computing effective communication policies in multiagent systems

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems - AAMAS '07, 2007

Communication is a key tool for facilitating multiagent coordination in cooperative and uncertain domains. We focus on a class of multiagent problems modeled as Decentralized Markov Decision Processes with Communication (DEC-MDP-COM) with local observability. The planning problem for computing the optimal communication strategy in such domains is often formulated with the assumption of the knowledge of optimal domain-level policy. Computing the optimal communication policy is NP-complete. There is a need, then, for heuristic solutions that trade-off performance with efficiency. We present a decision theoretic approach for computing optimal communication policies in stochastic environments which uses a branching future representation and evaluates only those decisions that an agent is likely to encounter. The communication strategy computed off-line is used in the more probable scenarios that the agent would face in future. Our approach also allows agents to compute communication policies at run-time in the unlikely event of the agents facing scenarios that were discarded while computing the off-line policy.