Multiagent reinforcement learning with the partly high-dimensional state space (original) (raw)
Related papers
A Modular Approach to Multi-agent Reinforcement Learning
Several attempts have been reported to let multiple monolithic reinforcement-learning agents synthesize coordinated decision policies needed to accomplish their common goal effectively. Most of these straightforward reinforcement-learning approaches, however, scale poorly to more complex multi-agent learning problems, because the state space for each learning agent grows exponentially in the number of its partner agents engaged in the joint task. To remedy the exponentially large state space in multi-agent reinforcement learning, we previously proposed a modular approach and demonstrated its effectiveness through the application to a modified version of the pursuit problem. In this paper, the effectiveness of the proposed idea is filrther demonstrated using several variants of the pursuit problem. Just as in the previous case, our modular Q-learning hunters can successfillly capture a randomly-evading prey agent, by synthesizing and taking advantage of effective coordinated behavior.
Q-error as a Selection Mechanism in Modular Reinforcement-Learning Systems
This paper introduces a novel multi-modular method for reinforcement learning. A multimodular system is one that partitions the learning task among a set of experts (modules), where each expert is incapable of solving the entire task by itself. There are many advantages to splitting up large tasks in this way, but existing methods face difficulties when choosing which module(s) should contribute to the agent's actions at any particular moment. We introduce a novel selection mechanism where every module, besides calculating a set of action values, also estimates its own error for the current input. The selection mechanism combines each module's estimate of long-term reward and self-error to produce a score by which the next module is chosen. As a result, the modules can use their resources effectively and efficiently divide up the task. The system is shown to learn complex tasks even when the individual modules use only linear function approximators. * This work was funded by the following grants to J. Schmidhuber: EU project FP7-ICT-IP-231722 (IM-CLeVeR) and SNF Project 200020-122124.
Graph Exploration for Effective Multi-agent Q-Learning
arXiv (Cornell University), 2023
This paper proposes an exploration technique for multi-agent reinforcement learning (MARL) with graph-based communication among agents. We assume the individual rewards received by the agents are independent of the actions by the other agents, while their policies are coupled. In the proposed framework, neighbouring agents collaborate to estimate the uncertainty about the state-action space in order to execute more efficient explorative behaviour. Different from existing works, the proposed algorithm does not require counting mechanisms and can be applied to continuous-state environments without requiring complex conversion techniques. Moreover, the proposed scheme allows agents to communicate in a fully decentralized manner with minimal information exchange. And for continuousstate scenarios, each agent needs to exchange only a single parameter vector. The performance of the algorithm is verified with theoretical results for discrete-state scenarios and with experiments for continuous ones.
Reducing the complexity of multiagent reinforcement learning
2007
Abstract It is known that the complexity of the reinforcement learning algorithms, such as Q-learning, may be exponential in the number of environment's states. It was shown, however, that the learning complexity for the goal-directed problems may be substantially reduced by initializing the Q-values with a" good" approximative function. In the multiagent case, there exists such a good approximation for a big class of problems, namely, for goal-directed stochastic games.
Factorized Q-learning for large-scale multi-agent systems
Proceedings of the First International Conference on Distributed Artificial Intelligence, 2019
Deep Q-learning has achieved signi cant success in single-agent decision making tasks. However, it is challenging to extend Q-learning to large-scale multi-agent scenarios, due to the explosion of action space resulting from the complex dynamics between the environment and the agents. In this paper, we propose to make the computation of multi-agent Q-learning tractable by treating the Q-function (w.r.t. state and joint-action) as a high-order high-dimensional tensor and then approximate it with factorized pairwise interactions. Furthermore, we utilize a composite deep neural network architecture for computing the factorized Q-function, share the model parameters among all the agents within the same group, and estimate the agents' optimal joint actions through a coordinate descent type algorithm. All these simpli cations greatly reduce the model complexity and accelerate the learning process. Extensive experiments on two di erent multi-agent problems demonstrate the performance gain of our proposed approach in comparison with strong baselines, particularly when there are a large number of agents. CCS CONCEPTS •Computing methodologies → Multi-agent reinforcement learning; Multi-agent systems; Reinforcement learning;
A selection-mutation model for q-learning in multi-agent systems
2003
ABSTRACT Although well understood in the single-agent framework, the use of traditional reinforcement learning (RL) algorithms in multi-agent systems (MAS) is not always justified. The feedback an agent experiences in a MAS, is usually influenced by the other agents present in the system. Multi agent environments are therefore non-stationary and convergence and optimality guarantees of RL algorithms are lost.
Q-learning by the nth step state and multi-agent negotiation in unknown environment
Tehnicki Vjesnik, 2012
Original scientific paper This work will show a new procedure of Q-learning in which the agent's decision, regarding the next step, is not based on the optimal action at that moment but on the usefulness of a future state. A near agent communication has been implemented so that the agents signal each other their future actions which contribute to a better choice of actions for each of the agents. The new method is named Q-learning by the n th step and multi-agent negotiation. The results of the testing of this algorithm are compared with the basic QL algorithm which is also graphically demonstrated and the advantages of the new algorithm are listed too. An average of 40 % collision decrease is obtained during learning procedure.
A REINFORCEMENT LEARNING APPROACH FOR MULTIAGENT NAVIGATION
This paper presents a Q-Learning-based multiagent system oriented to provide navigation skills to simulation agents in virtual environments. We focus on learning local navigation behaviours from the interactions with other agents and the environment. We adopt an environment-independent state space representation to provide the required scalability of such kind of systems. In this way, we evaluate whether the learned action-value functions can be transferred to other agents to increase the size of the group without loosing behavioural quality. We explain the learning process defined and the results of the collective behaviours obtained in a well-known experiment in multiagent navigation: evacuation through a door.
Modular Q-learning based multi-agent cooperation for robot soccer
Robotics and Autonomous Systems, 2001
In a multi-agent system, action selection is important for the cooperation and coordination among agents. As the environment is dynamic and complex, modular Q-learning, which is one of the reinforcement learning schemes, is employed in assigning a proper action to an agent in the multi-agent system. The architecture of modular Q-learning consists of learning modules and a mediator module. The mediator module of the modular Q-learning system selects a proper action for the agent based on the Q-value obtained from each learning module. To obtain better performance, along with the Q-value, the mediator module also considers the state information in the action selection process. A uni-vector field is used for robot navigation. In the robot soccer environment, the effectiveness and applicability of modular Q-learning and the uni-vector field method are verified by real experiments using five micro-robots.