A reinforcement learning algorithm for building collaboration in multi-agent systems (original) (raw)
Related papers
Building Collaboration in Multi-agent Systems Using Reinforcement Learning
Lecture Notes in Computer Science, 2018
This paper presents a proof-of concept study for demonstrating the viability of building collaboration among multiple agents through standard Q learning algorithm embedded in particle swarm optimisation. Collaboration is formulated to be achieved among the agents via competition, where the agents are expected to balance their action in such a way that none of them drifts away of the team and none intervene any fellow neighbours territory, either. Particles are devised with Q learning for self training to learn how to act as members of a swarm and how to produce collaborative/collective behaviours. The produced experimental results are supportive to the proposed idea suggesting that a substantive collaboration can be build via proposed learning algorithm.
Cooperative Multi-Agent Systems Using Distributed Reinforcement Learning Techniques
Procedia Computer Science, 2018
In this paper, the fully cooperative multi-agent system is studied, in which all of the agents share the same common goal. The main difficulty in such systems is the coordination problem: how to ensure that the individual decisions of the agents lead to jointly optimal decisions for the group? Firstly, a multi-agent reinforcement learning algorithm combining traditional Q-learning with observation-based teammate modeling techniques, called T M Qlearning, is presented and evaluated. Several new cooperative action selection strategies are then suggested to improve the multi-agent coordination and accelerate learning, especially in the case of unknown and temporary dynamic environments. The effectiveness of combining T M Qlearning with the new proposals is demonstrated using the hunting game.
Cooperative Q-learning: the knowledge sharing issue
Advanced Robotics, 2001
A group of cooperative and homogeneous Q-learning agents can cooperate to learn faster and gain more knowledge. In order to do so, each learner agent must be able to evaluate the expertness and the intelligence level of the other agents, and to assess the knowledge and the information it gets from them. In addition, the learner needs a suitable method to properly combine its own knowledge and what it gains from the other agents according to their relative expertness. In this paper, some expertness measuring criteria are introduced. Also, a new cooperative learning method called weighted strategy sharing (WSS) is introduced. In WSS, based on the amount of its teammate expertness, each agent assigns a weight to their knowledge and utilizes it accordingly. WSS and the expertness criteria are tested on two simulated hunter-prey and object-pushing systems.
2012
This paper presents the design and implementation of a new reinforcement learning (RL) based algorithm. The proposed algorithm, ) ( CQ (collaborative ) ( Q ) allows several learning agents to acquire knowledge from each other. Acquiring knowledge learnt by an agent via collaboration with another agent enables acceleration of the entire learning system; therefore, learning can be utilized more efficiently. By developing collaborative learning algorithms, a learning task solution can be achieved significantly faster if performed by a single agent only, namely the number of learning episodes to solve a task is reduced. The proposed algorithm proved to accelerate learning in navigation robotic problem. The ) ( CQ algorithm was applied to autonomous mobile robot navigation where several robot agents serve as learning processes. Robots learned to navigate an 11 x 11 world contains obstacles and boundaries choosing the optimum path to reach a target. Simulated experiments based on 50 le...
Learning Cooperative Behaviours in Multiagent Reinforcement Learning
We investigated the coordination among agents in a goal finding task in a partially observable environment. In our problem formulation, the task was to locate a goal in a 2D space. However, no information related to the goal was given to the agents unless they had formed a swarm. Further more, the goal must be located by a swarm of agents, not a single agent. In this study, cooperative behaviours among agents were learned using our proposed context dependent multiagent SARSA algorithms (CDM-SARSA). In essence, instead of tracking the actions from all the agents in the Q-table i.e., Q(s,bfa)Q(s,{\bf a})Q(s,bfa), the CDM-SARSA tracked only actions aia_iai of agent iii and the context ccc resulting from the actions of all the agents, i.e., Qi(s,ai,c)Q_i(s,a_i,c)Qi(s,ai,c). This approach reduced the size of the state space considerably. Tracking all the agents' actions was impractical since the state space increased exponentially with every new agent added into the system. In our opinion, tracking the context abstracted unnecessary details and this approach was a logical solution for multiagent reinforcement learning task. The proposed approach for learning cooperative behaviours was illustrated using a different number of agents and with different grid sizes. The empirical results confirmed that the proposed CDM-SARSA could learn cooperative behaviours successfully.
Learning to cooperate in multi-agent systems by combining Q-learning and evolutionary strategy
Int. J. Lateral Comput, 2005
Coordination games can represent interactions between multiple agents in many real-life situations. Thus single-stage coordination games provide a stylized, abstracted environment for testing algorithms that allow artificial agents to learn to cooperate in such settings. Individual reinforcement learners often fail to learn coordinated behavior. Using an evolutionary approach to strategy selection can produce optimal joint behavior but may require significant computational effort. Our goal in this paper is to improve convergence to optimal behavior with reduced computational effort by combining learning and evolutionary techniques. In particular, we show that by letting agents learn in between generations in an evolutionary algorithm allows them to more consistently learn effective cooperative behavior even in difficult, stochastic environments. Our combined mechanism is a novel improvisation involving selecting actual rather than inherited behaviors.
Expertness based cooperative Q-learning
IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 2002
By using other agents' experiences and knowledge, a learning agent may learn faster, make fewer mistakes, and create some rules for unseen situations. These benefits would be gained if the learning agent can extract proper rules out of the other agents' knowledge for its own requirements. One possible way to do this is to have the learner assign some expertness values (intelligence level values) to the other agents and use their knowledge accordingly. In this paper, some criteria to measure the expertness of the reinforcement learning agents are introduced. Also, a new cooperative learning method, called weighted strategy sharing (WSS) is presented. In this method, each agent measures the expertness of its teammates and assigns a weight to their knowledge and learns from them accordingly. The presented methods are tested on two Hunter-Prey systems. We consider that the agents are all learning from each other and compare them with those who cooperate only with the more expert ones. Also, the effect of the communication noise, as a source of uncertainty, on the cooperative learning method is studied. Moreover, the Q-table of one of the cooperative agents is changed randomly and its effects on the presented methods are examined.
Reinforcement learning based on policy gradient methods is considered suitable for autonomous distributed multiagent learning. This paper shows that reinforcement learning technique can be applied to problems of finding a minimum value of continuous functions by Particle Swarm Optimization (PSO). In PSO, common values are given to all particles for the weight parameters of two attractive forces to each particle’s best solution and the group’s best solution. However, the PSO results depend on the values of fixed weight parameters that are given uniformly to all the particles. We propose a new model to apply the policy gradient method to PSO to adjust these parameters. In our model, particles may have distinct values for the weight parameters. Allowing particles to change their weight parameters corresponds to giving them personalities. Moreover it enables us to use repulsive forces to each particle’s best solution and the group’s best solution as well as attractive forces. The result...
Multi-robot learning with particle swarm optimization
Proceedings of the fifth international joint …, 2006
We apply an adapted version of Particle Swarm Optimization to distributed unsupervised robotic learning in groups of robots with only local information. The performance of the learning technique for a simple task is compared across robot groups of various sizes, with the maximum group size allowing each robot to individually contain and manage a single PSO particle. Different PSO neighborhoods based on limitations of real robotic communication are tested in this scenario, and the effect of varying communication power is explored. The algorithms are then applied to a group learning scenario to explore their susceptibility to the credit assignment problem. Results are discussed and future work is proposed.