Multiagent Cooperative Reinforcement Learning by Expert Agents (MCRLEA) (original) (raw)
Related papers
International Journal of Agent Technologies and Systems, 2017
This article gives a novel approach to cooperative decision-making algorithms by Joint Action learning for the retail shop application. Accordingly, this approach presents three retailer stores in the retail marketplace. Retailers can help to each other and can obtain profit from cooperation knowledge through learning their own strategies that just stand for their aims and benefit. The vendors are the knowledgeable agents to employ cooperative learning to train in the circumstances. Assuming a significant hypothesis on the vendor's stock policy, restock period, and arrival process of the consumers, the approach was formed as a Markov model. The proposed algorithms learn dynamic consumer performance. Moreover, the article illustrates the results of cooperative reinforcement learning algorithms by joint action learning of three shop agents for the period of one-year sale duration. Two approaches have been compared in the article, i.e. multi-agent Q Learning and joint action learning.
2017
A novel approach by Expertise based Multi-agent Cooperative Reinforcement Learning Algorithms (EMCRLA) for dynamic decision-making in the retail application is proposed in this paper. Performance evaluation between Cooperative Reinforcement Learning Algorithms and Expertise based Multi-agent Cooperative Reinforcement Learning Algorithms (EMCRLA) is demonstrated. Different cooperation schemes for multi-agent cooperative reinforcement learning i.e. EQ learning, EGroup scheme, EDynamic scheme and EGoal driven scheme are proposed here. Implementation outcome includes a demonstration of recommended cooperation schemes that are competent enough to speed up the collection of agents that achieve excellent action policies. This approach is developed for three retailer stores in the retail marketplace. Retailers are able to help with each other and can obtain profit from cooperation knowledge through learning their own strategies that exactly stand for their aims and benefit. The vendors are ...
Multi-agent Cooperation Models by Reinforcement Learning (MCMRL)
International Journal of Computer Applications, 2017
A novel approach to multi-agent cooperation methods by reinforcement learning (MCMRL) is proposed in this paper. Cooperation methods for reinforcement learning depend on the multi-agent scheme are proposed and implemented. Different cooperation methods of cooperative reinforcement learning of each agent proposed here i.e. group method, dynamic method, goal-oriented method. Implementation results have demonstrated that the suggested cooperation methods are capable to accelerate the aggregation of agents that accomplish best action strategies. This approach is developed for dynamic product availability in a three retailer shop in the market. Retailers can cooperate with each other and can get the benefit of cooperative information from their own policies that accurately represent their goals and interests. The retailers are the learning agents in the problem and apply reinforcement learning to learn cooperatively in the situation. By making the considerable theory of the dealer's inventory strategy, refill period, and entry procedure of the customers, the problem turns out to be Markov decision process model thus facilitating to apply learning algorithms.
A Framework for Improved Cooperative Learning Algorithms with Expertness (ICLAE)
Advanced Computing and Communication Technologies, 2017
A policy framework for dynamic products availability in three retailer shops in the market is investigated. Retailers can cooperate with each other and can get benefit from cooperative information by their own policies that accurately represent their goals and interests. The retailers are the learning agents in the system and use reinforcement learning to learn cooperatively from the situation. Cooperation in learning (CL) is understood in a multi-agent environment. A framework for Improved Cooperative Learning Algorithms with Expertness (ICLAE) is proposed. Expertness measuring criteria which were used in earlier work are further enhanced and improved in the proposed method. Four methods for measuring the agents' expertness are used, viz., Normal, Absolute, Positive, and Negative. The novelty of this approach lies in the implementation of the RL algorithms with expertness measuring criteria by means of Q-learning, Q(k) learning, Sarsa learning, and Sarsa(k) learning algorithms. This chapter shows the implementation results and performance comparison of these algorithms.
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
The output of the system is a sequence of actions in some applications. There is no such measure as the best action in any in-between state; an action is excellent if it is part of a good policy. A single action is not important; the policy is important that is the sequence of correct actions to reach the goal. In such a case, machine learning program should be able to assess the goodness of policies and learn from past good action sequences to be able to generate a policy. A multi-agent environment is one in which there is more than one agent, where they interact with one another, and further, where there are restrictions on that environment such that agents may not at any given time know everything about the world that other agents know. Two features of multi-agent learning which establish its study as a separate field from ordinary machine learning. Parallelism, scalability, simpler construction and cost effectiveness are main characteristics of multi-agent systems. Multiagent learning model is given in this paper. Two multiagent learning algorithms i. e. Strategy Sharing & Joint Rewards algorithm are implemented. In Strategy Sharing algorithm simple averaging of Q tables is taken. Each Q-learning agent learns from all of its teammates by taking the average of Q-tables. Joint reward learning algorithm combines the Q learning with the idea of joint rewards. Paper shows result and performance comparison of the two multiagent learning algorithms.
A two-layered multi-agent reinforcement learning model and algorithm
Journal of Network and Computer Applications, 2007
Multi-agent reinforcement learning technologies are mainly investigated from two perspectives of the concurrence and the game theory. The former chiefly applies to cooperative multi-agent systems, while the latter usually applies to coordinated multi-agent systems. However, there exist such problems as the credit assignment and the multiple Nash equilibriums for agents with them. In this paper, we propose a new multi-agent reinforcement learning model and algorithm LMRL from a layer perspective. LMRL model is composed of an off-line training layer that employs a single agent reinforcement learning technology to acquire stationary strategy knowledge and an online interaction layer that employs a multi-agent reinforcement learning technology and the strategy knowledge that can be revised dynamically to interact with the environment. An agent with LMRL can improve its generalization capability, adaptability and coordination ability. Experiments show that the performance of LMRL can be better than those of a single agent reinforcement learning and Nash-Q. r
Cooperative Multi-Agent Reinforcement Learning for Inventory Management
arXiv (Cornell University), 2023
With Reinforcement Learning (RL) for inventory management (IM) being a nascent field of research, approaches tend to be limited to simple, linear environments with implementations that are minor modifications of off-the-shelf RL algorithms. Scaling these simplistic environments to a real-world supply chain comes with a few challenges such as: minimizing the computational requirements of the environment, specifying agent configurations that are representative of dynamics at real world stores and warehouses, and specifying a reward framework that encourages desirable behavior across the whole supply chain. In this work, we present a system with a custom GPU-parallelized environment that consists of one warehouse and multiple stores, a novel architecture for agent-environment dynamics incorporating enhanced state and action spaces, and a shared reward specification that seeks to optimize for a large retailer's supply chain needs. Each vertex in the supply chain graph is an independent agent that, based on its own inventory, able to place replenishment orders to the vertex upstream. The warehouse agent, aside from placing orders from the supplier, has the special property of also being able to constrain replenishment to stores downstream, which results in it learning an additional allocation sub-policy. We achieve a system that outperforms standard inventory control policies such as a base-stock policy and other RL-based specifications for 1 product, and lay out a future direction of work for multiple products.
A Collaborative Decision-making Approach for Supply Chain Based on a Multi-agent System
Springer Series in Advanced Manufacturing, 2010
To improve the supply chain's performance under demand uncertainty and exceptions, various levels of collaboration techniques based on information sharing were set up in real supply chains (VMI, CPR, CPFR…). The main principle of these methods is that the retailers do not need to place orders because wholesalers use information centralization to decide when to replenish them. Although these techniques could be extended to a whole supply chain, current implementations only work between two business partners. With these techniques, companies electronically exchange a series of written comments and supporting data, which includes past sales trends, scheduled promotions, and forecasts. This allows participants to coordinate joint forecasting by focusing on differences in forecasts. But if the supply chain consists of autonomous enterprises, sharing information becomes a critical obstacle, since each independent actor is typically not willing to share with the other nodes its own strategic data (as inventory levels); That is why researchers proposed different methods and information systems to let the members of the supply chain collaborate without sharing all their confidential data and information. In this chapter we analyze some of the existing approaches and works and describe an agent-based distributed architecture for the decision-making process. The agents in this architecture use a set of negotiation protocols (such as Firm Heuristic, Recursive Heuristic, CPFR Negotiation Protocol) to collectively make decisions in a short time. The architecture has been validated on an industrial case study.
A Multi-agent System for Electronic Commerce including Adaptive Strategic Behaviours
1999
This work is primarily based on the use of software agents for automated negotiation. We present in this paper a test-bed for agents in an electronic marketplace, through which we simulated different scenarios allowing us to evaluate different agents’ negotiation behaviours. The system follows a multi-party and multi-issue negotiation approach. We tested the system by comparing the performance of agents that use multiple tactics with ones that include learning capabilities based on a specific kind of Reinforcement Learning technique. First experiments showed that the adaptive agents tend to win deals over their competitors as their experience increases.
Multi-Agent Learning in both Cooperative and Competitive Environments
Advances in Artificial Intelligence, EPIA 2013 16th Portuguese Conference on Artificial Intelligence, pp.400-411, Azores, Portugal. ISBN: 978-989-95489-1-6., 2013
Our intent is to present a mechanism suitable for agents that, immersed in an environment that is simultaneously cooperative and competitive, have to learn its own best behaviour not only from an individual point of view but also from a global perspective of the system. We consider the learning mechanism we propose to be a multi-agent learning mechanism not only because there are multiple agents learning concurrently in the same environment but also because it allows them to understand how to improve their performance and still not to damage the performance of the other agents. We tested our learning mechanism over the Disruption Management in Airline Operations Control Center application domain and the results show that it provides a good performance to the agents in cooperative as well as in competitive situations in the environment.