Grid management support by means of collaborative learning agents (original) (raw)
Related papers
Collaborative DFA Learning applied to Grid Administration
Caries Res, 2009
This paper proposes a distributed learning mechanism that learns patterns from distributed datasets. The complex and dynamic settings of grid environments requires supporting systems to be of a more sophisticated level. Contemporary tools lack the ability to relate and infer events. We developed an information system, based on collaborative agents, that supports system administrators in monitoring the grid. While observing log files, the agents learn traffic patterns in their own local domain of the grid. The agents represent their knowledge in the form of deterministic finite automata (DFA), and share their models to provide global or multi-domain overviews. We discuss our collaborative learning mechanism and show the results of our experiments with data of two grid-sites. Our system generated jobtraffic overviews that gave new insights in the performance of the grid environment.
Grid information services using software agents
2002
Computational Grids allow large-scale, pervasive and consistent sharing of geographically dispersed resources. Their inherent nature incorporates issues including the discovery of resources located in different administrative domains, predicting the performance of those resources and monitoring their behaviour. The Monitoring and Discovery Service (MDS), one of the pillars provided by the Globus toolkit, can be used to offer Grid information servies to an existing agent-based resource advertisement and discovery system. This paper ...
Command and control for grid infrastructures
2008
The centralised management of distributed computing infrastructures presents a number of considerable challenges, not least of which is the effective monitoring of physical resources and middleware components to provide an accurate operational picture for use by administrative or management staff. The detection and presentation of real-time information pertaining to the performance and availability of computing resources is a difficult yet critical activity. This thesis presents an architecture intended to enhance the service monitoring experience of a Grid operations team. We have designed and implemented an extensible agent-based architecture capable of detecting and aggregating status information using low-level sensors, functionality tests and existing information systems. To date it has been successfully deployed across eighteen Grid-Ireland sites. Managing the availability of the monitored services is an associated and essential task in ensuring the availability of a grid infr...
The MonALISA (Monitoring Agents in A Large Integrated Services Architecture) system provides a distributed service architecture which is used to collect and process monitoring information. While its initial target field of application is networks and Grid systems supporting data processing and analysis for global high energy and nuclear physics collaborations, MonALISA is broadly applicable to many fields of "data intensive" science, and to the monitoring and management of major research and education networks. MonALISA is based on a scalable Dynamic Distributed Services Architecture), and is implemented in Java using JINI and WSDL technologies. The scalability of the system derives from the use of a multi threaded engine to host a variety of loosely coupled self-describing dynamic services, the ability of each service to register itself and then to be discovered and used by any other services, or clients that require such information. The framework integrates many existing monitoring tools and procedures to collect parameters describing computational nodes, applications and network performance. Specialized mobile agents are used in the MonALISA framework to perform global optimization tasks or help and improve the operation of large distributed system by performing supervising tasks for different applications or real time parameters. MonALISA is currently running around the clock monitoring several Grids and distributed applications on around 160 sites.
Multi-agent support for Internet-scale Grid management
Proceedings of the AISB’02 Symposium on AI and Grid Computing, 2002
Internet-scale computational grids are emerging from various research projects. Most notably are the US National Technology Grid and the European Data Grid projects. One specific problem in realizing wide-area distributed computing environments as proposed in these projects, is effective management of the vast amount of resources that are made available within the grid environment. This paper proposes an agent-based approach to resource management in grid environments, and describes an agent infrastructure that ...
c ○ 2007 SWPS AN AGENT-BASED APPROACH TO GRID SERVICE MONITORING
2011
Abstract. The centralised management of distributed computing infrastructures presents a number of considerable challenges, not least of which is the effective monitoring of physical resources and middleware components to provide an accurate operational picture for use by administrative or management staff. The detection and presentation of real-time information pertaining to the performance and availability of computing resources is a difficult yet critical activity. This architecture is intended to enhance the service monitoring experience of a Grid operations team. We have designed and implemented an extensible agent-based architecture capable of detecting and aggregating status information using low-level sensors, functionality tests and existing information systems. To date it has been successfully deployed across eighteen Grid-Ireland sites.
Towards an agent framework for grid computing
This paper presents an agent oriented approach for grid computing. As opposed to existing approaches, agent technology promises a more flexible approach, easier installation and management of the agent framework, and better ability to autonomously recover from failures. The semantically rich, ontological description of the grid applications, services and resources opens the possibility for better monitoring and resource management, and a better user interfaces -both for customers and service providers.
Proffering Task oriented Grid Resource Discovery based on Learning Automata
International Journal of Computer Applications, 2011
Main challenge of existing resource discovery service is the lack of support from task oriented query. This paper puts forward a design of task-oriented grid resource discovery service based on learning automata to enable users to dynamically discover the grid resources which are suitable for their task. The core of this service is learning automata based grid resource classifier, which periodically accesses the Meta computing directory service and dynamically classifier the grid resources into task-oriented categories according to the real-time state of grid computing environment. Users can invoke this service and pass her or his task type as a parameter to discover the current most suitable grid resources. Grid resource allocation manager also can interact with this service to improve its practicability and efficiency.
Multiagent Framework based Interactive Job Management for Grid
International Journal of Computer Applications, 2012
Grid , a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high end computational capabilities, with intelligent cooperative agents, enable the system to well-suited for many types of services and autonomously adapt to users computation needs as well as dynamically changing computing resource environments. Grid technologies need to be extended which includes graphical, interactive sessions known as Interactive Grids. Interactive Grids permit end-users to access and control a remote resource. The motive of this paper is to intricate the effectiveness of Grid computing by Interactive Agent based Job management. An agent based interactive job management system is developed to incorporate the concept of agent in the grid.
Automated agents for management and control of the ALICE Computing Grid
Journal of Physics: Conference Series, 2010
A complex software environ ment such as the ALICE Co mputing Grid infrastruc ture requires permanent control and management for the large set of services involved. Automating control procedures reduces the human interaction with the various components of the system and yields better availability of the overall system. In this paper we will present how we used the MonALISA framework to gather, store and display the relevant metrics in the entire system fro m central and remote site services. We will also show the automatic local and global procedures that are triggered by the monitored values. Decision-taking agents are used to restart remote services, alert the operators in case of problems that cannot be automatically solved, submit production jobs, replicate and analyze raw data, resource load -balance and other control mechanisms that optimize the overall work flow and simplify day-to-day operations. Synthetic graphical views for all operational parameters, correlations, state of services and applications as well as the full h istory of all monitoring met rics are available for the ent ire system that now encompasses 85 sites all over the world, mo re than 14000 CPU cores and 10PB of storage.