Designing a New Class of Distributed Systems (original) (raw)

A distributed workflow system with autonomous components

2005

This paper describes the architecture of a distributed workflow management system in a dynamic environment. The system features autonomous agent components that can adapt to both structural changes in business processes and changes in system parameters, such as the number of available resources. This adaptation could be a permanent adjustment that should be reflected in all the incoming work cases, or be associated with a particular instance of a work case.

BUILDING RESILIENT DISTRIBUTED SYSTEMS: FAULT-TOLERANT DESIGN PATTERNS FOR STATEFUL WORKFLOWS

IAEME PUBLICATION, 2024

Resilient design patterns play a crucial role in developing fault-oblivious stateful workflow systems in distributed computing. This article explores advanced techniques and strategies for building resilient distributed systems that can gracefully handle failures and maintain operational continuity. It delves into fault tolerance strategies, such as data redundancy, checkpointing, and transactional consistency, to ensure system reliability and data integrity. The article discusses the benefits of microservices architecture in achieving fault isolation and minimizing the impact of failures. It highlights the importance of self-healing mechanisms, including automated fault detection and correction, to ensure continuous operation. Scalability and load balancing techniques, such as dynamic resource adjustment and workload distribution, are explored to accommodate fluctuating demands and optimize system performance. The article also examines error handling and recovery mechanisms, including automated rollbacks and distributed consensus protocols, to maintain data consistency and coordinate recovery actions across nodes. Additionally, it emphasizes the significance of proactive system health monitoring and rapid fault identification and resolution in minimizing downtime and ensuring a smooth user experience. The article concludes by discussing emerging trends, open research problems, and providing recommendations for building resilient distributed systems that can withstand the challenges of modern computing environments.

A Fully Distributed Architecture for Large Scale Workflow Enactment

International Journal of Cooperative Information Systems, 2003

Standard client-server workflow management systems are usually designed as client-server systems. The central server is responsible for the coordination of the workflow execution and, in some cases, may manage the activities database. This centralized control architecture may represent a single point of failure, which compromises the availability of the system. We propose a fully distributed and configurable architecture for workflow management systems. It is based on the idea that the activities of a case (an instance of the process) migrate from host to host, executing the workflow tasks, following a process plan. This core architecture is improved with the addition of other distributed components so that other requirements for Workflow Management Systems, besides scalability, are also addressed. The components of the architecture were tested in different distributed and centralized configurations. The ability to configure the location of components and the use of dynamic allocati...