Towards Scalable Web Documents (original) (raw)

Adaptive replicated Web documents

month, 2000

Caching and replication techniques can improve latency of the Web, while reducing network traffic and balancing load among servers. However, no single strategy is optimal for replicating all documents. Depending on its access pattern, each document should use the policy that suits it best. This paper presents an architecture for adaptive replicated documents. Each adaptive document monitors its access pattern, and uses it to determine which strategy it should follow. When a change is detected in its access pattern, it re-evaluates its strategy to adapt to the new conditions. Adaptation comes at an acceptable cost considering to the benefits of per-document replication strategies.

Differentiated strategies for replicating Web documents

Computer Communications, 2001

Replicating Web documents reduces user-perceived delays and wide-area network traffic. Numerous caching and replication protocols have been proposed to manage such replication, while keeping the document copies consistent. We claim however that Web documents have very diverse characteristics, and that no single caching or replication policy can efficiently manage all documents. Instead, we propose that each document is replicated with its most-suited policy. We collected traces on our university's Web server and conducted simulations to determine the performance such configurations would produce, as opposed to configurations using the same policy for all documents. The results show a significant performance improvement with respect to end-user delays, wide-area network traffic and document consistency.

A framework for consistent, replicated Web objects

Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183), 1998

Despite the extensive use of caching techniques, the Web is overloaded. While the caching techniques currently used help some, it would be better to use different caching and replication strategies for different Web pages, depending on their characteristics. We propose a framework in which such strategies can be devised independently per Web document. A Web document is constructed as a worldwide, scalable distributed Web object. Depending on the coherence requirements for that document, the most appropriate caching or replication strategy can subsequently be implemented and encapsulated by the Web object. Coherence requirements are formulated from two different perspectives: that of the Web object, and that of clients using the Web object. We have developed a prototype in Java to demonstrate the feasibility of implementing different strategies for different Web objects.

Globule: a platform for self-replicating Web documents

Protocols for Multimedia Systems, 2001

Replicating Web documents at a worldwide scale can help reduce user-perceived latency and wide-area network traffic. This paper presents the design of Globule, a platform that automates all aspects of such replication: serverto-server peering negotiation, creation and destruction of replicas, selection of the most appropriate replication strategies on a per-document basis, consistency management and transparent redirection of clients to replicas. Globule is initially directed to support standard Web documents. However, it can also be applied to stream-oriented documents. To facilitate the transition from a non-replicated server to a replicated one, we designed Globule as a module for the Apache Web server. Therefore, converting Web documents should require no more than compiling a new module into Apache and editing a configuration file.

Differentiated strategies for replicating Web documents Computer Communications

Replicating Web documents reduces user-perceived delays and wide-area network traffic. Numerous caching and replication protocols have been proposed to manage such replication, while keeping the document copies consistent. We claim however that Web documents have very diverse characteristics, and that no single caching or replication policy can efficiently manage all documents. Instead, we propose that each document is replicated with its most-suited policy. We collected traces on our university's Web server and conducted simulations to determine the performance such configurations would produce, as opposed to configurations using the same policy for all documents. The results show a significant performance improvement with respect to end-user delays, wide-area network traffic and document consistency.

A distributed-object infrastructure for corporate Websites

Proceedings DOA'00. International Symposium on Distributed Objects and Applications, 2000

A corporate website is the virtual representation of a corporation or organization on the Internet. Corporate websites face numerous problems due to their large size and complexity, and the nonscalability of the underlying Web infrastructure. Current solutions to these problems generally rely on traditional scaling techniques such as caching and replication. These are usually too restrictive, however, taking a one-size-fits-all approach and applying the same solution to every document. We propose Globe as a foundation upon which to build scalable corporate websites, and introduce GlobeDoc, a website model based on Globe distributed shared objects. This paper describes GlobeDoc, highlighting the design and technical details of the infrastructure.

Document replication and distribution in extensible geographically distributed web servers

Journal of Parallel and Distributed Computing, 2003

A geographically distributed web server (GDWS) system, consisting of multiple server nodes interconnected by a MAN or a WAN, can achieve better efficiency in handling the ever-increasing web requests than centralized web servers because of the proximity of server nodes to clients. It is also more scalable since the throughput will not be limited by available bandwidth connecting to a central server. The key research issue in the design of GDWS is how to replicate and distribute the documents of a website among the server nodes. This paper proposes a density-based replication scheme and applies it to our proposed Extensible GDWS (EGDWS) architecture. Its document distribution scheme supports partial replication targeting only at hot objects among the documents. To distribute the replicas generated via the density-based replication scheme, we propose four different document distribution algorithms: Greedy-cost, Maximal-density, Greedypenalty, and Proximity-aware. A proximity-based routing mechanism is designed to incorporate these algorithms for achieving better web server performance in a WAN environment. Simulation results show that our document replication and distribution algorithms achieve better response times and load balancing than existing dynamic schemes. To further reduce user's response time, we propose two document grouping algorithms that can cut down on the request redirection overheads.

Distributed replication and caching: a mechanism for architecting responsive Web services

The focus of this paper is on the mechanism that supports the provision of the adequate service level by relying on Web system architecture with multi-node clustering. Replication of Web documents can improve both performance and reliability of the Web service. Server selection algorithms allow Web clients to select one of the replicated servers which is "close" to them and thereby minimize the response time of the Web service.

Automated delivery of Web documents through a caching infrastructure

Proceedings of the 20th IEEE Instrumentation Technology Conference (Cat No 03CH37412) EURMIC-03, 2003

The dramatic growth of the Internet and of the Web traffic calls for scalable solutions to accessing Web documents. To this purpose, various caching schemes have been proposed and caching has been widely deployed. Since most Web documents change very rarely, the issue of consistency, i.e. how to assure access to the most recent version of a Web document, has received not much attention. However, as the number of frequently changing documents and the number of users accessing these documents increases, it becomes mandatory to propose scalable techniques that assure consistency. We look at one class of techniques that achieve consistency by performing automated delivery of Web documents. Among all schemes imaginable, automated delivery guarantees the lowest access latency for the clients. We compare pull-and pushbased schemes for automated delivery and evaluate their performance analytically and via trace-driven simulation. We show that for both, pull-and push-based schemes, the use of a caching infrastructure is important to achieve scalability. For most documents in the Web, a pull distribution with a caching infrastructure can efficiently implement an automated delivery. However, when servers update their documents randomly and servers cannot ensure a minimum time-to-live interval during which documents remain unchanged, pull generates many requests to the origin server. For this case, we consider push-based schemes that use a caching infrastructure and we present a simple algorithm to determine which documents should be pushed given a limited available bandwidth.

A distributed Web document database and its supporting environment

Proceedings IEEE International Symposium on Computers and Communications (Cat. No.PR00250)

In this paper, we propose a new Web docun~entation database as a supporting environment of the Multimedia Micro-University project I. The design of this database facilitates a Web documentation development paradigm that we have proposed earlier. From a script description, to its implementation as well as testing records, the database and its interface allow the user to design Web documents as virtual courses to be used in a Web-savvy virtual library. The database supports object reuse and sharing, as well as referential integrity and concurrence. In order to allow realtime coiirse demonstration, we also propose a simple course distribution mechanism, which allows the pre-broadcast of course materials. The system is implemented as a threetier architecture which runs under MS Windows and other platforms.