Differentiated strategies for replicating Web documents Computer Communications (original) (raw)

Differentiated strategies for replicating Web documents

Computer Communications, 2001

Replicating Web documents reduces user-perceived delays and wide-area network traffic. Numerous caching and replication protocols have been proposed to manage such replication, while keeping the document copies consistent. We claim however that Web documents have very diverse characteristics, and that no single caching or replication policy can efficiently manage all documents. Instead, we propose that each document is replicated with its most-suited policy. We collected traces on our university's Web server and conducted simulations to determine the performance such configurations would produce, as opposed to configurations using the same policy for all documents. The results show a significant performance improvement with respect to end-user delays, wide-area network traffic and document consistency.

Adaptive replicated Web documents

month, 2000

Caching and replication techniques can improve latency of the Web, while reducing network traffic and balancing load among servers. However, no single strategy is optimal for replicating all documents. Depending on its access pattern, each document should use the policy that suits it best. This paper presents an architecture for adaptive replicated documents. Each adaptive document monitors its access pattern, and uses it to determine which strategy it should follow. When a change is detected in its access pattern, it re-evaluates its strategy to adapt to the new conditions. Adaptation comes at an acceptable cost considering to the benefits of per-document replication strategies.

Application-level document caching in the internet

1995

Abstract With the increasing demand for document transfer services such as the World Wide Web comes a need for better resource management to reduce the latency of documents in these systems. To address this need, we analyze the potential for document caching at the application level in document transfer services. We have collected traces of actual executions of Mosaic, reflecting over half a million user requests for WWW documents.

Globule: a platform for self-replicating Web documents

Protocols for Multimedia Systems, 2001

Replicating Web documents at a worldwide scale can help reduce user-perceived latency and wide-area network traffic. This paper presents the design of Globule, a platform that automates all aspects of such replication: serverto-server peering negotiation, creation and destruction of replicas, selection of the most appropriate replication strategies on a per-document basis, consistency management and transparent redirection of clients to replicas. Globule is initially directed to support standard Web documents. However, it can also be applied to stream-oriented documents. To facilitate the transition from a non-replicated server to a replicated one, we designed Globule as a module for the Apache Web server. Therefore, converting Web documents should require no more than compiling a new module into Apache and editing a configuration file.

A New Environment for Web Caching and Replication Study

2000

Web Caching and replication have received considerable attention in the past years due to effectiveness in reducing client response time and network traffic. In this paper we describe a tool to improving the Web caching and replication study. An important difference with other tools is this tool try to take advantage of two tools very useful in caching and computer networks study, Proxycizer and NS[NS] respectively. Proxycizer2ns is an approach to combine automatically these two tools, and in that way to improving our large scale web caching and replication study, taking the best of both parts, in one hand the facility of simulating an infrastructure which allow large scale caching and replication analisys, and in the other hand the verification of impact in the network that this simulated infrastructure have produced, rarely presented by other tool in such a didactic and complete way.

Automated delivery of Web documents through a caching infrastructure

Proceedings of the 20th IEEE Instrumentation Technology Conference (Cat No 03CH37412) EURMIC-03, 2003

The dramatic growth of the Internet and of the Web traffic calls for scalable solutions to accessing Web documents. To this purpose, various caching schemes have been proposed and caching has been widely deployed. Since most Web documents change very rarely, the issue of consistency, i.e. how to assure access to the most recent version of a Web document, has received not much attention. However, as the number of frequently changing documents and the number of users accessing these documents increases, it becomes mandatory to propose scalable techniques that assure consistency. We look at one class of techniques that achieve consistency by performing automated delivery of Web documents. Among all schemes imaginable, automated delivery guarantees the lowest access latency for the clients. We compare pull-and pushbased schemes for automated delivery and evaluate their performance analytically and via trace-driven simulation. We show that for both, pull-and push-based schemes, the use of a caching infrastructure is important to achieve scalability. For most documents in the Web, a pull distribution with a caching infrastructure can efficiently implement an automated delivery. However, when servers update their documents randomly and servers cannot ensure a minimum time-to-live interval during which documents remain unchanged, pull generates many requests to the origin server. For this case, we consider push-based schemes that use a caching infrastructure and we present a simple algorithm to determine which documents should be pushed given a limited available bandwidth.

Clustering Web Content for Efficient Replication

2002

Recently there has been an increasing deployment of content distribution networks (CDNs) that offer hosting services to Web content providers. In this paper, we first compare the un-cooperative pulling of Web contents used by commercial CDNs with the cooperative pushing. Our results show that the latter can achieve comparable users' perceived performance with only 4 -5% of replication and update traffic compared to the former scheme. Therefore we explore how to efficiently push content to CDN nodes. Using trace-driven simulation, we show that replicating content in units of URLs can yield 60 -70% reduction in clients' latency, compared to replicating in units of Web sites. However, it is very expensive to perform such a fine-grained replication.

PROXY BASED WEB CACHE CONSISTENCY FOR ENHANCING NETWORK EXECUTION

With the exponential growth of the Internet, the World Wide Web (WWW) has become the most widely used tool for access and dissemination of commercial educational, and news information on the internet. The success of the WWW is largely due to its cap ability of providing quick and easy accesses to a large variety of information form sites all over the world. The retrieval latency of the WWW is however sensitive to overload conditions of the network which agonizes web users. Caching data copies at client Sites (including web proxies) is regarded as a good technique for reducing both the network traffic and the server load, thus improving retrieval latency of web documents. Use of proxy caches in the World Wide Web is beneficial to the end user, network administrator, and server administrator since it reduces the amount of redundant traffic that circulates through the network. In addition, end users get quicker access to documents that are cached. However, the use of proxies introduces additional issues that need to be addressed. The existing consistency protocols used in the web are proving to be insufficient to meet the growing needs of the Internet population. For example, too many messages sent over the network are due to caches guessing when their copy is inconsistent. Many decisions must be made when exploring World Wide Web coherency, such as whether to provide consistency at the proxy level (client pull) or to allow the sever to handle it (server push). What tradeoffs are inherent for each of these decisions? The relevant usage of any method strongly depends upon the conditions of the network (e.g., document types that are frequently requested or the state of the network load) and the resources available (e.g., disk space and type of cache available). One goal of this dissertation is to study the characteristics of document retrieval and modification to determine their effect on proposed consistency mechanisms. A set of effective consistency policies is identified from the investigation. The main objective of this dissertation is to use these findings to design and implement a consistency algorithm that provide improved performance over the current mechanisms. Optimistically, we want an algorithm that provides strong consistency. However, we do not want to further degrade the network or cause undue burden on the server to gain this advantage. We propose a system based on the notion of soft-state and based on server push. In this system, the proxy would have some influence on what state information. Our results show that an average of 20% control message savings by limiting how much polling occurs with the current Web cache consistency mechanism, Adaptive Client Polling.

Document replication and distribution in extensible geographically distributed web servers

Journal of Parallel and Distributed Computing, 2003

A geographically distributed web server (GDWS) system, consisting of multiple server nodes interconnected by a MAN or a WAN, can achieve better efficiency in handling the ever-increasing web requests than centralized web servers because of the proximity of server nodes to clients. It is also more scalable since the throughput will not be limited by available bandwidth connecting to a central server. The key research issue in the design of GDWS is how to replicate and distribute the documents of a website among the server nodes. This paper proposes a density-based replication scheme and applies it to our proposed Extensible GDWS (EGDWS) architecture. Its document distribution scheme supports partial replication targeting only at hot objects among the documents. To distribute the replicas generated via the density-based replication scheme, we propose four different document distribution algorithms: Greedy-cost, Maximal-density, Greedypenalty, and Proximity-aware. A proximity-based routing mechanism is designed to incorporate these algorithms for achieving better web server performance in a WAN environment. Simulation results show that our document replication and distribution algorithms achieve better response times and load balancing than existing dynamic schemes. To further reduce user's response time, we propose two document grouping algorithms that can cut down on the request redirection overheads.

Efficient and adaptive Web replication using content clustering

IEEE Journal on Selected Areas in Communications, 2003

Recently there has been an increasing deployment of content distribution networks (CDNs) that offer hosting services to Web content providers. In this paper, we first compare the uncooperative pulling of Web contents used by commercial CDNs with the cooperative pushing. Our results show that the latter can achieve comparable users' perceived performance with only 4 -5% of replication and update traffic compared to the former scheme. Therefore we explore how to efficiently push content to CDN nodes. Using trace-driven simulation, we show that replicating content in units of URLs can yield 60 -70% reduction in clients' latency, compared to replicating in units of Web sites. However, it is very expensive to perform such a fine-grained replication.

Differentiated strategies for replicating Web documents Computer Communications (original) (raw)

Related papers