Stéphane Gançarski - Academia.edu (original) (raw)

Papers by Stéphane Gançarski

Research paper thumbnail of TransPeer

Proceedings of the 2010 ACM Symposium on Applied Computing - SAC '10, 2010

Research paper thumbnail of Routage décentralisé de transactions avec gestion des pannes dans un réseau à large échelle

Ingénierie des Systèmes d' …, Jan 1, 2010

Les systèmes à large échelle tels que les grilles fournissent l'accès à des ressources massives d... more Les systèmes à large échelle tels que les grilles fournissent l'accès à des ressources massives de stockage et de traitement. Bien qu'elles furent principalement destinées au calcul scientifique, les grilles sont maintenant considérées comme une solution viable pour héberger les données des grandes applications. Pour cela, les données sont répliquées sur la grille dans l'optique d'assurer leur disponibilité et de permettre une rapide exécution des transactions grâce au parallélisme. Cependant, garantir à la fois la cohérence et la rapidité d'accès aux données sur de telles architectures pose des problèmes à plusieurs niveaux. En particulier, le contrôle centralisé est prohibé à cause de sa faible disponibilité et de la congestion que cela engendre à grande échelle. En outre, le comportement dynamique des noeuds, qui peuvent rejoindre ou quitter le système à tout instant et de manière fréquent peut compromettre la cohérence mutuelle des copies. Dans cet article, nous proposons une nouvelle solution pour la gestion décentralisée du routage des transactions dans un réseau large échelle. Nous transposons une solution de routage dédiée à un cluster de machines et l'améliorons afin que le routage * "Ce travail a été financé partiellement par le projet Respire (ANR-05-MMSA-0011)" devienne entièrement décentralisé et que les méta-données soient hébergées dans un catalogue distribué sur un réseau à large échelle. Ensuite, nous proposons un mécanisme de gestion des pannes, très adapté à notre contexte, prenant en compte la dynamicité et/ou les pannes des noeuds. Finalement, l'implémentation de notre modèle de routage et son évaluation expérimentale montrent la faisabilité de l'approche ; l'efficacité de la gestion des pannes est mesurée à travers une simulation. Les résultats révèlent une scalabilité linaire, et un temps de routage des transactions suffisamment rapide pour rendre notre solution applicable et adaptée aux applications à forte charge transactionnelle comme les systèmes de réservation en ligne. Les résultats montrent également que la gestion de la dynamicité augmente les performances du système en termes de débit tout en minimisant les coûts de communication.

Research paper thumbnail of DTR: Distributed Transaction Routing in a Large Scale Network, High Performance Computing for Computational Science-VECPAR 2008: 8th …

Grid systems provide access to huge storage and computing resources at large scale. While they ha... more Grid systems provide access to huge storage and computing resources at large scale. While they have been mainly dedicated to scientific computing for years, grids are now considered as a viable solution for hosting data-intensive applications. To this end, databases are replicated over the grid in order to achieve high availability and fast transaction processing thanks to parallelism. However, achieving both fast and consistent data access on such architectures is challenging at many points. In particular, centralized control is prohibited because of its vulnerability and lack of efficiency at large scale. In this article, we propose a novel solution for the distributed control of transaction routing in a large scale network. We leverage a cluster-oriented routing solution with a fully distributed approach that uses a large scale distributed directory to handle routing metadata. Moreover, we demonstrate the feasibility of our implementation through experimentation: results expose linear scale-up, and transaction routing time is fast enough to make our solution eligible for update intensive applications such as world wide online booking.

Research paper thumbnail of Failure-tolerant transaction routing at large scale

Advances in Databases …, Jan 1, 2010

Emerging Web2.0 applications such as virtual worlds or social networking websites strongly differ... more Emerging Web2.0 applications such as virtual worlds or social networking websites strongly differ from usual OLTP applications. First, the transactions are encapsulated in an API such that it is possible to know which data a transaction will access, before processing it. Second, the simultaneous transactions are very often commutative since they access distinct data. Anticipating that the workload of such applications will quickly reach thousands of transactions per seconds, we envision a novel solution that would allow these applications to scale-up without the need to buy expensive resources at a data center. To this end, databases are replicated over a P2P infrastructure for achieving high availability and fast transaction processing thanks to parallelism. However, achieving both fast and consistent data access on such architectures is challenging at many points. In particular, centralized control is prohibited because of its vulnerability and lack of efficiency at large scale. Moreover dynamic behavior of nodes, which can join and leave the system at anytime and frequently, can compromise mutual consistency. In this article, we propose a failure-tolerant solution for the distributed control of transaction routing in a large scale network. We leverage a fully distributed approach relying on a DHT to handle routing metadata, with a suitable failure management mechanism that handles nodes dynamicity and nodes failures. Moreover, we demonstrate the feasibility of our transaction routing implementation through experimentation and the effectiveness of our failure management approach through simulation.

Research paper thumbnail of Dtr: Distributed transaction routing in a large scale network

High Performance Computing for …, Jan 1, 2008

Grid systems provide access to huge storage and computing resources at large scale. While they ha... more Grid systems provide access to huge storage and computing resources at large scale. While they have been mainly dedicated to scientific computing for years, grids are now considered as a viable solution for hosting data-intensive applications. To this end, databases are replicated over the grid in order to achieve high availability and fast transaction processing thanks to parallelism. However, achieving both fast and consistent data access on such architectures is challenging at many points. In particular, centralized control is prohibited because of its vulnerability and lack of efficiency at large scale. In this article, we propose a novel solution for the distributed control of transaction routing in a large scale network. We leverage a cluster-oriented routing solution with a fully distributed approach that uses a large scale distributed directory to handle routing metadata. Moreover, we demonstrate the feasibility of our implementation through experimentation: results expose linear scale-up, and transaction routing time is fast enough to make our solution eligible for update intensive applications such as world wide online booking.

Research paper thumbnail of TransPeer: adaptive distributed transaction monitoring for Web2. 0 applications

… of the 2010 ACM Symposium on Applied …, Jan 1, 2010

In emerging Web2.0 applications such as virtual worlds or social networking websites, the number ... more In emerging Web2.0 applications such as virtual worlds or social networking websites, the number of users is very important (tens of thousands), hence the amount of data to manage is huge and dependability is a crucial issue. The large scale prevents from using centralized approaches or locking/two-phase-commit approach. Moreover, Web2.0 applications are mostly interactive, which means that the response time must always be less than few seconds. To face these problems, we present a novel solution, TransPeer, that allows applications to scale-up without the need to buy expensive resources at a data center. To this end, databases are replicated over a P2P system in order to achieve high availability and fast transaction processing thanks to parallelism. A distributed shared dictionary, implemented on top of a DHT, contains metadata used for routing transactions efficiently. Both metadata and data are accessed in an optimistic way: there is no locking on metadata and transactions are executed on nodes in a tentative way. We demonstrate the feasibility of our approaches through experimentation.

Research paper thumbnail of Mobilité et Bases de Données-Etats de l'Art et Perspectives-1ère Partie

TSI: Revue Technique et Science Informatiques, 2003

Research paper thumbnail of On Object and Database Versioning in Distributed Environment

The paper presents how multiversion databases may be efficiently distributed in the case of most ... more The paper presents how multiversion databases may be efficiently distributed in the case of most common multiversion applications. First, the versioning model is described which allows the management of both object versions and configurations. This model, together with the data structures it uses, allows to manage efficiently as many versions as needed, and appears to be well suited for being

Research paper thumbnail of Optimistic path-based concurrency control over XML documents

We present a new approach for concurrency control over XML documents. Unlike most of other approa... more We present a new approach for concurrency control over XML documents. Unlike most of other approaches, we use an optimistic scheme, since we believe that it is better suited for Web applications. The originality of our solution resides in the fact that we use path expressions associated with operations to detect conflicts between transactions. This makes our approach scalable since conflict detection except in few cases does not depend on the database size nor on the amount of modified fragments. In this paper, we describe and motivate our concurrency mechanism architecture, we describe the conflict detection algorithm which is the core of our proposal and exhibit first experimental results.

Research paper thumbnail of Fine-grained Refresh Strategies for Managing Replication in Database Clusters

Relaxing replica freshness has been exploited in database clusters to optimize load balanc- ing. ... more Relaxing replica freshness has been exploited in database clusters to optimize load balanc- ing. In this paper, we propose to support both routing-dependant and routing-independent refresh strategies in a database cluster with multi-master lazy replication. First, we pro- pose a model for capturing refresh strategies. Second, we describe the support of this model in a middleware architecture for freshness- aware

Research paper thumbnail of Coherence-Oriented Crawling and Navigation Using Patterns for Web Archives

Lecture Notes in Computer Science, 2011

We point out, in this paper, the issue of improving the coherence of web archives under limited r... more We point out, in this paper, the issue of improving the coherence of web archives under limited resources (e.g. bandwidth, storage space, etc.). Coherence measures how much a collection of archived pages versions reflects the real state (or the snapshot) of a set of related web pages at different points in time. An ideal approach to preserve the coherence of archives is to prevent pages content from changing during the crawl of a complete collection. However, this is practically infeasible because web sites are autonomous and dynamic. We propose two solutions: a priori and a posteriori. As a priori solution, our idea is to crawl sites during the off-peak hours (i.e. the periods of time where very little changes is expected on the pages) based on patterns. A pattern models the behavior of the importance of pages changes during a period of time. As an a posteriori solution, based on the same patterns, we introduce a novel navigation approach that enables users to browse the most coherent page versions at a given query time.

Research paper thumbnail of Database Versions to Represent Bitemporal Databases

Lecture Notes in Computer Science, 1999

. We present a new approach to implement an object bitemporaldatabase where both valid-time and t... more . We present a new approach to implement an object bitemporaldatabase where both valid-time and transaction-time are represented.It is based on the DataBase Version model, which allows an efficientmanagement of object versions. This facilitates the manipulation of pastevents and allows a straightforward representation of branching evolutionin valid-time.Keywords : bitemporal database, valid-time, transaction time, versions.IntroductionIn many applications time must be considered ...

Research paper thumbnail of Parallel Processing with Autonomous Databases in a Cluster System

Lecture Notes in Computer Science, 2002

We consider the use of a cluster system for Application Service Provider (ASP). In the ASP contex... more We consider the use of a cluster system for Application Service Provider (ASP). In the ASP context, hosted applications and databases can be update-intensive and must remain autonomous. In this paper, we propose a new solution for parallel processing with autonomous databases, using a replicated database organization. The main idea is to allow the system administrator to control the tradeoff between database consistency and application performance. Application requirements are captured through execution rules stored in a shared directory. They are used (at run time) to allocate cluster nodes to user requests in a way that optimizes load balancing while satisfying application consistency requirements. We also propose a new preventive replication method and a transaction load balancing architecture which can trade-off consistency for performance using execution rules. Finally, we discuss the ongoing implementation at LIP6 using a Linux cluster running Oracle 8i.

Research paper thumbnail of Integrity Constraints in Multiversion Databases

British National Conference on Databases, 1996

Research paper thumbnail of Improving the Quality of Web Archives through the Importance of Changes

Lecture Notes in Computer Science, 2011

Due to the growing importance of the Web, several archiving institutes (national libraries, Inter... more Due to the growing importance of the Web, several archiving institutes (national libraries, Internet Archive, etc.) are harvesting sites to preserve (a part of) the Web for future generations. A major issue encountered by archivists is to preserve the quality of web archives. One way of assessing the quality of an archive is to quantify its completeness and the coherence of its page versions. Due to the large number of pages to be captured and the limitations of resources (storage space, bandwidth, etc.), it is impossible to have a complete archive (containing all the versions of all the pages). Also it is impossible to assure the coherence of all captured versions because pages are changing very frequently during the crawl of a site. Nonetheless, it is possible to maximize the quality of archives by adjusting web crawlers strategy. Our idea for that is (i) to improve the completeness of the archive by downloading the most important versions and (ii) to keep the most important versions as coherent as possible. Moreover, we introduce a pattern model which describes the behavior of the importance of pages changes over time. Based on patterns, we propose a crawl strategy to improve both the completeness and the coherence of web archives. Experiments based on real patterns show the usefulness and the effectiveness of our approach.

Research paper thumbnail of Vi-DIFF: Understanding Web Pages Changes

Lecture Notes in Computer Science, 2010

Nowadays, many applications are interested in detecting and discovering changes on the web to hel... more Nowadays, many applications are interested in detecting and discovering changes on the web to help users to understand page updates and more generally, the web dynamics. Web archiving is one of these fields where detecting changes on web pages is important. Archiving institutes are collecting and preserving different web site versions for future generation. A major problem encountered by archiving systems is to understand what happened between two versions of web pages. In this paper, we address this requirement by proposing a new change detection approach that computes the semantic differences between two versions of HTML web pages. Our approach, called Vi-DIFF, detects changes on the visual representation of web pages. It detects two types of changes: content and structural changes. Content changes include modifications on text, hyperlinks and images. In contrast, structural changes alter the visual appearance of the page and the structure of its blocks. Our Vi-DIFF solution can serve for various applications such as crawl optimization, archive maintenance, web changes browsing, etc. Experiments on Vi-DIFF were conducted and the results are promising.

Research paper thumbnail of Using visual pages analysis for optimizing web archiving

Proceedings of the 1st International Workshop on Data Semantics - DataSem '10, 2010

ABSTRACT Due to the growing importance of the World Wide Web, archiving it has become crucial for... more ABSTRACT Due to the growing importance of the World Wide Web, archiving it has become crucial for preserving useful source of information. To maintain a web archive up-to-date, crawlers harvest the web by iteratively downloading new versions of documents. However, it is ...

Research paper thumbnail of Managing entity versions within their contexts: A formal approach

Lecture Notes in Computer Science, 1994

From years people work on integration of version management into design ap-plications, using file... more From years people work on integration of version management into design ap-plications, using file managers first, and then other kinds of repository (DBMS, configurations management systems). However, multiversion systems may not be easily used if they are not able to ...

Research paper thumbnail of Integrity Constraint Checking in Distributed Nested Transactions over a Database Cluster Authors and Affiliations

This paper presents a solution to check referential integrity constraints and conjunctive global ... more This paper presents a solution to check referential integrity constraints and conjunctive global constraints in a relational multi database system. It also presents the experimental results obtained by implementing this solution over a PC cluster with Oracle9i DBMS.

Research paper thumbnail of Posters of the 2003 CoopIS (Cooperative Information Systems) International Conference-Trading Freshness for Performance in a Cluster of Replicated Databases

Research paper thumbnail of TransPeer

Proceedings of the 2010 ACM Symposium on Applied Computing - SAC '10, 2010

Research paper thumbnail of Routage décentralisé de transactions avec gestion des pannes dans un réseau à large échelle

Ingénierie des Systèmes d' …, Jan 1, 2010

Les systèmes à large échelle tels que les grilles fournissent l'accès à des ressources massives d... more Les systèmes à large échelle tels que les grilles fournissent l'accès à des ressources massives de stockage et de traitement. Bien qu'elles furent principalement destinées au calcul scientifique, les grilles sont maintenant considérées comme une solution viable pour héberger les données des grandes applications. Pour cela, les données sont répliquées sur la grille dans l'optique d'assurer leur disponibilité et de permettre une rapide exécution des transactions grâce au parallélisme. Cependant, garantir à la fois la cohérence et la rapidité d'accès aux données sur de telles architectures pose des problèmes à plusieurs niveaux. En particulier, le contrôle centralisé est prohibé à cause de sa faible disponibilité et de la congestion que cela engendre à grande échelle. En outre, le comportement dynamique des noeuds, qui peuvent rejoindre ou quitter le système à tout instant et de manière fréquent peut compromettre la cohérence mutuelle des copies. Dans cet article, nous proposons une nouvelle solution pour la gestion décentralisée du routage des transactions dans un réseau large échelle. Nous transposons une solution de routage dédiée à un cluster de machines et l'améliorons afin que le routage * "Ce travail a été financé partiellement par le projet Respire (ANR-05-MMSA-0011)" devienne entièrement décentralisé et que les méta-données soient hébergées dans un catalogue distribué sur un réseau à large échelle. Ensuite, nous proposons un mécanisme de gestion des pannes, très adapté à notre contexte, prenant en compte la dynamicité et/ou les pannes des noeuds. Finalement, l'implémentation de notre modèle de routage et son évaluation expérimentale montrent la faisabilité de l'approche ; l'efficacité de la gestion des pannes est mesurée à travers une simulation. Les résultats révèlent une scalabilité linaire, et un temps de routage des transactions suffisamment rapide pour rendre notre solution applicable et adaptée aux applications à forte charge transactionnelle comme les systèmes de réservation en ligne. Les résultats montrent également que la gestion de la dynamicité augmente les performances du système en termes de débit tout en minimisant les coûts de communication.

Research paper thumbnail of DTR: Distributed Transaction Routing in a Large Scale Network, High Performance Computing for Computational Science-VECPAR 2008: 8th …

Grid systems provide access to huge storage and computing resources at large scale. While they ha... more Grid systems provide access to huge storage and computing resources at large scale. While they have been mainly dedicated to scientific computing for years, grids are now considered as a viable solution for hosting data-intensive applications. To this end, databases are replicated over the grid in order to achieve high availability and fast transaction processing thanks to parallelism. However, achieving both fast and consistent data access on such architectures is challenging at many points. In particular, centralized control is prohibited because of its vulnerability and lack of efficiency at large scale. In this article, we propose a novel solution for the distributed control of transaction routing in a large scale network. We leverage a cluster-oriented routing solution with a fully distributed approach that uses a large scale distributed directory to handle routing metadata. Moreover, we demonstrate the feasibility of our implementation through experimentation: results expose linear scale-up, and transaction routing time is fast enough to make our solution eligible for update intensive applications such as world wide online booking.

Research paper thumbnail of Failure-tolerant transaction routing at large scale

Advances in Databases …, Jan 1, 2010

Emerging Web2.0 applications such as virtual worlds or social networking websites strongly differ... more Emerging Web2.0 applications such as virtual worlds or social networking websites strongly differ from usual OLTP applications. First, the transactions are encapsulated in an API such that it is possible to know which data a transaction will access, before processing it. Second, the simultaneous transactions are very often commutative since they access distinct data. Anticipating that the workload of such applications will quickly reach thousands of transactions per seconds, we envision a novel solution that would allow these applications to scale-up without the need to buy expensive resources at a data center. To this end, databases are replicated over a P2P infrastructure for achieving high availability and fast transaction processing thanks to parallelism. However, achieving both fast and consistent data access on such architectures is challenging at many points. In particular, centralized control is prohibited because of its vulnerability and lack of efficiency at large scale. Moreover dynamic behavior of nodes, which can join and leave the system at anytime and frequently, can compromise mutual consistency. In this article, we propose a failure-tolerant solution for the distributed control of transaction routing in a large scale network. We leverage a fully distributed approach relying on a DHT to handle routing metadata, with a suitable failure management mechanism that handles nodes dynamicity and nodes failures. Moreover, we demonstrate the feasibility of our transaction routing implementation through experimentation and the effectiveness of our failure management approach through simulation.

Research paper thumbnail of Dtr: Distributed transaction routing in a large scale network

High Performance Computing for …, Jan 1, 2008

Grid systems provide access to huge storage and computing resources at large scale. While they ha... more Grid systems provide access to huge storage and computing resources at large scale. While they have been mainly dedicated to scientific computing for years, grids are now considered as a viable solution for hosting data-intensive applications. To this end, databases are replicated over the grid in order to achieve high availability and fast transaction processing thanks to parallelism. However, achieving both fast and consistent data access on such architectures is challenging at many points. In particular, centralized control is prohibited because of its vulnerability and lack of efficiency at large scale. In this article, we propose a novel solution for the distributed control of transaction routing in a large scale network. We leverage a cluster-oriented routing solution with a fully distributed approach that uses a large scale distributed directory to handle routing metadata. Moreover, we demonstrate the feasibility of our implementation through experimentation: results expose linear scale-up, and transaction routing time is fast enough to make our solution eligible for update intensive applications such as world wide online booking.

Research paper thumbnail of TransPeer: adaptive distributed transaction monitoring for Web2. 0 applications

… of the 2010 ACM Symposium on Applied …, Jan 1, 2010

In emerging Web2.0 applications such as virtual worlds or social networking websites, the number ... more In emerging Web2.0 applications such as virtual worlds or social networking websites, the number of users is very important (tens of thousands), hence the amount of data to manage is huge and dependability is a crucial issue. The large scale prevents from using centralized approaches or locking/two-phase-commit approach. Moreover, Web2.0 applications are mostly interactive, which means that the response time must always be less than few seconds. To face these problems, we present a novel solution, TransPeer, that allows applications to scale-up without the need to buy expensive resources at a data center. To this end, databases are replicated over a P2P system in order to achieve high availability and fast transaction processing thanks to parallelism. A distributed shared dictionary, implemented on top of a DHT, contains metadata used for routing transactions efficiently. Both metadata and data are accessed in an optimistic way: there is no locking on metadata and transactions are executed on nodes in a tentative way. We demonstrate the feasibility of our approaches through experimentation.

Research paper thumbnail of Mobilité et Bases de Données-Etats de l'Art et Perspectives-1ère Partie

TSI: Revue Technique et Science Informatiques, 2003

Research paper thumbnail of On Object and Database Versioning in Distributed Environment

The paper presents how multiversion databases may be efficiently distributed in the case of most ... more The paper presents how multiversion databases may be efficiently distributed in the case of most common multiversion applications. First, the versioning model is described which allows the management of both object versions and configurations. This model, together with the data structures it uses, allows to manage efficiently as many versions as needed, and appears to be well suited for being

Research paper thumbnail of Optimistic path-based concurrency control over XML documents

We present a new approach for concurrency control over XML documents. Unlike most of other approa... more We present a new approach for concurrency control over XML documents. Unlike most of other approaches, we use an optimistic scheme, since we believe that it is better suited for Web applications. The originality of our solution resides in the fact that we use path expressions associated with operations to detect conflicts between transactions. This makes our approach scalable since conflict detection except in few cases does not depend on the database size nor on the amount of modified fragments. In this paper, we describe and motivate our concurrency mechanism architecture, we describe the conflict detection algorithm which is the core of our proposal and exhibit first experimental results.

Research paper thumbnail of Fine-grained Refresh Strategies for Managing Replication in Database Clusters

Relaxing replica freshness has been exploited in database clusters to optimize load balanc- ing. ... more Relaxing replica freshness has been exploited in database clusters to optimize load balanc- ing. In this paper, we propose to support both routing-dependant and routing-independent refresh strategies in a database cluster with multi-master lazy replication. First, we pro- pose a model for capturing refresh strategies. Second, we describe the support of this model in a middleware architecture for freshness- aware

Research paper thumbnail of Coherence-Oriented Crawling and Navigation Using Patterns for Web Archives

Lecture Notes in Computer Science, 2011

We point out, in this paper, the issue of improving the coherence of web archives under limited r... more We point out, in this paper, the issue of improving the coherence of web archives under limited resources (e.g. bandwidth, storage space, etc.). Coherence measures how much a collection of archived pages versions reflects the real state (or the snapshot) of a set of related web pages at different points in time. An ideal approach to preserve the coherence of archives is to prevent pages content from changing during the crawl of a complete collection. However, this is practically infeasible because web sites are autonomous and dynamic. We propose two solutions: a priori and a posteriori. As a priori solution, our idea is to crawl sites during the off-peak hours (i.e. the periods of time where very little changes is expected on the pages) based on patterns. A pattern models the behavior of the importance of pages changes during a period of time. As an a posteriori solution, based on the same patterns, we introduce a novel navigation approach that enables users to browse the most coherent page versions at a given query time.

Research paper thumbnail of Database Versions to Represent Bitemporal Databases

Lecture Notes in Computer Science, 1999

. We present a new approach to implement an object bitemporaldatabase where both valid-time and t... more . We present a new approach to implement an object bitemporaldatabase where both valid-time and transaction-time are represented.It is based on the DataBase Version model, which allows an efficientmanagement of object versions. This facilitates the manipulation of pastevents and allows a straightforward representation of branching evolutionin valid-time.Keywords : bitemporal database, valid-time, transaction time, versions.IntroductionIn many applications time must be considered ...

Research paper thumbnail of Parallel Processing with Autonomous Databases in a Cluster System

Lecture Notes in Computer Science, 2002

We consider the use of a cluster system for Application Service Provider (ASP). In the ASP contex... more We consider the use of a cluster system for Application Service Provider (ASP). In the ASP context, hosted applications and databases can be update-intensive and must remain autonomous. In this paper, we propose a new solution for parallel processing with autonomous databases, using a replicated database organization. The main idea is to allow the system administrator to control the tradeoff between database consistency and application performance. Application requirements are captured through execution rules stored in a shared directory. They are used (at run time) to allocate cluster nodes to user requests in a way that optimizes load balancing while satisfying application consistency requirements. We also propose a new preventive replication method and a transaction load balancing architecture which can trade-off consistency for performance using execution rules. Finally, we discuss the ongoing implementation at LIP6 using a Linux cluster running Oracle 8i.

Research paper thumbnail of Integrity Constraints in Multiversion Databases

British National Conference on Databases, 1996

Research paper thumbnail of Improving the Quality of Web Archives through the Importance of Changes

Lecture Notes in Computer Science, 2011

Due to the growing importance of the Web, several archiving institutes (national libraries, Inter... more Due to the growing importance of the Web, several archiving institutes (national libraries, Internet Archive, etc.) are harvesting sites to preserve (a part of) the Web for future generations. A major issue encountered by archivists is to preserve the quality of web archives. One way of assessing the quality of an archive is to quantify its completeness and the coherence of its page versions. Due to the large number of pages to be captured and the limitations of resources (storage space, bandwidth, etc.), it is impossible to have a complete archive (containing all the versions of all the pages). Also it is impossible to assure the coherence of all captured versions because pages are changing very frequently during the crawl of a site. Nonetheless, it is possible to maximize the quality of archives by adjusting web crawlers strategy. Our idea for that is (i) to improve the completeness of the archive by downloading the most important versions and (ii) to keep the most important versions as coherent as possible. Moreover, we introduce a pattern model which describes the behavior of the importance of pages changes over time. Based on patterns, we propose a crawl strategy to improve both the completeness and the coherence of web archives. Experiments based on real patterns show the usefulness and the effectiveness of our approach.

Research paper thumbnail of Vi-DIFF: Understanding Web Pages Changes

Lecture Notes in Computer Science, 2010

Nowadays, many applications are interested in detecting and discovering changes on the web to hel... more Nowadays, many applications are interested in detecting and discovering changes on the web to help users to understand page updates and more generally, the web dynamics. Web archiving is one of these fields where detecting changes on web pages is important. Archiving institutes are collecting and preserving different web site versions for future generation. A major problem encountered by archiving systems is to understand what happened between two versions of web pages. In this paper, we address this requirement by proposing a new change detection approach that computes the semantic differences between two versions of HTML web pages. Our approach, called Vi-DIFF, detects changes on the visual representation of web pages. It detects two types of changes: content and structural changes. Content changes include modifications on text, hyperlinks and images. In contrast, structural changes alter the visual appearance of the page and the structure of its blocks. Our Vi-DIFF solution can serve for various applications such as crawl optimization, archive maintenance, web changes browsing, etc. Experiments on Vi-DIFF were conducted and the results are promising.

Research paper thumbnail of Using visual pages analysis for optimizing web archiving

Proceedings of the 1st International Workshop on Data Semantics - DataSem '10, 2010

ABSTRACT Due to the growing importance of the World Wide Web, archiving it has become crucial for... more ABSTRACT Due to the growing importance of the World Wide Web, archiving it has become crucial for preserving useful source of information. To maintain a web archive up-to-date, crawlers harvest the web by iteratively downloading new versions of documents. However, it is ...

Research paper thumbnail of Managing entity versions within their contexts: A formal approach

Lecture Notes in Computer Science, 1994

From years people work on integration of version management into design ap-plications, using file... more From years people work on integration of version management into design ap-plications, using file managers first, and then other kinds of repository (DBMS, configurations management systems). However, multiversion systems may not be easily used if they are not able to ...

Research paper thumbnail of Integrity Constraint Checking in Distributed Nested Transactions over a Database Cluster Authors and Affiliations

This paper presents a solution to check referential integrity constraints and conjunctive global ... more This paper presents a solution to check referential integrity constraints and conjunctive global constraints in a relational multi database system. It also presents the experimental results obtained by implementing this solution over a PC cluster with Oracle9i DBMS.

Research paper thumbnail of Posters of the 2003 CoopIS (Cooperative Information Systems) International Conference-Trading Freshness for Performance in a Cluster of Replicated Databases