Renaud Delbru - Academia.edu (original) (raw)

Papers by Renaud Delbru

La description des ressources web par des méta-données compréhensibles par les machines est l'un ... more La description des ressources web par des méta-données compréhensibles par les machines est l'un des fondements du Web Sémantique. RDF est le language pour décrire etéchanger les connaissances du Web Sémantique. Comme ces données deviennent de plus en plus courantes, les techniques permettant de manipuler et d'explorer ces informations deviennent nécessaires.

Considering that thousands if not millions of linked datasets will be published soon, we motivate... more Considering that thousands if not millions of linked datasets will be published soon, we motivate in this paper the need for an efficient and effective way to rank interlinked datasets based on formal descriptions of their characteristics. We propose DING (from Dataset RankING) as a new approach to rank linked datasets using information provided by the voiD vocabulary. DING is a domain-independent link analysis that measures the popularity of datasets by considering the cardinality and types of the relationships. We propose also a methodology to automatically assign weights to link types. We evaluate the proposed ranking algorithm against other well known ones, such as PageRank or HITS, using synthetic voiD descriptions. Early results show that DING performs better than the standard Web ranking algorithms.

Semantic Web Information Management, 2010

... This architecture can be seen as a distributed ABox reasoning with a shared persistent TBox. ... more ... This architecture can be seen as a distributed ABox reasoning with a shared persistent TBox. ... We observe that on a snapshoot of the index containing 6 million of documents, the original size of the corpus was 18 GB whereas the to-tal size after inference was 46 GB, thus a ratio ...

The task of entity retrieval becomes increasingly prevalent as more and more (semi-) structured i... more The task of entity retrieval becomes increasingly prevalent as more and more (semi-) structured information about objects is available on the Web in the form of documents embedding metadata (RDF, RDFa, Microformats, and others). However, research and development in that direction is dependent on (1) the availability of a representative corpus of entities that are found on the Web, and the availability of an entity-oriented search infrastructure for experimenting with new retrieval models. In this paper, we introduce the Sindice-2011 data collection which is derived from data collected by the Sindice semantic search engine. The data collection (available at http://data.sindice.com/trec2011/) is especially designed for supporting research in the domain of web entity retrieval. We describe how the corpus is organised, discuss statistics of the data collection, and introduce a search infrastructure to foster research and development.

The Semantic Web is driven by the idea of moving from a Web of documents, designed for human cons... more The Semantic Web is driven by the idea of moving from a Web of documents, designed for human consumption, to a Web of data in order to "create a universal medium for the exchange of data where data can be shared and processed by automated tools as well as by people" 1 .

Although most developers are object-oriented, programming RDF is triple-oriented. Bridging this g... more Although most developers are object-oriented, programming RDF is triple-oriented. Bridging this gap, by developing a truly objectoriented API that uses domain terminology, is not straightforward, because of the dynamic and semi-structured nature of RDF and the openworld semantics of RDF Schema. We present ActiveRDF, our object-oriented library for accessing RDF data. ActiveRDF is completely dynamic, offers full manipulation and querying of RDF data, does not rely on a schema and can be used against different data-stores. In addition, the integration with the popular Rails framework enables very easy development of Semantic Web applications.

Semwiki, 2006

Semantic Wikis allow users to semantically annotate their Wiki content. The particular annotation... more Semantic Wikis allow users to semantically annotate their Wiki content. The particular annotations can differ in expressive power, simplicity, and meaning. We present an elaborate conceptual model for semantic annotations, introduce a unique and rich Wiki syntax for these annotations, and discuss how to best formally represent the augmented Wiki content. We improve existing navigation techniques to automatically construct faceted browsing for semistructured data. By utilising the Wiki annotations we provide greatly enhanced information retrieval. Further we report on our ongoing development of these techniques in our prototype SemperWiki. Recently several Semantic Wikis have been developed, such as Platypus [22], WikSAR [2], Semantic MediaWiki [23] and IkeWiki . These Wikis answer these questions in a rather limited way: (a) they allow only simple annotations of the current Wiki page; (b) they do not formally separate the page and the concept that it describes; and (c) they do not fully exploit the semantic annotations for improved navigation.

Web Semantics Science Services and Agents on the World Wide Web, 2012

More and more (semi) structured information is becoming available on the web in the form of docum... more More and more (semi) structured information is becoming available on the web in the form of documents embedding metadata (e.g., RDF, RDFa, Microformats and others). There are already hundreds of millions of such documents accessible and their number is growing rapidly. This calls for large scale systems providing effective means of searching and retrieving this semi-structured information with the ultimate goal of making it exploitable by humans and machines alike.This article examines the shift from the traditional web document model to a web data object (entity) model and studies the challenges faced in implementing a scalable and high performance system for searching semi-structured data objects over a large heterogeneous and decentralised infrastructure. Towards this goal, we define an entity retrieval model, develop novel methodologies for supporting this model and show how to achieve a high-performance entity retrieval system. We introduce an indexing methodology for semi-structured data which offers a good compromise between query expressiveness, query processing and index maintenance compared to other approaches. We address high-performance by optimisation of the index data structure using appropriate compression techniques. Finally, we demonstrate that the resulting system can index billions of data objects and provides keyword-based as well as more advanced search interfaces for retrieving relevant data objects in sub-second time.This work has been part of the Sindice search engine project at the Digital Enterprise Research Institute (DERI), NUI Galway. The Sindice system currently maintains more than 200 million pages downloaded from the web and is being used actively by many researchers within and outside of DERI.

Lecture Notes in Computer Science, 2010

Now motivated also by the partial support of major search engines, hundreds of millions of docume... more Now motivated also by the partial support of major search engines, hundreds of millions of documents are being published on the web embedding semi-structured data in RDF, RDFa and Microformats. This scenario calls for novel information search systems which provide effective means of retrieving relevant semi-structured information. In this paper, we present an "entity retrieval system" designed to provide entity search capabilities over datasets as large as the entire Web of Data. Our system supports full-text search, semi-structural queries and top-k query results while exhibiting a concise index and efficient incremental updates. We advocate the use of a node indexing scheme and show that it offers a good compromise between query expressiveness, query processing time and update complexity in comparison to three other indexing techniques. We then demonstrate how such system can effectively answer queries over 10 billion triples on a single commodity machine. 5 Virtuoso: http://virtuoso.openlinksw.com/ 6 specifically a triple is a statement s, p, o consisting of a subject, a predicate, and an object and asserts that a subject has a property with some value. 7 specifically a quad is a statement with a fourth element c called "context" for naming the RDF graph, generally to keep the provenance of the RDF data. 8 SPARQL: http://www.w3.org/TR/rdf-sparql-query/

Lecture Notes in Computer Science, 2012

Lecture Notes in Computer Science, 2008

There has been interest in ranking the resources and generating corresponding expressive descript... more There has been interest in ranking the resources and generating corresponding expressive descriptions from the Semantic Web recently. This paper proposes an approach for automatically generating snippets from RDF documents and assisting users in better understanding the content of RDF documents return by Semantic Web search engines. A heuristic method for discovering topics, based on the occurrences of RDF nodes and the URIs of original RDF documents, is presented and experimented in this paper. In order to make the snippets more understandable, two strategies are proposed and used for ranking the topic-related statements and the query-related statements respectively. Finally, the conclusion is drawn based on the discussion about the performances of our topic discovery and the whole snippet generation approaches on a test dataset provided by Sindice.

Proceedings of the 17th International Database Engineering & Applications Symposium on - IDEAS '13, 2013

ABSTRACT In many applications, it is convenient to substitute a large data graph with a smaller h... more ABSTRACT In many applications, it is convenient to substitute a large data graph with a smaller homomorphic graph. This paper investigates approaches for summarising massive data graphs. In general, massive data graphs are processed using a shared-nothing infrastructure such as MapReduce. However, accurate graph summarisation algorithms are suboptimal for this kind of environment as they require multiple iterations over the data graph. We investigate approximate graph summarisation algorithms that are efficient to compute in a shared-nothing infrastructure. We define a quality assessment model of a summary with regards to a gold standard summary. We evaluate over several datasets the trade-offs between efficiency and precision of the algorithms. With regards to an application, experiments highlight the need to trade-off the precision and volume of a graph summary with the complexity of a summarisation technique.

Lecture Notes in Computer Science, 2011

In large web search engines the performance of Information Retrieval systems is a key issue. Bloc... more In large web search engines the performance of Information Retrieval systems is a key issue. Block-based compression methods are often used to improve the search performance, but current self-indexing techniques are not adapted to such data structure and provide suboptimal performance. In this paper, we present SkipBlock, a self-indexing model for block-based inverted lists. Based on a cost model, we show that it is possible to achieve significant improvements on both search performance and structure's space storage.

Semantic Web Information Management, 2009

Lecture Notes in Computer Science, 2013

ABSTRACT Linked Data promises that a large portion of Web Data will be usable as one big interlin... more ABSTRACT Linked Data promises that a large portion of Web Data will be usable as one big interlinked RDF database against which structured queries can be answered. In this lecture we will show how reasoning --- using RDF Schema (RDFS) and the Web Ontology Language (OWL) --- can help to obtain more complete answers for such queries over Linked Data. We first look at the extent to which RDFS and OWL features are being adopted on the Web. We then introduce two high-level architectures for query answering over Linked Data and outline how these can be enriched by (lightweight) RDFS and OWL reasoning, enumerating the main challenges faced and discussing reasoning methods that make practical and theoretical trade-offs to address these challenges. In the end, we also ask whether or not RDFS and OWL are enough and discuss numeric reasoning methods that are beyond the scope of these standards but that are often important when integrating Linked Data from several, heterogeneous sources.

Lecture Notes in Computer Science, 2011

The Sindice Semantic Web index provides search capabilities over 260 million documents. Reasoning... more The Sindice Semantic Web index provides search capabilities over 260 million documents. Reasoning over web data enables to make explicit what would otherwise be implicit knowledge: it adds value to the information and enables Sindice to ultimately be more competitive in terms of precision and recall. However, due to the scale and heterogeneity of web data, a reasoning engine for the Sindice system must (1) scale out through parallelisation over a cluster of machines; and (2) cope with unexpected data usage. In this paper, we report our experiences and lessons learned in building a large scale reasoning engine for Sindice. The reasoning approach has been deployed, used and improved since 2008 within Sindice and has enabled Sindice to reason over billions of triples.

2012 23rd International Workshop on Database and Expert Systems Applications, 2012

One of the reasons for the slow adoption of SPARQL is the complexity in query formulation due to ... more One of the reasons for the slow adoption of SPARQL is the complexity in query formulation due to data diversity. The principal barrier a user faces when trying to formulate a query is that he generally has no information about the underlying structure and vocabulary of the data.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008

Increasing amounts of RDF data are available on the Web for consumption by Semantic Web browsers ... more Increasing amounts of RDF data are available on the Web for consumption by Semantic Web browsers and indexing by Semantic Web search engines. Current Semantic Web publishing practices, however, do not directly support efficient discovery and high-performance retrieval by clients and search engines. We propose an extension to the Sitemaps protocol which provides a simple and effective solution: Data publishers create Semantic Sitemaps to announce and describe their data so that clients can choose the most appropriate access method. We show how this protocol enables an extended notion of authoritative information across different access methods.

Semantic Web Information Management, 2010

Semwiki, 2006

Web Semantics Science Services and Agents on the World Wide Web, 2012

Lecture Notes in Computer Science, 2010

Lecture Notes in Computer Science, 2012

Lecture Notes in Computer Science, 2008

Proceedings of the 17th International Database Engineering & Applications Symposium on - IDEAS '13, 2013

Lecture Notes in Computer Science, 2011

Semantic Web Information Management, 2009

Lecture Notes in Computer Science, 2013

Lecture Notes in Computer Science, 2011

2012 23rd International Workshop on Database and Expert Systems Applications, 2012

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008