Sindice.com: Weaving the Open Linked Data (original) (raw)
Related papers
Sindice.com: a document-oriented lookup index for open linked data
International Journal of Metadata, Semantics and Ontologies, 2008
Data discovery on the Semantic Web requires crawling and indexing of statements, in addition to the 'linked-data' approach of de-referencing resource URIs. Existing Semantic Web search engines are focused on database-like functionality, compromising on index size, query performance and live updates. We present Sindice, a lookup index over Semantic Web resources. Our index allows applications to automatically locate documents containing information about a given resource. In addition, we allow resource retrieval through inverse-functional properties, offer a full-text search and index SPARQL endpoints. Finally, we extend the sitemap protocol to efficiently index large datasets with minimal impact on data providers.
SpEnD: Linked Data SPARQL Endpoints Discovery Using Search Engines
Linked data endpoints are online query gateways to semantically annotated linked data sources. In order to query these data sources, SPARQL query language is used as a standard. Although a linked data endpoint (i.e. SPARQL endpoint) is a basic Web service, it provides a platform for federated online querying and data linking methods. For linked data consumers, SPARQL endpoint availability and discovery are crucial for live querying and semantic information retrieval. Current studies show that availability of linked datasets is very low, while the locations of linked data endpoints change frequently. There are linked data respsito-ries that collect and list the available linked data endpoints or resources. It is observed that around half of the endpoints listed in existing repositories are not accessible (temporarily or permanently offline). These endpoint URLs are shared through repository websites, such as Datahub.io, however, they are weakly maintained and revised only by their publishers. In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a " search keyword " discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, the collected search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories , as well as a significant number of new SPARQL endpoints, have been discovered. We analyze our findings in comparison to Datahub collection in detail.
Finding data, knowledge, and answers on the semantic Web
Proc. 20th Int. FLAIRS Conf., AAAI Press, 2007
Web search engines like Google have made us all smarter by providing ready access to the world's knowledge whenever we need to look up a fact, learn about a topic or evaluate opinions. The W3C's Semantic Web effort aims to make such knowledge more accessible to computer programs by publishing it in machine understandable form. As the volume of Semantic Web data grows, software agents will need their own search engines to help them find the relevant and trustworthy knowledge they need to perform their tasks. We will discuss the general issues underlying the indexing and retrieval of RDF-based information and describe Swoogle, a crawler based search engine whose index contains information on over two million RDF documents, and Tripleshop, which uses Swoogle to automatically build datasets appropriate for responding to user supplied queries. We will illustrate their use in ELVIS (Ecosystem Location Visualization and Information System), a distributed platform for constructing end-to-end use cases that demonstrate the semantic web's utility for integrating scientific data.
Linked Data in the Semantic Web
Sir Tim Burners-Lee created quite the stir with the introduction of Linked Data. Instead of having hyperlinks link to static documents online, Burners-Lee proposes that data be linked together semantically online in a concept he calls Linked Data. This paper will explore the many facets of Linked Data. This will be accomplished with an overview of the principles and standards of Linked Data to include concepts such as RDF, OWL, and SPARQL. To provide the audience with a better understanding of how Linked Data can function, it will illustrate current projects such as DBpedia, BabelNet, and MeLOD. Finally, there will a discussion on how libraries are impacted by Linked Data and some initiatives being explored such as BIBFRAME.
RKBExplorer.com: A Knowledge Driven Infrastructure for Linked Data Providers
Lecture Notes in Computer Science, 2008
RKB Explorer is a Semantic Web application that is able to present unified views of a significant number of heterogeneous data sources. We have developed an underlying information infrastructure which is mediated by ontologies and consists of many independent triplestores, each publicly available through both SPARQL endpoints and resolvable URIs. To realise this synergy of disparate information sources, we have deployed tools to identify co-referent URIs, and devised an architecture to allow the information to be represented and used. This paper provides a brief overview of the system including the underlying infrastructure, and a number of associated tools for both knowledge acquisition and publishing.
An ontology of resources for linked data
2009
ABSTRACT The primary goal of the Semantic Web is to use URIs as a universal space to name anything, expanding from using URIs for webpages to URIs for “real objects and imaginary concepts,” as phrased by Berners-Lee.
The Semantic Web Journal as Linked Data
The Semantic Web journal implements an open and transparent review process which creates a unique bibliographic dataset. In addition to traditional publication data such as author names and paper titles, each paper in this dataset is also accompanied with a fully timestamped history of its successive decision statuses, assigned editors, solicited and voluntary reviewers, full text reviews, comments, and in many cases also the authors' response letters. This dataset presents a rich and valuable resource for a variety of studies, such as understanding the collaboration networks of scholars as well as exploring the trending topics in the field of Semantic Web. This dataset is now publicly available online as Linked Data. In this short paper, we report the availability, novelty, as well as some design considerations of this dataset.
Using Search Paradigms and Architecture Information Components to Consume Linked Data
The success of the Linked Open Data Initiative has increased the amount of information available on the Web. However, the Web content published under this initiative cannot be consumed by users who are unfamiliar with Semantic Web technologies (RDF, SPARQL, Ontologies, etc.), because they need to understand the structure, provenance and the way in which data are queried, and this can be complex for non-tech-users. In this paper the process development of a Web application is described, which uses components borrowed from Information Architecture and search paradigms applied to the task of consuming Linked Data by non-tech-users. These data are available via a SPARQL endpoint (it is a SPARQL protocol service that enables users to query a knowledge base known as RDF triples database or triplestore), which can be queried through a HTTP protocol from a Web browser. This proposal allows full-text search over a bibliographic dataset and faceted search, based on representative concepts of ...
Searching and browsing linked data with SWSE: the semantic web search engine
2011
In this paper, we discuss the architecture and implementation of the Semantic Web Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional search engines, SWSE operates over RDF Web data–loosely also known as Linked Data–which implies unique challenges for the system design, architecture, algorithms, implementation and user interface.