Improving Open Information Extraction for Semantic Web Tasks (original) (raw)

From hyperlinks to Semantic Web properties using Open Knowledge Extraction

Semantic Web, 2016

Open information extraction approaches are useful but insufficient alone for populating the Web with machine readable information as their results are not directly linkable to, and immediately reusable from, other Linked Data sources. This work proposes a novel paradigm, named Open Knowledge Extraction, and its implementation (Legalo) that performs unsupervised, open domain, and abstractive knowledge extraction from text for producing directly usable machine readable information. The implemented method is based on the hypothesis that hyperlinks (either created by humans or knowledge extraction tools) provide a pragmatic trace of semantic relations between two entities, and that such semantic relations, their subjects and objects, can be revealed by processing their linguistic traces (i.e. the sentences that embed the hyperlinks) and formalised as Semantic Web triples and ontology axioms. Experimental evaluations conducted on validated text extracted from Wikipedia pages, with the help of crowdsourcing, confirm this hypothesis showing high performances. A demo is available at http://wit.istc.cnr.it/ stlab-tools/legalo.

OWL for the Masses: From Structured OWL to Unstructured Technically-Neutral Natural Language

Informatics, 2009. BCI'09. …, 2009

The Web Ontology Language (OWL) is one of the fundamental blocks for realizing the Semantic Web vision, since it allows to represent probably the most difficult component of the latter; knowledge regarding the concepts of domain of discourse, i.e. ontologies. Unfortunately, its expressive power goes in hand with a rather verbose syntax, difficult to be understood by non-technical users, and thus leading to difficulties in validation and verification of the represented knowledge. A translation tool from OWL to some form of natural language could significantly assist users towards such tasks. Such a tool requires correctly interpreting the ontology constructs, representing highly nested ontologies, and forming logical sentences. This paper presents an innovative approach to directly translating RDF/XML based structured ontology into error free concise natural language text akin to English.

Towards Semantic Web Information Extraction

International Symposium on Wearable Computers, 2003

The approach towards Semantic Web Information Extraction (IE) presented here is implemented in KIM - a platform for semantic indexing, annotation, and retrieval. It combines IE based on the mature text engineering platform (GATE1) with Semantic Web-compliant knowledge representation and management. The cornerstone is automatic generation of named-entity (NE) annotations with class and instance references to a semantic repository. Simplistic

Open Information Extraction from the Web

2007

Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and exploration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more relations, discovered on the fly. We report statistics on TEXTRUNNER's 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000 more abstract assertions.

Ontology Enrichment by Extracting Hidden Assertional Knowledge from Text

In this position paper we present a new approach for discovering some special classes of assertional knowledge in the text by using large RDF repositories, resulting in the extraction of new non-taxonomic ontological relations. Also we use inductive reasoning beside our approach to make it outperform. Then, we prepare a case study by applying our approach on sample data and illustrate the soundness of our proposed approach. Moreover in our point of view current LOD cloud is not a suitable base for our proposal in all informational domains. Therefore we figure out some directions based on prior works to enrich datasets of Linked Data by using web mining. The result of such enrichment can be reused for further relation extraction and ontology enrichment from unstructured free text documents. https://sites.google.com/site/ijcsis/vol-11-no-5-may-2013

OWL as a Target for Information Extraction Systems

2008

Current information extraction systems can do a good job of discovering entities, relations and events in natural language text. The traditional output of such systems is XML, with the ACE Pilot Format (APF) schema as a common target. We are developing a system that will take the output of an information extraction system as APF documents and directly populate a knowledge base with the information extracted. We report on an initial OWL ontology that covers the APF schema, a simple program to convert a set of APF documents to RDF data and a demonstration system build with Exhibit to view the results.

Populating the Semantic Web by Macro-reading Internet Text

Lecture Notes in Computer Science, 2009

A key question regarding the future of the semantic web is "how will we acquire structured information to populate the semantic web on a vast scale?" One approach is to enter this information manually. A second approach is to take advantage of pre-existing databases, and to develop common ontologies, publishing standards, and reward systems to make this data widely accessible. We consider here a third approach: developing software that automatically extracts structured information from unstructured text present on the web. We also describe preliminary results demonstrating that machine learning algorithms can learn to extract tens of thousands of facts to populate a diverse ontology, with imperfect but reasonably good accuracy.

Integration of Information Extraction with an Ontology

This paper describes the integration of an ontology with an infor- mation extraction (IE) tool. Our main goal is extract knowledge from text to populate the ontology, and so alleviate the problem of ontology maintenance. The IE tool extracts information using partial parsing and machine learning techniques. Our domain of study is "KMi Planet", a Web-based news server that helps to com- municate relevant information between members in our institute. Currently our system finds instances of classes or subclasses. Al- though in the future we expect to create new classes and subclasses from new concepts appearing in text.

Open Information Extraction: A Review of Baseline Techniques, Approaches, and Applications

arXiv (Cornell University), 2023

With the abundant amount of available online and offline text data, there arises a crucial need to extract the relation between phrases and summarize the main content of each document in a few words. For this purpose, there have been many studies recently in Open Information Extraction (OIE). OIE improves upon relation extraction techniques by analyzing relations across different domains and avoids requiring handlabeling pre-specified relations in sentences. This paper surveys recent approaches of OIE and its applications on Knowledge Graph (KG), text summarization, and Question Answering (QA). Moreover, the paper describes OIE basis methods in relation extraction. It briefly discusses the main approaches and the pros and cons of each method. Finally, it gives an overview about challenges, open issues, and future work opportunities for OIE, relation extraction, and OIE applications.

Identifying Motifs for Evaluating Open Knowledge Extraction on the Web

Knowledge-Based Systems, 2016

Open Knowledge Extraction (OKE) is the process of extracting knowledge from text and representing it in formalized machine readable format, by means of unsupervised, open-domain and abstractive techniques. Despite the growing presence of tools for reusing NLP results as linked data (LD), there is still lack of established practices and benchmarks for the evaluation of OKE results tailored to LD. In this paper, we propose to address this issue by constructing RDF graph banks, based on the definition of logical patterns called OKE Motifs. We demonstrate the usage and extraction techniques of motifs using a broad-coverage OKE tool for the Semantic Web called FRED. Finally, we use identified motifs as empirical data for assessing the quality of OKE results, and show how they can be extended trough a use case represented by an application within the Semantic Sentiment Analysis domain.