ARISTA: knowledge engineering with scientific texts (original) (raw)

ARISTA: knowledge engineering with scientific texts KNOWLEDGE ACQUISITION FROM SCIENTIFIC TEXTS

The paper presents results of experiments in knowledge engineering with scientific texts by the application of the ARISTA method. ARISTA stands for Automatic Representation Independent Syllogistic Text Analysis. This method uses natural language text as a knowledge base in contrast with the methods followed by the prevailing approach, which rely on the translation of texts into some knowledge representation formalism. The experiments demonstrate the feasibility of deductive question-answering and explanation generation directly from texts involving mainly causal reasoning. Illustrative examples of the operation of a prototype based on the ARISTA method and implemented in Prolog are presented. knowledge engineering, knowledge representation, natural language processing Reasoning is discourse in which given some premises something different from the given necessarily follows from the premises. (Aristotle, Topics, 4th century BC) Traditionally, knowledge engineering involves the elici-tation of knowledge from domain experts by a knowledge engineer and the 'manual' or rather 'mental' for-malization of this knowledge using a knowledge representation formalism. The result of such an activity is a knowledge base that can be processed by an inference engine to generate answers and explanations as a response to user questions. Experts are, however, often unavailable, and therefore an alternative is to use texts as a supplementary source of knowledge. The automatic processing of such texts may provide a method for a least partial solution of the knowledge acquisition bottleneck problem of expert systems. Therefore, currently a new discipline is emerging, which may be called 'Knowledge Engineering with Texts' (KET). The work reported in this paper addresses KET with scientific texts. An important advantage of KET with scientific texts such as (extbooks and journal articles is that these texts are widely available. In particular, the recent appearance of electronic editions of scientific texts facilitates the automatic processing of these texts by computer even more. Some initial experiments toward the development of KET with scientific texts are presented here, applying the Automatic Representation Independent Syllogistic Text Analysis (ARISTA) method, which differs from the methods followed by the prevailing approach. The development of systems with the ability to read scientific texts and assimilate or acquire the knowledge contained in them may provide help to solve the knowledge engineering problems posed by the knowledge acquisition bottleneck in the creation of scientific expert systems. The prevailing approach in the natural language processing literature uses methods that rely on the translation of texts into some knowledge representation formalism. As an illustrative example of the prevailing approach, a system called GREKA, which uses attribute grammars for the representation of the text content ',2, will be briefly reviewed. GREKA has been applied to the acquisition of causal and other forms of knowledge from texts. The method followed in GREKA involves: • the analysis of scientific texts by considering types of sentences that express causal and other relations between processes as well as relations between parts of objects • the location of the prerequisite background and domain knowledge necessary for answering questions • the translation of the texts delivering the knowledge into an attribute grammar, with its syntactic part modelling the structural knowledge and its 'semantic' part modelling the rest of the knowledge • the generation of answers and their explanations from the grammar generated in terms of the relations of processes and entities involved, building on the early question-answering work done by the author 3. The translation performed by the system is based on linguistic knowledge, which consists of the following parts: • rules that recognize the syntactic structures encountered in the body of the scientific texts used • semantic knowledge necessary for analysing these syntactic structures • lexical knowledge

ARISTA causal knowledge discovery from texts

Discovery Science, 2002

A method is proposed in the present paper for supporting the discovery of causal knowledge by finding causal sentences from a text and chaining them by the operation of our system. The operation of our system called ACkdT relies on the search for sentences containing appropriate natural language phrases. The system consists of two main subsystems. The first subsystem achieves the extraction of knowledge from individual sentences that is similar to traditional information extraction from texts while the second subsystem is based on a causal reasoning process that generates new knowledge by combining knowledge extracted by the first subsystem. In order to speed up the whole knowledge acquisition process a search algorithm is applied on a table of combinations of keywords characterizing the sentences of the text. Our knowledge discovery method is based on the use of our knowledge representation independent method ARISTA that accomplishes causal reasoning "on the fly" directly from text. The application of the method is demonstrated by the use of two examples. The first example concerns pneumonology and is found in a textbook and the second concerns cell apoptosis and is compiled from a collection of MEDLINE paper abstracts related to the recent proposal of a mathematical model of apoptosis.

Ontology or meta-model for retrieving scientific reasoning in documents: The Arkeotek project

Electronic publishing and large databases make it possible to store scientific data together with the texts that report scientific studies about these data. The SCD model, inspired from the logicist pro- gram, suggests to structure documents according to the role of each paragraph in the overall argu- mentation. The Arkeotek project promotes the use of the SCD format for scientific writing in archae- ology. To improve information retrieval in the collec- tion of all SCD documents, the Arkeotek project experiments semantic annotation with a domain ontology. This ontology is rather a domain meta- model used to enrich SCD-structured documents. This paper presents the SCD format and the do- main model. It illustrates how the target of docu- ment annotation influences the selection and defi- nition of concepts in the domain model. It dis- cusses the status of the model with respect to a core ontology like the CIDOC CRM.

Software Feasibility Study to Transform Complex Scientific Written Knowledge to a Clear, Rationale and Simple Language

International Journal of Database Management Systems, 2015

Ali Akbar Dehkhoda, the prominent lexicographer, describes a person who has difficulty in grasping knowledge as someone who "Cannot understand something without knowing all its details." If the knowledge required by somebody is in a language other than the person's mother tongue, access to this knowledge will surely meet special difficulties resulting from the person's lack of mastery over the second language. Any project that can monitor knowledge sources written in English and change them into the user's language by employing a simple understandable model is capable of being a knowledge-based project with a world view regarding text simplification. This article creates a knowledge system, investigates some algorithms for analyzing contents of complex texts, and presents solutions for changing such texts simple and understandable ones. Texts are automatically analyzed and their ambiguous points are identified by software, but it is the author or the human agent who makes decisions concerning omission of the ambiguities or correction of the texts.

Acquisition and Reuse of Reasoning Knowledge from Textual Cases for Automated Analysis

Lecture Notes in Computer Science, 2014

Analysis is essential for solving complex problems such as diagnosing a patient, investigating an accident or predicting the outcome of a legal case. It is a non-trivial process even for human experts. To assist experts in this process we propose a CBR-based approach for automated problem analysis. In this approach a new problem is analysed by reusing reasoning knowledge from the analysis of a similar problem. To avoid the laborious process of manual case acquisition, the reasoning knowledge is extracted automatically from text and captured in a graphbased representation, which we dubbed Text Reasoning Graph (TRG), that consists of causal, entailment and paraphrase relations. The reuse procedure involves adaptation of a similar past analysis to a new problem by finding paths in TRG that connect the evidence in the new problem to conclusions of the past analysis. The objective is to generate the best explanation of how the new evidence connects to the conclusion. For evaluation, we built a system for analysing aircraft accidents based on the collection of aviation investigation reports. The evaluation results show that our reuse method increases the precision of the retrieved conclusions.

DEDUCTIVE QUESTION ANSWERING WITH RHETORIC ANALYSIS FROM BIOMEDICAL TEXTS

—Deductive Question Answering from biomedical texts and analysis of rhetoric relations in the AROMA system is presented. The development of the AROMA system aims at the creation of an intelligent tool for the answering of questions related to biomedical models using information extracted from natural language texts. The system operation includes three main functions namely question answering, text mining and simulation. The question answering function generates model based answers and their explanations. The operation of AROMA allows the exploitation of rhetoric relations between a " basic " text that proposes a model of a biomedical system and parts of the abstracts of papers that present experimental findings supporting the model. The AROMA system consists of three subsystems. The first subsystem extracts knowledge including rhetoric relations from biomedical texts. The second subsystem answers questions with causal knowledge extracted by the first subsystem and generates explanations using rhetoric relation knowledge in addition to other knowledge. The third subsystem simulates the time-dependent behavior of a model from which textual descriptions of the waveforms are generated automatically.

System Modeling by Computer using Biomedical Texts

In the present paper we describe a new Artificial Intelligence method of system modeling that utilises causal knowledge extracted from different texts. The equations describing the system model are solved with a Prolog program which receives data such as values for its parameters from the text analysis subsystem. The knowledge extraction from the texts is based on the use of our knowledge representation independent method ARISTA that accomplishes causal reasoning directly from text. Our final aim is to be able to model biomedical systems by integrating partial knowledge extracted from a number of different texts and give the user a facility for questioning these models. The model based question answering we are aiming at may support both biomedical researchers and medical practitioners.

Transformation of natural language into logical formulas

Proceedings of the 9th conference on Computational linguistics -, 1982

This paper presents an attempt of elaboration of a full parsing system for Polish natural language which is being worked out in the Instlt~te of Informatics of Warsaw University. Our system was adapted to the parsing of the corpus of real medical texts whici= concern a subdomain of medicine. We made use of the experience of such famous authors as (6), (7), (8), (9), (10), (11), (12), (13), (1~).

An Architecture for Scientific Document Retrieval Using Textual and Math Entailment Modules

We present an architecture for scientific document retrieval. An existing system for textual and math-ware retrieval Math Indexer and Searcher MIaS is designed for extensions by modules for textual and mathaware entailment. The goal is to increase quality of retrieval (precision and recall) by handling natural languge variations of expressing semantically the same in texts and/or formulae. Entailment modules are designed to use several, ordered layers of processing on lexical, syntactic and semantic levels using natural language processing tools adapted for handling tree structures like mathematical formulae. If these tools are not able to decide on the entailment, generic knowledge databases are used deploying distributional semantics methods and tools. It is shown that sole use of distributional semantics for semantic textual entailment decisions on sentence level is surprisingly good. Finally, further research plans to deploy results in the digital mathematical libraries are outlined.

ARISTA: knowledge engineering with scientific texts (original) (raw)

Related papers