Mining associations in text in the presence of background knowledge (original) (raw)

Exploiting background information in knowledge discovery from text

1997

This paper describes the FACT system for knowledge discovery from text. It discovers associationspatterns of co-occurrence-amongst keywords labeling the items in a collection of textual documents. In addition, when background knowledge is available about the keywords labeling the documents FACT is able to use this information in its discovery process. FACT takes a query-centered view of knowledge discovery, in which a discovery request is viewed as a query over the implicit set of possible results supported by a collection of documents, and where background knowledge is used to specify constraints on the desired results of this query process. Execution of a knowledge-discovery query is structured so that these background-knowledge constraints can be exploited in the search for possible results. Finally, rather than requiring a user to specify an explicit query expression in the knowledge-discovery query language, FACT presents the user with a simple-to-use graphical interface to the query language, with the language providing a well-defined semantics for the discovery actions performed by a user through the interface.

Textmining: Generating association rules from textual data

Textmining is an emerging research area, whose goal is to discover additional information from hidden patterns in unstructured large textual collection. Hence, given a collection of text documents, most approaches of text mining perform knowledge-discovery operations on labels associated with each document, which are usually keywords that represent the result of non-trivial keyword-labeling processes. In this paper, we are interested especially in the extraction of the associations from unstructured database, especially full text. The aim of this paper is twofold. First, to propose a conceptual approach, based on the formal concept analysis [GANT99], in order to discover knowledge, formally represented by association rules, from large textual corpus. Second, to introduce an algorithm to derive additional and implicit association rules, using an associated taxonomy, from the already discovered association rules.

Knowledge Discovery in Text Mining using Association Rule Extraction

International Journal of Computer Applications, 2016

Internet and information technology are the platform where huge amount of information is available to use. But searching the exact information for some knowledge is time consuming and results confusion in dealing with it. Retrieving knowledge manually from collection of web documents and database may cause to miss the track for user. Text mining is helpful to user to find accurate information or knowledge discovery and features in the text documents. Thus there is need to develop text mining approach which clearly guides the user about what is important information and what is not, how to deal with important information, how to generate knowledge etc. Knowledge discovery is an increasing field in the research. For a user reading the collection of documents and get some knowledge is time consuming and less effective. There has been a significant improvement in the research related to generating Knowledge Discovery from collection of documents. We propose a method of generating Knowledge Discovery in Text mining using Association Rule Extraction. Using this approach the users are able to find accurate and important knowledge from the collection of web documents which will reduce time for reading all those documents.

A Text Mining Technique Using Association Rules Extraction

International Journal of …, 2007

Abstract—This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst ...

Maximal Association Rules: A Tool for Mining Associations in Text

Journal of Intelligent Information Systems, 2005

We describe a new tool for mining association rules, which is of special value in text mining. The new tool, called maximal associations, is geared toward discovering associations that are frequently lost when using regular association rules. Intuitively, a maximal association rule X max =⇒ Y says that whenever X is the only item of its type in a transaction, than Y also appears, with some confidence. Maximal associations allow the discovery of associations pertaining to items that most often do not appear alone, but rather together with closely related items, and hence associations relevant only to these items tend to obtain low confidence. We provide a formal description of maximal association rules and efficient algorithms for discovering all such associations. We present the results of applying maximal association rules to two text corpora.

A System for Knowledge Discovery in Big Dynamical Text Collections

2012

Software system Cordiet-FCA is presented, which is designed for knowledge discovery in big dynamic data collections, including texts in natural language. Cordiet-FCA allows one to compose ontology-controlled queries and outputs concept lattice, implication bases, association rules, and other useful concept-based artifacts. Efficient algorithms for data preprocessing, text processing, and visualization of results are discussed. Examples of applying the system to problems of medical diagnostics, criminal investigations are considered.

Knowledge Discovery in Textual Databases (KDT)

1995

The information age is characterized by a rapid growth in the amount of information available in electronic media. Traditional data handling methods are not adequate to cope with this information flood. Knowledge Discovery in Databases (KDD) is a new paradigm that focuses on computerized exploration of large amounts of data and on discovery of relevant and interesting patterns within them. While most work on KDD is concerned with structured databases, it is clear that this paradigm is required for handling the huge amount of information that is available only in unstructured textual form. To apply traditional KDD on texts it is necessary to impose some structure on the data that would be rich enough to allow for interesting KDD operations. On the other hand, we have to consider the severe limitations of current text processing technology and define rather simple structures that can be extracted from texts fairly automatically and in a reasonable cost. We propose using a text categorization paradigm to annotate text articles with meaningful concepts that are organized in hierarchical structure. We suggest that this relatively simple annotation is rich enough to provide the basis for a KDD framework, enabling data summarization, exploration of interesting patterns, and trend analysis. This research combines the KDD and text categorization paradigms and suggests advances to the state of the art in both areas.

USING DATA MINING METHODS KNOWLEDGE DISCOVERY FOR TEXT MINING

Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase)-based approaches should perform better than the term-based ones, but many experiments do not support this hypothesis. Proposed work presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information.