TRUMIT: a tool to support large-scale mining of text association rules (original) (raw)

Maximal Association Rules: A Tool for Mining Associations in Text

Journal of Intelligent Information Systems, 2005

We describe a new tool for mining association rules, which is of special value in text mining. The new tool, called maximal associations, is geared toward discovering associations that are frequently lost when using regular association rules. Intuitively, a maximal association rule X max =⇒ Y says that whenever X is the only item of its type in a transaction, than Y also appears, with some confidence. Maximal associations allow the discovery of associations pertaining to items that most often do not appear alone, but rather together with closely related items, and hence associations relevant only to these items tend to obtain low confidence. We provide a formal description of maximal association rules and efficient algorithms for discovering all such associations. We present the results of applying maximal association rules to two text corpora.

Textmining: Generating association rules from textual data

Textmining is an emerging research area, whose goal is to discover additional information from hidden patterns in unstructured large textual collection. Hence, given a collection of text documents, most approaches of text mining perform knowledge-discovery operations on labels associated with each document, which are usually keywords that represent the result of non-trivial keyword-labeling processes. In this paper, we are interested especially in the extraction of the associations from unstructured database, especially full text. The aim of this paper is twofold. First, to propose a conceptual approach, based on the formal concept analysis [GANT99], in order to discover knowledge, formally represented by association rules, from large textual corpus. Second, to introduce an algorithm to derive additional and implicit association rules, using an associated taxonomy, from the already discovered association rules.

A Text Mining Technique Using Association Rules Extraction

International Journal of …, 2007

Abstract—This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst ...

A Survey of Association Rule Mining in Text applications

In data mining, association rule is an eminent research field to discover frequent pattern in data repositories of either real world datasets or synthetic datasets. As an association rule mining has confined in that every rule fulfilling a set of constraints such as minimum support and confidence. The objective of this survey is to discuss the basic techniques of association rule mining and text mining concepts. Also, the various transactions of text documents are available in different data warehouses. Particularly, this analysis is carried some of the text based medical applications. This work is specifies to integrate one of the association rule mining algorithm namely Apriori into text mining in order to find interesting patterns and it can easily understand by visualization techniques.

Mining texts by association rules discovery in a technical corpus

2004

Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo-a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the term-document matrix and frequent phrase extraction using suffix arrays. Finally, we discuss results acquired from an empirical evaluation of the algorithm.

A survey on the use of association rules mining techniques in textual social media

Artificial Intelligence Review

The incursion of social media in our lives has been much accentuated in the last decade. This has led to a multiplication of data mining tools aimed at obtaining knowledge from these data sources. One of the greatest challenges in this area is to be able to obtain this knowledge without the need for training processes, which requires structured information and pre-labelled datasets. This is where unsupervised data mining techniques come in. These techniques can obtain value from these unstructured and unlabelled data, providing very interesting solutions to enhance the decision-making process. In this paper, we first address the problem of social media mining, as well as the need for unsupervised techniques, in particular association rules, for its treatment. We follow with a broad overview of the applications of association rules in the domain of social media mining, specifically, their application to the problems of mining textual entities, such as tweets. We also focus on the str...

Mining the Text using Association Rule Mining Technique

2020

As the amount of text available in electronic form continues to increase at alarming rate, the tools to manage these textual resources effectively will become critical. Information Retrieval System tries to save the users access time by classifying the documents and clustering the documents because users spend a lot of time to find documents or information from texts. Therefore, text mining is the most popular and it is necessary to solve this problem. The largest amount of work in text mining has been in the areas of categorization, classification and clustering of documents. Text mining has many methods to find the useful information. Among these methods, association rule mining is very suitable for finding the most frequent words that occur in the document collection. Association rule analysis is the task of discovering association rules that occur frequently in a given text sets. Our proposed system had been developed by applying the preprocessing steps of text mining system and...

Knowledge Discovery in Text Mining using Association Rule Extraction

International Journal of Computer Applications, 2016

Internet and information technology are the platform where huge amount of information is available to use. But searching the exact information for some knowledge is time consuming and results confusion in dealing with it. Retrieving knowledge manually from collection of web documents and database may cause to miss the track for user. Text mining is helpful to user to find accurate information or knowledge discovery and features in the text documents. Thus there is need to develop text mining approach which clearly guides the user about what is important information and what is not, how to deal with important information, how to generate knowledge etc. Knowledge discovery is an increasing field in the research. For a user reading the collection of documents and get some knowledge is time consuming and less effective. There has been a significant improvement in the research related to generating Knowledge Discovery from collection of documents. We propose a method of generating Knowledge Discovery in Text mining using Association Rule Extraction. Using this approach the users are able to find accurate and important knowledge from the collection of web documents which will reduce time for reading all those documents.

Object-Oriented Data Structure for Text Association Rule Mining

2007

Mining association rules is being actively studied for transaction databases, but extension to text applications is relatively novel. Most of the previous studies implement an Apriori-like approach, which requires multiple passes over the database to find all frequent itemsets. However, for some type of databases such as the bibliographic database where the data are very sparse, the Apriori algorithm becomes very costly. In this paper, we propose a new algorithm called Object-Oriented Association Rule Mining (OOARM), which uses a special objectoriented data structure that holds all relevant information. The algorithm can find all frequent itemsets from a single scan through the database. Performance studies show that it is faster than the Apriori algorithm. All rules that involve a certain itemset can be generated in real time. That, in turn, opens new opportunities to organize and explore relationships in large text data sets.

Developing Extracting Association Rules System from Textual Documents

A new algorithm is proposed for generating association rules based on concepts and it used a data structure of hash table for the mining process. The mathematical formula of weighting schema is presented for labeling the documents automatically and its named fuzzy weighting schema. The experiments are applied on a collection of scientific documents that selected from MEDLINE for breast cancer treatments and side effects. The performance of the proposed system is compared with the previous Apriori-concept system for the execution time and the evaluation of the extracted association rules. The results show that the number of extracted association rules in the proposed system is always less than that in Apriori-concept system. Moreover, the execution time of proposed system is much better than Apriori-concept system in all cases. https://sites.google.com/site/ijcsis/