Text mining with conceptual graphs (original) (raw)
Related papers
Text mining at detail level using conceptual graphs
… Structures: Integration and …, 2002
Text mining is defined as knowledge discovery in large text collections. It detects interesting patterns such as clusters, associations, deviations, similarities, and differences in sets of texts. Current text mining methods use simplistic representations of text contents, such as keyword vectors, which imply serious limitations on the kind and meaningfulness of possible discoveries. We show how to do some typical mining tasks using conceptual graphs as formal but meaningful representation of texts. Our methods involve qualitative and quantitative comparison of conceptual graphs, conceptual clustering, building a conceptual hierarchy, and application of data mining techniques to this hierarchy in order to detect interesting associations and deviations. Our experiments show that, despite widespread misbelief, detailed meaningful mining with conceptual graphs is computationally affordable. * Work done under partial support of CONACyT, CGEPI-IPN, and SNI, Mexico. patterns, i.e., those distinguishing not only entities (topics) but also actions, attributes and their relations.
Concept Based Mining Model for Text Clustering
2013
The common techniques in text mining are based on the statistical analysis of a term either word or phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. Two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. A new concept-based mining model that analyzes terms in the sentence, document level and corpus level is introduced. The concept based mining model can effectively discriminate between non important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of sentence-based concept analysis, document-based concept analysis, corpus based concept analysis and concept-based similarity measure in calculating the similarity between documents.
An Efficient Concept-Based Mining Model for Enhancing Text Clustering
The common techniques in text mining are based on the statistical analysis of a term, either word or phrase.Text is represented by the words it mentions, and thematic similarity is based on the proportion of words that texts have in common. The complex is constructed using groups of co-occurring words (term associations) identified using traditional data mining methods. Disjoint subsections of the complex (connect components) represent general concepts within the documents' concept space. A new concept-based mining model composed of four components, is proposed to improve the text clustering quality. By exploiting the semantic structure of the sentences in documents, a better text clustering result is achieved.
Enhancing text clustering using concept-based mining model
2006
Most of text mining techniques are based on word and/or phrase analysis of the text. The statistical analysis of a term (word or phrase) frequency captures the importance of the term within a document. However, to achieve a more accurate analysis, the underlying mining technique should indicate terms that capture the semantics of the text from which the importance of a term in a sentence and in the document can be derived. A new concept-based mining model that relies on the analysis of both the sentence and the document, rather than, the traditional analysis of the document dataset only is introduced. The proposed mining model consists of a concept-based analysis of terms and a concept-based similarity measure. The term which contributes to the sentence semantics is analyzed with respect to its importance at the sentence and document levels. The model can efficiently find significant matching terms, either words or phrases, of the documents according to the semantics of the text. The similarity between documents relies on a new concept-based similarity measure which is applied to the matching terms between documents. Experiments using the proposed concept-based term analysis and similarity measure in text clustering are conducted. Experimental results demonstrate that the newly developed concept-based mining model enhances the clustering quality of sets of documents substantially.
Concept Mining using Conceptual Ontological Graph (COG)
Concept mining (CM) is the area of exploring and finding links, associations, relationships, and patterns among huge collections of information. In this paper, we propose concept-based text representation, with an emphasis on using the proposed representation in different application s such as information retrieval, text summarization, and question answering. This work presents a new paradigm for concept mining by extracting the concept-based information from a raw text. At the text representation level, we introduce a sentence based conceptual ontological representation that builds concept-based representations for the whole document. A new concept-based similarity measure is proposed to measure the similarity of texts based on their meaning. The proposed approach is domain independent and it could be applied to general domain applications. The proposed approach has been applied to the domain of information retrieval and preliminary results are promising, and give an affirmation for proceeding in the right directions of this research.
A Consistent Web Documents Based Text Clustering Using Concept Based Mining Model
2012
Text mining is a growing innovative field that endeavors to collect significant information from natural language processing term. It might be insecurely distinguished as the course of examining texts to extract information that is practical for particular purposes. In this case, the mining model can detain provisions that identify the concepts of the sentence or document, which tends to detect the subject of the document. In an existing work, the concept-based mining model is used only for normal text documents clustering and clustered the text parts of the documents and efficiently discover noteworthy identical concepts among documents, according to the semantics of the sentences. But the downside of the work is that the existing work cannot be linked to web documents clustering and the text classification for the documents is an unreliable one. To make the text clustering more consistent, in our work, we plan to present a Conceptual Rule Mining On Text clusters to evaluate the mo...
Efficient Conceptual Rule Mining on Text Clusters in Web Documents
International Journal of Computer Applications, 2012
Text mining is a modern and computational approach attempts to determine new, formerly unidentified information by pertaining techniques from normal language processing and data mining. Clustering, one of the conventional data mining techniques is an unsubstantiated learning pattern where clustering techniques attempt to recognize intrinsic groupings of the text documents, so that a set of clusters is formed in which clusters reveal high intra-cluster comparison and low inter-cluster similarity. Most current document clustering methods are based on the Vector Space Model (VSM), which is a widely used data representation for text classification and clustering. Moreover, weighting these features accurately also affects the result of the clustering algorithm substantially. The previous work described the conceptual text clustering to web documents, containing various mark up language formats associated with the documents (term extraction mode). In this work, we are going to present a Conceptual rule mining which is generated for the sentence meaning and related sentences in the document. Weights are appropriated for the sentences having higher contribution to the topic of the document. Conditional probability is evaluated for the sentence weights. Probability ratio is identified for the sentence similarity from which unique sentence meaning contributing to the document topic are listed. Experiments are conducted with the web documents extracted from the research repositories to evaluate the efficiency of the proposed efficient conceptual rule mining on text clusters in web documents and compared with an existing Model for Concept Based Clustering and Classification in terms of Topic related rules, Weights of the influential sentence, Topic Sensitivity..
Graph-based hierarchical conceptual clustering
The Journal of Machine Learning …, 2002
Hierarchical conceptual clustering has proven to be a useful, although under-explored, data mining technique. A graph-based representation of structural information combined with a substructure discovery technique has been shown to be successful in knowledge discovery. The SUBDUE substructure discovery system provides one such combination of approaches. This work presents SUBDUE and the development of its clustering functionalities. Several examples are used to illustrate the validity of the approach both in structured and unstructured domains, as well as to compare SUBDUE to the Cobweb clustering algorithm. We also develop a new metric for comparing structurally-defined clusterings. Results show that SUBDUE successfully discovers hierarchical clusterings in both structured and unstructured data.
A ConceptLink Graph for Text Structure Mining
Australasian Computer Science Conference, 2009
Most text mining methods are based on representing doc- uments using a vector space model, commonly known as a bag of word model, where each document is modeled as a linear vector representing the occurrence of independent words in the text corpus. It is well known that using this vector-based representation, important information, such as semantic relationship among concepts, is
Mining Conceptual Graphs for Knowledge Acquisition
2008
This work addresses the use of computational linguistic analysis techniques for conceptual graphs learning from unstructured texts. A technique including both content mining and interpretation, as well as clustering and data cleaning, is introduced. Our proposal exploits sentence structure in order to generate concept hypothese, rank them according to plausibility and select the most credible ones. It enables the knowledge acquisition task to be performed without supervision, minimizing the possibility of failing to retrieve information contained in the document, in order to extract non-taxonomic relations.