Concept Mining using Conceptual Ontological Graph (COG) (original) (raw)
Related papers
Concept and Term Based Similarity Measure for Text Classification and Clustering
The exploitation of syntactic structures and semantic background knowledge has always been an appealing subject in the context of data mining, text retrieval and information management. The usefulness of this kind of information has been shown most prominently in highly specialized tasks, such as text categorization scenarios. So far, however, additional syntactic or semantic information has been used only individually. In this paper, a new principle approach , the concept and term based similarity measure, which incorporates linguistic and semantic structures, using syntactic dependencies, and semantic background knowledge is proposed. This novel method represents the meaning of texts in a high-dimensional space of concepts derived from WordNet. A number of case studies have been included in the research to demonstrate the various aspects of this framework.
Information Retrieval on Text using Concept Similarity
— Retrieving proper information from internet is a huge task due to the high amount of information available there. Identifying the individual concepts according to the queries is time consuming. To retrieve documents, keyword based retrieval method was used before. Using this type searching, the relationship between associated keywords can't be identified. If the same concept is described by different keywords, inaccurate and improper results will be retrieved. Concept based retrieval methods are the solution for this scenario. This gives the benefit of getting semantic relationships among concepts in finding relevant documents. Irrelevant documents can be eliminated by detecting conceptual mismatches, which is another benefit obtained from this. The main challenges identified are the ambiguity occurring due to multiple nature of words for the same concepts. Semantic analysis can reveal the conceptual relationships among words in a given document. In this paper the potential of concept-based information access via semantic analysis is explored with the help of a lexical database called WordNet. The mechanism is applied in the selected text documents and extracting the Synonym, Hyponym, Hypernym of each word from WordNet. The ranking will be calculated after checking the frequency rate of each word in the input documents and a hierarchy model will be generated according to the ranking.
Concept Based Mining Model for Text Clustering
2013
The common techniques in text mining are based on the statistical analysis of a term either word or phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. Two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. A new concept-based mining model that analyzes terms in the sentence, document level and corpus level is introduced. The concept based mining model can effectively discriminate between non important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of sentence-based concept analysis, document-based concept analysis, corpus based concept analysis and concept-based similarity measure in calculating the similarity between documents.
Enhancing text clustering using concept-based mining model
2006
Most of text mining techniques are based on word and/or phrase analysis of the text. The statistical analysis of a term (word or phrase) frequency captures the importance of the term within a document. However, to achieve a more accurate analysis, the underlying mining technique should indicate terms that capture the semantics of the text from which the importance of a term in a sentence and in the document can be derived. A new concept-based mining model that relies on the analysis of both the sentence and the document, rather than, the traditional analysis of the document dataset only is introduced. The proposed mining model consists of a concept-based analysis of terms and a concept-based similarity measure. The term which contributes to the sentence semantics is analyzed with respect to its importance at the sentence and document levels. The model can efficiently find significant matching terms, either words or phrases, of the documents according to the semantics of the text. The similarity between documents relies on a new concept-based similarity measure which is applied to the matching terms between documents. Experiments using the proposed concept-based term analysis and similarity measure in text clustering are conducted. Experimental results demonstrate that the newly developed concept-based mining model enhances the clustering quality of sets of documents substantially.
Text mining with conceptual graphs
2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236), 2001
A method for conceptual clustering of a collection of texts represented with conceptual graphs is presented. It uses the incremental strategy to construct the cluster hierarchy and incorporates some characteristics attractive for text mining proposes. For instance, it considers the structural information of the graphs, uses domain knowledge to detect the clusters with generalized descriptions, and uses a user-defined similarity measure between the graphs.
Text Summarization Based on Conceptual Data Classification
2006
In this article, we present an original approach for text summarization using conceptual data classification. We show how a given text can be summarized without losing meaningful knowledge and without using any semantic or grammatical concepts. In fact, concept date classification is used to extract the most interacting sentences from the main text and ignoring the other meaningless sentences in order to generate the text summary. The approach is tested on Arabic and English texts with different sizes and different topics and the obtained results are satisfactory. The system may be incorporated with the indexers of search engines over the Internet in order to find key words and other pertinent information of the new deployed Web pages that would be stored in databases for quick search.
A ConceptLink Graph for Text Structure Mining
Australasian Computer Science Conference, 2009
Most text mining methods are based on representing doc- uments using a vector space model, commonly known as a bag of word model, where each document is modeled as a linear vector representing the occurrence of independent words in the text corpus. It is well known that using this vector-based representation, important information, such as semantic relationship among concepts, is
2013
Semantic similarity is a way of analyzing the perfect synonym t hat exists between wordpairs. This measure is necessary to detect the degree of relationship that persists within wordpairs. To compute the semantic similarity that lies between a wordpair, clustering and classification augmented with semantic similarity (CCASS) was developed. CCASS is a novel method that uses page counts and text snippets returned by search engine. Several similarity measures are defined using the page counts of word� pairs. Lexical pattern clustering is applied on text snippets, obtained from search engine. These are fed to the support vector machine (SVM) which computes the semantic similarity that exists between wordpairs. Based on this value obtained from the support vector machine, Simple KMeans clustering algorithm is used to form clusters. Upcoming wordpairs can be classified, after computation of its semantic similarity measure. If it does match with the existing clusters, a new cluster may be ...
Exploring an ontology via text similarity: an experimental study
In this paper we consider the problem of retrieving the concepts of an ontology that are most relevant to a given textual query. In our setting the concepts are associated with textual fragments, such as labels, descriptions, and links to other relevant concepts. The main task to be solved is the definition of a similarity measure between the single text of the query and the set of texts associated with an ontology concept. We experimentally study this problem on a particular scenario with a socio-pedagogic domain ontology and Italian language texts. We investigate how the basic cosine similarity measure on the bag-of-words text representations can be improved in three distinct ways by (i) taking into account the context of the ontology nodes, (ii) using the linear combination of various measures, and (iii) exploiting semantic resources. The experimental evaluation confirms the improvement of the presented methods upon the baseline. Beside discussing some issues to consider in applying these methods, we point out some directions for further improvement.