Document classification utilising ontologies and relations between documents (original) (raw)
Related papers
Impact of an ontology for automatic text classification
Annals of Library and Information Studies, 2015
The concept of ontologies has widely been used in various applications including email filtering and electronic news classification. It can be also used for the classification of digital documents in a library. Advancing the accuracy of classification is the main purpose of using ontologies for classification. Documents may be difficult to understand due to the vague terms used in the text. However, since ontologies represent the semantic relationships of the terms, they can be used to correctly identify the subject of a document. This study made an attempt to improve the classification accuracy of an automatic text classification system by using an ontology. Classification results given by the automatic system with and without integrating the ontology were used to evaluate the impact of the ontology for automatic classification. Results showed that 32.76% more documents and 25% more subjects were correctly classified by the ontology based system than the system prior to use of ontology.
Ontology-driven Conceptual Document Classification
Document classification based on the lexical-semantic network, wordnet, is presented. Two types of document classification in Serbian have been experimented with-classification based on chosen concepts from Serbian WordNet (SWN) and proper names-based classification. Conceptual document classification criteria are constructed from hierarchies rooted in a set of chosen concepts (first case) or in hierarchies rooted in some of the proper names' hypernyms (second case). A classificator of the first type is trained and then tested on an indexed and already classified Ebart corpus of Serbian newspapers (476917 articles). Precision, recall and F-measure show that this type of classification is promising although incomplete due mainly to SWN incompleteness. In the context of proper names-based classification, a proper names ontology based on the SWN is presented in the paper. A distance based similarity measure is defined, based on Euclidean and Manhattan distances. Classification of a subset of Contemporary Serbian Language Corpus is presented.
Altering document term vectors for classification: ontologies as expectations of co-occurrence
2007
Abstract In this paper we extend the state-of-the-art in utilizing background knowledge for supervised classification by exploiting the semantic relationships between terms explicated in Ontologies. Preliminary evaluations indicate that the new approach generally improves precision and recall, more so for hard to classify cases and reveals patterns indicating the usefulness of such background knowledge.
Automatic Document Classification Using a Domain Ontology
Automatic classification has become an important research area due to the rapid increase of digital information. Evidently, manual classification of documents is a tough work due to occurrences of vocabulary ambiguities of classification schemes as well as the language used in the text in hand. In our study, we made an attempt to resolve this matter. This research has developed a computer program that can automatically classify a given text document based on a well developed ontology. Therefore, the user gets correct options of classification just after feeding the document to the new system. The new ontology is a domain ontology which is based on the Dewey Decimal Classification scheme and the Sears List. Data was obtained for classification accuracy for both manual and automatic methods. Moreover, the relationship between the vagueness of language in documents and the inaccuracy of classification were determined against manual classification and manual classification with automatic aid. The research revealed that the results were more accurate when the newly developed automatic system was used. In addition, it also revealed that the vagueness of documents affects more in manual classification than in manual classification with the automatic aid.
Towards Ontology-Based web text Document Classification
Journal of Engineering Science and Military Technologies
The data on the web is generally stored in structured, semi-structured and unstructured formats; from the survey the most of the information of an organization is stored in unstructured textual form .so, the task of categorizing this huge number of unstructured web text documents has become one of the most important tasks when dealing with web. Categorization, Classification, of web text documents aims in assigning one or more class labels, Categories, to the un-labeled ones; the assignment process depends mainly on the contents of the document itself with the help of using one or more of machine learning techniques. Different learning algorithms have been applied on the content of text documents for the classification process. In this paper experiments uses a subset of Reuters-21578 dataset to highlight the leakage and limitations of traditional techniques for feature generation and dimensionality reduction, showing the results of classification accuracy, and F-measure when applying different classification algorithms.
2013
Automatic classification of documents has become an important research area due to the exponential growth of digital content and because manual or semi-automatic organization is not effective. On one hand, manual and semi-automatic classification is very painstaking and labor-intensive. On the other hand, misclassifications due to vagueness of the documents and classification schemes are inevitable in these two methods. Hence, the current study sought to shed a light on these issues. This research proposes an automated system that can completely classify a given text document by minimizing the vocabulary ambiguities. One of our previous studies has developed a semi-automatic system for document classification and here we propose to extend it furthermore to obtain a fully automatic document classification system.
Semantic Ontology-Based Approach to Enhance Text Classification
2021
Text Classification is the process of defining a collection of pre-defined classes to free-text. It has been one of the most researched areas in machine learning with various applications such as sentiment analysis, topic labeling, language detection and spam filter etc. The efficiency of text classification improves, when some relation or pattern in the data is given or known, which can be provided by ontology. It further helps in reducing the size of dataset. Ontology is a collection of data items that helps in storing and representing data in a way that preserves the patterns in it and its semantic relationship with each other. We have attempted to verify the improvement provided by the use of ontology in classification algorithms. The code prepared in this research and the method developed is pretty generic, and could be extended to any ontology based text classification system. In this paper, we present an enhanced architecture that can uses ontology to provide an effective tex...
Ontology Evaluation through Text Classification
2009
We present a new method to evaluate a search ontology, which relies on mapping ontology instances to textual documents. On the basis of this mapping, we evaluate the adequacy of ontology relations by measuring their classification potential over the textual documents. This data-driven method provides concrete feedback to ontology maintainers and a quantitative estimation of the functional adequacy of the ontology relations towards search experience improvement. We specifically evaluate whether an ontology relation can help a semantic search engine support exploratory search. We test this ontology evaluation method on an ontology in the Movies domain, that has been acquired semi-automatically from the integration of multiple semi-structured and textual data sources (e.g., IMDb and Wikipedia). We automatically construct a domain corpus from a set of movie instances by crawling the Web for movie reviews (both professional and user reviews). The 1-1 relation between textual documents (reviews) and movie instances in the ontology enables us to translate ontology relations into text classes. We verify that the text classifiers induced by key ontology relations (genre, keywords, actors) achieve high performance and exploit the properties of the learned text classifiers to provide concrete feedback on the ontology. The proposed ontology evaluation method is general and relies on the possibility to automatically align textual documents to ontology instances.