A Survey on Text Classification using Machine Learning Algorithms (original) (raw)
Related papers
A Study on Document Classification using Machine Learning Techniques
2014
With the explosion of information fuelled by the growth of the World Wide Web it is no longer feasible for a human observer to understand all the data coming in or even classify it into categories. With this growth of information and simultaneous growth of available computing power automatic classification of data, particularly textual data, gains increasingly high importance. Text classification is a task of automatically sorting a set of documents into categories from a predefined set and is one of the important research issues in the field of text mining. This paper provides a review of generic text classification process, phases of that process and methods being
Document Classification using Various Classification Algorithms: A Survey
–Text classification is used to classify the document of similar types. Text classification can be also performed under supervision i.e. it is an supervised leaning technique Text classification is a process in which documents are sorted spontaneously into different classes using predefined set. The main issue is that large scale of information lacks organization which makes it difficult to manage. Text classification is identified as one of the key methods used for recognizing such types of digital information. Text classification have various applications such as in information retrieval, natural language processing, automatic indexing, text filtering, image processing, etc. Text classification is also used to process the big data and it can also be used to predict the class labels for newly added data. Text classification is also being used in academic and industries to classify the unstructured data. There are various types of the text classification approaches such as decision tree, SVM, Naïve Bayes etc. In this survey paper, we have analysed the various text classification techniques such as decision tree, SVM, Naïve Bayes etc. These techniques have their individual set of advantages which make them suitable in almost all classification jobs. In this paper we have also analysed evaluation parameters such as F-measure, G-measure and accuracy used in various research works. .
A Review of Machine Learning Algorithms for Text-Documents Classification
Journal of Advances in Information Technology, 2010
With the increasing availability of electronic documents and the rapid growth of the World Wide Web, the task of automatic categorization of documents became the key method for organizing the information and knowledge discovery. Proper classification of e-documents, online news, blogs, e-mails and digital libraries need text mining, machine learning and natural language processing techniques to get meaningful knowledge. The aim of this paper is to highlight the important techniques and methodologies that are employed in text documents classification, while at the same time making awareness of some of the interesting challenges that remain to be solved, focused mainly on text representation and machine learning techniques. This paper provides a review of the theory and methods of document classification and text mining, focusing on the existing literature.
Comparative Study for Text Document Classification Using Different Machine Learning Algorithms
2019
Classification is a supervised learning method: the goal is finding the labels of the unknown object. In the real world, the tedious amounts of manual works are required to label the unknown documents. The system is initially trained by labeled documents by using one of the supervise machine learning algorithm and then applied trained model to predict the label of the unknown documents. The framework of text document classification consists of: input text document, pre-processing, feature extraction and classification. The analysis four common classification methods are performed: Naive Bayes, Decision Tree, Support Vector Machine and K-nearest neighbors for text document classification. The main focus of this paper is to present comparative study of different exiting classification methods for text document classification. The experiment performed different classification methods on the Enron Email Dataset and measure classification accuracy, true positive, true negative, false po...
Text Classification Using Machine Learning Techniques: A Comparative Study
— Text mining is drawing enormous attention in this era as there is a huge amount of text data getting generated and it is required very hardly to manage this data to grasp maximum benefit out of it. Text classification is an essential sub-part of text mining where the related text data is assigned to a particular predefined category. In our study, we discussed different classifier techniques which are popularly used in recent years. There is comparison between different classifiers like SVM, Naïve Bayes, Neural Networks etc. which is expressed in a tabular form in this paper.
Document Classification : A Review
2018
As most information is stored as text in web, text document classification is considered to have a high commercial value. Text classification is classifying the documents according to predefined categories. Complexity of natural languages and the very high dimensionality of the feature space of documents have made this classification problem difficult. In this paper we have given the introduction of text classification, process of text classification, overview of the classifiers and compared some existing classifier on basis of few criteria like time principle, merits and demerits.
Text Classification using Data Mining and Machine Learning Techniques: A Brief Review
Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.
Empirical Studies On Machine Learning Based Text Classification Algorithms
Advanced Computing: An International Journal, 2011
Automatic classification of text documents has become an important research issue now days. Proper classification of text documents requires information retrieval, machine learning and Natural language processing (NLP) techniques. Our aim is to focus on important approaches to automatic text classification based on machine learning techniques viz. supervised, unsupervised and semi supervised. In this paper we present a review of various text classification approaches under machine learning paradigm. We expect our research efforts provide useful insights on the relationships among various text classification techniques as well as sheds light on the future research trend in this domain.
Role of machine learning in text classification – An extensive review
International Journal of Advance Research, Ideas and Innovations in Technology, 2021
Cyberspace has elevated business insights and created a virtual space to store all forms of information online. Due to the rapid development in the online world, the usage of digital documents has increased because it is comfortable for the users to share, update or keep track of the records in one place without losing data. However, maintaining massive data does not suit optimal decision-making and is extremely expensive for storage, processing, and collection. There is a gigantic possibility that human annotators make errors while classifying data because of distraction, monotony, fatigue, and failure to meet the requirements. Once the text classification method uses machine learning approaches, the process will execute with fewer mistakes and more accuracy. The main goal of this review paper is to highlight and explain the role of different machine learning methodologies in text classification. Concurrently, this paper describes the challenges faced by other machine learning techniques and text representation. Furthermore, this review paper will provide an extensive survey on how various machine learning techniques such as Neural Networks, Naive Bayes, Logistic Regression, Random Forest, Decision Trees, and Support Vector Machine (SVM)-are implemented in Text classification.
Text Classification Algorithms: A Survey
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in real-world problems are discussed.