Language-Based Text Categorization: A Survey (original) (raw)
2021, Digital Techniques for Heritage Presentation and Preservation
Language-based text classification has attracted interest over the years, especially in the fields of business, tourism, hospitality industry, international relations circles, sports, and social media. This chapter surveys various techniques for language-based text classification that have been developed over the years for application in different areas. Classification is performed on multilingual and monolingual text documents. Monolingual documents are classified based on the subject of the contents, whereas multilingual documents are classified based on language. The main techniques used are statistical methods (like regression models, KNN, decision trees, and Bayesian methods) and machine learning methods like neural networks, support vector machines, and deep learning classifiers. Classification performance among these techniques is comparatively evaluated.