Improving sentiment analysis performance on morphologically rich languages: Language and domain independent approach (original) (raw)
Related papers
Hybrid Sentiment Analysis Framework for a Morphologically Rich Language
This paper presents a process of building a Sentiment Analysis Framework for Serbian (SAFOS). We created a hybrid method that uses a sentiment lexicon and Serbian WordNet (SWN) synsets assigned with sentiment polarity scores in the process of feature selection. As the use of stemming for morphologically rich languages (MRLs) may result in loss or giving incorrect sentiment meaning to words, we decided to expand the sentiment lexicon, as well as the lexicon generated using SWN, by adding morphological forms of emotional terms and phrases. It was done using Serbian Morphological Electronic Dictionaries. A new feature reduction method for document-level sentiment polarity classification using maximum entropy modeling is proposed. It is based on mapping of a large number of related feature candidates (sentiment words, phrases and their inflectional forms) to a few concepts and using them as features. Testing was performed on a 10-fold cross validation set and on test sets containing news and movie reviews. The results of all experiments show that sentiment feature mapping for feature set reduction achieves better results over the basic set of features. For both test sets, the best classification accuracy scores were achieved for the combination of unigram and bigram features reduced by sentiment feature mapping (accuracy 78.3 % for movie reviews and 79.2 % for news test set). In 10-fold cross-validation, best average accuracy score of 95.6 % was obtained using unigrams as features, reduced by the mapping procedure.
Sentiment Classification of Documents in Serbian: The Effects of Morphological Normalization
Proceedings of the 24th Telecommunications forum (TELFOR 2016), 2016
Sentiment classification of texts written in Serbian is still an under-researched topic. One of the open issues is how the different forms of morphological normalization affect the performances of different sentiment classifiers and which normalization procedure is optimal for this task. In this paper we assess and compare the impact of lemmatizers and stemmers for Serbian on classifiers trained and evaluated on the Serbian Movie Review Dataset.
2017
Sentiment analysis and opinion mining have become<br> emerging topics of research in recent years but most of the work<br> is focused on data in the English language. A comprehensive<br> research and analysis are essential which considers multiple<br> languages, machine translation techniques, and different classifiers.<br> This paper presents, a comparative analysis of different approaches<br> for multilingual sentiment analysis. These approaches are divided<br> into two parts: one using classification of text without language<br> translation and second using the translation of testing data to a<br> target language, such as English, before classification. The presented<br> research and results are useful for understanding whether machine<br> translation should be used for multilingual sentiment analysis or<br> building language specific sentiment classification systems is a better<br> approach. The effect...
The Impact of NLP on Turkish Sentiment Analysis
2014
Sentiment analysis on English texts is a highly popular and well-studied topic. On the other hand, the research in this field for morphologically rich languages is still in its infancy. Turkish is an agglutinative language with a very rich morphological structure. For the first time in the literature, this paper investigates and reports the impact of the natural language preprocessing layers on the sentiment analysis of Turkish social media texts. The experiments show that the sentiment analysis performance may be improved by nearly 5 percentage points yielding a success ratio of 78.83% on the used data set.
Domain-neutral, Linguistically-motivated Sentiment Analysis: a performance evaluation
Within the field of sentiment analysis it has become commonplace the assertion that successful results depend to a large extent on developing systems specifically designed for a particular subject domain. In this paper we challenge this view by evaluating a domain-independent sentiment analysis system against a multiple-domain opinion corpus. The results show that high performance can be achieved by relying entirely on high quality, manually acquired, linguistic knowledge. Resumen: En el campo del análisis de sentimiento es común encontrar la afirmación de que para obtener buenos resultados es necesario emplear sistemas específicamente diseñados para un dominio temático en particular. En este trabajo ofrecemos una visión opuesta mediante la evaluación de un sistema de análisis de sentimiento independiente del dominio, que realizamos utilizando un corpus de opinión de múltiples dominios. Los resultados muestran que es posible obtener un alto rendimiento empleando exclusivamente recur...