Multilingual Sentiment Analysis: State of the Art and Independent Comparison of Techniques (original) (raw)

Erratum to: Multilingual Sentiment Analysis: State of the Art and Independent Comparison of Techniques

Cognitive Computation, 2016

With the advent of the internet, people actively express their opinions about products, services, events, political parties, etc., in social media, blogs, and website comments. The amount of research work on sentiment analysis is growing explosively. However, the majority of research efforts are devoted to English language data, while a great share of information is available in other languages. We present a state-of-the-art review on multilingual sentiment analysis. More importantly, we compare our own implementation of existing state-of-the-art approaches on common data. Precision observed in our experiments is typically lower than that reported by the original authors, which we attribute to lack of detail in the original presentation of those approaches. Thus, we compare the existing works by what they really offer to the reader, including whether they allow for accurate implementation and for reliable reproduction of the reported results.

Multilingual Sentiment Analysis

2020

Sentiment analysis has empowered researchers and analysts to extract opinions of people regarding various products, services, events and other entities. This has been made possible due to an astronomical rise in the amount of text data being made available on the Internet, not only in English but also in many regional languages around the world as well, along with the recent advancements in the field of machine learning and deep learning. It has been observed that deep learning models produce the state-of-the-art prediction results without the need for domain expertise or handcrafted feature engineering, unlike traditional machine learning-based algorithms. In this chapter, we wish to focus on sentiment analysis of various low resource languages having limited sentiment analysis resources such as annotated datasets, word embeddings and sentiment lexicons, along with English. Techniques to refine word embeddings for sentiment analysis and improve word embedding coverage in low resour...

A Literature Survey on Multilingual Sentiment Analysis

Sentiment analysis which often goes by the name opinion mining is one of the prominent field in lots of research is going on due to its endless application like social media monitoring, product reviews etc. But due to the prominent use of social media the use of multilingual statements has become most common as user tends to in their own comfort zone. These multilingual statement arises due the use of more than one language to make a statement. Due to lack of clear grammatical structure it is very difficult to find correct sentiment out of it. We present some techniques which can be used to analyse these multilingual statement correctly.

A comparative study of machine translation for multilingual sentence-level sentiment analysis

Information Sciences, 2019

Sentiment analysis has become a key tool for several social media applications, including analysis of user's opinions about products and services, support for politics during campaigns and even for market trending. Multiple existing sentiment analysis methods explore different techniques, usually relying on lexical resources or learning approaches. Despite the significant interest in this theme and amount of research efforts in the field, almost all existing methods are designed to work with only English content. Most current strategies in many languages consist of adapting existing lexical resources, without presenting proper validations and basic baseline comparisons. In this work, we take a different step into this field. We focus on evaluating existing efforts proposed to do language specific sentiment analysis with a simple yet effective baseline approach. To do it, we evaluated sixteen methods for sentence-level sentiment analysis proposed for English, comparing them with three language-specific methods. Based on fourteen human labeled language-specific datasets, we provide an extensive quantitative analysis of existing multi-language approaches. Our primary results suggest that simply translating the input text on a specific language to English and then using one of the existing best methods developed to English can be better than the existing language specific efforts evaluated. We also rank methods according to their prediction performance and we identified the methods that acquired the best results using machine translation across different languages. As a final contribution to the research community, we release our codes, datasets, and the iFeel 3.0 system, a web framework for multilingual sentence-level sentiment analysis. We hope our system setups a new baseline for future sentence-level methods developed in a wide set of languages.

Sentiment Analysis with a Multilingual Pipeline

Lecture Notes in Computer Science, 2011

Sentiment analysis refers to retrieving an author's sentiment from a text. We analyze the differences that occur in sentiment scoring across languages. We present our experiments for the Dutch and English language based on forum, blog, news and social media texts available on the Web, where we focus on the differences in the use of a language and the effect of the grammar of a language on sentiment analysis. We propose a multilingual pipeline for evaluating how an author's sentiment is conveyed in different languages. We succeed in correctly classifying positive and negative texts with an accuracy of approximately 71% for English and 79% for Dutch. The evaluation of the results shows however that usage of common expressions, emoticons, slang language, irony, sarcasm, and cynicism, acronyms and different ways of negation in English prevent the underlying sentiment scores from being directly comparable.

The Challenges of Multi-dimensional Sentiment Analysis Across Languages

2016

This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of social media content. We use parallel corpora of movie subtitles as a proxy for colloquial language in social media channels and a multilingual emotion lexicon for fine-grained sentiment analyses. Parallel data sets make it possible to study the preservation of sentiments and emotions in translation and our assessment reveals that the lexical approach shows great inter-language agreement. However, our manual evaluation also suggests that the use of purely lexical methods is limited and further studies are necessary to pinpoint the cross-lingual differences and to develop better sentiment classifiers.

Multilingual Sentiment Analysis: A Systematic Literature Review

Pertanika Journal of Science and Technology

With the explosive growth of social media, the online community can freely express their opinions without disclosing their identities. People with hidden agendas can easily post fake opinions to discredit target products, services, politicians, or organizations. With these big data, monitoring opinions and distilling their sentiments remain a formidable task because of the proliferation of diverse sites with a large volume of opinions that are portrayed in multilingual. Therefore, this paper aims to provide a systematic literature review on multilingual sentiment analysis, which summarises the common languages supported in multilingual sentiment analysis, pre-processing techniques, existing sentiment analysis approaches, and evaluation models that have been used for multilingual sentiment analysis. By following the systematic literature review, the findings revealed, most of the models supported two languages, and English is seen as the most used language in sentiment analysis studi...

A Survey of Cross-lingual Sentiment Analysis: Methodologies, Models and Evaluations

Data Science and Engineering

Cross-lingual sentiment analysis (CLSA) leverages one or several source languages to help the low-resource languages to perform sentiment analysis. Therefore, the problem of lack of annotated corpora in many non-English languages can be alleviated. Along with the development of economic globalization, CLSA has attracted much attention in the field of sentiment analysis and the last decade has seen a surge of researches in this area. Numerous methods, datasets and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey. This paper fills the gap by reviewing the state-of-the-art CLSA approaches from 2004 to the present. This paper teases out the research context of cross-lingual sentiment analysis and elaborates the following methods in detail: (1) The early main methods of CLSA, including those based on Machine Translation and its improved variants, parallel corpora or bilingual sentiment lexicon; (2) CLSA based on cross-lingua...

Sentiment Analysis: Comparative Analysis Of Multilingual Sentiment And Opinion Classification Techniques

2017

Sentiment analysis and opinion mining have become<br> emerging topics of research in recent years but most of the work<br> is focused on data in the English language. A comprehensive<br> research and analysis are essential which considers multiple<br> languages, machine translation techniques, and different classifiers.<br> This paper presents, a comparative analysis of different approaches<br> for multilingual sentiment analysis. These approaches are divided<br> into two parts: one using classification of text without language<br> translation and second using the translation of testing data to a<br> target language, such as English, before classification. The presented<br> research and results are useful for understanding whether machine<br> translation should be used for multilingual sentiment analysis or<br> building language specific sentiment classification systems is a better<br> approach. The effect...

A Review on Multi-Lingual Sentiment Analysis by Machine Learning Methods

Journal of Engineering Science and Technology Review, 2020

The arrival of e-commerce and the multitude of information presented by the web have established the internet as a principal destination for consumers looking for truthful opinions and multiple viewpoints for some product, news, topic, or trend in the markets. Thus, it is desirable to make this search easier by using systems which sift through the mass of data and summarize the available opinions for easy understanding of the seeker. This task, known as sentiment analysis, is currently a prominent area of research. Sentiment analysis can be useful for businesses, data analysts and data scientists, as well as customers. Even though many methods are designed to perform this task on English data, there is a lack of systems that can analyze data in other languages. This paper attempts to provide a detailed study on the sentiment analysis methods applied on languages other than English. The tools used, pros and cons, and efficiency of all methods is covered. The associated challenges are also discussed. The paper covers methods that analyze translated data as well as methods that analyze available data in the target language.