Sentiment Analysis with a Multilingual Pipeline (original) (raw)
Related papers
Experiments in Cross-Lingual Sentiment Analysis in Discussion Forums
Lecture Notes in Computer Science, 2012
One of the objectives of sentiment analysis is to classify the polarity of conveyed opinions from the perspective of textual evidence. Most of the work in the field has been intensively applied to the English language and only few experiments have explored other languages. In this paper, we present a supervised classification of posts in French online forums where sentiment analysis is based on shallow linguistic features such as POS tagging, chunking and common negation forms. Furthermore, we incorporate word semantic orientation extracted from the English lexical resource SentiWordNet as an additional feature. Since SentiWord-Net is an English resource, lexical entries in the studied French corpus should be translated into English. For this purpose, we propose a number of French to English translation experiments such as machine translation and WordNet synset translation using EuroWordNet. Obtained results show that WordNet synset translation have not significantly improved the classification performance with respect to the bag of words baseline due to the shortage in coverage. Automatic translation haven't either significantly improved the results due to its insufficient quality. Propositions of improving the classification performance are given by the end of the article.
Cognitive Computation, 2016
With the advent of the internet, people actively express their opinions about products, services, events, political parties, etc., in social media, blogs, and website comments. The amount of research work on sentiment analysis is growing explosively. However, the majority of research efforts are devoted to English language data, while a great share of information is available in other languages. We present a state-of-the-art review on multilingual sentiment analysis. More importantly, we compare our own implementation of existing state-of-the-art approaches on common data. Precision observed in our experiments is typically lower than that reported by the original authors, which we attribute to lack of detail in the original presentation of those approaches. Thus, we compare the existing works by what they really offer to the reader, including whether they allow for accurate implementation and for reliable reproduction of the reported results.
Multilingual Sentiment Analysis: State of the Art and Independent Comparison of Techniques
Cognitive Computation, 2016
With the advent of Internet, people actively express their opinions about products, services, events, political parties, etc., in social media, blogs, and website comments. The amount of research work on sentiment analysis is growing explosively. However, the majority of research efforts are devoted to English-language data, while a great share of information is available in other languages. We present a state-of-the-art review on multilingual sentiment analysis. More importantly, we compare our own implementation of existing approaches on common data. Precision observed in our experiments is typically lower than the one reported by the original authors, which we attribute to the lack of detail in the original presentation of those approaches. Thus, we compare the existing works by what they really offer to the reader, including whether they allow for accurate implementation and for reliable reproduction of the reported results.
A machine learning approach to sentiment analysis in multilingual Web texts
Information Retrieval, 2009
Sentiment analysis, also called opinion mining, is a form of information extraction from text of growing research and commercial interest. In this paper we present our machine learning experiments with regard to sentiment analysis in blog, review and forum texts found on the World Wide Web and written in English, Dutch and French. We train from a set of example sentences or statements that are manually annotated as positive, negative or neutral with regard to a certain entity. We are interested in the feelings that people express with regard to certain consumption products. We learn and evaluate several classification models that can be configured in a cascaded pipeline. We have to deal with several problems, being the noisy character of the input texts, the attribution of the sentiment to a particular entity and the small size of the training set. We succeed to identify positive, negative and neutral feelings to the entity under consideration with ca. 83% accuracy for English texts based on unigram features augmented with linguistic features. The accuracy results of processing the Dutch and French texts are ca. 70% and 68% respectively due to the larger variety of the linguistic expressions that more often diverge from standard language, thus demanding more training patterns. In addition, our experiments give us insights into the portability of the learned models across domains and languages. A substantial part of the article investigates the role of active learning techniques for reducing the number of examples to be manually annotated.
MuSES: a Multilingual Sentiment Elicitation System for Social Media Data
IEEE Intelligent Systems, 2000
We develop MuSES, a multilingual sentiment identification system, which implements three different sentiment identification algorithms. Our first algorithm augments previous compositional semantic rules by adding social media-specific rules. In the second algorithm, we define a scoring function that measures the degree of a sentiment, instead of simply classifying a sentiment into binary polarities. All such scores are calculated based on a large volume of customer reviews. Due to the special characteristics of social media texts, we propose a third algorithm, which takes emoticons, negation word position, and domain-specific words into account. In addition, we propose a label-free process to transfer multilingual sentiment knowledge between different languages. We conduct our experiments on user comments from Facebook, tweets from Twitter, and multilingual product reviews from Amazon.
A comparative study of machine translation for multilingual sentence-level sentiment analysis
Information Sciences, 2019
Sentiment analysis has become a key tool for several social media applications, including analysis of user's opinions about products and services, support for politics during campaigns and even for market trending. Multiple existing sentiment analysis methods explore different techniques, usually relying on lexical resources or learning approaches. Despite the significant interest in this theme and amount of research efforts in the field, almost all existing methods are designed to work with only English content. Most current strategies in many languages consist of adapting existing lexical resources, without presenting proper validations and basic baseline comparisons. In this work, we take a different step into this field. We focus on evaluating existing efforts proposed to do language specific sentiment analysis with a simple yet effective baseline approach. To do it, we evaluated sixteen methods for sentence-level sentiment analysis proposed for English, comparing them with three language-specific methods. Based on fourteen human labeled language-specific datasets, we provide an extensive quantitative analysis of existing multi-language approaches. Our primary results suggest that simply translating the input text on a specific language to English and then using one of the existing best methods developed to English can be better than the existing language specific efforts evaluated. We also rank methods according to their prediction performance and we identified the methods that acquired the best results using machine translation across different languages. As a final contribution to the research community, we release our codes, datasets, and the iFeel 3.0 system, a web framework for multilingual sentence-level sentiment analysis. We hope our system setups a new baseline for future sentence-level methods developed in a wide set of languages.
2017
Sentiment analysis and opinion mining have become<br> emerging topics of research in recent years but most of the work<br> is focused on data in the English language. A comprehensive<br> research and analysis are essential which considers multiple<br> languages, machine translation techniques, and different classifiers.<br> This paper presents, a comparative analysis of different approaches<br> for multilingual sentiment analysis. These approaches are divided<br> into two parts: one using classification of text without language<br> translation and second using the translation of testing data to a<br> target language, such as English, before classification. The presented<br> research and results are useful for understanding whether machine<br> translation should be used for multilingual sentiment analysis or<br> building language specific sentiment classification systems is a better<br> approach. The effect...
Multilingual Sentiment Analysis
2020
Sentiment analysis has empowered researchers and analysts to extract opinions of people regarding various products, services, events and other entities. This has been made possible due to an astronomical rise in the amount of text data being made available on the Internet, not only in English but also in many regional languages around the world as well, along with the recent advancements in the field of machine learning and deep learning. It has been observed that deep learning models produce the state-of-the-art prediction results without the need for domain expertise or handcrafted feature engineering, unlike traditional machine learning-based algorithms. In this chapter, we wish to focus on sentiment analysis of various low resource languages having limited sentiment analysis resources such as annotated datasets, word embeddings and sentiment lexicons, along with English. Techniques to refine word embeddings for sentiment analysis and improve word embedding coverage in low resour...
Towards Cross-Language Sentiment Analysis through Universal Star Ratings
Advances in Intelligent Systems and Computing, 2013
The abundance of sentiment-carrying user-generated content renders automated cross-language information monitoring tools crucial for today's businesses. In order to facilitate cross-language sentiment analysis, we propose to compare the sentiment conveyed by unstructured text across languages through universal star ratings for intended sentiment. We demonstrate that the way natural language reveals people's intended sentiment differs across languages. The results of our experiments with respect to modeling this relation for both Dutch and English by means of a monotone increasing step function mainly suggest that language-specific sentiment scores can separate universal classes of intended sentiment from one another to a limited extent.
Development of Multilingual Social Media Data Corpus for Sentiment Classification
2019
The purpose of this study is manual annotating, a corpus for Bahasa Indonesia and Bahasa Melayu. Corpus for both languages has been made by many researchers before, but the focus of this research is only on words with the same vocabulary but which have very different meanings. The data were obtained from social media, so informal words were found. As many as 2100 words for each language were identified which were then randomly selected so that 300 words with the same vocabulary but with different meanings were used. The objective of this study was to confirm that this condition can influence the results of polarity sentiment. At the end of this paper, we will show the results of the influence of the conditions of the two languages on the polarity of sentiments. From the manual annotation, an annotation agreement test was made by three Bahasa Indonesia annotators and three Bahasa Melayu annotators. The results of the annotation found that there were 63 out of 300 words that experience different polarity. Results of score agreement among annotations for each language show that there is good agreement among the annotators during annotation process.