Building a fine-grained subjectivity lexicon from a web corpus (original) (raw)

A bootstrapping method for building subjectivity lexicons for languages with scarce resources

Proceedings of the Sixth International …, 2008

This paper introduces a method for creating a subjectivity lexicon for languages with scarce resources. The method is able to build a subjectivity lexicon by using a small seed set of subjective words, an online dictionary, and a small raw corpus, coupled with a bootstrapping process that ranks new candidate words based on a similarity measure. Experiments performed with a rule-based sentence level subjectivity classifier show an 18% absolute improvement in F-measure as compared to previously proposed semi-supervised methods.

Subjectivity Identification Through Lexical Rules

SN Computer Science, 2022

We can deduce a discourse into utterances with two types of information: subjective and objective. Objective utterances are factual in nature and have a truth value that can be validated against a fact. On the other hand, subjective utterances contain the emotional state or opinion of a speaker more than factual information. Languages use different devices to demonstrate subjectivity. In this paper, we present a comparative analysis of various linguistic devices that languages use to demonstrate subjectivity. For that, we analyze subjective constructions in five Indian languages, formalize lexical rules for subjective constructions and implement a Lexical Rule-based FST for subjectivity identification. We evaluate the FST on test data from entertainment, lifestyle, politics, sports, and technology domains. Our system achieves 91% accuracy in politics domain and ~ 84% accuracy on average.

Learning Subjective Language

Computational Linguistics, 2004

Subjectivity in natural language refers to aspects of language used to express opinions, evaluations, and speculations . There are numerous NLP applications for which subjectivity analysis is relevant, including information extraction and text categorization. The goal of this work is learning subjective language from corpora. The rst part of the paper explores annotating subjectivity at di erent levels (expression, sentence, document) and producing annotated corpora. In the second part of the paper, clues of subjectivity are generated and tested, including low-frequency words, collocations, and adjectives and verbs identi ed using distributional similarity. The third part of the paper examines the features working together in concert. The features, generated from di erent datasets using di erent procedures, exhibit consistency in performance in that they all do better and worse on the same datasets. In addition, we show that the density of subjectivity clues in the surrounding context strongly a ects how likely it is that a word is subjective, and give the results of an annotation study assessing the subjectivity of sentences with high-density features. Finally, the clues are used to perform opinionpiece recognition (a type of text categorization and genre detection), to demonstrate the utility of the knowledge acquired in this paper.

Computing with Subjectivity Lexicons

2020

In this paper, we introduce a new set of lexicons for expressing subjectivity in text documents written in Brazilian Portuguese. Besides the non-English idiom, in contrast to other subjectivity lexicons available, these lexicons represent different subjectivity dimensions (other than sentiment) and are more compact in number of terms. This last feature was designed intentionally to leverage the power of word embedding techniques, i.e., with the words mapped to an embedding space and the appropriate distance measures, we can easily capture semantically related words to the ones in the lexicons. Thus, we do not need to build comprehensive vocabularies and can focus on the most representative words for each lexicon dimension. We showcase the use of these lexicons in three highly non-trivial tasks: (1) Automated Essay Scoring in the Presence of Biased Ratings, (2) Subjectivity Bias in Brazilian Presidential Elections and (3) Fake News Classification Based on Text Subjectivity. All these...

Linguistic Features for Subjectivity Classification

2012 International Conference on Asian Language Processing, 2012

Opinions are subjective expressions that describe people's viewpoints, perspectives or feelings about entities, events and theirs properties. Detecting subjective expressions is the task of identifying whether a given text is subjective (i.e. an opinion) or objective (i.e. a reports fact). This task is considered as the first problem and it is very important for opinion mining and sentiment analysis which is now attracting many researchers cause its applicable capacity. Improvements in subjectivity classification will positively impact on the performance of a sentiment analysis system. Actually, features play the most important role for getting accurate subjective sentences. In this paper, we will enrich features by using syntactic information of the text. From our observation when investigating opinion evidences in the texts, we will propose syntax-based patterns which are used for extracting rich linguistic features. Combining these new features with conventional features from previous studies, we obtain a high accuracy (about 92.1%) for detecting subjective sentences on the Movie review data.

Building Subjectivity Lexicon(s) from Scratch for Essay Data

Lecture Notes in Computer Science, 2012

While there are a number of subjectivity lexicons available for research purposes, none can be used commercially. We describe the process of constructing subjectivity lexicon(s) for recognizing sentiment polarity in essays written by test-takers, to be used within a commercial essay-scoring system. We discuss ways of expanding a manually-built seed lexicon using dictionary-based, distributional indomain and out-of-domain information, as well as using Amazon Mechanical Turk to help "clean up" the expansions. We show the feasibility of constructing a family of subjectivity lexicons from scratch using a combination of methods to attain competitive performance with state-of-art research-only lexicons. Furthermore, this is the first use, to our knowledge, of a paraphrase generation system for expanding a subjectivity lexicon.

Subjectivity Analysis of Arabic-English Wikipedia

2020

This paper discusses the subjectivity analysis of objective collaborative content, represented by comparable Arabic and English Wikipedia articles. Because Arabic is an underresourced language, the creation of a gold standard for subjectivity analysis of Wikipedia is a major novel contribution that resulted from conducting this study. This task was achieved with crowdsourcing which was utilized for annotating English sentences of Wikipedia. Furthermore, eight Arabic language speakers volunteered to annotate Arabic sentences from parallel Arabic Wikipedia articles. The resulted gold standard has a percentage agreement of 80.69% for the English subset and 94.59% for the Arabic subset. Naive Bayes and Logistic Regression with a bag of n-grams, including unigrams, bigrams and trigrams representations were used for subjectivity analysis of the corpus using 3-folds cross-validation. Naive Bayes classifier outperformed Logistic Regression classifier for both Arabic and English datasets with F1 scores of 53.99% and 57.23% for the Arabic and the English subsets, respectively.

Improving Subjectivity Detection using Unsupervised Subjectivity Word Sense Disambiguation

Resumen: En este trabajo se presenta un método para la detección de subjetividad a nivel de oraciones basado en la desambiguación subjetiva del sentido de las palabras. Para ello se extiende un método de desambiguación semántica basado en agrupamiento de sentidos para determinar cuándo las palabras dentro de la oración están siendo utilizadas de forma subjetiva u objetiva. En nuestra propuesta se utilizan recursos semánticos anotados con valores de polaridad y emociones para determinar cuándo un sentido de una palabra puede ser considerado subjetivo u objetivo. Se presenta un estudio experimental sobre la detección de subjetividad en oraciones, en el cual se consideran las colecciones del corpus MPQA y Movie Review Dataset, así como los recursos semánticos SentiWordNet, Micro-WNOp y WordNet-Affect. Los resultados obtenidos muestran que nuestra propuesta contribuye de manera significativa en la detección de subjetividad. Palabras clave: detección de subjetividad, desambiguación semántica, análisis de sentimiento Abstract: In this work, we present a sentence-level subjectivity detection method, which relies on Subjectivity Word Sense Disambiguation (SWSD). We use an unsupervised sense clustering-based method for SWSD. In our method, semantic resources tagged with emotions and sentiment polarities are used to apply subjectivity detection, intervening Word Sense Disambiguation sub-tasks. Through an experimental study, we empirically validated the proposed method over two subjectivity collections, MPQA Corpus and Movie Review Dataset, using three widely popular opinion-mining resources SentiWordNet, WordNet-Affect and Micro-WNOp. The results show that our proposal performs significantly better than our proposed baseline.

RA-SR: Using a ranking algorithm to automatically building resources for subjectivity analysis over annotated corpora

2013

In this paper we propose a method that uses corpora where phrases are annotated as Positive, Negative, Objective and Neutral, to achieve new sentiment resources involving words dictionaries with their associated polarity. Our method was created to build sentiment words inventories based on sentisemantic evidences obtained after exploring text with annotated sentiment polarity information. Through this process a graph-based algorithm is used to obtain auto-balanced values that characterize sentiment polarities well used on Sentiment Analysis tasks. To assessment effectiveness of the obtained resource, sentiment classification was made, achieving objective instances over 80%.

Subjectivity Detection through Socio-Linguistic Features

2011

Social media platforms have opened new dimensions within the information retrieval domain leading to a novel concept known as Social Information Retrieval. We argue that the concept of Social Information Retrieval can be extended by augmenting the huge amount of content on the traditional Web with the ever-growing rich Social Web content to increase the information richness of today's search engines. This paper proposes a subjectivity detection framework which can lead towards a proposed emotion-aware search engine interface. Our proposed method differs from previous subjectivity analysis approaches in that it is the first method that takes into account social features of social media platforms for the subjectivity classification task. Through experimental evaluations, we observe the accuracy of the proposed method to be 86.21% which demonstrates a promising outcome for large-scale application of our proposed subjectivity analysis technique.