Identifying subjective statements in news titles using a personal sense annotation framework (original) (raw)

Recognizing subjectivity: a case study in manual tagging

Natural Language Engineering, 1999

In this paper, we describe a case study of a sentence-level categorization in which tagging instructions are developed and used by four judges to classify clauses from the Wall Street Journal as either subjective or objective. Agreement among the four judges is analyzed, and, based on that analysis, each clause is given a nal classi cation. To provide empirical support for the classi cations, correlations are assessed in the data between the subjective category and a basic semantic class posited by .

Learning Subjective Language

Computational Linguistics, 2004

Subjectivity in natural language refers to aspects of language used to express opinions, evaluations, and speculations . There are numerous NLP applications for which subjectivity analysis is relevant, including information extraction and text categorization. The goal of this work is learning subjective language from corpora. The rst part of the paper explores annotating subjectivity at di erent levels (expression, sentence, document) and producing annotated corpora. In the second part of the paper, clues of subjectivity are generated and tested, including low-frequency words, collocations, and adjectives and verbs identi ed using distributional similarity. The third part of the paper examines the features working together in concert. The features, generated from di erent datasets using di erent procedures, exhibit consistency in performance in that they all do better and worse on the same datasets. In addition, we show that the density of subjectivity clues in the surrounding context strongly a ects how likely it is that a word is subjective, and give the results of an annotation study assessing the subjectivity of sentences with high-density features. Finally, the clues are used to perform opinionpiece recognition (a type of text categorization and genre detection), to demonstrate the utility of the knowledge acquired in this paper.

A Corpus for Sentence-level Subjectivity Detection on English News Articles

arXiv (Cornell University), 2023

We present a novel corpus for subjectivity detection at the sentence level. We develop new annotation guidelines for the task, which are not limited to language-specific cues, and apply them to produce a new corpus in English. The corpus consists of 411 subjective and 638 objective sentences extracted from ongoing coverage of political affairs from online news outlets. This new resource paves the way for the development of models for subjectivity detection in English and across other languages, without relying on language-specific tools like lexicons or machine translation. We evaluate state-ofthe-art multilingual transformer-based models on the task, both in mono-and cross-lingual settings, the latter with a similar existing corpus in Italian language. We observe that enriching our corpus with resources in other languages improves the results on the task.

EmotiBlog: a fine-grained annotation schema for labelling subjectivity in the new-textual genres born with the Web 2.0

Resumen: El crecimiento exponencial de la información subjetiva en el marco de la Web 2.0 ha creado la necesidad de producir herramientas de Procesamiento del Lenguaje Natural que sean capaces de analizar y procesar estos datos para aplicaciones concretas. Estas herramientas requieren un entrenamiento con corpus anotados con este tipo de información a nivel muy detallado para poder capturar aquellos fenómenos lingüísticos que contienen una carga emotiva. El presente artículo describe EmotiBlog, un modelo detallado para la anotación de la subjetividad. Presentamos el proceso de creación y demostramos que aporta mejoras a los sistemas de aprendizaje automático. Para ello, empleamos distintos corpus que presentan textos de diversos géneros -una colección de noticias periodísticas en estilo indirecto, la colección de títulos de noticias anotados con la polaridad y emoción del SemEval 2007 (Tarea 14) e ISEAR, un corpus de expresiones reales de emociones. Además, demostramos que otros recursos pueden integrarse con EmotiBlog. Los resultados prueban que gracias a su estructura y parámetros de anotación, el modelo propuesto, EmotiBlog, proporciona ventajas considerables para el entrenamiento de sistemas que trabajan con minería de opiniones y detección de emoción.

Finding the sources and targets of subjective expressions

Proceedings of …, 2008

As many popular text genres such as blogs or news contain opinions by multiple sources and about multiple targets, finding the sources and targets of subjective expressions becomes an important sub-task for automatic opinion analysis systems. We argue that while automatic semantic role labeling systems (ASRL) have an important contribution to make, they cannot solve the problem for all cases. Based on the experience of manually annotating opinions, sources, and targets in various genres, we present linguistic phenomena that require knowledge beyond that of ASRL systems. In particular, we address issues relating to the attribution of opinions to sources; sources and targets that are realized as zero-forms; and inferred opinions. We also discuss in some depth that for arguing attitudes we need to be able to recover propositions and not only argued-about entities. A recurrent theme of the discussion is that close attention to specific discourse contexts is needed to identify sources and targets correctly.

A Hybrid System for Subjectivity Analysis

We suggested different structured hybrid systems for the sentence-level subjectivity analysis based on three supervised machine learning algorithms, namely, Hidden Markov Model, Fuzzy Control System, and Adaptive Neuro-Fuzzy Inference System. The suggested feature extraction algorithm in our experiment computes a feature vector using statistical textual terms frequencies in a training dataset not having the use of any lexical knowledge except tokenization. Taking into consideration this fact, the above-mentioned methods may be employed in other languages as these methods do not utilize the morphological, syntactical, and lexical analysis in the classification problems.

Subjectivity word sense disambiguation

Proceedings of the 2009 …, 2009

This paper investigates a new task, subjectivity word sense disambiguation (SWSD), which is to automatically determine which word instances in a corpus are being used with subjective senses, and which are being used with objective senses. We provide empirical evidence that SWSD is more feasible than full word sense disambiguation, and that it can be exploited to improve the performance of contextual subjectivity and sentiment analysis systems.

Personal sense in subjective language research in the blogosphere

Blogs are a very important part of the digital world, indeed they can be viewed as a digital representation of the whole world. People share pictures and videos, describe their daily life, ask questions and, of course, give opinions. The blogosphere presents a unique opportunity to obtain huge statistics about what people like, feel, needabout their 'private states'. The vast and ever-growing volumes of 'bloggers' and thus, information, demand an automated way of analyzing blog texts.

Comparison of Sentence Subjectivity Classification Methods in Indonesian News

The increasing utilization of Internet has increased the number and types of content on the Internet, particularly text, that became very large and spread out in many sources of information such as blogs, news sites, forums, social networks, and micro-blogging. It affects the information overload for users. Information overload can be overcome, among others, by a text classification. Therefore, it is necessary to find a system that can identify opinions, attitudes, and sentiments in a text automatically. This study compared subjective and objective sentences classification methods in Indonesian news. The methods compared were rule-based classifier, Naïve Bayes Classifier (NBC), and Support Vector Machine (SVM) classifier. The results of the study examined in 1050 sentences manually labeled as subjective or objective sentences show the accuracy of 80.4%, 74% and 71% for the rule-based classifier, SVM classifier, and NBC, respectively. Keywords—classification, rule based classifier, naïve bayes classifier, support vector machine classifier

Word sense and subjectivity

… of the 21st International Conference on …, 2006

Subjectivity and meaning are both important properties of language. This paper explores their interaction, and brings empirical evidence in support of the hypotheses that (1) subjectivity is a property that can be associated with word senses, and (2) word sense disambiguation can directly benefit from subjectivity annotations.

Subjectivity Identification Through Lexical Rules

SN Computer Science, 2022

We can deduce a discourse into utterances with two types of information: subjective and objective. Objective utterances are factual in nature and have a truth value that can be validated against a fact. On the other hand, subjective utterances contain the emotional state or opinion of a speaker more than factual information. Languages use different devices to demonstrate subjectivity. In this paper, we present a comparative analysis of various linguistic devices that languages use to demonstrate subjectivity. For that, we analyze subjective constructions in five Indian languages, formalize lexical rules for subjective constructions and implement a Lexical Rule-based FST for subjectivity identification. We evaluate the FST on test data from entertainment, lifestyle, politics, sports, and technology domains. Our system achieves 91% accuracy in politics domain and ~ 84% accuracy on average.

Building Subjectivity Lexicon(s) from Scratch for Essay Data

Lecture Notes in Computer Science, 2012

While there are a number of subjectivity lexicons available for research purposes, none can be used commercially. We describe the process of constructing subjectivity lexicon(s) for recognizing sentiment polarity in essays written by test-takers, to be used within a commercial essay-scoring system. We discuss ways of expanding a manually-built seed lexicon using dictionary-based, distributional indomain and out-of-domain information, as well as using Amazon Mechanical Turk to help "clean up" the expansions. We show the feasibility of constructing a family of subjectivity lexicons from scratch using a combination of methods to attain competitive performance with state-of-art research-only lexicons. Furthermore, this is the first use, to our knowledge, of a paraphrase generation system for expanding a subjectivity lexicon.

Adjectives as indicators of subjectivity in documents

Proceedings of the American Society for Information Science and Technology, 2004

The goal of this research is to automatically predict human judgments of document qualities such as subjectivity, verbosity and depth. In this paper, we explore the behavior of adjectives as indicators of subjectivity in documents. Specifically, we test whether a subset of automatically derived subjective adjectives (Wiebe, 2000b), selected a priori, behaves differently than other adjectives. 3,200 documents were ranked by 100 subjects as being high or low in nine document qualities (Tang, Ng, Strzalkowski, & Kantor, 2003). We report a statistically significant correlation between the occurrence of adjectives in documents and human judgments of subjectivity. More importantly, we find that the subset of subjective adjectives is more strongly correlated with subjectivity than adjectives in general. These results can be used to identify document qualities for use in information retrieval and questionanswering systems.

Linguistic Features for Subjectivity Classification

2012 International Conference on Asian Language Processing, 2012

Opinions are subjective expressions that describe people's viewpoints, perspectives or feelings about entities, events and theirs properties. Detecting subjective expressions is the task of identifying whether a given text is subjective (i.e. an opinion) or objective (i.e. a reports fact). This task is considered as the first problem and it is very important for opinion mining and sentiment analysis which is now attracting many researchers cause its applicable capacity. Improvements in subjectivity classification will positively impact on the performance of a sentiment analysis system. Actually, features play the most important role for getting accurate subjective sentences. In this paper, we will enrich features by using syntactic information of the text. From our observation when investigating opinion evidences in the texts, we will propose syntax-based patterns which are used for extracting rich linguistic features. Combining these new features with conventional features from previous studies, we obtain a high accuracy (about 92.1%) for detecting subjective sentences on the Movie review data.

Computing with Subjectivity Lexicons

2020

In this paper, we introduce a new set of lexicons for expressing subjectivity in text documents written in Brazilian Portuguese. Besides the non-English idiom, in contrast to other subjectivity lexicons available, these lexicons represent different subjectivity dimensions (other than sentiment) and are more compact in number of terms. This last feature was designed intentionally to leverage the power of word embedding techniques, i.e., with the words mapped to an embedding space and the appropriate distance measures, we can easily capture semantically related words to the ones in the lexicons. Thus, we do not need to build comprehensive vocabularies and can focus on the most representative words for each lexicon dimension. We showcase the use of these lexicons in three highly non-trivial tasks: (1) Automated Essay Scoring in the Presence of Biased Ratings, (2) Subjectivity Bias in Brazilian Presidential Elections and (3) Fake News Classification Based on Text Subjectivity. All these...

Improving Subjectivity Detection using Unsupervised Subjectivity Word Sense Disambiguation

Resumen: En este trabajo se presenta un método para la detección de subjetividad a nivel de oraciones basado en la desambiguación subjetiva del sentido de las palabras. Para ello se extiende un método de desambiguación semántica basado en agrupamiento de sentidos para determinar cuándo las palabras dentro de la oración están siendo utilizadas de forma subjetiva u objetiva. En nuestra propuesta se utilizan recursos semánticos anotados con valores de polaridad y emociones para determinar cuándo un sentido de una palabra puede ser considerado subjetivo u objetivo. Se presenta un estudio experimental sobre la detección de subjetividad en oraciones, en el cual se consideran las colecciones del corpus MPQA y Movie Review Dataset, así como los recursos semánticos SentiWordNet, Micro-WNOp y WordNet-Affect. Los resultados obtenidos muestran que nuestra propuesta contribuye de manera significativa en la detección de subjetividad. Palabras clave: detección de subjetividad, desambiguación semántica, análisis de sentimiento Abstract: In this work, we present a sentence-level subjectivity detection method, which relies on Subjectivity Word Sense Disambiguation (SWSD). We use an unsupervised sense clustering-based method for SWSD. In our method, semantic resources tagged with emotions and sentiment polarities are used to apply subjectivity detection, intervening Word Sense Disambiguation sub-tasks. Through an experimental study, we empirically validated the proposed method over two subjectivity collections, MPQA Corpus and Movie Review Dataset, using three widely popular opinion-mining resources SentiWordNet, WordNet-Affect and Micro-WNOp. The results show that our proposal performs significantly better than our proposed baseline.

Sentiment analysis in the news

2010

Recent years have brought a significant growth in the volume of research in sentiment analysis, mostly on highly subjective text types (movie or product reviews). The main difference these texts have with news articles is that their target is clearly defined and unique across the text. Following different annotation efforts and the analysis of the issues encountered, we realised that news opinion mining is different from that of other text types. We identified three subtasks that need to be addressed: definition of the target; separation of the good and bad news content from the good and bad sentiment expressed on the target; and analysis of clearly marked opinion that is expressed explicitly, not needing interpretation or the use of world knowledge. Furthermore, we distinguish three different possible views on newspaper articlesauthor, reader and text, which have to be addressed differently at the time of analysing sentiment. Given these definitions, we present work on mining opinions about entities in English language news, in which (a) we test the relative suitability of various sentiment dictionaries and (b) we attempt to separate positive or negative opinion from good or bad news. In the experiments described here, we tested whether or not subject domain-defining vocabulary should be ignored. Results showed that this idea is more appropriate in the context of news opinion mining and that the approaches taking this into consideration produce a better performance.