Identifying subjective statements in news titles using a personal sense annotation framework (original) (raw)
Related papers
Recognizing subjectivity: a case study in manual tagging
Natural Language Engineering, 1999
In this paper, we describe a case study of a sentence-level categorization in which tagging instructions are developed and used by four judges to classify clauses from the Wall Street Journal as either subjective or objective. Agreement among the four judges is analyzed, and, based on that analysis, each clause is given a nal classi cation. To provide empirical support for the classi cations, correlations are assessed in the data between the subjective category and a basic semantic class posited by .
Computational Linguistics, 2004
Subjectivity in natural language refers to aspects of language used to express opinions, evaluations, and speculations . There are numerous NLP applications for which subjectivity analysis is relevant, including information extraction and text categorization. The goal of this work is learning subjective language from corpora. The rst part of the paper explores annotating subjectivity at di erent levels (expression, sentence, document) and producing annotated corpora. In the second part of the paper, clues of subjectivity are generated and tested, including low-frequency words, collocations, and adjectives and verbs identi ed using distributional similarity. The third part of the paper examines the features working together in concert. The features, generated from di erent datasets using di erent procedures, exhibit consistency in performance in that they all do better and worse on the same datasets. In addition, we show that the density of subjectivity clues in the surrounding context strongly a ects how likely it is that a word is subjective, and give the results of an annotation study assessing the subjectivity of sentences with high-density features. Finally, the clues are used to perform opinionpiece recognition (a type of text categorization and genre detection), to demonstrate the utility of the knowledge acquired in this paper.
A Corpus for Sentence-level Subjectivity Detection on English News Articles
arXiv (Cornell University), 2023
We present a novel corpus for subjectivity detection at the sentence level. We develop new annotation guidelines for the task, which are not limited to language-specific cues, and apply them to produce a new corpus in English. The corpus consists of 411 subjective and 638 objective sentences extracted from ongoing coverage of political affairs from online news outlets. This new resource paves the way for the development of models for subjectivity detection in English and across other languages, without relying on language-specific tools like lexicons or machine translation. We evaluate state-ofthe-art multilingual transformer-based models on the task, both in mono-and cross-lingual settings, the latter with a similar existing corpus in Italian language. We observe that enriching our corpus with resources in other languages improves the results on the task.
Resumen: El crecimiento exponencial de la información subjetiva en el marco de la Web 2.0 ha creado la necesidad de producir herramientas de Procesamiento del Lenguaje Natural que sean capaces de analizar y procesar estos datos para aplicaciones concretas. Estas herramientas requieren un entrenamiento con corpus anotados con este tipo de información a nivel muy detallado para poder capturar aquellos fenómenos lingüísticos que contienen una carga emotiva. El presente artículo describe EmotiBlog, un modelo detallado para la anotación de la subjetividad. Presentamos el proceso de creación y demostramos que aporta mejoras a los sistemas de aprendizaje automático. Para ello, empleamos distintos corpus que presentan textos de diversos géneros -una colección de noticias periodísticas en estilo indirecto, la colección de títulos de noticias anotados con la polaridad y emoción del SemEval 2007 (Tarea 14) e ISEAR, un corpus de expresiones reales de emociones. Además, demostramos que otros recursos pueden integrarse con EmotiBlog. Los resultados prueban que gracias a su estructura y parámetros de anotación, el modelo propuesto, EmotiBlog, proporciona ventajas considerables para el entrenamiento de sistemas que trabajan con minería de opiniones y detección de emoción.
Finding the sources and targets of subjective expressions
Proceedings of …, 2008
As many popular text genres such as blogs or news contain opinions by multiple sources and about multiple targets, finding the sources and targets of subjective expressions becomes an important sub-task for automatic opinion analysis systems. We argue that while automatic semantic role labeling systems (ASRL) have an important contribution to make, they cannot solve the problem for all cases. Based on the experience of manually annotating opinions, sources, and targets in various genres, we present linguistic phenomena that require knowledge beyond that of ASRL systems. In particular, we address issues relating to the attribution of opinions to sources; sources and targets that are realized as zero-forms; and inferred opinions. We also discuss in some depth that for arguing attitudes we need to be able to recover propositions and not only argued-about entities. A recurrent theme of the discussion is that close attention to specific discourse contexts is needed to identify sources and targets correctly.
A Hybrid System for Subjectivity Analysis
We suggested different structured hybrid systems for the sentence-level subjectivity analysis based on three supervised machine learning algorithms, namely, Hidden Markov Model, Fuzzy Control System, and Adaptive Neuro-Fuzzy Inference System. The suggested feature extraction algorithm in our experiment computes a feature vector using statistical textual terms frequencies in a training dataset not having the use of any lexical knowledge except tokenization. Taking into consideration this fact, the above-mentioned methods may be employed in other languages as these methods do not utilize the morphological, syntactical, and lexical analysis in the classification problems.
Subjectivity word sense disambiguation
Proceedings of the 2009 …, 2009
This paper investigates a new task, subjectivity word sense disambiguation (SWSD), which is to automatically determine which word instances in a corpus are being used with subjective senses, and which are being used with objective senses. We provide empirical evidence that SWSD is more feasible than full word sense disambiguation, and that it can be exploited to improve the performance of contextual subjectivity and sentiment analysis systems.
Personal sense in subjective language research in the blogosphere
Blogs are a very important part of the digital world, indeed they can be viewed as a digital representation of the whole world. People share pictures and videos, describe their daily life, ask questions and, of course, give opinions. The blogosphere presents a unique opportunity to obtain huge statistics about what people like, feel, needabout their 'private states'. The vast and ever-growing volumes of 'bloggers' and thus, information, demand an automated way of analyzing blog texts.
Comparison of Sentence Subjectivity Classification Methods in Indonesian News
The increasing utilization of Internet has increased the number and types of content on the Internet, particularly text, that became very large and spread out in many sources of information such as blogs, news sites, forums, social networks, and micro-blogging. It affects the information overload for users. Information overload can be overcome, among others, by a text classification. Therefore, it is necessary to find a system that can identify opinions, attitudes, and sentiments in a text automatically. This study compared subjective and objective sentences classification methods in Indonesian news. The methods compared were rule-based classifier, Naïve Bayes Classifier (NBC), and Support Vector Machine (SVM) classifier. The results of the study examined in 1050 sentences manually labeled as subjective or objective sentences show the accuracy of 80.4%, 74% and 71% for the rule-based classifier, SVM classifier, and NBC, respectively. Keywords—classification, rule based classifier, naïve bayes classifier, support vector machine classifier