A linguistic approach for determining the topics of Spanish Twitter messages (original) (raw)

LyS at TASS 2013: Analysing Spanish tweets by means of dependency parsing, semantic-oriented lexicons and psychometric word-properties

Proc. of Workshop on Sentiment Analysis at SEPLN (TASS 2013)

This article describes the approach developed by our group in order to resolve the sentiment analysis at a global level, topic identification and political tendency classification tasks on Spanish tweets; proposed at the Workshop of Sentiment Analysis at sepln (tass 2013). As a preliminary step, we carry out an ad-hoc preprocessing in order to normalise the tweets. We then apply part-of-speech tagging and dependency parsing algorithms to the tweets to obtain their syntactic structure. Our proposal also employs psychological resources in order to exploit the psychometric properties of human language. The experimental results confirm the robustness of the proposal, which has achieved good performance in general, being the best-performing approach in the topic classification task.

Techniques for Sentiment Analysis and Topic Detection of Spanish Tweets: Preliminary Report

Abstract: Sentiment analysis and topic detection are new problems that are at the intersection of natural language processing (NLP) and data mining. Sentiment analysis attempts to determine if a text is positive, negative, or neither, while topic detection attempts to identify the subject of the text. A significant amount of effort has been invested in constructing effective solutions for these problems, mostly for English texts.

TLA: Twitter Linguistic Analysis

2021

Linguistics have been instrumental in developing a deeper understanding of human nature. Words are indispensable to bequeath the thoughts, emotions, and purpose of any human interaction, and critically analyzing these words can elucidate the social and psychological behavior and characteristics of these social animals. Social media has become a platform for human interaction on a large scale and thus gives us scope for collecting and using that data for our study. However, this entire process of collecting, labeling, and analyzing this data iteratively makes the entire procedure cumbersome. To make this entire process easier and structured, we would like to introduce TLA(Twitter Linguistic Analysis). In this paper, we describe TLA and provide a basic understanding of the framework and discuss the process of collecting, labeling, and analyzing data from Twitter for a corpus of languages while providing detailed labeled datasets for all the languages and the models are trained on thes...

Graph-based Techniques for Topic Classification of Tweets in Spanish

International Journal of Interactive Multimedia and Artificial Intelligence, 2014

Topic classification of texts is one of the most interesting challenges in Natural Language Processing (NLP). Topic classifiers commonly use a bag-of-words approach, in which the classifier uses (and is trained with) selected terms from the input texts. In this work we present techniques based on graph similarity to classify short texts by topic. In our classifier we build graphs from the input texts, and then use properties of these graphs to classify them. We have tested the resulting algorithm by classifying Twitter messages in Spanish among a predefined set of topics, achieving more than 70% accuracy.

Sentiment analysis and topic detection of Spanish tweets: A comparative study of of NLP techniques

Se está invirtiendo mucho esfuerzo en la construcción de soluciones efectivas para el análisis de sentimientos y detección de asunto, pero principalmente para textos en inglés. Usando un corpus de tweets en español, presentamos aquí un análisis comparativo de diversas aproximaciones y técnicas de clasificación para estos problemas. Palabras clave: Análisis de sentimientos, detección de asunto.

LyS at TASS 2014: A Prototype for Extracting and Analysing Aspects from Spanish tweets

Resumen: Este artículo describe nuestra participación en la tercera edición del taller de análisis del sentimiento de tuits escritos en castellano, el tass 2014. En la evaluación competitiva de este año, se han propuesto cuatro retos: (1) análisis del sentimiento a nivel global, (2) clasificación de tópicos, (3) extracción de as-pectos y (4) análisis del sentimiento a nivel aspectual. Para las tareas 1 y 2 em-pleamos una aproximación basada en aprendizaje automático, donde distintos re-cursos lingüísticos e información extraída del conjunto de entrenamiento son uti-lizados para entrenar un clasificador supervisado. Para abordar la tarea 3, nuestra aproximación recolecta una lista de representaciones que es empleada para identi-ficar los aspectos requeridos por los organizadores. Porúltimo, la tarea 4 delega en heurísticas para identificar el alcance de cada aspecto, para después determinar su sentimiento a través de un clasificador supervisado. Los resultados experimentales son prometedores y nos servirán para desarrollar técnicas más complejas en el fu-turo. Palabras clave: Análisis del sentimiento, Clasificación de tópicos, Extracción de aspectos, Análisis del sentimiento a nivel aspectual. Abstract: This paper describes our participation at the third edition of the workshop on Sentiment Analysis focused on Spanish tweets, tass 2014. This year's evaluation campaign includes four challenges: (1) global sentiment analysis, (2) topic classification, (3) aspect-extraction and (4) aspect-based sentiment analysis. Tasks 1 and 2 are addressed from a machine learning approach, using several linguistic resources and other information extracted from the training corpus to feed to a supervised classifier. With respect to task 3, we develop a naive approach, collecting a set of representations to identify the predefined aspects requested by the organisers. Finally, task 4 uses heuristics to identify the scope of each aspect, to then classify their sentiment via a supervised classifier. The experimental results are promising and will serve us as the starting point to develop more complex techniques.

An Insight Into Twitter: A Corpus Based

Revista de Lingüística y Lenguas Aplicadas, 2012

The aim of this paper is to study the use of Spanish and English in the micro-blogging social network Twitter from a contrastive point of view. A quantitative research methodology is applied in order firstly, to identify specific common characteristics of language, organization and content in the medium and secondly, to find eventual differences in the use of a particular language. To carry out the experiment, two corpora were constructed using language data from Twitter, one in Spanish with a total number of 4,027,746 words and another with similar characteristics in English with a total number of 4,655,992 words. From the results obtained, the conclusion is that there are a number of very general discourse and organizational features common to the two corpora under study. It is also concluded that there are some particular characteristics which differentiate the use of English and Spanish in the medium.

Analysis of Tweet Data

2019

Sentiment Analysis (SA) or Opinion Mining (OM) is the computational study of people’s opinions, attitudes and emotions toward an entity. The entity can represent individuals, events or topics that are covered by reviews. There are issues with sentiment analysis for classification of text which has not yet been solved and it has been a challenge to many researchers. With the explosive growth of social media (e.g., reviews sites, forum discussions, blogs, micro-blogs, Twitter, comments, and postings in social network sites) on the Web, individuals and organizations are increasingly using the content in these media for decision making. The problem with sentiment Analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level, whether the expressed opinion in a document, sentence or an entity feature/aspect is positive, neutral or negative. Therefore, this study gave an overview of the different sentiment Analysis approaches. The study reviewed ex...

Lexicon-Based Sentiment Analysis of Twitter Messages in Spanish

Procesamiento Del Lenguaje Natural, 2013

Los enfoques al análisis de sentimiento basados en lexicones difieren de los más usuales enfoques basados en aprendizaje de máquina en que se basan exclusivamente en recursos que almacenan la polaridad de las unidades léxicas, que podrán así ser identificadas en los textos y asignárseles una etiqueta de polaridad mediante la cual se realiza un cálculo que arroja una puntuación global del texto analizado. Estos sistemas han demostrado un rendimiento similar a los sistemas estadísticos, con la ventaja de no requerir un conjunto de datos de entrenamiento. Sin embargo, pueden no resultar ser óptimos cuando los textos de análisis son extremadamente cortos, tales como los generados en algunas redes sociales, como Twitter. En este trabajo llevamos a cabo tal evaluación de rendimiento con la herramienta Sentitext, un sistema de análisis de sentimiento del español. Palabras clave: análisis de sentimiento basado en lexicones, analítica de texto, textos cortos, Twitter, evaluación de rendimiento.