Emily Öhman | University of Helsinki (original) (raw)

Papers by Emily Öhman

Research paper thumbnail of Affect as a proxy for literary mood

Journal of Data Mining & Digital Humanities

We propose to use affect as a proxy for mood in literary texts. In this study, we explore the dif... more We propose to use affect as a proxy for mood in literary texts. In this study, we explore the differences in computationally detecting tone versus detecting mood. Methodologically we utilize affective word embeddings to look at the affective distribution in different text segments. We also present a simple yet efficient and effective method of enhancing emotion lexicons to take both semantic shift and the domain of the text into account producing real-world congruent results closely matching both contemporary and modern qualitative analyses.

Research paper thumbnail of Strategic sentiments and emotions in post-Second World War party manifestos in Finland

Journal of Computational Social Science

We contribute to the growing number of studies on emotions and politics by investigating how poli... more We contribute to the growing number of studies on emotions and politics by investigating how political parties strategically use sentiments and emotions in party manifestos. We use computational methods in examining changes of sentiments and emotions in Finnish party manifestos from 1945 to 2019. We use sentiment and emotion lexicons first translated from English into Finnish and then modified for the purposes of our study. We analyze how the use of emotions and sentiments differs between government and opposition parties depending on their left/right ideology and the specific type of party manifesto. In addition to traditional sentiment and emotion analysis, we use emotion intensity analysis. Our results indicate that in Finland, government and opposition parties do not differ substantially from each other in their use of emotional language. From a historical perspective, the individual emotions used in party manifestos have persisted, but changes have taken place in the intensity ...

Research paper thumbnail of Hate speech, Censorship, and Freedom of Speech: The Changing Policies of Reddit

Journal of Data Mining & Digital Humanities

This paper examines the shift in focus on content policies and user attitudes on the social media... more This paper examines the shift in focus on content policies and user attitudes on the social media platform Reddit. We do this by focusing on comments from general Reddit users from five posts made by admins (moderators) on updates to Reddit Content Policy. All five concern the nature of what kind of content is allowed to be posted on Reddit, and which measures will be taken against content that violates these policies. We use topic modeling to probe how the general discourse for Redditors has changed around limitations on content, and later, limitations on hate speech, or speech that incites violence against a particular group. We show that there is a clear shift in both the contents and the user attitudes that can be linked to contemporary societal upheaval as well as newly passed laws and regulations, and contribute to the wider discussion on hate speech moderation.

Research paper thumbnail of Towards the Inevitable Demise of Everybody?: A multifactorial analysis of -one/-body/-man variation in indefinite pronouns in historical American English

Language Variation and Change, 2019

Research paper thumbnail of The Language of Emotions : Building and Applying Computational Methods for Emotion Detection for English and Beyond

Helsingin yliopisto, Mar 5, 2021

Research paper thumbnail of LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?

This paper presents the different models submitted by the LT@Helsinki team for the SemEval 2020 S... more This paper presents the different models submitted by the LT@Helsinki team for the SemEval 2020 Shared Task 12. Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively. In both cases we used the so-called Bidirectional Encoder Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID and SOLID datasets. The results show that offensive tweet classification is one of several language-based tasks where BERT can achieve state-of-the-art results.

Research paper thumbnail of Language Change Database: A new online resource

ICAME Journal, 2016

We introduce the Language Change Database (LCD), which provides access to the results of previous... more We introduce the Language Change Database (LCD), which provides access to the results of previous corpus-based research dealing with change in the English language. The LCD will be published on an open-access linked data platform that will allow users to enter information about their own publications into the database and to conduct searches based on linguistic and extralinguistic parameters. Both metadata and numerical data from the original publications will be available for download, enabling systematic reviews, meta-analyses, replication studies and statistical modelling of language change. The LCD will be of interest to scholars, teachers and students of English.

Research paper thumbnail of Teaching Computational Methods to Humanities Students

This paper discusses the academic and societal implications of teaching computational methods to ... more This paper discusses the academic and societal implications of teaching computational methods to humanities students from the perspective of digital humanities. Pedagogical choices are backed up by both pedagogical theory and concrete examples from actual courses and course feedback. The aim of this paper is to introduce clear best-practice recommendations for developing digital humanities teaching with an emphasis on methods teaching in order to increase the number of students who understand such methods and can apply them to their own projects.

Research paper thumbnail of XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-anno... more We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

Research paper thumbnail of Sentimentator: Gamifying Fine-Grained Sentiment Annotation

We introduce Sentimentator; a publicly available gamified web-based annotation platform for fine-... more We introduce Sentimentator; a publicly available gamified web-based annotation platform for fine-grained sentiment annotation at the sentence-level. Sentimentator is unique in that it moves beyond binary classification. We use a tendimensional model which allows for the annotation of 51 unique sentiments and emotions. The platform is gamified with a complex scoring system designed to reward users for high quality annotations. Sentimentator introduces several unique features that have previously not been available, or at best very limited, for sentiment annotation. In particular, it provides streamlined multi-dimensional annotation optimized for sentence-level annotation of movie subtitles. Because the platform is publicly available it will benefit anyone and everyone interested in finegrained sentiment analysis and emotion detection, as well as annotation of other datasets.

Research paper thumbnail of Emotion Preservation in Translation: Evaluating Datasets for Annotation Projection

This paper is a pilot study that aims to explore the viability of annotation projection from one ... more This paper is a pilot study that aims to explore the viability of annotation projection from one language to another as well as to evaluate the multilingual data set we have created for emotion analysis. We study different language pairs based on parallel corpora for sentiment and emotion annotations and explore annotator agreement. We show that the data source is a possible one for reliable L1 data to be used in annotation projection from high-resource languages, such as English, into low-resource languages and that this is a reliable way of creating data sets for fine-grained sentiment analysis and emotion detection.

Research paper thumbnail of SELF & FEIL: Emotion and Intensity Lexicons for Finnish

ArXiv, 2021

This paper introduces a Sentiment and Emotion Lexicon for Finnish (SELF) and a Finnish Emotion In... more This paper introduces a Sentiment and Emotion Lexicon for Finnish (SELF) and a Finnish Emotion Intensity Lexicon (FEIL). We describe the lexicon creation process and evaluate the lexicon using some commonly available tools. The lexicon uses annotations projected from the NRC Emotion Lexicon with carefully edited translations. To our knowledge, this is the first comprehensive sentiment and emotion lexicon for Finnish.

Research paper thumbnail of Emotion Annotation: Rethinking Emotion Categorization

One of the biggest hurdles for the utilization of machine learning in interdisciplinary projects ... more One of the biggest hurdles for the utilization of machine learning in interdisciplinary projects is the need for annotated training data which is costly to create. Emotion annotation is a notoriously difficult task, and the current annotation schemes which are based on psychological theories of human interaction are not always the most conducive for the creation of reliable emotion annotations, nor are they optimal for annotating emotions in the modality of text. This paper discusses the theory, history, and challenges of emotion annotation, and proposes improvements for emotion annotation tasks based on both theory and case studies. These improvements focus on rethinking the categorization of emotions and the overlap and disjointedness of emotion categories.

Research paper thumbnail of European intercultural workplace: Sweden

We present our thanks to the Leonardo da Vinci II Program for granting project funding, as well a... more We present our thanks to the Leonardo da Vinci II Program for granting project funding, as well as Göteborg university, the interdisciplinary research center SSKKII and the Department of Linguistics, where the Swedish project team has been working in fruitful cooperation with department colleagues. In particular, we would like to thank the Project "Communication and Interaction in Multicultural Health Care". Our warmest thanks to all who have provided us with valuable information for the national report. Regarding the analysis of immigration to Sweden we would particularly like to thank

Research paper thumbnail of The Challenges of Multi-dimensional Sentiment Analysis Across Languages

This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of soc... more This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of social media content. We use parallel corpora of movie subtitles as a proxy for colloquial language in social media channels and a multilingual emotion lexicon for fine-grained sentiment analyses. Parallel data sets make it possible to study the preservation of sentiments and emotions in translation and our assessment reveals that the lexical approach shows great inter-language agreement. However, our manual evaluation also suggests that the use of purely lexical methods is limited and further studies are necessary to pinpoint the cross-lingual differences and to develop better sentiment classifiers.

Research paper thumbnail of Towards the Inevitable Demise of Everybody?: A multifactorial analysis of -one/-body/-man variation in indefinite pronouns in historical American English

Language Variation and Change, 2019

Research paper thumbnail of Challenges in Annotation: Annotator Experiences from a Crowdsourced Emotion Annotation Task

With the prevalence of machine learning in natural language processing and other fields, an incre... more With the prevalence of machine learning in natural language processing and other fields, an increasing number of crowd-sourced data sets are created and published. However, very little has been written about the annotation process from the point of view of the annotators. This pilot study aims to help fill the gap and provide insights into how to maximize the quality of the annotation output of crowd-sourced annotations with a focus on fine-grained sentence-level sentiment and emotion annotation from the annotators point of view.

Research paper thumbnail of European intercultural workplace: Sverige

Vi vill tacka Leonardo da Vinci II Programmet för beviljade projektmedel, Göteborgs universitet, ... more Vi vill tacka Leonardo da Vinci II Programmet för beviljade projektmedel, Göteborgs universitet, det tvärvetenskapliga forskningscentrat Kollegium SSKKII och institutionen för lingvistik, som vi i den svenska projektgruppen haft ett gott samarbete med. Vi vill särskilt tacka projektet” Kommunikation och interaktion i den mångkulturella vården".

Research paper thumbnail of European intercultural workplace: Sweden

Gothenburg: The European Intercultural Workplace Project, Jun 1, 2007

European workplaces are experiencing major transformation. Economic and political changes in Euro... more European workplaces are experiencing major transformation. Economic and political changes in Europe over the past decades have resulted in a vast increase in the cultural diversity of those living, working and being educated within its borders. The expansion of the EU coupled with labor shortages in many parts of the continent have brought about a steady increase in mobility both within and from outside the EEA. This trend is likely to continue and expand, as workplaces grow into microcosms of a culturally diverse society.

Research paper thumbnail of Creating a Dataset for Multilingual Fine-grained Emotion-detection Using Gamification-based Annotation

Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detect... more This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection. We present a flexible tool, Sentimentator, that can be used for efficient annotation based on crowd sourcing and a selfperpetuating gold standard. We also present a novel dataset with multi-dimensional annotations of emotions and sentiments in movie subtitles that enables research on sentiment preservation across languages and the creation of robust multilingual emotion detection tools. The tools and datasets are public and opensource and can easily be extended and applied for various purposes.

Research paper thumbnail of Affect as a proxy for literary mood

Journal of Data Mining & Digital Humanities

We propose to use affect as a proxy for mood in literary texts. In this study, we explore the dif... more We propose to use affect as a proxy for mood in literary texts. In this study, we explore the differences in computationally detecting tone versus detecting mood. Methodologically we utilize affective word embeddings to look at the affective distribution in different text segments. We also present a simple yet efficient and effective method of enhancing emotion lexicons to take both semantic shift and the domain of the text into account producing real-world congruent results closely matching both contemporary and modern qualitative analyses.

Research paper thumbnail of Strategic sentiments and emotions in post-Second World War party manifestos in Finland

Journal of Computational Social Science

We contribute to the growing number of studies on emotions and politics by investigating how poli... more We contribute to the growing number of studies on emotions and politics by investigating how political parties strategically use sentiments and emotions in party manifestos. We use computational methods in examining changes of sentiments and emotions in Finnish party manifestos from 1945 to 2019. We use sentiment and emotion lexicons first translated from English into Finnish and then modified for the purposes of our study. We analyze how the use of emotions and sentiments differs between government and opposition parties depending on their left/right ideology and the specific type of party manifesto. In addition to traditional sentiment and emotion analysis, we use emotion intensity analysis. Our results indicate that in Finland, government and opposition parties do not differ substantially from each other in their use of emotional language. From a historical perspective, the individual emotions used in party manifestos have persisted, but changes have taken place in the intensity ...

Research paper thumbnail of Hate speech, Censorship, and Freedom of Speech: The Changing Policies of Reddit

Journal of Data Mining & Digital Humanities

This paper examines the shift in focus on content policies and user attitudes on the social media... more This paper examines the shift in focus on content policies and user attitudes on the social media platform Reddit. We do this by focusing on comments from general Reddit users from five posts made by admins (moderators) on updates to Reddit Content Policy. All five concern the nature of what kind of content is allowed to be posted on Reddit, and which measures will be taken against content that violates these policies. We use topic modeling to probe how the general discourse for Redditors has changed around limitations on content, and later, limitations on hate speech, or speech that incites violence against a particular group. We show that there is a clear shift in both the contents and the user attitudes that can be linked to contemporary societal upheaval as well as newly passed laws and regulations, and contribute to the wider discussion on hate speech moderation.

Research paper thumbnail of Towards the Inevitable Demise of Everybody?: A multifactorial analysis of -one/-body/-man variation in indefinite pronouns in historical American English

Language Variation and Change, 2019

Research paper thumbnail of The Language of Emotions : Building and Applying Computational Methods for Emotion Detection for English and Beyond

Helsingin yliopisto, Mar 5, 2021

Research paper thumbnail of LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?

This paper presents the different models submitted by the LT@Helsinki team for the SemEval 2020 S... more This paper presents the different models submitted by the LT@Helsinki team for the SemEval 2020 Shared Task 12. Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively. In both cases we used the so-called Bidirectional Encoder Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID and SOLID datasets. The results show that offensive tweet classification is one of several language-based tasks where BERT can achieve state-of-the-art results.

Research paper thumbnail of Language Change Database: A new online resource

ICAME Journal, 2016

We introduce the Language Change Database (LCD), which provides access to the results of previous... more We introduce the Language Change Database (LCD), which provides access to the results of previous corpus-based research dealing with change in the English language. The LCD will be published on an open-access linked data platform that will allow users to enter information about their own publications into the database and to conduct searches based on linguistic and extralinguistic parameters. Both metadata and numerical data from the original publications will be available for download, enabling systematic reviews, meta-analyses, replication studies and statistical modelling of language change. The LCD will be of interest to scholars, teachers and students of English.

Research paper thumbnail of Teaching Computational Methods to Humanities Students

This paper discusses the academic and societal implications of teaching computational methods to ... more This paper discusses the academic and societal implications of teaching computational methods to humanities students from the perspective of digital humanities. Pedagogical choices are backed up by both pedagogical theory and concrete examples from actual courses and course feedback. The aim of this paper is to introduce clear best-practice recommendations for developing digital humanities teaching with an emphasis on methods teaching in order to increase the number of students who understand such methods and can apply them to their own projects.

Research paper thumbnail of XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-anno... more We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

Research paper thumbnail of Sentimentator: Gamifying Fine-Grained Sentiment Annotation

We introduce Sentimentator; a publicly available gamified web-based annotation platform for fine-... more We introduce Sentimentator; a publicly available gamified web-based annotation platform for fine-grained sentiment annotation at the sentence-level. Sentimentator is unique in that it moves beyond binary classification. We use a tendimensional model which allows for the annotation of 51 unique sentiments and emotions. The platform is gamified with a complex scoring system designed to reward users for high quality annotations. Sentimentator introduces several unique features that have previously not been available, or at best very limited, for sentiment annotation. In particular, it provides streamlined multi-dimensional annotation optimized for sentence-level annotation of movie subtitles. Because the platform is publicly available it will benefit anyone and everyone interested in finegrained sentiment analysis and emotion detection, as well as annotation of other datasets.

Research paper thumbnail of Emotion Preservation in Translation: Evaluating Datasets for Annotation Projection

This paper is a pilot study that aims to explore the viability of annotation projection from one ... more This paper is a pilot study that aims to explore the viability of annotation projection from one language to another as well as to evaluate the multilingual data set we have created for emotion analysis. We study different language pairs based on parallel corpora for sentiment and emotion annotations and explore annotator agreement. We show that the data source is a possible one for reliable L1 data to be used in annotation projection from high-resource languages, such as English, into low-resource languages and that this is a reliable way of creating data sets for fine-grained sentiment analysis and emotion detection.

Research paper thumbnail of SELF & FEIL: Emotion and Intensity Lexicons for Finnish

ArXiv, 2021

This paper introduces a Sentiment and Emotion Lexicon for Finnish (SELF) and a Finnish Emotion In... more This paper introduces a Sentiment and Emotion Lexicon for Finnish (SELF) and a Finnish Emotion Intensity Lexicon (FEIL). We describe the lexicon creation process and evaluate the lexicon using some commonly available tools. The lexicon uses annotations projected from the NRC Emotion Lexicon with carefully edited translations. To our knowledge, this is the first comprehensive sentiment and emotion lexicon for Finnish.

Research paper thumbnail of Emotion Annotation: Rethinking Emotion Categorization

One of the biggest hurdles for the utilization of machine learning in interdisciplinary projects ... more One of the biggest hurdles for the utilization of machine learning in interdisciplinary projects is the need for annotated training data which is costly to create. Emotion annotation is a notoriously difficult task, and the current annotation schemes which are based on psychological theories of human interaction are not always the most conducive for the creation of reliable emotion annotations, nor are they optimal for annotating emotions in the modality of text. This paper discusses the theory, history, and challenges of emotion annotation, and proposes improvements for emotion annotation tasks based on both theory and case studies. These improvements focus on rethinking the categorization of emotions and the overlap and disjointedness of emotion categories.

Research paper thumbnail of European intercultural workplace: Sweden

We present our thanks to the Leonardo da Vinci II Program for granting project funding, as well a... more We present our thanks to the Leonardo da Vinci II Program for granting project funding, as well as Göteborg university, the interdisciplinary research center SSKKII and the Department of Linguistics, where the Swedish project team has been working in fruitful cooperation with department colleagues. In particular, we would like to thank the Project "Communication and Interaction in Multicultural Health Care". Our warmest thanks to all who have provided us with valuable information for the national report. Regarding the analysis of immigration to Sweden we would particularly like to thank

Research paper thumbnail of The Challenges of Multi-dimensional Sentiment Analysis Across Languages

This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of soc... more This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of social media content. We use parallel corpora of movie subtitles as a proxy for colloquial language in social media channels and a multilingual emotion lexicon for fine-grained sentiment analyses. Parallel data sets make it possible to study the preservation of sentiments and emotions in translation and our assessment reveals that the lexical approach shows great inter-language agreement. However, our manual evaluation also suggests that the use of purely lexical methods is limited and further studies are necessary to pinpoint the cross-lingual differences and to develop better sentiment classifiers.

Research paper thumbnail of Towards the Inevitable Demise of Everybody?: A multifactorial analysis of -one/-body/-man variation in indefinite pronouns in historical American English

Language Variation and Change, 2019

Research paper thumbnail of Challenges in Annotation: Annotator Experiences from a Crowdsourced Emotion Annotation Task

With the prevalence of machine learning in natural language processing and other fields, an incre... more With the prevalence of machine learning in natural language processing and other fields, an increasing number of crowd-sourced data sets are created and published. However, very little has been written about the annotation process from the point of view of the annotators. This pilot study aims to help fill the gap and provide insights into how to maximize the quality of the annotation output of crowd-sourced annotations with a focus on fine-grained sentence-level sentiment and emotion annotation from the annotators point of view.

Research paper thumbnail of European intercultural workplace: Sverige

Vi vill tacka Leonardo da Vinci II Programmet för beviljade projektmedel, Göteborgs universitet, ... more Vi vill tacka Leonardo da Vinci II Programmet för beviljade projektmedel, Göteborgs universitet, det tvärvetenskapliga forskningscentrat Kollegium SSKKII och institutionen för lingvistik, som vi i den svenska projektgruppen haft ett gott samarbete med. Vi vill särskilt tacka projektet” Kommunikation och interaktion i den mångkulturella vården".

Research paper thumbnail of European intercultural workplace: Sweden

Gothenburg: The European Intercultural Workplace Project, Jun 1, 2007

European workplaces are experiencing major transformation. Economic and political changes in Euro... more European workplaces are experiencing major transformation. Economic and political changes in Europe over the past decades have resulted in a vast increase in the cultural diversity of those living, working and being educated within its borders. The expansion of the EU coupled with labor shortages in many parts of the continent have brought about a steady increase in mobility both within and from outside the EEA. This trend is likely to continue and expand, as workplaces grow into microcosms of a culturally diverse society.

Research paper thumbnail of Creating a Dataset for Multilingual Fine-grained Emotion-detection Using Gamification-based Annotation

Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detect... more This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection. We present a flexible tool, Sentimentator, that can be used for efficient annotation based on crowd sourcing and a selfperpetuating gold standard. We also present a novel dataset with multi-dimensional annotations of emotions and sentiments in movie subtitles that enables research on sentiment preservation across languages and the creation of robust multilingual emotion detection tools. The tools and datasets are public and opensource and can easily be extended and applied for various purposes.