Investigating Gender Bias in BERT (original) (raw)

Evaluating the Underlying Gender Bias in Contextualized Word Embeddings

Proceedings of the First Workshop on Gender Bias in Natural Language Processing

Gender bias is highly impacting natural language processing applications. Word embeddings have clearly been proven both to keep and amplify gender biases that are present in current data sources. Recently, contextualized word embeddings have enhanced previous word embedding techniques by computing word vector representations dependent on the sentence they appear in. In this paper, we study the impact of this conceptual change in the word embedding computation in relation with gender bias. Our analysis includes different measures previously applied in the literature to standard word embeddings. Our findings suggest that contextualized word embeddings are less biased than standard ones even when the latter are debiased.

Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias

ArXiv, 2020

Contextualized word embeddings have been replacing standard embeddings as the representational knowledge source of choice in NLP systems. Since a variety of biases have previously been found in standard word embeddings, it is crucial to assess biases encoded in their replacements as well. Focusing on BERT (Devlin et al., 2018), we measure gender bias by studying associations between gender-denoting target words and names of professions in English and German, comparing the findings with real-world workforce statistics. We mitigate bias by fine-tuning BERT on the GAP corpus (Webster et al., 2018), after applying Counterfactual Data Substitution (CDS) (Maudslay et al., 2019). We show that our method of measuring bias is appropriate for languages such as English, but not for languages with a rich morphology and gender-marking, such as German. Our results highlight the importance of investigating bias and mitigation techniques cross-linguistically, especially in view of the current empha...

Identifying and Reducing Gender Bias in Word-Level Language Models

ArXiv, 2019

Many text corpora exhibit socially problematic biases, which can be propagated or amplified in the models trained on such data. For example, doctor cooccurs more frequently with male pronouns than female pronouns. In this study we (i) propose a metric to measure gender bias; (ii) measure bias in a text corpus and the text generated from a recurrent neural network language model trained on the text corpus; (iii) propose a regularization loss term for the language model that minimizes the projection of encoder-trained embeddings onto an embedding subspace that encodes gender; (iv) finally, evaluate efficacy of our proposed method on reducing gender bias. We find this regularization method to be effective in reducing gender bias up to an optimal weight assigned to the loss term, beyond which the model becomes unstable as the perplexity increases. We replicate this study on three training corpora—Penn Treebank, WikiText-2, and CNN/Daily Mail—resulting in similar conclusions.

Measuring Gender Bias in Word Embeddings across Domains and Discovering New Gender Bias Word Categories

Proceedings of the First Workshop on Gender Bias in Natural Language Processing

Prior work has shown that word embeddings capture human stereotypes, including gender bias. However, there is a lack of studies testing the presence of specific gender bias categories in word embeddings across diverse domains. This paper aims to fill this gap by applying the WEAT bias detection method to four sets of word embeddings trained on corpora from four different domains: news, social networking, biomedical and a gender-balanced corpus extracted from Wikipedia (GAP). We find that some domains are definitely more prone to gender bias than others, and that the categories of gender bias present also vary for each set of word embeddings. We detect some gender bias in GAP. We also propose a simple but novel method for discovering new bias categories by clustering word embeddings. We validate this method through WEAT's hypothesis testing mechanism and find it useful for expanding the relatively small set of wellknown gender bias word categories commonly used in the literature.

Impact of Gender Debiased Word Embeddings in Language Modeling

ArXiv, 2021

Gender, race and social biases have recently been detected as evident examples of unfairness in applications of Natural Language Processing. A key path towards fairness is to understand, analyse and interpret our data and algorithms. Recent studies have shown that the human-generated data used in training is an apparent factor of getting biases. In addition, current algorithms have also been proven to amplify biases from data. To further address these concerns, in this paper, we study how an stateof-the-art recurrent neural language model behaves when trained on data, which under-represents females, using pre-trained standard and debiased word embeddings. Results show that language models inherit higher bias when trained on unbalanced data when using pre-trained embeddings, in comparison with using embeddings trained within the task. Moreover, results show that, on the same data, language models inherit lower bias when using debiased pre-trained emdeddings, compared to using standar...

Word Embeddings via Causal Inference: Gender Bias Reducing and Semantic Information Preserving

Proceedings of the AAAI Conference on Artificial Intelligence

With widening deployments of natural language processing (NLP) in daily life, inherited social biases from NLP models have become more severe and problematic. Previous studies have shown that word embeddings trained on human-generated corpora have strong gender biases that can produce discriminative results in downstream tasks. Previous debiasing methods focus mainly on modeling bias and only implicitly consider semantic information while completely overlooking the complex underlying causal structure among bias and semantic components. To address these issues, we propose a novel methodology that leverages a causal inference framework to effectively remove gender bias. The proposed method allows us to construct and analyze the complex causal mechanisms facilitating gender information flow while retaining oracle semantic information within word embeddings. Our comprehensive experiments show that the proposed method achieves state-of-the-art results in gender-debiasing tasks. In additi...

Mitigating Gender Bias in Natural Language Processing: Literature Review

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019

As Natural Language Processing (NLP) and Machine Learning (ML) tools rise in popularity, it becomes increasingly vital to recognize the role they play in shaping societal biases and stereotypes. Although NLP models have shown success in modeling various applications, they propagate and may even amplify gender bias found in text corpora. While the study of bias in artificial intelligence is not new, methods to mitigate gender bias in NLP are relatively nascent. In this paper, we review contemporary studies on recognizing and mitigating gender bias in NLP. We discuss gender bias based on four forms of representation bias and analyze methods recognizing gender bias. Furthermore, we discuss the advantages and drawbacks of existing gender debiasing methods. Finally, we discuss future studies for recognizing and mitigating gender bias in NLP. * Equal Contribution. Task Example of Representation Bias in the Context of Gender D S R U Machine Translation Translating "He is a nurse. She is a doctor." to Hungarian and back to English results in "She is a nurse. He is a doctor." (Douglas, 2017) Caption Generation An image captioning model incorrectly predicts the agent to be male because there is a computer nearby (Burns et al., 2018). Speech Recognition Automatic speech detection works better with male voices than female voices (Tatman, 2017). Sentiment Analysis Sentiment Analysis Systems rank sentences containing female noun phrases to be indicative of anger more often than sentences containing male noun phrases (Park et al., 2018). Language Model "He is doctor" has a higher conditional likelihood than "She is doctor" (Lu et al., 2018). Word Embedding Analogies such as "man : woman :: computer programmer : homemaker" are automatically generated by models trained on biased word embeddings (Bolukbasi et al., 2016).

Gender Biased Algorithms: Word Embedding Models' impact on Gender Equality

2019

Word embedding is a popular technique used in machine learning and natural language processing tasks to represent text data as vectors. Recent events show that algorithms can be sexist, putting men in a priority position in comparison with women. In this analysis, I will apply the science and technology theories on the word embedding models (WEMs) and analyse how society, politics, culture affect word-embedding, and how the models, in turn, affect society, politics, and culture. I will examine what impact the WEMs have on gender equality from the social constructivism and technological determinism perspectives.

Multi-Dimensional Gender Bias Classification

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Machine learning models are trained to find patterns in data. NLP models can inadvertently learn socially undesirable patterns when training on gender biased text. In this work, we propose a novel, general framework that decomposes gender bias in text along several pragmatic and semantic dimensions: bias from the gender of the person being spoken about, bias from the gender of the person being spoken to, and bias from the gender of the speaker. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information. In addition, we collect a new, crowdsourced evaluation benchmark. Distinguishing between gender bias along multiple dimensions enables us to train better and more fine-grained gender bias classifiers. We show our classifiers are valuable for a variety of applications, like controlling for gender bias in generative models, detecting gender bias in arbitrary text, and classifying text as offensive based on its genderedness.

Gender Gaps Correlate with Gender Bias in Social Media Word Embeddings

2020

Gender status, gender roles, and gender values vary widely across cultures. Anthropology has provided qualitative accounts of economic, cultural, and biological factors that impact social groups, and international organizations have gathered indices and surveys to help quantify gender inequalities in states. Concurrently, machine learning research has recently characterized pervasive gender biases in AI language models, rooting from biases in their textual training data. While these machine biases produce sub-optimal inferences, they may help us characterize and predict statistical gender gaps and gender values in the culture(s) that produced the training text, thereby helping us understand cultural context through big data. This paper presents an approach to (1) construct word embeddings (i.e., vector-based lexical semantics) from a region’s social media, (2) quantify gender bias in word embeddings, and (3) correlate biases with survey responses and statistical gender gaps in educa...