An Analysis of Social Biases Present in BERT Variants Across Multiple Languages (original) (raw)

Compared to Us, They Are …: An Exploration of Social Biases in English and Italian Language Models Using Prompting and Sentiment Analysis

International multiconference Information Society 2023 - Data Mining and Data Warehouses, 2023

Social biases are biases toward specific social groups, often accompanied by discriminatory behavior. They are reflected and perpetuated through language and language models. In this study, we consider two language models (RoBERTa, in English; and UmBERTo, in Italian), and investigate and compare the presence of social biases in each one. Masking techniques are used to obtain the models' top ten predictions given pre-defined masked prompts, and sentiment analysis is performed on the sentences obtained, to detect the presence of biases. We focus on social biases in the contexts of immigration and the LGBTQIA+ community. Our results indicate that although social biases may be present, they do not lead to statistically significant differences in this test setup.

Identifying and Reducing Gender Bias in Word-Level Language Models

ArXiv, 2019

Many text corpora exhibit socially problematic biases, which can be propagated or amplified in the models trained on such data. For example, doctor cooccurs more frequently with male pronouns than female pronouns. In this study we (i) propose a metric to measure gender bias; (ii) measure bias in a text corpus and the text generated from a recurrent neural network language model trained on the text corpus; (iii) propose a regularization loss term for the language model that minimizes the projection of encoder-trained embeddings onto an embedding subspace that encodes gender; (iv) finally, evaluate efficacy of our proposed method on reducing gender bias. We find this regularization method to be effective in reducing gender bias up to an optimal weight assigned to the loss term, beyond which the model becomes unstable as the perplexity increases. We replicate this study on three training corpora—Penn Treebank, WikiText-2, and CNN/Daily Mail—resulting in similar conclusions.

Investigating Gender Bias in BERT

Cogn. Comput., 2021

Contextual language models (CLMs) have pushed the NLP benchmarks to a new height. It has become a new norm to utilize CLM provided word embeddings in downstream tasks such as text classification. However, unless addressed, CLMs are prone to learn intrinsic gender-bias in the dataset. As a result, predictions of downstream NLP models can vary noticeably by varying gender words, such as replacing "he" to "she", or even gender-neutral words. In this paper, we focus our analysis on a popular CLM, i.e., BERT. We analyse the gender-bias it induces in five downstream tasks related to emotion and sentiment intensity prediction. For each task, we train a simple regressor utilizing BERT's word embeddings. We then evaluate the gender-bias in regressors using an equity evaluation corpus. Ideally and from the specific design, the models should discard gender informative features from the input. However, the results show a significant dependence of the system's predict...

Fine, A. B., Frank, A., Jaeger, T. F., and Van Durme, B. 2014. Biases in Predicting the Human Language Model. Proceedings of ACL 2014, Baltimore, MD, June 22nd-27th, XXX-XXX.

We consider the prediction of three human behavioral measures – lexical decision, word naming, and picture naming – through the lens of domain bias in language modeling. Contrasting the predictive ability of statistics derived from 6 different corpora, we find intuitive results showing that, e.g., a British corpus overpredicts the speed with which an American will react to the words ward and duke, and that the Google n-grams overpredicts familiarity with technology terms. This study aims to provoke increased consideration of the human language model by NLP practitioners: biases are not limited to differences between corpora (i.e. “train” vs. “test”); they can exist as well between corpora and the intended user of the resultant technology.

He is very intelligent, she is very beautiful? On Mitigating Social Biases in Language Modelling and Generation

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021

Social biases with respect to demographics (e.g., gender, age, race) in datasets are often encoded in the large pre-trained language models trained on them. Prior works have largely focused on mitigating biases in context-free representations, with recent shift to contextual ones. While this is useful for several word and sentence-level classification tasks, mitigating biases in only the representations may not suffice to use these models for language generation tasks, such as auto-completion, summarization, or dialogue generation. In this paper, we propose an approach to mitigate social biases in BERT, a large pre-trained contextual language model, and show its effectiveness in fill-in-the-blank sentence completion and summarization tasks. In addition to mitigating biases in BERT, which in general acts as an encoder, we propose lexical co-occurrence-based bias penalization in the decoder units in generation frameworks, and show bias mitigation in summarization. Finally, our approach results in better debiasing of BERT-based representations compared to post training bias mitigation, thus illustrating the efficacy of our approach to not just mitigate biases in representations, but also generate text with reduced biases.

Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias

ArXiv, 2020

Contextualized word embeddings have been replacing standard embeddings as the representational knowledge source of choice in NLP systems. Since a variety of biases have previously been found in standard word embeddings, it is crucial to assess biases encoded in their replacements as well. Focusing on BERT (Devlin et al., 2018), we measure gender bias by studying associations between gender-denoting target words and names of professions in English and German, comparing the findings with real-world workforce statistics. We mitigate bias by fine-tuning BERT on the GAP corpus (Webster et al., 2018), after applying Counterfactual Data Substitution (CDS) (Maudslay et al., 2019). We show that our method of measuring bias is appropriate for languages such as English, but not for languages with a rich morphology and gender-marking, such as German. Our results highlight the importance of investigating bias and mitigation techniques cross-linguistically, especially in view of the current empha...

An Information-Theoretic Approach and Dataset for Probing Gender Stereotypes in Multilingual Masked Language Models

Findings of the Association for Computational Linguistics: NAACL 2022

Warning: This work deals with statements of a stereotypical nature that may be upsetting. Bias research in NLP is a rapidly growing and developing field. Similar to CrowS-Pairs (Nangia et al., 2020), we assess gender bias in masked-language models (MLMs) by studying pairs of sentences that are identical except that the individuals referred to have different gender. Most bias research focuses on and often is specific to English. Using a novel methodology for creating sentence pairs that is applicable across languages, we create, based on CrowS-Pairs, a multilingual dataset for English, Finnish, German, Indonesian and Thai. Additionally, we propose S JSD , a new bias measure based on Jensen-Shannon divergence, which we argue retains more information from the model output probabilities than other previously proposed bias measures for MLMs. Using multilingual MLMs, we find that S JSD diagnoses the same systematic biased behavior for non-English that previous studies have found for monolingual English pre-trained MLMs. S JSD outperforms the CrowS-Pairs measure, which struggles to find such biases for smaller non-English datasets.

AutoBiasTest: Controllable Sentence Generation for Automated and Open-Ended Social Bias Testing in Language Models

arXiv (Cornell University), 2023

Social bias in Pretrained Language Models (PLMs) affects text generation and other downstream NLP tasks. Existing bias testing methods rely predominantly on manual templates or on expensive crowd-sourced data. We propose a novel AutoBiasTest method that automatically generates sentences for testing bias in PLMs, hence providing a flexible and lowcost alternative. Our approach uses another PLM for generation and controls the generation of sentences by conditioning on social group and attribute terms. We show that generated sentences are natural and similar to human-produced content in terms of word length and diversity. We illustrate that larger models used for generation produce estimates of social bias with lower variance. We find that our bias scores are well correlated with manual templates, but AutoBiasTest highlights biases not captured by these templates due to more diverse and realistic test sentences. By automating large-scale test sentence generation, we enable better estimation of underlying bias distributions.

Inference Functions in Large Language Models: A Comprehensive Framework for Bias Mitigation

Journal of Electrical Systems, 2024

Large Language Models (LLMs) have become a cornerstone of natural language processing tasks across industries. However, these models often perpetuate the biases present in the training data, resulting in harmful societal impacts. Bias in LLMs can manifest in various forms, such as gender stereotypes, racial prejudice, and cultural misrepresentation. This paper introduces Inference Functions, a post-processing mechanism designed to dynamically detect and mitigate bias in real-time during the inference stage. Unlike traditional bias mitigation techniques, which require pre-processing data or retraining models, inference functions offer a scalable and efficient solution by intervening after the model generates outputs. We explore the design, application, and trade-offs of inference functions for bias mitigation, backed by experiments and case studies. We also discuss the ethical implications and potential for compliance with emerging AI regulations.

Can Existing Methods Debias Languages Other than English? First Attempt to Analyze and Mitigate Japanese Word Embeddings

2020

It is known that word embeddings exhibit biases inherited from the corpus, and those biases reflect social stereotypes. Recently, many studies have been conducted to analyze and mitigate biases in word embeddings. Unsupervised Bias Enumeration (UBE) (Swinger et al., 2019) is one of approach to analyze biases for English, and Hard Debias (Bolukbasi et al., 2016) is the common technique to mitigate gender bias. These methods focused on English, or, in smaller extent, on Indo-European languages. However, it is not clear whether these methods can be generalized to other languages. In this paper, we apply these analyzing and mitigating methods, UBE and Hard Debias, to Japanese word embeddings. Additionally, we examine whether these methods can be used for Japanese. We experimentally show that UBE and Hard Debias cannot be sufficiently adapted to Japanese embeddings.