The Negation Bias in Large Language Models: Investigating bias reflected in linguistic markers

1citations
PDFProject
1
citations
#178
in COLM 2025
of 263 papers
3
Top Authors
1
Data Points

Abstract

Large Language Models trained on large-scale uncontrolled corpora often encode stereotypes and biases, which can be displayed through harmful text generation or biased associations. However, do they also pick up subtler linguistic patterns that can potentially reinforce and communicate biases and stereotypes, as humans do? We aim to bridge theoretical insights from social science with bias research in NLP by designing controlled, theoretically motivated LLM experiments to elicit this type of bias. Our case study is negation bias, the bias that humans have towards using negation to describe situations that challenge common stereotypes. We construct an evaluation dataset containing negated and affirmed stereotypical and anti-stereotypical sentences and evaluate the performance of eight language models using perplexity as a metric for measuring model surprisal. We find that the autoregressive decoder models in our experiment exhibit this bias, while we do not find evidence for it among the stacked encoder models.

Citation History

Feb 10, 2026
1