Attribution Bias in Large Language Models
Summary: arXiv:2604.05224v1 Announce Type: new
As large language models (LLMs) continue to evolve and find applications in various fields, particularly in search and information retrieval, the importance of accurately attributing content to original authors is paramount. In a recent study, researchers have introduced AttriBench, a novel dataset designed to benchmark quote attribution in LLMs while addressing demographic biases.
Introduction to AttriBench
AttriBench is the first dataset that is both fame- and demographically-balanced, allowing for a controlled investigation into how demographic factors impact quote attribution. The dataset aims to assist researchers in evaluating the performance of LLMs when tasked with attributing quotes to their rightful authors, ensuring that both well-known and lesser-known voices are represented fairly.
Study Findings
The researchers conducted evaluations on 11 widely utilized LLMs across various prompt settings. The findings revealed that quote attribution remains a significant challenge, even for cutting-edge models. The study highlights several key points:
- Systematic Disparities: There are notable differences in attribution accuracy that correlate with race, gender, and intersectional identities. These disparities indicate that some groups are underrepresented or misattributed more frequently than others.
- Suppression Phenomenon: The research introduces a unique failure mode termed “suppression,” where models fail to provide any attribution, even when they possess the necessary authorship information. This issue is particularly concerning as it obscures the contributions of certain authors.
- Widespread and Uneven Distribution: The suppression of attribution is not randomly distributed; it disproportionately affects specific demographic groups, further evidencing the presence of systematic biases in LLM outputs.
Significance of the Findings
The results of this study are crucial for understanding representational fairness in LLMs. They call into question the efficacy of standard accuracy metrics, which fail to capture the underlying biases that exist within model outputs. By highlighting these disparities, researchers aim to drive the development of more equitable AI systems that fairly represent all demographic groups.
Conclusion
As the reliance on LLMs grows, the implications of their attribution capabilities become increasingly significant. The introduction of AttriBench represents a step forward in the quest for fairness in AI, providing a valuable tool for future research. Addressing the challenges of quote attribution and systemic biases is essential for ensuring that AI technologies serve a diverse and inclusive user base.
As we advance in the field of artificial intelligence, ongoing scrutiny of how these models operate will be vital in shaping a fairer digital landscape.
