Who Benefits from RAG? The Role of Exposure, Utility and Attribution Bias
In recent years, large language models (LLMs) have gained significant attention for their capabilities in generating human-like text. One of the most promising enhancements to these models is Retrieval-Augmented Generation (RAG), which has been shown to improve accuracy by grounding responses in relevant external documents. However, a compelling question arises: is this improvement equitable across different groups? This article delves into the fairness implications of RAG, particularly focusing on what we term “query group fairness.”
Understanding Query Group Fairness
Query group fairness refers to the systematic variation in accuracy improvements for queries associated with certain demographic or fairness categories in RAG-enhanced LLMs. While the technology has shown promise, it is crucial to examine whether certain groups benefit more than others from the integration of RAG. Our research investigates three critical factors that influence this fairness: exposure, utility, and attribution bias.
Key Factors Impacting Fairness
- Group Exposure: This factor considers the representation of different groups in the retrieved documents. A higher proportion of documents from a particular group increases the likelihood that queries associated with that group receive accurate responses. An imbalance in exposure can lead to unfair advantages for some groups over others.
- Group Utility: This aspect measures how much the documents from each group contribute to improving the accuracy of the generated responses. If a specific group’s documents are more useful, then queries associated with that group are likely to see greater improvements in accuracy.
- Group Attribution: This refers to the extent to which the LLM depends on documents from each group when formulating answers. A higher reliance on documents from a particular group can lead to biases in the generated output, affecting the overall fairness of the system.
Research Findings
Our extensive experiments utilized three datasets from the TREC 2022 Fair Ranking Track, focusing on two primary tasks: article generation and title generation. The results revealed that RAG systems indeed suffer from query group fairness issues. In comparison to LLM-only systems, RAG systems exhibited amplified disparities in average accuracy across different groups.
Furthermore, we discovered that the interplay between group utility, exposure, and attribution exhibited strong correlations with the accuracy levels of queries from respective groups. These findings underscore the importance of addressing these factors to ensure fairer outcomes in RAG systems.
Conclusion and Future Directions
The implications of our research extend beyond academic interest; they raise critical questions about the ethical deployment of AI technologies. As RAG systems become more prevalent, understanding and mitigating fairness disparities will be essential for fostering equitable AI solutions.
Our data and code related to this research are publicly accessible on GitHub, inviting further exploration and discussion within the AI community. As we continue to refine these technologies, our collective responsibility is to ensure they serve all segments of society fairly and justly.
