Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models
Summary: arXiv:2506.14092v3 Announce Type: replace
In recent years, large language models (LLMs) have become integral to decision-support systems, particularly in high-stakes domains such as hiring and university admissions. These systems often require choosing between competing alternatives, which can significantly impact individuals’ lives. Although previous studies have identified position biases in LLM-driven comparisons, a systematic analysis linking these biases to underlying preference structures has been lacking. This article presents a comprehensive study that explores the nuances of position biases across various LLMs and two distinct domains: resume comparisons and color selection.
Understanding Position Biases
Our research highlights strong and consistent order effects in LLMs. Position biases manifest differently depending on the quality of the options being compared. Notably, we observed the following trends:
- Quality-Dependent Shift: When all options presented are of high quality, models tend to favor the first option. Conversely, if the quality of the options is lower, LLMs exhibit a preference for later options.
- Name Bias: A previously undocumented phenomenon where certain names receive preferential treatment, even when controlling for demographic signals.
Framework for Analyzing Preferences
To better understand the implications of these biases, we propose an extension of the rational choice framework. This framework classifies pairwise preferences as:
- Robust: Preferences that remain consistent across various contexts.
- Fragile: Preferences that are easily influenced by superficial factors, such as the order in which options are presented.
- Indifferent: Preferences where there is no clear distinction among options.
Our findings reveal that order effects can lead LLMs to select options that are strictly inferior, indicating a distinct failure mode not typically observed in human decision-making processes. This raises questions about the reliability of LLMs in high-stakes decision-making scenarios.
Mitigation Strategies
Recognizing the potential pitfalls of order effects, we propose several targeted strategies to mitigate these biases. Among these strategies is a novel approach using the temperature parameter, which can help recover underlying preferences when distorted by order effects. By adjusting this parameter, we aim to enhance the decision-making capabilities of LLMs, making them more aligned with genuine preferences rather than superficial order biases.
Conclusion
As LLMs continue to play a critical role in high-stakes decision-making, it is essential to understand and address the biases that can influence their outputs. Our study sheds light on the complexities of order effects and their implications for LLM performance. Future research should focus on refining mitigation strategies and exploring the broader impacts of these biases on decision-support systems.
