Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space
Summary: arXiv:2604.04944v1 Announce Type: cross
Abstract: Multiple-choice questions (MCQs) are widely used to evaluate large language models (LLMs). However, LLMs remain vulnerable to the presence of plausible distractors. This often diverts attention toward irrelevant choices, resulting in unstable oscillation between correct and incorrect answers.
Introduction
In recent years, the use of large language models (LLMs) has surged across various applications, making their evaluation increasingly critical. Among the numerous methods employed for this purpose, multiple-choice questions (MCQs) stand out due to their structured format and ease of analysis. Nevertheless, a significant challenge arises from the presence of plausible distractors within these MCQs, which can lead to cognitive overload for the models. This results in erratic decision-making, as LLMs may vacillate between correct and incorrect options.
Proposed Solution: Inclusion-of-Thoughts (IoT)
To address this challenge, we introduce a new approach known as Inclusion-of-Thoughts (IoT). This method is a progressive self-filtering strategy designed to enhance the decision-making capabilities of LLMs by mitigating the cognitive load associated with distractors. The core idea behind IoT is to reconstruct the MCQs so that only plausible option choices are presented to the model.
Key Features of IoT
- Self-Filtering Mechanism: IoT operates by filtering out irrelevant options, allowing the model to focus on the most plausible answers.
- Comparative Judgements: By providing a controlled setting, IoT fosters better comparative judgments and enhances the stability of the model’s internal reasoning.
- Transparency and Interpretability: The filtering process is explicitly documented, which improves the transparency and interpretability of the model’s decision-making.
Empirical Evaluation
We conducted extensive empirical evaluations to assess the effectiveness of IoT across various domains, including arithmetic, commonsense reasoning, and educational benchmarks. The results reveal substantial improvements in chain-of-thought performance with minimal computational overhead. Specifically, our findings indicate that the IoT framework significantly enhances the ability of LLMs to arrive at correct answers by reducing the influence of distractors.
Conclusion
The Inclusion-of-Thoughts strategy represents a significant advancement in the evaluation of large language models. By addressing the cognitive load associated with plausible distractors, IoT not only improves the stability of decision-making but also enhances the overall interpretability of LLMs. As AI continues to evolve, methodologies like IoT will be essential for ensuring that LLMs can operate effectively in diverse and complex environments.
