Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment
Summary: arXiv:2503.02976v3 Announce Type: replace
Abstract
Large language models (LLMs), initially developed for generative AI, are now evolving into agentic AI systems, which make decisions in complex, real-world contexts. Unfortunately, while their generative capabilities are well-documented, their decision-making processes remain poorly understood. This is particularly evident when testing targeted decision-making: for instance, how models handle exceptions, a critical and challenging aspect of decision-making made relevant by the inherent incompleteness of contracts.
Key Findings
In our research, we demonstrate that LLMs, even those that excel at reasoning, deviate significantly from human judgments. This deviation occurs because these models adhere strictly to policies, even when such adherence is impractical, suboptimal, or counterproductive. Our study evaluates three approaches to tuning AI agents to handle exceptions:
- Ethical Framework Prompting: This method involves guiding the AI with ethical considerations but unfortunately fails to yield the desired alignment with human judgments.
- Chain-of-Thought Reasoning: While this approach offers slight improvements, it does not substantially enhance the model’s decision-making capabilities regarding exceptions.
- Supervised Fine-Tuning: This method, particularly when combined with human explanations, shows markedly better results in aligning AI decision-making with human judgment.
Supervised Fine-Tuning Results
Our experiments reveal that supervised fine-tuning enabled models to generalize human-like decision-making to novel scenarios. This demonstrates the potential for transfer learning of human-aligned decision-making across various contexts. The implications of these findings are significant, particularly as they suggest that aligning LLMs with human judgment necessitates explicit training on how decisions are made, rather than merely focusing on which decisions are made.
Conclusion
The results of this study highlight a critical area in the development of agentic AI systems: the need to address LLMs’ shortcomings in handling exceptions. As AI continues to evolve and integrate into more complex and dynamic environments, ensuring that these systems can effectively align with human judgment will be paramount. By focusing on supervised fine-tuning with explanations, we can guide the development of AI models that not only understand policy but also adapt to the nuanced realities of human decision-making.
Future Directions
The ongoing evolution of AI necessitates further research into effective training methodologies. Future studies should explore:
- The long-term effects of supervised fine-tuning on AI decision-making.
- Additional frameworks and strategies for improving exception handling in LLMs.
- Cross-disciplinary approaches that integrate insights from cognitive science and ethics to enhance AI alignment with human values.
By addressing these challenges, we can move closer to developing AI systems that not only meet operational standards but also resonate with human ethical frameworks and decision-making processes.
