SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution
Summary: arXiv:2604.18982v1 Announce Type: new
Abstract: Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge for language agents. Training such agents via reinforcement learning requires solving the credit assignment problem: determining how individual utterances contribute to multi-turn dialogue outcomes. Existing approaches directly employ language models to distribute episode-level rewards, yielding attributions that are retrospective and lack theoretical grounding. We propose SAVOIR (ShApley Value fOr SocIal RL), a novel principled framework grounded in cooperative game theory. Our approach combines two complementary principles: expected utility shifts evaluation from retrospective attribution to prospective valuation, capturing an utterance’s strategic potential for enabling favorable future trajectories; Shapley values ensure fair credit distribution with axiomatic guarantees of efficiency, symmetry, and marginality. Experiments on the SOTOPIA benchmark demonstrate that SAVOIR achieves new state-of-the-art performance across all evaluation settings, with our 7B model matching or exceeding proprietary models including GPT-4o and Claude-3.5-Sonnet. Notably, even large reasoning models consistently underperform, suggesting social intelligence requires qualitatively different capabilities than analytical reasoning.
Introduction
The development of social intelligence in AI has become increasingly important as the demand for sophisticated conversational agents rises. Traditional language models often struggle with the intricacies of human interaction, particularly in dialogue systems that require an understanding of context and social dynamics. The newly proposed SAVOIR framework aims to bridge this gap by leveraging principles from cooperative game theory to enhance the training of social agents.
The Challenge of Social Intelligence
Social intelligence encompasses a range of skills necessary for effectively navigating interpersonal communications. Current reinforcement learning techniques face the significant challenge of credit assignment, which involves attributing the success of dialogue outcomes to specific utterances made throughout a conversation. Without a robust mechanism to assess the contributions of individual utterances, agents may fail to learn effectively from their interactions.
Introducing SAVOIR
SAVOIR stands for ShApley Value fOr SocIal RL and represents a paradigm shift in how social agents are trained. This framework integrates two core principles:
- Expected Utility: This principle transitions the evaluation process from a retrospective to a prospective approach. It emphasizes the strategic potential of utterances in facilitating positive future interactions.
- Shapley Values: By employing Shapley values, SAVOIR guarantees a fair distribution of credit among utterances. This method adheres to established axioms, including efficiency, symmetry, and marginality, ensuring that each contribution is recognized appropriately.
Experimental Results
To validate the effectiveness of SAVOIR, extensive experiments were conducted using the SOTOPIA benchmark. The results showcased that SAVOIR achieved unprecedented performance across various evaluation settings. Notably, the 7B model developed under this framework matched or surpassed the capabilities of well-known proprietary models such as GPT-4o and Claude-3.5-Sonnet.
Implications for Future Research
The findings suggest that traditional large reasoning models may not adequately address the nuances of social intelligence. This indicates a need for further exploration into methodologies that prioritize social reasoning alongside analytical capabilities. The SAVOIR framework provides a promising avenue for future research aimed at developing conversational agents that can better understand and navigate the complexities of human interaction.
Conclusion
As AI continues to evolve, frameworks like SAVOIR mark significant progress in equipping language agents with the social intelligence necessary for effective communication. By addressing fundamental challenges in credit assignment and leveraging cooperative game theory, SAVOIR sets a new standard for training social agents and opens the door for further advancements in the field.
