Asymmetric Actor-Critic for Multi-turn LLM Agents
Large language models (LLMs) have made significant strides in their reasoning and conversational abilities. However, ensuring reliable behavior in multi-turn interactions remains a formidable challenge. In many real-world applications, agents are required to succeed in one-shot settings where retries are not an option. Traditional approaches often rely on reflection or post-hoc evaluation, which necessitate additional attempts, or they assume fully trainable models that do not leverage proprietary LLMs.
In response to these challenges, we propose a novel asymmetric actor-critic framework designed for reliable conversational agents. Our framework features a powerful proprietary LLM functioning as the actor, while a smaller open-source critic provides essential runtime supervision. This critic monitors the actor’s actions and intervenes within the same interaction trajectory, ensuring that the conversation remains on track.
Key Features of the Asymmetric Actor-Critic Framework
- Asymmetric Design: The framework leverages a generation-verification asymmetry, where high-quality generation necessitates large models, but effective oversight can often be achieved by smaller ones.
- Fixed Actor Supervision: Unlike traditional training-based actor-critic methods, our framework supervises a fixed actor that operates within open-ended conversational environments.
- Data Generation Pipeline: We introduce a data generation pipeline that creates supervision signals for critic fine-tuning without necessitating modifications to the actor.
Experimental Validation
We conducted experiments on two benchmark datasets: τ-bench and UserBench. The results demonstrated that our framework significantly enhances reliability and task success rates when compared to strong single-agent baselines. Importantly, lightweight open-source critics were found to rival or even surpass larger proprietary models in the critic role. Furthermore, the fine-tuning of critics yielded additional performance gains over several state-of-the-art methods.
Conclusion
The proposed asymmetric actor-critic framework represents a significant advancement in the development of reliable conversational agents. By combining the strengths of both large proprietary LLMs and smaller open-source critics, we achieve a robust solution that performs well in multi-turn interactions without the need for retries. This innovation not only enhances the performance of conversational agents but also opens new avenues for the integration of various model sizes in practical applications.
Future Directions
Moving forward, we aim to explore further enhancements to the critic’s capabilities as well as investigate the implications of our framework in diverse conversational contexts. Additionally, we plan to assess the scalability of our approach across different domains and languages, ensuring that our solutions are applicable in a wide range of real-world scenarios.
