Team of Thoughts: Efficient Test-time Scaling of Agentic Systems through Orchestrated Tool Calling
Summary: arXiv:2602.16485v2 Announce Type: replace-cross
The area of Multi-Agent Systems (MAS) has witnessed considerable advancements over the years. However, most existing systems tend to utilize homogeneous model configurations, which limits the potential to leverage the diverse capabilities and expertise offered by different post-trained architectures. In response to these limitations, we introduce the concept of “Team-of-Thoughts,” a heterogeneous MAS framework designed to optimize performance by treating diverse models as specialized tools within an orchestrator-driven paradigm.
Key Innovations
Team-of-Thoughts introduces two groundbreaking components that enhance the functionality and efficiency of Multi-Agent Systems:
- Orchestrator Calibration: This component identifies models that possess superior coordination and synthesis capabilities, allowing the system to leverage their strengths effectively.
- Agent Self-Assessment: This innovative protocol enables tool agents to evaluate their own domain-specific strengths, providing valuable insights that guide the selection of agents in real-time scenarios.
Dynamic Inference Mechanism
During inference, the orchestrator dynamically activates the most compatible agents based on their self-assessment profiles. This strategic activation maximizes capability coverage, allowing the system to perform more effectively across a range of tasks. By utilizing a diverse set of specialized models, Team-of-Thoughts significantly enhances the overall performance of the MAS.
Performance Benchmarks
To validate the effectiveness of Team-of-Thoughts, we conducted extensive testing across five mathematical reasoning and code generation benchmarks. The results were compelling:
- AIME24: Team-of-Thoughts achieved an impressive accuracy of 96.00%, showcasing its superior capability in handling complex reasoning tasks.
- LiveCodeBench: The framework also excelled in code generation tasks, attaining an accuracy of 77.91%.
These results indicate a significant improvement over traditional homogeneous role-play baselines, which recorded accuracies of only 80.00% and 65.93% for AIME24 and LiveCodeBench, respectively.
Conclusion
The Team-of-Thoughts framework represents a significant advancement in the field of Multi-Agent Systems. By harnessing the strengths of diverse models and implementing an orchestrator-driven approach, we can achieve greater efficiency and accuracy in various applications. As the demand for sophisticated AI systems continues to grow, frameworks like Team-of-Thoughts will play a crucial role in advancing the capabilities of agentic systems.
