Mind DeepResearch Technical Report
Summary: arXiv:2604.14518v2 Announce Type: replace
Abstract: We present Mind DeepResearch (MindDR), an efficient multi-agent deep research framework that achieves leading performance with only ~30B-parameter models through a meticulously designed data synthesis and multi-stage training pipeline. The core innovation of MindDR lies in a collaborative three-agent architecture and a four-stage agent-specialized training pipeline comprising SFT cold-start, Search-RL, Report-RL and preference alignment.
Key Innovations of MindDR
The central features of Mind DeepResearch include:
- Three-Agent Architecture: MindDR employs a collaborative framework consisting of three specialized agents:
- Planning Agent: Responsible for strategizing the research approach.
- DeepSearch Agent: Focuses on gathering and analyzing data effectively.
- Report Agent: Compiles and presents findings in a coherent manner.
- Four-Stage Training Pipeline: MindDR utilizes a comprehensive training regime that includes:
- SFT Cold-Start: Initial training phase to bootstrap the model’s capabilities.
- Search-RL: Reinforcement learning focused on optimizing search strategies.
- Report-RL: Reinforcement learning aimed at enhancing report generation.
- Preference Alignment: Adjusting the model to align with user preferences and feedback.
Performance Metrics
MindDR has demonstrated competitive performance metrics, even with its ~30B parameter scale. The following are the results achieved on various benchmarks:
- BrowseComp-ZH: 45.7%
- BrowseComp: 42.8%
- WideSearch: 46.5%
- xbench-DS: 75.0%
- DeepResearch Bench: 52.5%
These results indicate that MindDR not only outperforms comparable-scale open-source agent systems but also rivals larger-scale models, showcasing its efficiency and effectiveness.
Real-World Deployment
MindDR has been successfully deployed as an online product in Li Auto, demonstrating its practical applicability and effectiveness in real-world scenarios.
MindDR Bench: A Curated Benchmark
To further validate its capabilities, we introduce MindDR Bench, a curated benchmark consisting of 500 real-world Chinese queries derived from our internal product user interactions. This benchmark is evaluated using a comprehensive multi-dimensional rubric system, moving beyond a single RACE metric.
On MindDR Bench, MindDR achieves a state-of-the-art score of 51.8, underscoring its superior performance and adaptability in handling complex queries.
Conclusion
Mind DeepResearch represents a significant advancement in the field of multi-agent deep research frameworks. With its innovative architecture and efficient training processes, MindDR not only achieves remarkable performance but also sets a new standard for future research initiatives.
