HiMA-Ecom: Enabling Joint Training of Hierarchical Multi-Agent E-commerce Assistants
The field of artificial intelligence is rapidly evolving, particularly in the realm of hierarchical multi-agent systems. Recent advancements have brought forth the potential for using large language models (LLMs) to create sophisticated AI assistants specifically designed for vertical domains like e-commerce. However, the lack of realistic benchmarks for training and evaluating these systems poses significant challenges. In response to this pressing need, researchers have introduced HiMA-Ecom, a novel benchmark aimed at facilitating the joint training of specialized agents in e-commerce scenarios.
Overview of HiMA-Ecom
HiMA-Ecom represents a groundbreaking effort to develop a hierarchical multi-agent benchmark tailored exclusively for e-commerce applications. This benchmark comprises a substantial dataset of 22.8K instances, which includes:
- Agent-specific supervised fine-tuning samples
- Memory and system-level input-output pairs
- Data for joint multi-agent reinforcement learning
Introducing HiMA-R1
Alongside the HiMA-Ecom benchmark, the researchers have proposed a joint training method known as HiMA-R1. This innovative approach utilizes Variance-Reduction Group Relative Policy Optimization (VR-GRPO) to tackle the complexities associated with joint action spaces. The key features of HiMA-R1 include:
- Initial Trajectory-based Monte Carlo Sampling: This technique is employed to alleviate the challenges posed by the exponential joint action space, allowing for more efficient training processes.
- Informative Agent Group Selection: The method selects specific groups of agents for updates based on reward variance, enhancing the training efficiency.
- Adaptive Memory Evolution Mechanism: This mechanism repurposes GRPO rewards as cost-free supervisory signals, effectively reducing repetitive reasoning and significantly accelerating convergence.
Experimental Results
Experiments conducted on the HiMA-Ecom benchmark have yielded promising results. The HiMA-R1 method, which is built upon smaller open-source models with 3B and 7B parameters, demonstrates performance levels comparable to those of larger language models, such as DeepSeek-R1. Notably, HiMA-R1 outperforms DeepSeek-V3 by an impressive average margin of 6%.
Conclusion
The introduction of HiMA-Ecom and the accompanying HiMA-R1 training method marks a significant advancement in the field of hierarchical multi-agent systems. By providing a robust benchmark and effective training strategies, researchers are paving the way for the development of more efficient and capable AI assistants in e-commerce. As the demand for sophisticated AI solutions continues to grow, innovations like HiMA-Ecom will play a crucial role in shaping the future of AI technology.
