TRACE: Capability-Targeted Agentic Training
Summary: arXiv:2604.05336v1 Announce Type: new
Abstract: Large Language Models (LLMs) deployed in agentic environments must exercise multiple capabilities across different task instances. A capability is defined as performing one or more actions in a trajectory that are essential for successfully solving a subset of tasks in the environment. Many existing approaches either rely on synthetic training data that does not target the model’s actual capability deficits in the target environment or train directly on the target environment, requiring the model to implicitly learn the capabilities across tasks.
Introduction to TRACE
We introduce TRACE (Turning Recurrent Agent failures into Capability-targeted training Environments), an end-to-end system designed for environment-specific agent self-improvement. TRACE effectively contrasts successful and failed trajectories to automatically identify lacking capabilities. This innovative approach synthesizes a targeted training environment for each identified capability, rewarding the model when it exercises the relevant capability during training.
Methodology
TRACE employs a systematic process to enhance the learning capabilities of LLMs in agentic environments:
- Trajectory Analysis: The system analyzes both successful and failed trajectories to pinpoint specific capabilities that need improvement.
- Targeted Environment Synthesis: For each identified capability deficit, TRACE synthesizes a training environment that focuses on that capability.
- Reinforcement Learning Training: A LoRA (Low-Rank Adaptation) adapter is trained via reinforcement learning on each synthetic environment, ensuring that the model can adapt swiftly to the relevant capabilities.
- Inference Routing: During inference, TRACE routes to the appropriate LoRA adapter based on the task at hand, allowing for dynamic capability deployment.
Empirical Results
Empirical evaluations demonstrate that TRACE generalizes effectively across various environments. The system shows substantial improvements over the base agent performance, achieving:
- A +14.1 point increase on the $\tau^2$-bench in customer service tasks.
- A +7 perfect score improvement on ToolSandbox, which focuses on tool use.
Additionally, TRACE outperformed the strongest baseline models by +7.4 points on $\tau^2$-bench and +4 perfect scores on ToolSandbox, showcasing its superior capability-targeted training approach.
Efficiency and Scalability
Beyond performance improvements, TRACE demonstrates enhanced efficiency in scaling compared to baseline methods. When subjected to the same number of rollouts, TRACE consistently outperforms other models, including GRPO and GEPA, by +9.2 and +7.4 points on the $\tau^2$-bench, respectively. This aspect of TRACE indicates not only the effectiveness of its training strategy but also its potential for application in resource-constrained environments.
Conclusion
TRACE represents a significant advancement in capability-targeted training for LLMs in agentic environments. By focusing on specific capabilities and utilizing tailored training environments, TRACE not only improves performance but also enhances the efficiency of the training process. As AI continues to evolve, systems like TRACE will be crucial for developing more capable and adaptive agents.
