Learning to Retrieve from Agent Trajectories
Summary: arXiv:2604.04949v1 Announce Type: cross
Abstract
Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of large language model (LLM) powered search agents, however, retrieval is increasingly consumed by agents rather than human beings, and is embedded as a core component within multi-turn reasoning and action loops. In this setting, retrieval models trained under human-centric assumptions exhibit a fundamental mismatch with the way agents issue queries and consume results.
Introduction
In this work, we argue that retrieval models for agentic search should be trained directly from agent interaction data. We introduce learning to retrieve from agent trajectories as a new training paradigm, where supervision is derived from multi-step agent interactions. This shift in focus is essential for aligning retrieval systems with the unique interaction patterns exhibited by search agents.
Methodology
Through a systematic analysis of search agent trajectories, we identify key behavioral signals that reveal document utility. These signals include:
- Browsing actions: The actions agents take while exploring documents.
- Unbrowsed rejections: Instances where agents reject documents without browsing them.
- Post-browse reasoning traces: Insights gathered after agents have browsed documents.
Proposed Framework: LRAT
Guided by these insights, we propose LRAT, a simple yet effective framework that mines high-quality retrieval supervision from agent trajectories. This framework incorporates relevance intensity through weighted optimization, allowing it to better capture the nuances of agent interactions.
Results
Extensive experiments on both in-domain and out-of-domain deep research benchmarks demonstrate that retrievers trained with LRAT consistently improve:
- Evidence recall: The ability to retrieve relevant documents efficiently.
- End-to-end task success: The effectiveness of completing tasks using the retrieved information.
- Execution efficiency: The speed and resource usage of the retrieval process.
Conclusion
Our results highlight agent trajectories as a practical and scalable supervision source, pointing to a promising direction for retrieval in the era of agentic search. As the landscape of information retrieval continues to evolve with advancements in artificial intelligence, adapting retrieval models to better suit the needs of agents will be crucial for future developments in the field.
