Search, Do not Guess: Teaching Small Language Models to Be Effective Search Agents
Summary: arXiv:2604.04651v1 Announce Type: new
Abstract: Agents equipped with search tools have emerged as effective solutions for knowledge-intensive tasks. While Large Language Models (LLMs) exhibit strong reasoning capabilities, their high computational cost limits practical deployment for search agents. Consequently, recent work has focused on distilling agentic behaviors from LLMs into Small Language Models (SLMs).
Through comprehensive evaluation on complex multi-hop reasoning tasks, we find that despite possessing less parametric knowledge, SLMs invoke search tools less frequently and are more prone to hallucinations. To address this issue, we propose \policy, a lightweight fine-tuning approach that explicitly trains SLMs to reliably retrieve and generate answers grounded in retrieved evidence. Compared to agent distillation from LLMs, our approach improves performance by 17.3 scores on Bamboogle and 15.3 scores on HotpotQA, achieving LLM-level results across benchmarks. Our further analysis reveals that adaptive search strategies in SLMs often degrade performance, highlighting the necessity of consistent search behavior for reliable reasoning.
Introduction
The advent of search agents powered by AI has transformed the landscape of knowledge acquisition and problem-solving. While LLMs have set the gold standard with their impressive reasoning abilities, their deployment remains hindered by computational demands. In contrast, SLMs offer a solution due to their smaller size and reduced resource requirements.
Challenges with Small Language Models
Despite the advantages of SLMs, they encounter significant hurdles, including:
- Frequent Hallucinations: SLMs are more susceptible to generating incorrect or nonsensical information due to limited parametric knowledge.
- Infrequent Tool Invocation: These models tend to utilize search tools less often, resulting in reliance on internal knowledge that may be outdated or inaccurate.
- Performance Variability: The adaptive search strategies employed by SLMs can lead to inconsistent outcomes, further complicating their effectiveness in reasoning tasks.
The Proposed Solution: \policy
To mitigate these challenges, the proposed fine-tuning approach, \policy, is designed to enhance the reliability of SLMs. This method aims to:
- Explicitly train models to retrieve relevant information before generating answers.
- Improve the consistency of search behavior, leading to more reliable reasoning outcomes.
- Bridge the performance gap between SLMs and LLMs, allowing SLMs to achieve results on par with their larger counterparts.
Results and Implications
The implementation of \policy has yielded impressive results, with significant performance improvements noted in benchmark tasks:
- Bamboogle: An increase of 17.3 scores.
- HotpotQA: An increase of 15.3 scores.
These advancements suggest that SLMs, when properly fine-tuned, can serve as effective search agents capable of performing at LLM levels without the associated computational burden.
Conclusion
The findings underscore the potential of SLMs in knowledge-intensive tasks. By focusing on consistent search behavior and effective evidence retrieval, the \policy approach paves the way for future developments in AI-driven search agents. As the field progresses, these insights will be crucial for optimizing the balance between model capability and computational efficiency.
