PRAISE: Efficient Prefix-Based Rollout in Agentic Search

PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training

Summary: arXiv:2604.03675v1 Announce Type: new

Abstract: In agentic search, large language models (LLMs) are trained to perform multi-turn retrieval and reasoning for complex tasks such as multi-hop question answering (QA). However, current search-based Reinforcement Learning (RL) methods suffer from two core limitations: expensive long-horizon rollouts are under-utilized during training, and supervision is typically available only at the final answer, resulting in severe reward sparsity.

We present Prefix-based Rollout reuse for Agentic search with Intermediate Step rEwards (PRAISE), a framework for improving both data efficiency and credit assignment in agentic search training. Given a complete search trajectory, PRAISE extracts prefix states at different search turns, elicits intermediate answers from them, and uses these prefixes both to construct additional training trajectories and to derive step-level rewards from performance differences across prefixes.

Our method uses a single shared model for both search policy learning and prefix answer evaluation, enabling joint optimization without extra human annotations or a separate reward model. Experiments on multi-hop QA benchmarks show that PRAISE consistently improves performance over strong baselines.

Introduction

Agentic search has emerged as a crucial area in the realm of artificial intelligence, particularly in enhancing the capabilities of large language models (LLMs) to tackle complex tasks. Despite their impressive abilities, existing methods in this area face significant challenges. In particular, the dependency on long-horizon rollouts presents a barrier to efficient training.

Challenges in Current Methods

The primary limitations identified in current search-based reinforcement learning approaches include:

Under-utilization of Long-Horizon Rollouts: The expensive nature of long-horizon rollouts means that they are often not fully leveraged, leading to inefficiencies in training.
Severe Reward Sparsity: With supervision typically being available only at the final answer, the training process suffers from sparse rewards, complicating the learning process.

The PRAISE Framework

PRAISE offers a novel solution to these challenges by focusing on prefix-based rollout reuse. Key features of the framework include:

Extraction of Prefix States: By identifying and extracting prefix states at various points in the search process, PRAISE can utilize these states to inform training.
Intermediate Answers: The framework generates intermediate answers from these prefixes, enhancing the training data available.
Step-Level Rewards: By analyzing performance differences across prefixes, PRAISE generates step-level rewards that facilitate better credit assignment during training.
Joint Optimization: The use of a single shared model allows for simultaneous optimization of the search policy learning and prefix answer evaluation.

Experimental Results

In comprehensive experiments conducted on multi-hop QA benchmarks, PRAISE demonstrated a consistent improvement in performance over established strong baselines. These results underscore the effectiveness of the proposed framework in enhancing data efficiency and optimizing the training process.

Conclusion

PRAISE represents a significant advancement in the field of agentic search, addressing critical limitations in current methodologies. By improving data efficiency and facilitating better reward assignment, this framework has the potential to enhance the capabilities of large language models in tackling complex multi-turn tasks.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

PRAISE: Efficient Prefix-Based Rollout in Agentic Search

PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training

Introduction

Challenges in Current Methods

The PRAISE Framework

Experimental Results

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related