PRAISE: Efficient Prefix-Based Rollout in Agentic Search

Date:

PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training

Summary: arXiv:2604.03675v1 Announce Type: new

Abstract: In agentic search, large language models (LLMs) are trained to perform multi-turn retrieval and reasoning for complex tasks such as multi-hop question answering (QA). However, current search-based Reinforcement Learning (RL) methods suffer from two core limitations: expensive long-horizon rollouts are under-utilized during training, and supervision is typically available only at the final answer, resulting in severe reward sparsity.

We present Prefix-based Rollout reuse for Agentic search with Intermediate Step rEwards (PRAISE), a framework for improving both data efficiency and credit assignment in agentic search training. Given a complete search trajectory, PRAISE extracts prefix states at different search turns, elicits intermediate answers from them, and uses these prefixes both to construct additional training trajectories and to derive step-level rewards from performance differences across prefixes.

Our method uses a single shared model for both search policy learning and prefix answer evaluation, enabling joint optimization without extra human annotations or a separate reward model. Experiments on multi-hop QA benchmarks show that PRAISE consistently improves performance over strong baselines.

Introduction

Agentic search has emerged as a crucial area in the realm of artificial intelligence, particularly in enhancing the capabilities of large language models (LLMs) to tackle complex tasks. Despite their impressive abilities, existing methods in this area face significant challenges. In particular, the dependency on long-horizon rollouts presents a barrier to efficient training.

Challenges in Current Methods

The primary limitations identified in current search-based reinforcement learning approaches include:

  • Under-utilization of Long-Horizon Rollouts: The expensive nature of long-horizon rollouts means that they are often not fully leveraged, leading to inefficiencies in training.
  • Severe Reward Sparsity: With supervision typically being available only at the final answer, the training process suffers from sparse rewards, complicating the learning process.

The PRAISE Framework

PRAISE offers a novel solution to these challenges by focusing on prefix-based rollout reuse. Key features of the framework include:

  • Extraction of Prefix States: By identifying and extracting prefix states at various points in the search process, PRAISE can utilize these states to inform training.
  • Intermediate Answers: The framework generates intermediate answers from these prefixes, enhancing the training data available.
  • Step-Level Rewards: By analyzing performance differences across prefixes, PRAISE generates step-level rewards that facilitate better credit assignment during training.
  • Joint Optimization: The use of a single shared model allows for simultaneous optimization of the search policy learning and prefix answer evaluation.

Experimental Results

In comprehensive experiments conducted on multi-hop QA benchmarks, PRAISE demonstrated a consistent improvement in performance over established strong baselines. These results underscore the effectiveness of the proposed framework in enhancing data efficiency and optimizing the training process.

Conclusion

PRAISE represents a significant advancement in the field of agentic search, addressing critical limitations in current methodologies. By improving data efficiency and facilitating better reward assignment, this framework has the potential to enhance the capabilities of large language models in tackling complex multi-turn tasks.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.