AgentHER: Boost LLM Performance with Trajectory Relabeling

Date:

AgentHER: Revolutionizing LLM Agent Trajectory Relabeling

In a groundbreaking development in the field of artificial intelligence, researchers have introduced a novel framework called AgentHER, which adapts the Hindsight Experience Replay (HER) principle to improve the performance of Large Language Model (LLM) agents. The research, documented in arXiv:2603.21357v3, highlights the significant shortcomings of current LLM agents, particularly in real-world task execution. For instance, it has been reported that GPT-4o successfully completes less than 15% of WebArena navigation tasks and achieves a pass rate of below 55% on ToolBench (Zhou et al., 2024; Qin et al., 2024).

AgentHER aims to address the critical issue of discarding failed trajectories, which represent a considerable source of valuable experience. By recovering and relabeling these trajectories, AgentHER transforms them into useful training data, thereby enhancing the effectiveness of LLM agents.

The Four-Stage Pipeline of AgentHER

AgentHER employs a systematic four-stage pipeline designed to maximize the utility of failed trajectories:

  • Failure Classification: This initial stage involves identifying and categorizing the nature of the failure in the agent’s trajectory.
  • Outcome Extraction: Here, the framework extracts potential alternative goals that could have been achieved, which are viable substitutes for the original objectives.
  • LLM-Guided Prompt Relabeling: Utilizing a confidence gating mechanism, this stage employs LLMs to relabel the failed trajectories based on the insights gained from the previous stages.
  • Data Packaging: The final stage involves converting the relabeled trajectories into high-quality training datasets suitable for various training methodologies, including Supervised Fine-Tuning (SFT), Data Programming Optimization (DPO), and ShareGPT.

Performance Improvements and Data Efficiency

AgentHER has demonstrated impressive results across multiple model families, including GPT-4o, Qwen2.5-72B/7B, and LLaMA-3.1-8B. In experiments conducted on both WebArena and ToolBench, the framework has shown improvements of 7.1 to 11.7 percentage points over traditional success-only SFT methods. Perhaps even more striking is its ability to achieve these gains with only 50% of successful demonstrations, effectively doubling data efficiency.

The performance enhancements are consistent across a wide range of model sizes, from 1.5 billion to 72 billion parameters, with observed improvements ranging from 5.8 to 9.2 percentage points. Additionally, the benefits compound significantly with iterative redeployment, yielding an extra 2.1 percentage points in performance over subsequent rounds of training.

Validation Through Human Evaluation

To validate the effectiveness of the relabeling process, a human evaluation was conducted, confirming an impressive relabeling precision of 97.7% under a multi-judge verification system. This high level of accuracy underscores the reliability of AgentHER in transforming failed trajectories into valuable training assets.

In conclusion, AgentHER represents a significant advancement in LLM training methodologies, offering a robust solution to the frequent challenges faced by agents in real-world scenarios. By leveraging the insights gained from failed trajectories, AgentHER not only enhances model performance but also promotes a more efficient use of available training data.

Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.