AgentHER: Boost LLM Performance with Trajectory Relabeling

AgentHER: Revolutionizing LLM Agent Trajectory Relabeling

In a groundbreaking development in the field of artificial intelligence, researchers have introduced a novel framework called AgentHER, which adapts the Hindsight Experience Replay (HER) principle to improve the performance of Large Language Model (LLM) agents. The research, documented in arXiv:2603.21357v3, highlights the significant shortcomings of current LLM agents, particularly in real-world task execution. For instance, it has been reported that GPT-4o successfully completes less than 15% of WebArena navigation tasks and achieves a pass rate of below 55% on ToolBench (Zhou et al., 2024; Qin et al., 2024).

AgentHER aims to address the critical issue of discarding failed trajectories, which represent a considerable source of valuable experience. By recovering and relabeling these trajectories, AgentHER transforms them into useful training data, thereby enhancing the effectiveness of LLM agents.

The Four-Stage Pipeline of AgentHER

AgentHER employs a systematic four-stage pipeline designed to maximize the utility of failed trajectories:

Failure Classification: This initial stage involves identifying and categorizing the nature of the failure in the agent’s trajectory.
Outcome Extraction: Here, the framework extracts potential alternative goals that could have been achieved, which are viable substitutes for the original objectives.
LLM-Guided Prompt Relabeling: Utilizing a confidence gating mechanism, this stage employs LLMs to relabel the failed trajectories based on the insights gained from the previous stages.
Data Packaging: The final stage involves converting the relabeled trajectories into high-quality training datasets suitable for various training methodologies, including Supervised Fine-Tuning (SFT), Data Programming Optimization (DPO), and ShareGPT.

Performance Improvements and Data Efficiency

AgentHER has demonstrated impressive results across multiple model families, including GPT-4o, Qwen2.5-72B/7B, and LLaMA-3.1-8B. In experiments conducted on both WebArena and ToolBench, the framework has shown improvements of 7.1 to 11.7 percentage points over traditional success-only SFT methods. Perhaps even more striking is its ability to achieve these gains with only 50% of successful demonstrations, effectively doubling data efficiency.

The performance enhancements are consistent across a wide range of model sizes, from 1.5 billion to 72 billion parameters, with observed improvements ranging from 5.8 to 9.2 percentage points. Additionally, the benefits compound significantly with iterative redeployment, yielding an extra 2.1 percentage points in performance over subsequent rounds of training.

Validation Through Human Evaluation

To validate the effectiveness of the relabeling process, a human evaluation was conducted, confirming an impressive relabeling precision of 97.7% under a multi-judge verification system. This high level of accuracy underscores the reliability of AgentHER in transforming failed trajectories into valuable training assets.

In conclusion, AgentHER represents a significant advancement in LLM training methodologies, offering a robust solution to the frequent challenges faced by agents in real-world scenarios. By leveraging the insights gained from failed trajectories, AgentHER not only enhances model performance but also promotes a more efficient use of available training data.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

AgentHER: Boost LLM Performance with Trajectory Relabeling

AgentHER: Revolutionizing LLM Agent Trajectory Relabeling

The Four-Stage Pipeline of AgentHER

Performance Improvements and Data Efficiency

Validation Through Human Evaluation

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related