APEX-EM: Advanced Online Learning for Autonomous Agents

Date:

APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay

Recent advancements in large language models (LLMs) for autonomous agents have highlighted a significant challenge: the lack of persistent procedural memory. These models often re-derive solutions from scratch, even for tasks that have been structurally identical and previously solved. To address this limitation, we introduce APEX-EM, a novel non-parametric online learning framework that aims to accumulate, retrieve, and reuse structured procedural plans without altering model weights.

Key Features of APEX-EM

APEX-EM is built on several groundbreaking concepts that enhance the efficiency and effectiveness of task execution in autonomous agents:

  • Structured Experience Representation: This feature encodes the complete procedural-episodic trace of each execution. It includes planning steps, artifacts, iteration history with error analysis, and quality scores, which provide a comprehensive overview of the agent’s decision-making process.
  • Plan-Retrieve-Generate-Iterate-Ingest (PRGII) Workflow: This innovative workflow integrates Task Verifiers that supply multi-dimensional reward signals, enabling agents to learn from both successes and failures effectively.
  • Dual-outcome Experience Memory: APEX-EM employs a hybrid retrieval mechanism that combines semantic search, structural signature matching, and plan Directed Acyclic Graph (DAG) traversal. This allows for cross-domain transfer between tasks that may not share lexical overlap but possess analogous operational structures.

Evaluation and Results

The performance of APEX-EM has been rigorously evaluated using three prominent benchmarks: BigCodeBench, KGQAGen-10k, and Humanity’s Last Exam. The evaluations utilized Claude Sonnet 4.5 and Opus 4.5 as the underlying models.

  • KGQAGen-10k: APEX-EM achieved an impressive 89.6% accuracy, a significant improvement over the 41.3% accuracy recorded without memory, surpassing the oracle-retrieval upper bound of 84.9% by a notable margin of +48.3 percentage points.
  • BigCodeBench: The framework reached 83.3% success rate (SR), a substantial increase from the baseline of 53.9%, translating to a +29.4 percentage point gain. This performance also exceeded the +11.0 percentage point gain reported by MemRL under comparable conditions.
  • Humanity’s Last Exam: Here, entity graph retrieval improved from 25.2% to 48.0%, marking an enhancement of +22.8 percentage points.

Component Value and Task Dependency

Ablation studies indicate that the value of components within APEX-EM is task-dependent. For instance, while rich judge feedback is crucial for structured queries, it proves negligible for code generation tasks. Additionally, binary-signal iteration can partially compensate for weaker feedback, highlighting the framework’s adaptability in various scenarios.

Conclusion

APEX-EM represents a significant step forward in the development of autonomous agents, providing a robust framework for learning and memory that enhances performance across diverse tasks. By leveraging structured procedural plans and innovative retrieval mechanisms, APEX-EM not only improves accuracy but also paves the way for more intelligent and adaptable AI systems.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.