APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay
Recent advancements in large language models (LLMs) for autonomous agents have highlighted a significant challenge: the lack of persistent procedural memory. These models often re-derive solutions from scratch, even for tasks that have been structurally identical and previously solved. To address this limitation, we introduce APEX-EM, a novel non-parametric online learning framework that aims to accumulate, retrieve, and reuse structured procedural plans without altering model weights.
Key Features of APEX-EM
APEX-EM is built on several groundbreaking concepts that enhance the efficiency and effectiveness of task execution in autonomous agents:
- Structured Experience Representation: This feature encodes the complete procedural-episodic trace of each execution. It includes planning steps, artifacts, iteration history with error analysis, and quality scores, which provide a comprehensive overview of the agent’s decision-making process.
- Plan-Retrieve-Generate-Iterate-Ingest (PRGII) Workflow: This innovative workflow integrates Task Verifiers that supply multi-dimensional reward signals, enabling agents to learn from both successes and failures effectively.
- Dual-outcome Experience Memory: APEX-EM employs a hybrid retrieval mechanism that combines semantic search, structural signature matching, and plan Directed Acyclic Graph (DAG) traversal. This allows for cross-domain transfer between tasks that may not share lexical overlap but possess analogous operational structures.
Evaluation and Results
The performance of APEX-EM has been rigorously evaluated using three prominent benchmarks: BigCodeBench, KGQAGen-10k, and Humanity’s Last Exam. The evaluations utilized Claude Sonnet 4.5 and Opus 4.5 as the underlying models.
- KGQAGen-10k: APEX-EM achieved an impressive 89.6% accuracy, a significant improvement over the 41.3% accuracy recorded without memory, surpassing the oracle-retrieval upper bound of 84.9% by a notable margin of +48.3 percentage points.
- BigCodeBench: The framework reached 83.3% success rate (SR), a substantial increase from the baseline of 53.9%, translating to a +29.4 percentage point gain. This performance also exceeded the +11.0 percentage point gain reported by MemRL under comparable conditions.
- Humanity’s Last Exam: Here, entity graph retrieval improved from 25.2% to 48.0%, marking an enhancement of +22.8 percentage points.
Component Value and Task Dependency
Ablation studies indicate that the value of components within APEX-EM is task-dependent. For instance, while rich judge feedback is crucial for structured queries, it proves negligible for code generation tasks. Additionally, binary-signal iteration can partially compensate for weaker feedback, highlighting the framework’s adaptability in various scenarios.
Conclusion
APEX-EM represents a significant step forward in the development of autonomous agents, providing a robust framework for learning and memory that enhances performance across diverse tasks. By leveraging structured procedural plans and innovative retrieval mechanisms, APEX-EM not only improves accuracy but also paves the way for more intelligent and adaptable AI systems.
