Stateless Decision Memory for Enterprise AI Agents
Summary: arXiv:2604.20158v1 Announce Type: new
Abstract
Enterprise deployment of long-horizon decision agents in regulated domains such as underwriting, claims adjudication, and tax examination is increasingly dominated by retrieval-augmented pipelines. This trend persists despite a decade of advancements in sophisticated stateful memory architectures. We argue that this phenomenon reflects a hidden requirement: regulated deployment is heavily reliant on four critical system properties: deterministic replay, auditable rationale, multi-tenant isolation, and statelessness for horizontal scale. Unfortunately, stateful architectures violate these properties by their very nature.
Introduction to Deterministic Projection Memory (DPM)
To address these challenges, we propose a novel approach known as Deterministic Projection Memory (DPM). This innovative framework consists of an append-only event log combined with a task-conditioned projection at the time of decision-making. Our experiments across ten regulated decision-making cases at three different memory budgets reveal that DPM matches the performance of summarization-based memory at generous budgets. However, it substantially outperforms these systems when the budget is constrained.
Performance Metrics and Results
In scenarios with a 20x compression ratio, DPM demonstrates significant improvements in factual precision and reasoning coherence. Specifically, it achieves an increase of +0.52 in factual precision (Cohen’s h=1.17, p=0.0014) and +0.53 in reasoning coherence (h=1.13, p=0.0034), based on paired permutation tests with n=10. Furthermore, DPM is observed to be 7-15 times faster when operating under constrained budgets, as it requires only one LLM call at the time of decision-making instead of N.
Determinism Study
A determinism study involving 10 replays per case at temperature zero indicates that both DPM and summarization architectures inherit a level of residual API-level nondeterminism. However, the asymmetry is primarily structural. DPM exposes only one nondeterministic call, while summarization exposes N compounding calls, leading to a more manageable audit surface.
Audit Surface Analysis
The audit surface also reflects this one-to-N pattern; DPM logs merely two LLM calls per decision, compared to the 83-97 calls logged by summarization systems on the LongHorizon-Bench. This stark contrast emphasizes the efficiency and clarity that DPM offers in regulated environments.
Conclusion and Future Work
In conclusion, we introduce TAMS, a practitioner heuristic aimed at aiding architecture selection in enterprise AI systems. Our failure analysis of stateful memory under enterprise operating conditions highlights the necessity for statelessness as a load-bearing property. This requirement elucidates the preference for less sophisticated but replayable retrieval pipelines in enterprise settings. Importantly, our findings confirm that DPM effectively demonstrates that statelessness is an attainable goal without incurring the decision-making penalties typically associated with retrieval-based methods.
References
- arXiv:2604.20158v1
- LongHorizon-Bench
- Deterministic Projection Memory (DPM)
