Bottlenecked Transformers: Boost Reasoning with KV Cache

Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning

Summary: arXiv:2505.16950v4 Announce Type: replace-cross

Abstract: Transformer LLMs have been shown to exhibit strong reasoning ability that scales with inference-time compute, most prominently through token-space “thinking” chains of thought. A growing line of work pushes extra computation into the model’s latent space, which we term Auxiliary Latent-Space Computation (ALSC).

Existing ALSC methods largely fall into three buckets:

Token-mediated latent rollouts
Residual/activation steering
Memory (KV) compression

An underexplored alternative is memory consolidation/reconsolidation, two processes in the brain that are responsible for stabilising newly formed memory traces, and, upon recall, transiently rendering established traces plastic such that they can integrate new contextual information before restabilising.

In Transformer LLMs, this can be seen as analogous to performing in-place rewrites of new KV segments, and rewrites of recalled past segments. In this work, we give a theoretical justification as to why memory (re)consolidation via KV cache rewrites is beneficial for improved reasoning. We do this through the lens of Information Bottleneck (IB) theory, which posits that model generalisation emerges from an optimal balance between input information compression and retention of predictive information in latent representations.

We then introduce the Bottlenecked Transformer, which augments a backbone LLM with a Cache Processor, an auxiliary Transformer that performs periodic, non-causal, in-place KV rewrites at newline-delimited reasoning step boundaries. The Processor consolidates recently written KV entries and reconsolidates a small, top-k attention-selected set of prior entries.

We evaluate our Bottlenecked Transformer architecture on math reasoning benchmarks. Our model sees consistent performance gains over vanilla Transformers and pause-token augmented baselines, with gains of up to +6.6pp for selected tasks/backbones.

Key Insights

The Bottlenecked Transformer introduces a novel approach to memory management in LLMs.
Periodic KV cache consolidation enhances the reasoning capabilities of Transformer models.
Information Bottleneck theory provides a foundational framework for understanding the benefits of memory consolidation.

Conclusion

The development of the Bottlenecked Transformer marks a significant step forward in the design of Transformer architectures, particularly in enhancing their reasoning capabilities. By leveraging principles from neuroscience and information theory, this approach opens new avenues for research and application in the field of artificial intelligence.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Bottlenecked Transformers: Boost Reasoning with KV Cache

Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning

Key Insights

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related