LLM Reasoning Is Latent, Not the Chain of Thought
Summary: arXiv:2604.15726v1 Announce Type: new
This position paper presents a compelling argument that large language model (LLM) reasoning should be conceptualized as a formation of latent-state trajectories instead of relying on the notion of a faithful surface chain-of-thought (CoT). This distinction is crucial, as various claims regarding faithfulness, interpretability, reasoning benchmarks, and inference-time intervention are directly influenced by the primary object of reasoning that the field chooses to focus on.
Key Arguments
To facilitate a deeper understanding of LLM reasoning, the authors propose to disentangle three frequently conflated factors. They formalize three competing hypotheses:
- H1: Reasoning is primarily mediated by latent-state trajectories.
- H2: Reasoning is primarily mediated by explicit surface chain-of-thought.
- H0: Most apparent reasoning gains can be attributed to generic serial computation rather than a privileged representational object.
Empirical Evidence
By reorganizing recent empirical data, mechanistic studies, and survey results within this framework, the authors present a nuanced analysis of the current understanding of LLM reasoning. They emphasize that the existing evidence most robustly supports H1 as a default working hypothesis, rather than as a verdict applicable across all tasks. This suggests that the dynamics of latent states play a significant role in LLM reasoning capabilities.
Recommendations for Future Research
Based on their findings, the authors propose two key recommendations for future research in the field:
- The field should prioritize the study of latent-state dynamics as the default focus when investigating LLM reasoning.
- Reasoning evaluations should be designed to explicitly disentangle surface traces, latent states, and serial computation, allowing for a clearer understanding of each component’s impact on reasoning performance.
Conclusion
The insights provided by this position paper challenge traditional notions of reasoning in large language models. By advocating for a shift in focus towards latent-state trajectory formation, the authors invite researchers to explore new dimensions of LLM reasoning, potentially leading to enhanced interpretability and more effective interventions during inference. This reorientation could pave the way for significant advancements in the development and application of AI language models, ultimately contributing to their reliability and transparency in diverse contexts.
