Introspective Diffusion Language Models
Summary: arXiv:2604.11035v1 Announce Type: new
Abstract: Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do not. We define the introspective acceptance rate, which measures whether a model accepts its previously generated tokens. This reveals why AR training has a structural advantage: causal masking and logit shifting implicitly enforce introspective consistency.
Motivated by this observation, we introduce Introspective Diffusion Language Model (I-DLM), a paradigm that retains diffusion-style parallel decoding while inheriting the introspective consistency of AR training. I-DLM uses a novel introspective strided decoding (ISD) algorithm, which enables the model to verify previously generated tokens while advancing new ones in the same forward pass.
From a systems standpoint, we build the I-DLM inference engine on AR-inherited optimizations and further customize it with a stationary-batch scheduler. To the best of our knowledge, I-DLM is the first DLM to match the quality of its same-scale AR counterpart while outperforming prior DLMs in both model quality and practical serving efficiency across 15 benchmarks.
Key Features of I-DLM
- Introspective Consistency: Unlike traditional DLMs, I-DLM maintains introspective consistency, allowing it to agree with its own token generations.
- Introspective Strided Decoding (ISD): This innovative algorithm enables the model to check previously generated tokens while producing new output, enhancing overall coherence.
- Performance Benchmarks: I-DLM achieves a score of 69.6 on AIME-24 and 45.7 on LiveCodeBench-v6, surpassing LLaDA-2.1-mini (16B) by more than 26 and 15 points, respectively.
- High Throughput: Designed to handle large concurrency services, I-DLM delivers approximately three times the throughput compared to previous state-of-the-art DLMs.
Comparative Analysis
In comparison to autoregressive models, DLMs have struggled with quality and coherence in their text generation capabilities. The introduction of I-DLM marks a significant advancement, as it not only closes the performance gap but also enhances the efficiency of model serving. The introspective acceptance rate provides a crucial metric for understanding the generative capabilities of language models, highlighting the inherent advantages of the AR training paradigm.
Furthermore, I-DLM’s design reflects an adaptation to the increasing demands for high-performance language generation systems, making it a timely contribution to the field. Its novel approaches promise a future where diffusion models can compete on equal footing with autoregressive counterparts while offering unique advantages in parallelization and efficiency.
Conclusion
The introduction of the Introspective Diffusion Language Model (I-DLM) is a pivotal development in the realm of natural language processing. By integrating the strengths of both diffusion and autoregressive models, I-DLM not only enhances generative quality but also meets the growing needs for efficient language model deployment in practical applications. The advancements presented in this research pave the way for further innovations in language modeling and artificial intelligence.
