Where does output diversity collapse in post-training?
Summary: arXiv:2604.16027v1 Announce Type: cross
Abstract: Post-trained language models produce less varied outputs than their base counterparts. This output diversity collapse undermines inference-time scaling methods that rely on varied samples, and risks homogenizing model outputs on creative and value-laden tasks.
Introduction
Recent advancements in artificial intelligence, particularly in natural language processing, have led to significant improvements in the performance of language models. However, a concerning phenomenon known as output diversity collapse has emerged, where post-trained language models exhibit less varied outputs compared to their original versions. This decline in diversity poses challenges for tasks that require creativity and nuanced responses, such as storytelling or ethical reasoning.
Understanding Output Diversity Collapse
Prior research has linked this collapse to specific post-training methods. However, a comprehensive analysis has yet to disentangle the influence of training data composition and generation format from the model weights themselves. This study investigates three distinct post-training lineages of the Olmo 3 model: Think (chain-of-thought distillation), Instruct (broad multi-source data), and RL-Zero.
Methodology
We traced output diversity across 15 different tasks and examined four text diversity metrics. Our findings reveal that the collapse of output diversity is not uniform across different models. Instead, it varies significantly based on the lineage and the specific training data composition.
Key Findings
- The Think lineage experienced a significant loss of semantic diversity during supervised fine-tuning.
- The impact of Direct Preference Optimization (DPO) was more pronounced in the Instruct lineage compared to Think.
- Suppressing chain-of-thought reasoning at inference in Think models resulted in a drop in accuracy for complex tasks, yet did not affect answer-level diversity.
- Diversity collapse appears to be embedded within the model weights, primarily influenced by the training data rather than the generation format.
Diversity Loss Components
We decomposed the loss of diversity into two components: a quality-control component, which involves the removal of incorrect outputs, and a residual component reflecting genuine narrowing among correct outputs. Our analysis showed that:
- The balance between these components is task-dependent.
- Think models maintained a higher level of correct-answer diversity than Instruct models, despite an overall greater collapse in the Think lineage.
Conclusion
The research indicates that the phenomenon of diversity collapse is primarily determined during the training phase, specifically by the composition of the training data. Consequently, addressing this issue cannot rely solely on inference-time adjustments; a more holistic approach to model training and data curation is required to preserve output diversity in post-trained language models.
As AI continues to evolve, understanding and mitigating output diversity collapse will be crucial for enhancing the effectiveness of language models in creative and complex tasks.
