OSCAR: Orchestrated Self-verification and Cross-path Refinement
Summary: arXiv:2604.01624v2 Announce Type: replace
Abstract
Diffusion language models (DLMs) expose their denoising trajectories, offering a natural handle for inference-time control; accordingly, an ideal hallucination mitigation framework should intervene during generation using this model-native signal rather than relying on an externally trained hallucination classifier.
Toward this, we formulate commitment uncertainty localization: given a denoising trajectory, identify token positions whose cross-chain entropy exceeds an unsupervised threshold before factually unreliable commitments propagate into self-consistent but incorrect outputs.
Introduction to OSCAR
We introduce OSCAR, a training-free inference-time framework operationalizing the commitment uncertainty localization. OSCAR runs N parallel denoising chains with randomized reveal orders, computes cross-chain Shannon entropy to detect high-uncertainty positions, and then performs targeted remasking conditioned on retrieved evidence.
Methodology
The framework employs a series of trajectory-level assessments, including a cross-chain divergence-at-hallucination (CDH) metric, for principled comparison of localization methods. The approach involves the following key steps:
- Parallel Denoising Chains: OSCAR operates multiple chains simultaneously to enhance the robustness of the inference process.
- Randomized Reveal Orders: The randomization of token reveal orders helps to mitigate biases and ensures a more reliable assessment of uncertainty.
- Cross-chain Shannon Entropy: This metric is utilized to identify token positions with high uncertainty, allowing for targeted interventions.
- Targeted Remasking: Once high-uncertainty positions are detected, OSCAR applies remasking strategies to correct potential hallucinations.
Results
Ablation studies confirm that both localization and correction strategies contribute complementary gains, showing robustness across various configurations of N in {4, 8, 16}. The application of OSCAR on multiple datasets, including:
- TriviaQA
- HotpotQA
- RAGTruth
- CommonsenseQA
using models such as LLaDA-8B and Dream-7B, demonstrates significant enhancements in generation quality. OSCAR effectively reduces hallucinated content and improves factual accuracy through its uncertainty-guided remasking approach, facilitating a more effective integration of retrieved evidence.
Conclusion
The native entropy-based uncertainty signal of OSCAR surpasses that of specialized trained detectors, emphasizing the inherent capacity of diffusion language models to identify factual uncertainty. This is a significant advancement over the sequential token commitment structure typically found in autoregressive models, suggesting that DLMs possess unique advantages in managing factual accuracy during text generation.
