Discover Decoding by Perturbation, a training-free method that mitigates hallucinations in multimodal large language models by dynamic textual perturbation...
Explore evidence collapse in multimodal reasoning models, its risks, and mitigation strategies to improve vision-language model reliability and safety.
Discover the TAB framework using Vision Language Models for enhanced zero-shot 3D visual grounding with multi-view geometry and dynamic 3D reconstruction.