Discover First Logit Boosting, a training-free method to reduce object hallucination in large vision-language models for improved accuracy and reliability.
Discover how controllable modality alignment bridges the modality gap in Vision-Language Models, enhancing cross-modal tasks like captioning and clustering...
Discover HIVE, a novel hierarchical pre-training method that enhances vision encoders with large language models for superior multimodal AI performance.