CWCD: Category-Wise Contrastive Decoding for Structured Medical Report Generation
Summary: arXiv:2604.10410v1 Announce Type: new
Abstract
Interpreting chest X-rays is inherently challenging due to the overlap between anatomical structures and the subtle presentation of many clinically significant pathologies. This complexity makes accurate diagnosis time-consuming, even for experienced radiologists. Recent advancements in radiology-focused foundation models, including LLaVA-Rad and Maira-2, have positioned multi-modal large language models (MLLMs) at the forefront of automated radiology report generation (RRG). However, despite these strides, the current generation of foundation models employs a single forward pass for report generation. This approach reduces the attention given to visual tokens and increases reliance on language priors as the generation process continues, which can lead to the introduction of spurious pathology co-occurrences in the final reports.
Introduction of CWCD
To address these limitations, we introduce Category-Wise Contrastive Decoding (CWCD), a novel and modular framework aimed at enhancing structured radiology report generation (SRRG). Our approach leverages category-specific parameterization and generates reports categorized by contrasting normal X-rays with masked X-rays, facilitated by category-specific visual prompts.
Methodology
The CWCD framework is designed to refine the report generation process by focusing attention on relevant visual information while maintaining structural integrity in the output reports. The key components of our methodology include:
- Category-Specific Parameterization: Each category of pathology is addressed with tailored parameters to optimize report generation.
- Contrastive Learning: Normal X-rays are juxtaposed with masked versions to highlight significant features and improve diagnostic accuracy.
- Visual Prompts: Category-specific prompts guide the model in understanding which features are most relevant for generating accurate reports.
Experimental Results
Our experimental evaluations demonstrate that CWCD consistently outperforms baseline methods across various clinical efficacy and natural language generation metrics. The improvements noted include:
- Enhanced accuracy in pathology identification.
- Reduction in the occurrence of spurious co-occurrences in generated reports.
- Higher satisfaction ratings from radiologists reviewing generated reports.
Ablation Studies
We conducted an ablation study to further elucidate the contribution of each architectural component within the CWCD framework. The findings indicate that the category-specific parameterization and contrastive learning elements significantly boost overall performance, underscoring the efficacy of our innovative approach.
Conclusion
The introduction of CWCD marks a significant step forward in the realm of automated radiology report generation. By addressing the limitations of existing models and enhancing the focus on visual information, our framework not only improves diagnostic accuracy but also streamlines the reporting process for radiologists. As the demand for efficient and precise medical reporting continues to grow, CWCD provides a promising solution that can enhance patient care and diagnostic workflows.
