DenseSwinV2: Channel Attentive Dual Branch CNN Transformer Learning for Cassava Leaf Disease Classification
Summary: arXiv:2603.25935v1 Announce Type: cross
Abstract: This work presents a new Hybrid Dense SwinV2, a two-branch framework that jointly leverages densely connected convolutional features and hierarchical customized Swin Transformer V2 (SwinV2) representations for cassava disease classification. The proposed framework captures high resolution local features through its DenseNet branch, preserving the fine structural cues and also allowing for effective gradient flow. Concurrently, the customized SwinV2 models global contextual dependencies through the idea of shifted-window self attention, which enables the capture of long range interactions critical in distinguishing between visually similar lesions.
Moreover, an attention channel-squeeze module is employed for each CNN Transformer stream independently to emphasize discriminative disease related responses and suppress redundant or background driven activations. Finally, these discriminative channels are fused to achieve refined representations from the dense local and SwinV2 global correlated strengthened feature maps, respectively.
Key Features of Dense SwinV2
- Hybrid Architecture: Combines DenseNet and SwinV2 for enhanced feature extraction.
- Attention Mechanism: Utilizes channel-squeeze modules to focus on relevant disease features.
- High Classification Accuracy: Achieved 98.02% accuracy and an F1 score of 97.81% on a public dataset.
- Robustness: Designed to handle real-world challenges such as occlusion and complex backgrounds.
Dataset and Performance
The proposed Dense SwinV2 utilized a public cassava leaf disease dataset comprising 31,000 images, which included five distinct disease categories: brown streak, mosaic, green mottle, bacterial blight, and normal leaf conditions. This extensive dataset allowed for comprehensive training and validation of the model.
In comparative analyses, the Dense SwinV2 framework significantly outperformed well-established convolutional and transformer models, showcasing its effectiveness in a practical application. The high accuracy and F1 score indicate that this model not only performs well on standard metrics but also maintains reliability when faced with the variability present in real-world scenarios.
Conclusion
The Hybrid Dense SwinV2 framework marks a substantial advancement in the field of cassava leaf disease classification. By effectively integrating local and global features through its dual-branch architecture, the model provides a powerful tool for agricultural diagnostics. The results underscore its potential to enhance field-level diagnosis and address challenges associated with noise and complex image backgrounds.
As precision agriculture continues to evolve, innovations like Dense SwinV2 are essential in developing robust solutions for crop disease management, ultimately aiding farmers in optimizing yield and sustaining food security.
