SkinGPT-X: A Self-Evolving Collaborative Multi-Agent System for Transparent and Trustworthy Dermatological Diagnosis
Summary: arXiv:2603.26122v1 Announce Type: cross
Abstract: Recent advancements in Large Language Models (LLMs) have significantly improved dermatological diagnosis. However, monolithic LLMs often encounter challenges when faced with fine-grained, large-scale multi-class diagnostic tasks or rare skin disease diagnoses due to the scarcity of training data. Furthermore, these models frequently lack the interpretability and traceability necessary for effective clinical reasoning. Although multi-agent systems promise enhanced transparency and explainability in diagnostics, current frameworks mainly focus on Visual Question Answering and conversational tasks, relying heavily on static knowledge bases that limit adaptability in complex clinical settings.
In response to these challenges, we introduce SkinGPT-X, a multimodal collaborative multi-agent system designed for dermatological diagnosis, which incorporates a self-evolving dermatological memory mechanism. By simulating the diagnostic workflow of dermatologists and facilitating continuous memory evolution, SkinGPT-X aims to provide transparent and trustworthy diagnostics essential for managing complex and rare dermatological cases.
Key Features of SkinGPT-X
- Self-Evolving Memory: SkinGPT-X continuously adapts its knowledge base, allowing for real-time updates and learning from new data.
- Multimodal Collaboration: The system integrates various modalities, including images and textual data, to enhance diagnostic accuracy.
- Transparent Diagnostics: SkinGPT-X is designed to provide clear reasoning for its diagnostic decisions, improving trust and understanding among clinicians.
Validation and Performance
To validate the robustness of SkinGPT-X, we conducted a comprehensive three-tier comparative experiment:
- Benchmarking Against State-of-the-Art Models: We compared SkinGPT-X with four leading LLMs across four public datasets. Our results showed a remarkable +9.6% accuracy improvement on the DDI31 dataset and a +13% gain in weighted F1 score on Dermnet compared to the best-performing model.
- Fine-Grained Classification Evaluation: We constructed a large-scale multi-class dataset comprising 498 distinct dermatological categories to assess SkinGPT-X’s fine-grained classification capabilities.
- Rare Skin Disease Benchmark: We curated a unique dataset focused on rare skin diseases, featuring 564 clinical samples across eight different rare dermatological conditions. SkinGPT-X achieved a +9.8% accuracy improvement, +7.1% in weighted F1 score, and a +10% boost in Cohen’s Kappa compared to existing models.
Conclusion
SkinGPT-X represents a significant advancement in the field of dermatological diagnostics. By integrating a self-evolving memory mechanism and facilitating a collaborative multi-agent framework, it addresses the limitations of existing monolithic LLMs. The promising results from our comparative experiments underscore its potential to enhance the accuracy and transparency of dermatological diagnoses, particularly for complex and rare conditions. As healthcare continues to embrace AI-driven solutions, SkinGPT-X stands out as a pioneering effort toward more trustworthy and effective clinical decision-making.
