Large-Scale Universal Defect Generation: Foundation Models and Datasets
Summary: arXiv:2604.08915v1 Announce Type: cross
Abstract: Existing defect/anomaly generation methods often rely on few-shot learning, which overfits to specific defect categories due to the lack of large-scale paired defect editing data. This issue is aggravated by substantial variations in defect scale and morphology, resulting in limited generalization, degraded realism, and category consistency. We address these challenges by introducing UDG, a large-scale dataset of 300K normal-abnormal-mask-caption quadruplets spanning diverse domains, and by presenting UniDG, a universal defect generation foundation model that supports both reference-based defect generation and text instruction-based defect editing without per-category fine-tuning.
Introduction
The field of defect generation has traditionally struggled with the limitations of few-shot learning techniques. These methods tend to focus on specific defect categories, which can lead to overfitting and hinder the overall effectiveness of anomaly detection systems. The lack of large-scale, high-quality datasets for defect editing exacerbates this issue, causing variability in defect scale and morphology.
Introducing UDG and UniDG
To tackle these challenges, researchers have developed a new dataset known as UDG, which consists of 300,000 quadruplets of normal-abnormal-mask-caption pairs. This extensive dataset spans a variety of domains, thereby providing a robust foundation for training defect generation models.
Alongside UDG, the team has introduced UniDG, a universal defect generation foundation model that allows for:
- Reference-based defect generation
- Text instruction-based defect editing
Importantly, UniDG does not require per-category fine-tuning, which is a significant advancement in the field.
Innovative Features of UniDG
UniDG employs several innovative techniques to enhance its performance:
- Defect-Context Editing: This feature utilizes adaptive defect cropping and a structured diptych input format to improve the quality of generated defects.
- Multimodal Attention: The model integrates reference and target conditions using MM-DiT multimodal attention, allowing for more coherent and contextually relevant defect generation.
- Two-Stage Training Strategy: A dual training approach, consisting of Diversity-SFT followed by Consistency-RFT, is implemented to increase diversity while enhancing realism and reference consistency.
Performance Evaluation
Extensive experiments were conducted on datasets such as MVTec-AD and VisA to evaluate the performance of UniDG. The results indicate that UniDG significantly outperforms prior few-shot anomaly generation methods as well as existing image insertion and editing baselines. Key performance metrics include:
- Improved synthesis quality
- Enhanced single-class and multi-class anomaly detection
- Effective localization of defects
Conclusion
In conclusion, the introduction of UDG and UniDG marks a significant advancement in the field of defect generation and anomaly detection. By providing a large-scale dataset and a versatile foundation model, this research paves the way for more robust and effective applications in various domains. Researchers and practitioners can access the code for UniDG at GitHub.
