StableSketcher: AI Diffusion Model for Pixel Sketches

StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback

Summary: arXiv:2510.20093v2 Announce Type: replace-cross

Abstract

Recent advancements in diffusion models have significantly enriched the quality of generated images; however, challenges remain in synthesizing pixel-based human-drawn sketches, which serve as a representative example of abstract expression. Addressing these challenges, we propose StableSketcher, a novel framework designed to empower diffusion models to generate hand-drawn sketches with high prompt fidelity.

Key Features of StableSketcher

Within the StableSketcher framework, several critical components work in tandem to achieve improved sketch generation:

Variational Autoencoder Fine-Tuning: We fine-tune the variational autoencoder to optimize latent decoding, enhancing its ability to capture the unique characteristics of sketches.
Reinforcement Learning Integration: A new reward function for reinforcement learning, based on visual question answering, is integrated to improve text-image alignment and semantic consistency.
Enhanced Stylization: Extensive experiments reveal that StableSketcher generates sketches with improved stylistic fidelity, achieving better alignment with prompts compared to the existing Stable Diffusion baseline.

Introduction of SketchDUO

To further support the development and evaluation of sketch generation, we introduce SketchDUO, which, to the best of our knowledge, is the first dataset comprising instance-level sketches paired with captions and question-answer pairs. This innovation addresses the limitations of existing datasets that primarily rely on image-label pairs, thereby providing a more robust framework for training and evaluating sketch generation models.

Experimental Results

Through extensive experiments, the capabilities of StableSketcher have been thoroughly assessed. Our results indicate a marked improvement in the quality of generated sketches, demonstrating superior performance in terms of stylistic fidelity and alignment with the given prompts. By leveraging the visual question answering feedback mechanism, the framework ensures that the generated sketches not only retain artistic integrity but also align closely with the intended semantic meaning of the prompts.

Conclusion

In conclusion, StableSketcher represents a significant advancement in the domain of sketch generation through diffusion models. The integration of variational autoencoder fine-tuning and reinforcement learning based on visual question answering feedback has proven effective in enhancing both the quality and fidelity of generated sketches. As we move forward, we are committed to making our code and dataset publicly available upon acceptance, fostering further research and innovation in this exciting area of artificial intelligence.

Project Page

For more information, please visit our project page: StableSketcher Project Page.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

StableSketcher: AI Diffusion Model for Pixel Sketches

StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback

Abstract

Key Features of StableSketcher

Introduction of SketchDUO

Experimental Results

Conclusion

Project Page

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related