StableSketcher: AI Diffusion Model for Pixel Sketches

Date:

StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback

Summary: arXiv:2510.20093v2 Announce Type: replace-cross

Abstract

Recent advancements in diffusion models have significantly enriched the quality of generated images; however, challenges remain in synthesizing pixel-based human-drawn sketches, which serve as a representative example of abstract expression. Addressing these challenges, we propose StableSketcher, a novel framework designed to empower diffusion models to generate hand-drawn sketches with high prompt fidelity.

Key Features of StableSketcher

Within the StableSketcher framework, several critical components work in tandem to achieve improved sketch generation:

  • Variational Autoencoder Fine-Tuning: We fine-tune the variational autoencoder to optimize latent decoding, enhancing its ability to capture the unique characteristics of sketches.
  • Reinforcement Learning Integration: A new reward function for reinforcement learning, based on visual question answering, is integrated to improve text-image alignment and semantic consistency.
  • Enhanced Stylization: Extensive experiments reveal that StableSketcher generates sketches with improved stylistic fidelity, achieving better alignment with prompts compared to the existing Stable Diffusion baseline.

Introduction of SketchDUO

To further support the development and evaluation of sketch generation, we introduce SketchDUO, which, to the best of our knowledge, is the first dataset comprising instance-level sketches paired with captions and question-answer pairs. This innovation addresses the limitations of existing datasets that primarily rely on image-label pairs, thereby providing a more robust framework for training and evaluating sketch generation models.

Experimental Results

Through extensive experiments, the capabilities of StableSketcher have been thoroughly assessed. Our results indicate a marked improvement in the quality of generated sketches, demonstrating superior performance in terms of stylistic fidelity and alignment with the given prompts. By leveraging the visual question answering feedback mechanism, the framework ensures that the generated sketches not only retain artistic integrity but also align closely with the intended semantic meaning of the prompts.

Conclusion

In conclusion, StableSketcher represents a significant advancement in the domain of sketch generation through diffusion models. The integration of variational autoencoder fine-tuning and reinforcement learning based on visual question answering feedback has proven effective in enhancing both the quality and fidelity of generated sketches. As we move forward, we are committed to making our code and dataset publicly available upon acceptance, fostering further research and innovation in this exciting area of artificial intelligence.

Project Page

For more information, please visit our project page: StableSketcher Project Page.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.