DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models
Summary: arXiv:2306.14685v5 Announce Type: replace-cross
Abstract: We demonstrate that pre-trained text-to-image diffusion models, despite being trained on raster images, possess a remarkable capacity to guide vector sketch synthesis. In this paper, we introduce DiffSketcher, a novel algorithm for generating vectorized free-hand sketches directly from natural language prompts. Our method optimizes a set of Bézier curves via an extended Score Distillation Sampling (SDS) loss, successfully bridging a raster-level diffusion prior with a parametric vector generator. To further accelerate the generation process, we propose a stroke initialization strategy driven by the diffusion model’s intrinsic attention maps. Results show that DiffSketcher produces sketches across varying levels of abstraction while maintaining the structural integrity and essential visual details of the subject. Experiments confirm that our approach yields superior perceptual quality and controllability over existing methods. The code and demo are available at DiffSketcher Project.
Introduction
The field of generative models has witnessed a significant evolution with the advent of diffusion models, particularly in text-to-image tasks. Recent developments have highlighted their potential to not only generate images but also to influence vector-based outputs. This article delves into DiffSketcher, a cutting-edge algorithm that leverages this potential to create free-hand sketches from textual descriptions.
Methodology
DiffSketcher is designed to generate vector sketches using a combination of pre-trained text-to-image diffusion models and a unique optimization technique. The core of the approach lies in the following components:
- Bézier Curve Optimization: By utilizing an extended Score Distillation Sampling (SDS) loss, DiffSketcher fine-tunes a set of Bézier curves, establishing a connection between raster images and parametric vector generation.
- Stroke Initialization Strategy: To enhance the efficiency of the sketch generation, the model incorporates a stroke initialization process that is informed by the attention maps derived from the diffusion model.
- Abstraction Control: The algorithm’s architecture allows it to produce sketches of varying abstraction levels, effectively capturing both complex and simplistic representations.
Results
The experiments conducted with DiffSketcher reveal its impressive ability to maintain essential visual details while also providing a high degree of control over the generated sketches. Key findings include:
- Superior perceptual quality compared to existing vector sketch synthesis methods.
- Enhanced structural integrity of sketches across different abstraction levels.
- Increased controllability over the sketch generation process, allowing users to influence the output more effectively.
Conclusion
DiffSketcher represents a significant advancement in the intersection of text-guided generation and vector sketch synthesis. By utilizing the strengths of pre-trained diffusion models, this innovative approach opens new avenues for artists, designers, and developers seeking to integrate advanced AI capabilities into their creative workflows. For those interested in exploring this technology further, the code and demo can be accessed through the DiffSketcher Project website.
