VFIG: Advanced SVG Vectorization with Vision-Language AI

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Summary: arXiv:2603.24575v1 Announce Type: cross

Abstract

Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only “flat” rasterized versions (e.g., PNG or JPEG) that are difficult to modify or scale. Manually reconstructing these figures is a prohibitively labor-intensive process, requiring specialized expertise to recover the original geometric intent.

Introduction to VFIG

To bridge this gap, we propose VFIG, a family of Vision-Language Models trained for complex and high-fidelity figure-to-SVG conversion. This innovative approach aims to automate the reconstruction of vector graphics from raster images, thereby saving time and resources in digital design workflows.

Challenges in Current Practices

While the task of converting raster images back to SVG is inherently data-driven, existing datasets are typically small-scale and lack the complexity of professional diagrams. This limitation poses challenges for the effective training of models capable of understanding and recreating intricate vector graphics.

Introducing VFIG-DATA

To address the above challenges, we introduce VFIG-DATA, a large-scale dataset comprising 66,000 high-quality figure-SVG pairs. This dataset is meticulously curated from a diverse mix of real-world paper figures and procedurally generated diagrams, providing a robust foundation for training Vision-Language Models.

Training Methodology

Recognizing that SVGs are composed of recurring primitives and hierarchical local structures, we have developed a coarse-to-fine training curriculum. This methodology begins with supervised fine-tuning (SFT) to learn atomic primitives and transitions to reinforcement learning (RL) refinement. The RL phase is designed to optimize:

Global diagram fidelity
Layout consistency
Topological edge cases

Evaluation with VFIG-BENCH

Finally, we introduce VFIG-BENCH, a comprehensive evaluation suite featuring novel metrics designed to measure the structural integrity of complex figures. This benchmarking tool allows for a standardized assessment of performance across different models.

Performance and Results

VFIG achieves state-of-the-art performance among open-source models and performs on par with GPT-5.2. It has achieved a VLM-Judge score of 0.829 on VFIG-BENCH, demonstrating its effectiveness in accurately converting rasterized figures back into high-fidelity SVG formats.

Conclusion

The development of VFIG represents a significant advancement in the field of vector graphics and Vision-Language Models. By providing a scalable solution for figure reconstruction, VFIG not only enhances digital design processes but also opens new avenues for research and application in technical illustration.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

VFIG: Advanced SVG Vectorization with Vision-Language AI

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Abstract

Introduction to VFIG

Challenges in Current Practices

Introducing VFIG-DATA

Training Methodology

Evaluation with VFIG-BENCH

Performance and Results

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related