VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models
Summary: arXiv:2603.24575v1 Announce Type: cross
Abstract
Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only “flat” rasterized versions (e.g., PNG or JPEG) that are difficult to modify or scale. Manually reconstructing these figures is a prohibitively labor-intensive process, requiring specialized expertise to recover the original geometric intent.
Introduction to VFIG
To bridge this gap, we propose VFIG, a family of Vision-Language Models trained for complex and high-fidelity figure-to-SVG conversion. This innovative approach aims to automate the reconstruction of vector graphics from raster images, thereby saving time and resources in digital design workflows.
Challenges in Current Practices
While the task of converting raster images back to SVG is inherently data-driven, existing datasets are typically small-scale and lack the complexity of professional diagrams. This limitation poses challenges for the effective training of models capable of understanding and recreating intricate vector graphics.
Introducing VFIG-DATA
To address the above challenges, we introduce VFIG-DATA, a large-scale dataset comprising 66,000 high-quality figure-SVG pairs. This dataset is meticulously curated from a diverse mix of real-world paper figures and procedurally generated diagrams, providing a robust foundation for training Vision-Language Models.
Training Methodology
Recognizing that SVGs are composed of recurring primitives and hierarchical local structures, we have developed a coarse-to-fine training curriculum. This methodology begins with supervised fine-tuning (SFT) to learn atomic primitives and transitions to reinforcement learning (RL) refinement. The RL phase is designed to optimize:
- Global diagram fidelity
- Layout consistency
- Topological edge cases
Evaluation with VFIG-BENCH
Finally, we introduce VFIG-BENCH, a comprehensive evaluation suite featuring novel metrics designed to measure the structural integrity of complex figures. This benchmarking tool allows for a standardized assessment of performance across different models.
Performance and Results
VFIG achieves state-of-the-art performance among open-source models and performs on par with GPT-5.2. It has achieved a VLM-Judge score of 0.829 on VFIG-BENCH, demonstrating its effectiveness in accurately converting rasterized figures back into high-fidelity SVG formats.
Conclusion
The development of VFIG represents a significant advancement in the field of vector graphics and Vision-Language Models. By providing a scalable solution for figure reconstruction, VFIG not only enhances digital design processes but also opens new avenues for research and application in technical illustration.
