OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward
Summary: arXiv:2604.05514v1 Announce Type: new
Abstract
The paradigm of programmable diagram generation is evolving rapidly, playing a crucial role in structured visualization. However, most existing studies are confined to a narrow range of task formulations and language support, constraining their applicability to diverse diagram types. In this work, we propose OmniDiagram, a unified framework that incorporates diverse diagram code languages and task definitions.
Introduction
The ability to generate diagrams programmatically is becoming increasingly important in various fields, from education to data science. Despite the advancements, the current methodologies have notable limitations, primarily due to their restrictive focus on specific types of diagrams and coding languages. OmniDiagram aims to overcome these challenges by providing a versatile solution that caters to a wider array of diagrammatic needs.
Visual Interrogation Verifies All (Viva)
To address the challenge of aligning code logic with visual fidelity in Reinforcement Learning (RL), we introduce a novel visual feedback strategy named Visual Interrogation Verifies All (Viva). This innovative approach diverges from traditional methods that rely on brittle syntax-based rules or pixel-level matching. Instead, Viva rewards the visual structure of rendered diagrams through a generative approach.
- Active Visual Inquiries: Viva actively generates targeted visual inquiries to scrutinize the visual fidelity of diagrams.
- Fine-Grained Feedback: It provides detailed feedback that facilitates optimization, enhancing the overall quality of generated diagrams.
- Self-Evolving Training: This mechanism supports a self-evolving training process, diminishing the dependency on manually annotated ground truth code.
M3²Diagram Dataset
As part of our research, we also constructed M3²Diagram, the first large-scale diagram code generation dataset containing over 196,000 high-quality instances. This dataset serves as a critical resource for training and evaluating the capabilities of OmniDiagram.
Experimental Results
Our experimental results highlight the effectiveness of OmniDiagram. The integration of Supervised Fine-Tuning (SFT) alongside our Viva-based RL approach has allowed OmniDiagram to set a new state-of-the-art (SOTA) across various diagram code generation benchmarks. Key findings from our experiments include:
- Performance Improvement: A significant increase in the quality of generated diagrams compared to previous models.
- Generalization: Enhanced ability to adapt to different diagram types and structures.
- User Flexibility: Increased support for various user-defined task formulations and coding languages.
Conclusion
OmniDiagram represents a significant advancement in the field of programmable diagram generation. By integrating diverse diagram code languages and leveraging innovative visual feedback mechanisms, it not only enhances the quality and applicability of generated diagrams but also sets a new benchmark for future research in this area. The introduction of the M3²Diagram dataset further solidifies the foundation for ongoing exploration and development in diagrammatic visualization.
