OmniDiagram: Unified Diagram Code Generation with Visual Feedback

Date:

OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward

Summary: arXiv:2604.05514v1 Announce Type: new

Abstract

The paradigm of programmable diagram generation is evolving rapidly, playing a crucial role in structured visualization. However, most existing studies are confined to a narrow range of task formulations and language support, constraining their applicability to diverse diagram types. In this work, we propose OmniDiagram, a unified framework that incorporates diverse diagram code languages and task definitions.

Introduction

The ability to generate diagrams programmatically is becoming increasingly important in various fields, from education to data science. Despite the advancements, the current methodologies have notable limitations, primarily due to their restrictive focus on specific types of diagrams and coding languages. OmniDiagram aims to overcome these challenges by providing a versatile solution that caters to a wider array of diagrammatic needs.

Visual Interrogation Verifies All (Viva)

To address the challenge of aligning code logic with visual fidelity in Reinforcement Learning (RL), we introduce a novel visual feedback strategy named Visual Interrogation Verifies All (Viva). This innovative approach diverges from traditional methods that rely on brittle syntax-based rules or pixel-level matching. Instead, Viva rewards the visual structure of rendered diagrams through a generative approach.

  • Active Visual Inquiries: Viva actively generates targeted visual inquiries to scrutinize the visual fidelity of diagrams.
  • Fine-Grained Feedback: It provides detailed feedback that facilitates optimization, enhancing the overall quality of generated diagrams.
  • Self-Evolving Training: This mechanism supports a self-evolving training process, diminishing the dependency on manually annotated ground truth code.

M3²Diagram Dataset

As part of our research, we also constructed M3²Diagram, the first large-scale diagram code generation dataset containing over 196,000 high-quality instances. This dataset serves as a critical resource for training and evaluating the capabilities of OmniDiagram.

Experimental Results

Our experimental results highlight the effectiveness of OmniDiagram. The integration of Supervised Fine-Tuning (SFT) alongside our Viva-based RL approach has allowed OmniDiagram to set a new state-of-the-art (SOTA) across various diagram code generation benchmarks. Key findings from our experiments include:

  • Performance Improvement: A significant increase in the quality of generated diagrams compared to previous models.
  • Generalization: Enhanced ability to adapt to different diagram types and structures.
  • User Flexibility: Increased support for various user-defined task formulations and coding languages.

Conclusion

OmniDiagram represents a significant advancement in the field of programmable diagram generation. By integrating diverse diagram code languages and leveraging innovative visual feedback mechanisms, it not only enhances the quality and applicability of generated diagrams but also sets a new benchmark for future research in this area. The introduction of the M3²Diagram dataset further solidifies the foundation for ongoing exploration and development in diagrammatic visualization.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.