Reliable Multimodal Circuit-to-Verilog Code Generation

Date:

From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation

In recent advancements in artificial intelligence, multimodal large language models (MLLMs) have gained traction for their ability to translate visual artifacts into functional code. This capability spans a wide range of applications, from converting UI mockups into HTML to generating Python scripts from scientific plots. However, a more complex challenge arises when dealing with circuit diagrams, which serve as a visual domain-specific language for hardware design. These diagrams encapsulate critical information about timing, topology, and bit-level semantics—elements that are often overlooked but are vital for ensuring safety once the design is fabricated into silicon.

The translation of circuit diagrams into register-transfer-level (RTL) code represents a rigorous test of reliability for vision-to-code generation systems. A recent study introduced a troubling phenomenon known as “Mirage,” which highlights a significant flaw in the performance of certain MLLMs. Specifically, researchers found that when a circuit diagram was replaced with a blank image, the models’ performance metrics, particularly Pass@k scores, remained unchanged or even improved. This suggests that the models are circumventing the visual input entirely, instead relying on the semantics of identifiers within the module header to extract canonical RTL templates. Such a behavior not only raises concerns about the models’ reliability but also poses a threat to their overall trustworthiness in practical applications.

Key Findings and Methodology

To better understand and quantify the Mirage phenomenon, the researchers developed a benchmarking tool dubbed C2VEVAL. This tool was utilized to evaluate eight different MLLMs under a paired Normal/Anony protocol. In this setup, the Anony mode anonymizes all identifiers present in both the circuit diagram and the module header. The results were telling: scores in Anony mode dropped sharply across all models, confirming that high accuracy observed in Normal mode could be misleading and largely a product of the Mirage effect.

Introducing VeriGround

In light of these findings, the researchers proposed a novel model called VeriGround, which is specifically designed to address the issues revealed by the Mirage phenomenon. VeriGround is trained with several innovative strategies, including:

  • Identifier Anonymization: This technique ensures that the model learns to generate code without relying on identifiable semantics.
  • Refusal Augmentation: This approach enables the model to decline requests when it cannot confidently produce accurate code.
  • D-ORPO (Decision-Focused ORPO) Preference Alignment: This method up-weights pivotal generate-or-refuse tokens, enhancing the model’s decision-making capabilities.

With 4 billion parameters, VeriGround has demonstrated impressive results, achieving a Functional Pass@1 score of 46.11% in Normal mode and 42.51% in Anony mode, while maintaining a low False Refusal Rate of only 1.20% and 0.00%, respectively. Notably, the model exhibits a refusal rate exceeding 92% when presented with blank images, underscoring its capability to discern meaningful input from irrelevant data.

Conclusion

VeriGround’s performance indicates that it can compete with larger models, such as GPT-5.4, in Normal mode, and significantly outperforms all existing baselines in Anony mode. This research not only sheds light on the hidden challenges in AI-assisted code generation but also paves the way for more reliable systems that genuinely understand visual inputs, thus enhancing trust in MLLMs for critical applications in hardware design.

Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.