Evaluating LLM-Generated Codes for Construction Safety

Date:

Is Vibe Coding the Future? An Empirical Assessment of LLM Generated Codes for Construction Safety

Summary: arXiv:2604.12311v1 Announce Type: cross

Abstract

The emergence of vibe coding, a paradigm where non-technical users instruct Large Language Models (LLMs) to generate executable codes via natural language, presents both significant opportunities and severe risks for the construction industry. While empowering construction personnel such as safety managers, foremen, and workers to develop tools and software, the probabilistic nature of LLMs introduces the threat of silent failures, wherein generated code compiles perfectly but executes flawed mathematical safety logic.

Research Overview

This study empirically evaluates the reliability, software architecture, and domain-specific safety fidelity of 450 vibe-coded Python scripts generated by three frontier models: Claude 3.5 Haiku, GPT-4o-Mini, and Gemini 2.5 Flash. The research utilizes a persona-driven prompt dataset (n=150) and a bifurcated evaluation pipeline comprising isolated dynamic sandboxing and an LLM-as-a-Judge.

Key Findings

The research quantifies the severe limits of zero-shot vibe codes for construction safety. The findings reveal a highly significant relationship between user persona and data hallucination, demonstrating that less formal prompts drastically increase the AI’s propensity to invent missing safety variables. Furthermore, while the models demonstrated high foundational execution viability (~85%), this syntactic reliability actively masked logic deficits and a severe lack of defensive programming.

Silent Failure Rate

Among successfully executed scripts, the study identified an alarming ~45% overall Silent Failure Rate, with GPT-4o-Mini generating mathematically inaccurate outputs in ~56% of its functional code. These results highlight the critical deficiencies in the current generation of LLMs when applied to safety engineering in construction.

Implications for the Construction Industry

The findings demonstrate that current LLMs lack the deterministic rigor required for standalone safety engineering. As the construction industry increasingly adopts these technologies, it becomes vital to implement safety measures to mitigate the risks posed by unreliable AI-generated code.

Recommendations

  • Adoption of deterministic AI wrappers to enhance the reliability of generated code.
  • Implementation of strict governance protocols for cyber-physical deployments.
  • Training personnel in understanding the limitations of LLM-generated codes.
  • Conducting regular audits of AI-generated tools and software to ensure safety compliance.

Conclusion

In conclusion, while vibe coding offers exciting possibilities for the construction industry, its current application raises significant safety concerns. It is essential to address these challenges through improved AI techniques, governance, and ongoing evaluation to ensure that the benefits of AI can be harnessed without compromising safety.

Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.