MitUNet: Hybrid Transformer U-Net for Accurate Wall Segmentation

Enhancing Floor Plan Recognition: A Hybrid Mix-Transformer and U-Net Approach for Precise Wall Segmentation

Summary: arXiv:2512.02413v3 Announce Type: replace-cross

Abstract

Automatic 3D reconstruction of indoor spaces from 2D floor plans necessitates high-precision semantic segmentation of structural elements, particularly walls. However, existing methods often struggle with detecting thin structures and maintaining geometric precision. To address this, we introduce MitUNet, a hybrid neural network designed to bridge the gap between global semantic context and fine-grained structural details.

Introduction

The task of converting 2D floor plans into accurate 3D models has gained significant attention in recent years, particularly in fields such as architecture, real estate, and virtual reality. High-quality semantic segmentation of walls and other structural elements is essential for achieving this objective. Traditional segmentation methods have shown limitations, especially regarding thin structures, which can result in inaccuracies in the reconstructed models.

MitUNet Architecture

Our proposed architecture, MitUNet, combines a Mix-Transformer encoder with a U-Net decoder. This innovative design leverages the strengths of both models:

Mix-Transformer Encoder: Captures global semantic context through attention mechanisms, allowing for better understanding of the overall layout.
U-Net Decoder: Focuses on fine-grained details and structural accuracy, essential for precise wall segmentation.
Spatial and Channel Attention Blocks: Enhance feature extraction by allowing the model to focus on relevant areas and channels, improving segmentation performance.

Optimization and Performance

To optimize the performance of MitUNet, we employed the Tversky loss function, which strikes a balance between precision and recall. This is particularly important in segmentation tasks where the accurate recovery of boundaries is critical.

Our experiments on the CubiCasa5k dataset, along with a dedicated regional dataset, demonstrated the effectiveness of MitUNet in generating structurally correct masks with high boundary accuracy. The results indicate a significant improvement over standard models in terms of both segmentation quality and computational efficiency.

Results and Implications

MitUNet’s ability to accurately segment walls and other structural elements lays a robust foundation for automated 3D reconstruction pipelines. This advancement not only enhances the quality of 3D models but also accelerates the workflow for professionals in various industries, including architecture and interior design.

Availability

To ensure reproducibility and facilitate future research, we have made the source code and the regional dataset publicly available. Researchers and developers can access these resources at:

Conclusion

In conclusion, the MitUNet architecture represents a significant advancement in the field of floor plan recognition and segmentation. By effectively combining global context with fine-grained details, it addresses key challenges in automatic 3D reconstruction tasks, paving the way for more accurate and efficient modeling solutions.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

MitUNet: Hybrid Transformer U-Net for Accurate Wall Segmentation

Enhancing Floor Plan Recognition: A Hybrid Mix-Transformer and U-Net Approach for Precise Wall Segmentation

Abstract

Introduction

MitUNet Architecture

Optimization and Performance

Results and Implications

Availability

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related