Enhancing Floor Plan Recognition: A Hybrid Mix-Transformer and U-Net Approach for Precise Wall Segmentation
Summary: arXiv:2512.02413v3 Announce Type: replace-cross
Abstract
Automatic 3D reconstruction of indoor spaces from 2D floor plans necessitates high-precision semantic segmentation of structural elements, particularly walls. However, existing methods often struggle with detecting thin structures and maintaining geometric precision. To address this, we introduce MitUNet, a hybrid neural network designed to bridge the gap between global semantic context and fine-grained structural details.
Introduction
The task of converting 2D floor plans into accurate 3D models has gained significant attention in recent years, particularly in fields such as architecture, real estate, and virtual reality. High-quality semantic segmentation of walls and other structural elements is essential for achieving this objective. Traditional segmentation methods have shown limitations, especially regarding thin structures, which can result in inaccuracies in the reconstructed models.
MitUNet Architecture
Our proposed architecture, MitUNet, combines a Mix-Transformer encoder with a U-Net decoder. This innovative design leverages the strengths of both models:
- Mix-Transformer Encoder: Captures global semantic context through attention mechanisms, allowing for better understanding of the overall layout.
- U-Net Decoder: Focuses on fine-grained details and structural accuracy, essential for precise wall segmentation.
- Spatial and Channel Attention Blocks: Enhance feature extraction by allowing the model to focus on relevant areas and channels, improving segmentation performance.
Optimization and Performance
To optimize the performance of MitUNet, we employed the Tversky loss function, which strikes a balance between precision and recall. This is particularly important in segmentation tasks where the accurate recovery of boundaries is critical.
Our experiments on the CubiCasa5k dataset, along with a dedicated regional dataset, demonstrated the effectiveness of MitUNet in generating structurally correct masks with high boundary accuracy. The results indicate a significant improvement over standard models in terms of both segmentation quality and computational efficiency.
Results and Implications
MitUNet’s ability to accurately segment walls and other structural elements lays a robust foundation for automated 3D reconstruction pipelines. This advancement not only enhances the quality of 3D models but also accelerates the workflow for professionals in various industries, including architecture and interior design.
Availability
To ensure reproducibility and facilitate future research, we have made the source code and the regional dataset publicly available. Researchers and developers can access these resources at:
Conclusion
In conclusion, the MitUNet architecture represents a significant advancement in the field of floor plan recognition and segmentation. By effectively combining global context with fine-grained details, it addresses key challenges in automatic 3D reconstruction tasks, paving the way for more accurate and efficient modeling solutions.
