GeoGuide: Hierarchical Geometric Guidance for Open-Vocabulary 3D Semantic Segmentation
Published on: arXiv:2603.26260v1
Type: Cross
Abstract
Open-vocabulary 3D semantic segmentation is a rapidly evolving field, aiming to segment arbitrary categories that extend beyond the original training set. The challenge lies in the fact that most existing methods depend heavily on transferring knowledge from 2D open-vocabulary models, which can lead to several limitations. One major issue is the alignment of 3D features to 2D representation spaces, which can restrict the learning of intrinsic 3D geometric properties and often results in the propagation of errors from 2D predictions. To overcome these challenges, we introduce GeoGuide, an innovative framework designed to harness the power of pretrained 3D models while ensuring hierarchical geometry-semantic consistency in open-vocabulary 3D segmentation.
Key Innovations
- Uncertainty-based Superpoint Distillation Module: This module fuses geometric and semantic features to estimate per-point uncertainty. It adaptively weights 2D features within superpoints, effectively suppressing noise while retaining critical discriminative information, thereby enhancing local semantic consistency.
- Instance-level Mask Reconstruction Module: By utilizing geometric priors, this module enforces semantic consistency within instances by reconstructing complete instance masks. This approach ensures that the segmentation remains coherent and accurate across different instances.
- Inter-Instance Relation Consistency Module: This module focuses on aligning geometric and semantic similarity matrices, which helps in calibrating consistency across instances of the same category. This is particularly beneficial in mitigating semantic drift that may occur due to varying viewpoints.
Experimental Validation
To validate the effectiveness of GeoGuide, extensive experiments were conducted on renowned datasets including ScanNet v2, Matterport3D, and nuScenes. The results demonstrated the superior performance of GeoGuide in comparison to existing state-of-the-art methods.
Conclusion
GeoGuide represents a significant advancement in the realm of open-vocabulary 3D semantic segmentation. By leveraging hierarchical geometric guidance, the framework not only addresses the limitations of traditional methods but also enhances the overall accuracy and reliability of 3D segmentation tasks. As the field continues to evolve, innovations like GeoGuide pave the way for more robust and adaptable AI systems capable of understanding complex 3D environments.
