Geometry-Aware Cross Modal Alignment for Light Field-LiDAR Semantic Segmentation
Summary: arXiv:2510.06687v4 Announce Type: replace-cross
Semanitc segmentation is a fundamental aspect of scene understanding, especially in the context of autonomous driving. Despite its importance, numerous challenges remain, particularly in complex scenarios such as occlusion. Recent advancements in sensor technology have introduced light field and LiDAR modalities, which provide complementary visual and spatial cues essential for robust perception. However, the effective integration of these modalities is often hindered by limited viewpoint diversity and inherent discrepancies between the modalities.
Introduction
To tackle these challenges, researchers have proposed an innovative multimodal semantic segmentation dataset that integrates light field data with point cloud data. This dataset serves as a foundation for the development of a novel segmentation approach that leverages both modalities.
Proposed Methodology
The proposed solution, named the Multi-modal Light Field Point-cloud Fusion Segmentation Network (Mlpfseg), is designed to segment both camera images and LiDAR point clouds simultaneously. The Mlpfseg network incorporates several advanced components:
- Feature Completion Module: This component addresses the density mismatch between point clouds and image pixels by performing differential reconstruction of point-cloud feature maps. This process enhances the fusion of the two modalities, ensuring that the features derived from both light fields and LiDAR data can be effectively combined.
- Depth Perception Module: To improve the segmentation accuracy of occluded objects, this module reinforces attention scores, enhancing occlusion awareness. By focusing on depth perception, the network is better equipped to identify and segment objects that may be partially obscured in the scene.
Results
The effectiveness of the Mlpfseg network has been demonstrated through rigorous testing. The results indicate a significant improvement in segmentation performance:
- 1.71 Mean Intersection over Union (mIoU): The proposed method outperforms image-only segmentation methods.
- 2.38 Mean Intersection over Union (mIoU): Additionally, it surpasses point cloud-only segmentation techniques.
Conclusion
In conclusion, the integration of light field data and LiDAR point clouds through the Mlpfseg network presents a significant advancement in the field of semantic segmentation for autonomous driving. By addressing the challenges of occlusion and modality discrepancies, this approach enhances the robustness of scene understanding. Future work will focus on further refining these techniques and exploring additional applications in autonomous systems.
