LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows
Summary: arXiv:2604.05182v1 Announce Type: cross
The Large Sparse Reconstruction Model (LSRM) has been introduced to investigate how scaling transformer context windows can enhance feed-forward 3D reconstruction techniques. Despite significant advancements in object-centric feed-forward methods that deliver robust and high-quality reconstructions, these approaches still fall short compared to dense-view optimization, particularly in recovering fine-grained textures and appearances.
In this article, we explore how expanding the context window—by significantly increasing the number of active object and image tokens—can effectively bridge this gap, enabling high-fidelity 3D object reconstruction and inverse rendering.
Key Contributions of LSRM
To scale effectively, LSRM integrates native sparse attention within its architecture design, resulting in three pivotal contributions:
- Efficient Coarse-to-Fine Pipeline: LSRM employs a pipeline that concentrates computation on informative regions, predicting sparse high-resolution residuals to optimize performance.
- 3D-Aware Spatial Routing Mechanism: This mechanism establishes accurate 2D-3D correspondences by utilizing explicit geometric distances instead of relying on standard attention scores.
- Custom Block-Aware Sequence Parallelism: By leveraging an All-gather-KV protocol, LSRM balances dynamic, sparse workloads across GPUs, enhancing computational efficiency.
As a result of these innovations, LSRM is capable of managing 20 times more object tokens and more than double the image tokens compared to previous state-of-the-art (SOTA) methods.
Performance Evaluation
Extensive evaluations conducted on standard novel-view synthesis benchmarks reveal substantial performance gains over current SOTA approaches. The results demonstrate:
- A 2.5 dB increase in Peak Signal-to-Noise Ratio (PSNR), indicating improved reconstruction quality.
- A 40% reduction in Learned Perceptual Image Patch Similarity (LPIPS), showcasing enhanced perceptual similarity to reference images.
Furthermore, when LSRM is extended to inverse rendering tasks, both qualitative and quantitative assessments on widely-used benchmarks highlight consistent improvements in texture and geometry details. The model achieves LPIPS scores that match or surpass those of SOTA dense-view optimization methods.
Future Directions
The authors of LSRM are committed to advancing the field of 3D reconstruction and rendering. As part of this effort, they plan to release the code and model on their project page, allowing the community to benefit from these advancements and potentially explore further innovations in the domain.
In conclusion, the Large Sparse Reconstruction Model represents a significant step forward in the realm of 3D object-centric reconstruction, leveraging innovative techniques that push the boundaries of existing methodologies.
