Panoptic Vision-Language Feature Fields
Haoran Chen, Kenneth Blomqvist, Francesco Milano and Roland Siegwart. "Panoptic Vision-Language Feature Fields." IEEE Robotics and Automation Letters. 2024
In this paper, we proposed a open-vocabulary panoptic system based on neural fields for scene understanding. Our method implicitly reconstructs the scene geometry from 2D images and simultaneously gains panoptic informaiton from 2D proposals computed by off-the-shelf 2D networks.
Abstract
“Recently, methods have been proposed for 3D open- vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes based on text descriptions provided during runtime. In this paper, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes. Our algorithm, Panoptic Vision- Language Feature Fields (PVLFF), learns a semantic feature field of the scene by distilling vision-language features from a pretrained 2D model, and jointly fits an instance feature field through contrastive learning using 2D instance segments on input frames. Despite not being trained on the target classes, our method achieves panoptic segmentation performance similar to the state-of-the-art closed-set 3D systems on the HyperSim, Scan- Net and Replica dataset and additionally outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We ablate the components of our method to demonstrate the effectiveness of our model architecture. Our code will be available at https://github.com/ethz-asl/pvlff.”
Main Contributions
- A hierarchical instance feature field that enables obtaining 3D instance segments from 2D proposals using contrastive learning;
- To the best of our knowledge, the first zero-shot open-vocabulary panoptic segmentation system.
Overview of PVLFF
Some visual results
Please find more visualization results on our website and check more details in the paper.