Hi. First of all thank you for open-sourcing such a great work!
From what I understand about the paper, it seems to involve two steps
- Generate multiple 2D segmentation maps from Mask2Former and matching instances through NeRF-RCNN's 3D box prediction
- Using these matched 2D Segmentation maps train an Instance-NeRF model through appearance loss (i.e. photometric loss from NeRF) and pixel-wise classification loss.
My question is whether if it is possible to train a Instance-NeRF without the first step assuming that the Segmentation Maps are precise. The reason behind this question is because I want to train an Instance-NeRF without having 3D bbox annotation for my custom scene.
Also could you also tell me, what the training time for one scene?
Please correct me if I am wrong or misunderstood in any way. Thank you in advance!
Hi. First of all thank you for open-sourcing such a great work!
From what I understand about the paper, it seems to involve two steps
My question is whether if it is possible to train a Instance-NeRF without the first step assuming that the Segmentation Maps are precise. The reason behind this question is because I want to train an Instance-NeRF without having 3D bbox annotation for my custom scene.
Also could you also tell me, what the training time for one scene?
Please correct me if I am wrong or misunderstood in any way. Thank you in advance!