add high resolution monocular priors script#76
add high resolution monocular priors script#76pablovela5620 wants to merge 1 commit intoautonomousvision:masterfrom
Conversation
|
Hi, thanks for working on this. How to generating a high-resolution monocular depth and normal is still a open research problem. I don't think using our patch-merge way in MonoSDF would generalise to a wide range of scenarios. As you shown above, the merged depth is not good in the white wall and the merged normal map looks different with low res normal map . Here is a method https://github.com/compphoto/BoostingMonocularDepth that can generate high resolution depth map and the results look very good but unfortunately normal input is not supported. |
|
Understood, I'm not sure if this is worth merging or not then. It could be helpful for folks to try and have the generation code available, but I'll leave it to your discretion. If you deem it worth merging I'll remove the visualization code and clean things up a bit. Otherwise you're more than welcome to close this PR! |
I added an option to generate high resolution monocular priors similar to monosdf script with a few modifications. I made this a draft as there are some issues I found that I wanted to ask some questions. I also temporarily added some visualization scripts to help debug. This is what most outputs currently look like
You can see that the high resolutions cues look much better v.s. the interpolated (especially around edges) but there are some problems @niujinshuchong may be able to help answer?
Here is a link the custom dataset I generated using nerfstudio ns-process-data
https://drive.google.com/file/d/1EPVHdDuV3vCEaF2852FeJR9yi-kUblZS/view?usp=sharing
to generate the low resolution cues use
to generate the high resolution cues use
To use the visualizer first
then from the
sdfstudiodirectory you can useassuming that is where the dataset is
--crop-mult 2in the process_nerfstudio_to_sdfstudio.py. This is mostly because I noticed that using a 1920x1080 takes an EXTREMELY long time compared to the original 384x384 due to the large number of patches created (a few hours). 768x768 seems like a good middle ground (a few minutes) v.s. the original 384x384 (a few seconds)I think this is probably due to the large white wall that makes it difficult for patches to be correctly merged together and to have some sort of coherency since the depth/normal values don't align well. is this something you've come across @niujinshuchong ?
A solution I could come up with is just to simply throw these clearly bad frames away using some sort of heuristic taking into account previous frames but that seems very brittle