Thank you for sharing this code. Great work! I am able to reach the final step in the pipeline where I use the matched segmentation masks to train the instance NeRF field. However, I am getting this error when trying to run the main_nerf_mask.py at this point: https://github.com/zymk9/torch-ngp/blob/6be6af198f1092e8d75574727a030ae15e199fe8/nerf/utils.py#L1312. I have followed all the previous steps in the README.
0% 0/161 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/Loss.cu:176: nll_loss_forward_no_reduce_cuda_kernel: block: [3,0,0], thread: [192,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:176: nll_loss_forward_no_reduce_cuda_kernel: block: [3,0,0], thread: [193,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:176: nll_loss_forward_no_reduce_cuda_kernel: block: [3,0,0], thread: [194,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:176: nll_loss_forward_no_reduce_cuda_kernel: block: [3,0,0], thread: [195,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
.....
.....
.....
../aten/src/ATen/native/cuda/Loss.cu:176: nll_loss_forward_no_reduce_cuda_kernel: block: [1,0,0], thread: [829,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:176: nll_loss_forward_no_reduce_cuda_kernel: block: [1,0,0], thread: [830,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:176: nll_loss_forward_no_reduce_cuda_kernel: block: [1,0,0], thread: [831,0,0] Assertion `cur_target >= 0 && cur_target < n_classes` failed.
Traceback (most recent call last):
File "main_nerf_mask.py", line 220, in <module>
trainer.train(train_loader, valid_loader, max_epoch)
File "/scratch/vv15/dec_5_instance_nerf/instance_nerf/nerf/utils.py", line 716, in train
self.train_one_epoch(train_loader)
File "/scratch/vv15/dec_5_instance_nerf/instance_nerf/nerf/utils.py", line 932, in train_one_epoch
preds, truths, loss = self.train_step(data)
File "/scratch/vv15/dec_5_instance_nerf/instance_nerf/nerf/utils.py", line 1320, in train_step
loss = self.criterion(pred_masks_labeled, gt_masks_labeled) # [B*N], loss fn with reduction='none'
File "/home/vv15/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/vv15/.local/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1174, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/home/vv15/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 3029, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
When I train with 0 loss (basically comment out the cross entropy lines), the NeRF training progresses without issue.
Please let me know if you have any solution for the CUDA error. Thanks in advance.
Thank you for sharing this code. Great work! I am able to reach the final step in the pipeline where I use the matched segmentation masks to train the instance NeRF field. However, I am getting this error when trying to run the
main_nerf_mask.pyat this point: https://github.com/zymk9/torch-ngp/blob/6be6af198f1092e8d75574727a030ae15e199fe8/nerf/utils.py#L1312. I have followed all the previous steps in the README.When I train with 0 loss (basically comment out the cross entropy lines), the NeRF training progresses without issue.
Before this error, I also ran into the issue that
num_instanceswas not found in thetransforms.jsonfile that came with the NeRF data at this point in the code: https://github.com/zymk9/torch-ngp/blob/6be6af198f1092e8d75574727a030ae15e199fe8/nerf/provider.py#L429. Therefore, I removed all references tonum_instancestemporarily assuming that it would take a default value of 2 from the constructors. I am not sure if this could be the cause here.Please let me know if you have any solution for the CUDA error. Thanks in advance.