Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue (original) (raw)

References

Bangunharcana, A., Cho, J.W., Lee, S., Kweon, I.S., Kim, K.S., & Kim, S. (2021). Correlate-and-excite: Real-time stereo matching via guided cost volume excitation. In: IROS.
Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., & Gall, J. (2019). SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences. In ICCV.
Bian, J. W., Li, Z., Wang, N., Zhan, H., Shen, C., Cheng, M. M., & Reid, I. (2019). Unsupervised scale-consistent depth and ego-motion learning from monocular video. In NeurIPS.
Bian, J. W., Zhan, H., Wang, N., Li, Z., Zhang, L., Shen, C., Cheng, M. M., & Reid, I. (2021). Unsupervised scale-consistent depth learning from video. International Journal of Computer Vision (IJCV).
Cao, Z., Kar, A., Hane, C., & Malik, J. (2019). Learning independent object motion from unlabelled stereoscopic videos. In CVPR.
Casser, V., Pirk, S., Mahjourian, R., & Angelova, A. (2019). Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In AAAI.
Casser, V., Pirk, S., Mahjourian, R., & Angelova, A. (2019). Unsupervised monocular depth and ego-motion learning with structure and semantics. In CVPR workshops.
Chang, J.R., & Chen, Y.S. (2018). Pyramid stereo matching network. In CVPR.
Chen, P. Y., Liu, A. H., Liu, Y. C., & Wang, Y.C.F. (2019). Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In CVPR.
Chen, Y., Schmid, C., & Sminchisescu, C. (2019). Self-supervised learning with geometric constraints in monocular video: Connecting flow, depth, and camera. In ICCV.
Cheng, B., Saggu, I. S., Shah, R., Bansal, G., & Bharadia, D. (2020). \(s^{3}\)net: Semantic-aware self-supervised depth estimation with monocular videos and synthetic data. In ECCV.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR.
Dai, Q., Patil, V., Hecker, S., Dai, D., Van Gool, L., & Schindler, K. (2020). Self-supervised object motion and depth estimation from video. In CVPR workshops.
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In ICCV.
Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In NIPS.
Garg, R., BG, V. K., Carneiro, G., & Reid, I. (2016). Unsupervised cnn for single view depth estimation: Geometry to the rescue. In ECCV.
Geiger, A., Lauer, M., Wojek, C., Stiller, C., & Urtasun, R. (2014). 3d traffic scene understanding from movable platforms. In IEEE transactions on pattern analysis and machine intelligence (PAMI).
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The kitti vision benchmark suite. In CVPR.
Godard, C., Mac Aodha, O., & Brostow, G.J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR.
Godard, C., Mac Aodha, O., Firman, M., & Brostow, G.J. (2019). Digging into self-supervised monocular depth estimation. In ICCV.
Gordon, A., Li, H., Jonschkowski, R., & Angelova, A. (2019). Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In ICCV.
Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., & Gaidon, A. (2020). 3d packing for self-supervised monocular depth estimation. In CVPR.
Guizilini, V., Hou, R., Li, J., Ambrus, R., & Gaidon, A. (2020). Semantically-guided representation learning for self-supervised monocular depth. In ICLR.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In ICCV.
Hur, J., & Roth, S. (2020). Self-supervised monocular scene flow estimation. In CVPR.
Jaderberg, M., Simonyan, K., & Zisserman, A., et al. (2015). Spatial transformer networks. In NIPS.
Janai, J., Guney, F., Ranjan, A., Black, M., & Geiger, A. (2018). Unsupervised learning of multi-frame optical flow with occlusions. In ECCV.
Kingma, D.P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.
Klingner, M., Termöhlen, J. A., Mikolajczyk, J., & Fingscheidt, T. (2020). Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In ECCV.
Lee, S., Im, S., Lin, S., & Kweon, I.S. (2019). Learning residual flow as dynamic motion from stereo videos. In IROS.
Lee, S., Im, S., Lin, S., & Kweon, I. S. (2021). Learning monocular depth in dynamic scenes via instance-aware projection consistency. In AAAI.
Lee, S., Kim, J., Oh, T. H., Jeong, Y., Yoo, D., Lin, S., & Kweon, I. S. (2019). Visuomotor understanding for representation learning of driving scenes. In BMVC.
Lee, S., Rameau, F., Pan, F., Kweon, I. S. (2021). Attentive and contrastive learning for joint depth and motion field estimation. In ICCV.
Li, H., Gordon, A., Zhao, H., Casser, V., & Angelova, A. (2020). Unsupervised monocular depth learning in dynamic scenes. In CoRL.
Liu, P., Lyu, M., King, I., & Xu, J. (2019). Selflow: Self-supervised learning of optical flow. In CVPR.
Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance segmentation. In CVPR.
Luo, C., Yang, Z., Wang, P., Wang, Y., Xu, W., Nevatia, R., & Yuille, A. (2019). Every pixel counts++: Joint learning of geometry and motion with 3d holistic understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI).
Lv, Z., Kim, K., Troccoli, A., Sun, D., Rehg, J. M., & Kautz, J. (2018). Learning rigidity in dynamic scenes with a moving camera for 3d motion field estimation. In ECCV.
Mahjourian, R., Wicke, M., & Angelova, A. (2018). Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In CVPR.
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR.
Meister, S., Hur, J., & Roth, S. (2018). Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In AAAI.
Ošep, A., Mehner, W., Mathias, M., & Leibe, B. (2017). Combined image-and world-space tracking in traffic scenes. In ICRA.
Ošep, A., Mehner, W., Voigtlaender, P., & Leibe, B. (2018). Track, then decide: Category-agnostic vision-based multi-object tracking. In ICRA.
Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems: Man, and Cybernetics, 9, 62–66.
Google Scholar
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., & Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS.
Pillai, S., Ambruş, R., & Gaidon, A. (2019). Superdepth: Self-supervised, super-resolved monocular depth estimation. In ICRA.
Ranjan, A., Jampani, V., Kim, K., Sun, D., Wulff, J., & Black, M. J. (2019). Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In CVPR.
Shashua, A., Gdalyahu, Y., & Hayun, G. (2004). Pedestrian detection for driving assistance systems: Single-frame classification and system level performance. In IEEE intelligent vehicles symposium.
Shin, K., Kwon, Y. P., & Tomizuka, M. (2019). Roarnet: A robust 3d object detection based on region approximation refinement. In 2019 IEEE intelligent vehicles symposium (IV).
Sun, D., Yang, X., Liu, M. Y., & Kautz, J. (2018). Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In CVPR.
Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B.B.G., Geiger, A., & Leibe, B.(2019). Mots: Multi-object tracking and segmentation. In CVPR.
Wang, C., Buenaposada, J. M., Zhu, R., & Lucey, S. (2018). Learning depth from monocular videos using direct methods. In CVPR.
Wang, Y., Wang, P., Yang, Z., Luo, C., Yang, Y., & Xu, W. (2019). Unos: Unified unsupervised optical-flow and stereo-depth estimation by watching videos. In CVPR.
Wang, Y., Yang, Y., Yang, Z., Zhao, L., Wang, P., & Xu, W. (2018). Occlusion aware unsupervised learning of optical flow. In CVPR.
Wang, Z., Bovik, A.C., Sheikh, H.R., & Simoncelli, E.P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing (TIP).
Yang, Z., Wang, P., Wang, Y., Xu, W., & Nevatia, R. (2018). Lego: Learning edge with geometry all at once by watching videos. In CVPR.
Yin, Z., & Shi, J. (2018). Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In CVPR.
Zhang, C., Benz, P., Argaw, D. M., Lee, S., Kim, J., Rameau, F., Bazin, J. C., & Kweon, I. S. (2021). Resnet or densenet? introducing dense shortcuts to resnet. In WACV.
Zhang, C., Rameau, F., Lee, S., Kim, J., Benz, P., Argaw, D. M., Bazin, J. C., & Kweon, I. S. (2019). Revisiting residual networks with nonlinear shortcuts. In BMVC.
Zhou, T., Brown, M., Snavely, N., & Lowe, D. G. (2017). Unsupervised learning of depth and ego-motion from video. In CVPR.

Download references