UST-SU: a U-shaped video prediction network based on partial autoregression (original) (raw)

References

  1. Kalbkhani H, Shayesteh MG, Haghighat N (2017) Adaptive LSTAR model for long-range variable bit rate video traffic prediction. IEEE Trans Multimed 19(5):999–1014. https://doi.org/10.1109/TMM.2016.2639379
    Article Google Scholar
  2. Su Z, Liu T, Hao X, Hu X (2023) Spatial-temporal graph convolutional networks for traffic flow prediction considering multiple traffic parameters. J Supercomput 79(16):18293–18312
    Article Google Scholar
  3. Shi X, Chen Z, Wang H, Yeung D.-Y, Wong W.-K, Woo W.-c (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems 28
  4. Requena-Mesa C, Benson V, Reichstein M, Runge J, Denzler J (2021). Earthnet2021: a large-scale dataset and challenge for earth surface forecasting as a guided video prediction task. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1132–1142
  5. Li D, Yao T, Duan L-Y, Mei T, Rui Y (2019) Unified spatio-temporal attention networks for action recognition in videos. IEEE Trans Multimed 21(2):416–428. https://doi.org/10.1109/TMM.2018.2862341
    Article Google Scholar
  6. Kocabas M, Athanasiou N, Black MJ (2020) Vibe: video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5253–5263
  7. Liao X, Yuan J, Cai Z, Lai Jh (2023) An attention-based bidirectional GRU network for temporal action proposals generation. J Supercomput 79(8):8322–8339
    Article Google Scholar
  8. Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B, et al (2020) Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2446–2454
  9. Kumar VR, Yogamani S, Rashed H, Sitsu G, Witt C, Leang I, Milz S, Mäder P (2021) Omnidet: surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot Autom Lett 6(2):2830–2837
    Article Google Scholar
  10. Wu H, Yao Z, Wang J, Long M (2021) Motionrnn: a flexible model for video prediction with spacetime-varying motions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15435–15444
  11. Wang Y, Gao Z, Long M, Wang J, Philip SY (2018) Predrnn++: towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In: International Conference on Machine Learning, pp 5123–5132
  12. Wang Y, Wu H, Zhang J, Gao Z, Wang J, Philip SY, Long M (2022) Predrnn: a recurrent neural network for spatiotemporal predictive learning. IEEE Trans Pattern Anal Mach Intell 45(2):2208–2225
    Article Google Scholar
  13. Lin Z, Li M, Zheng Z, Cheng Y, Yuan C (2020) Self-attention ConvLSTM for spatiotemporal prediction. Proc AAAI Conf Artif Intell 34:11531–11538
    Google Scholar
  14. Wang Y, Jiang L, Yang M-H, Li L-J, Long M, Fei-Fei L (2019) Eidetic 3D LSTM: a model for video prediction and beyond. In: International Conference on Learning Representations
  15. Chang Z, Zhang X, Wang S, Ma S, Ye Y, Xinguang X, Gao W (2021) Mau: a motion-aware unit for video prediction and beyond. Adv Neural Inf Process Syst 34:26950–26962
    Google Scholar
  16. Chang Z, Zhang X, Wang S, Ma S, Gao W (2022) STAM: a spatiotemporal attention based memory for video prediction. IEEE Trans Multimed 25:2354
    Article Google Scholar
  17. Lee S, Kim HG, Choi DH, Kim H-I, Ro YM (2021) Video prediction recalling long-term motion context via memory alignment learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3054–3063
  18. Wang Y, Long M, Wang J, Gao Z, Yu PS (2017) PredRNN: recurrent neural networks for predictive learning using spatiotemporal LSTMS. In: Advances in Neural Information Processing Systems 30
  19. Ye X, Bilodeau G-A (2022) VPTR: efficient transformers for video prediction. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp 3492–3499. IEEE
  20. Gao Z, Tan C, Wu L, Li SZ (2022) SimVP: simpler yet better video prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3170–3180
  21. Wang Y, Zhang J, Zhu H, Long M, Wang J, Yu PS (2019) Memory in memory: a predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9154–9162
  22. Gao H, Xu H, Cai Q.-Z, Wang R, Yu F, Darrell T (2019) Disentangling propagation and generation for video prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9006–9015
  23. Mathai M (2024) Deep learning-based video prediction. Doctorate dissertation, Santa Clara University. https://scholarcommons.scu.edu/eng_phd_theses/53
  24. Chang Z, Zhang X, Wang S, Ma S, Gao W (2022) STRPM: a spatiotemporal residual predictive model for high-resolution video prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13946–13955
  25. Sun M, Wang W, Zhu X, Liu J (2023) Moso: decomposing motion, scene and object for video prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18727–18737
  26. Yu W, Lu,Y, Easterbrook S, Fidler S (2019) Crevnet: conditionally reversible video prediction. arXiv preprint arXiv:1910.11577
  27. Gupta A, Tian S, Zhang Y, Wu J, Martín-Martín R, Fei-Fei L (2022) Maskvit: masked visual pre-training for video prediction. arXiv preprint arXiv:2206.11894
  28. Akan AK, Erdem E, Erdem A, Güney F (2021) SLAMP: stochastic latent appearance and motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 14728–14737
  29. Denton E, Fergus R (2018) Stochastic video generation with a learned prior. In: International Conference on Machine Learning, pp 1174–1183. PMLR
  30. Babaeizadeh M, Finn C, Erhan D, Campbell R.H, Levine S (2017) Stochastic variational video prediction. arXiv preprint arXiv:1710.11252
  31. Zhang Z, Hu J, Cheng W, Paudel D, Yang J (2024) Extdm: distribution extrapolation diffusion model for video prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 19310–19320
  32. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, pp 234–241. Springer
  33. Zhou Z, Rahman Siddiquee M.M, Tajbakhsh N, Liang J (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, pp 3–11. Springer
  34. Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto Y, Han X, Chen Y-W, Wu J (2020). Unet 3+: A full-scale connected Unet for medical image segmentation. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1055–1059. IEEE
  35. Alom MZ, Yakopcic C, Hasan M, Taha TM, Asari VK (2019) Recurrent residual u-net for medical image segmentation. J Med Imaging 6(1):014006–014006
    Article Google Scholar
  36. Oktay O, Schlemper J, Folgoc L.L, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B et al (2018) Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999
  37. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
  38. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
  39. Bengio S, Vinyals O, Jaitly N, Shazeer N (2015) Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems 28
  40. Zhang J, Zheng Y, Qi D (2017) Deep spatio-temporal residual networks for citywide crowd flows prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 31
  41. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
    Article Google Scholar
  42. Unterthiner T, Van Steenkiste S, Kurach K, Marinier R, Michalski M, Gelly S (2018). Towards accurate generative models of video: a new metric & challenges. arXiv preprint arXiv:1812.01717
  43. Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using LSTMS. In: International Conference on Machine Learning, pp 843–852
  44. Zhang J, Zheng Y, Qi D (2017) Deep spatio-temporal residual networks for citywide crowd flows prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 31
  45. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol 3, pp 32–36. IEEE
  46. Ionescu C, Papava D, Olaru V, Sminchisescu C (2013) Human3. 6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339
  47. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237
    Article Google Scholar
  48. Villegas R, Yang J, Hong S, Lin X, Lee H (2017) Decomposing motion and content for natural video sequence prediction. arXiv preprint arXiv:1706.08033
  49. Tang S, Li C, Zhang P, Tang R (2023) Swinlstm: improving spatiotemporal prediction accuracy using swin transformer and LSTM. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 13470–13479
  50. Tan C, Gao Z, Wu L, Xu Y, Xia J, Li S, Li S.Z (2023) Temporal attention unit: towards efficient spatiotemporal predictive learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18770–18782
  51. Yao Z, Wang Y, Wu H, Wang J, Long M (2023) Modernn: harnessing spatiotemporal mode collapse in unsupervised predictive learning. IEEE Trans Pattern Anal Mach Intell 45(11):13281–13296
    Google Scholar
  52. Tang Y, Dong P, Tang Z, Chu X, Liang J (2024) VMRNN: integrating vision mamba and LSTM for efficient and accurate spatiotemporal forecasting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5663–5673
  53. Rusch TK, Chamberlain BP, Mahoney MW, Bronstein MM, Mishra S (2022) Gradient gating for deep multi-rate learning on graphs. arXiv preprint arXiv:2210.00513

Download references