Action Recognition Using Multiple Pooling Strategies of CNN Features (original) (raw)

References

  1. Yamato J, Ohya J, Ishii K (1992) Recognizing human actions in time-sequential images using hidden Markov model. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 379–385
  2. Laptev I (2005) On space-time interest points. Int J Comput Vis 64:107–123
    Article Google Scholar
  3. Niebles J, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial temporal words. Int J Comput Vis 79:299–318
    Article Google Scholar
  4. Efros A, Berg A, Mori G, Malik J (2003) Recognizing action at a distance. In: Proceedings of IEEE conference on computer vision, pp 726–733
  5. Wang H, Klaser A, Schmid C (2013) Dense trajectories and motion boundary descriptors for action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 60–79
  6. Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 492–4976
  7. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
    Article MATH Google Scholar
  8. Kliper-Gross O, Gurovich Y, Hassner T (2012) Motion interchange patterns for action recognition in unconstrained videos. In: European conference on computer vision, pp 256–269
  9. Tao D, Guo Y, Li Y, Gao X (2018) Tensor rank preserving discriminant analysis for facial recognition. IEEE Trans Image Process 27(1):325–334
    Article MathSciNet MATH Google Scholar
  10. Tao D, Cheng J, Song M, Lin X (2016) Manifold ranking-based matrix factorization for saliency detection. IEEE Trans Neural Netw Learn Syst 27(6):1122–1134
    Article MathSciNet Google Scholar
  11. Wang H, Klaser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3169–3176
  12. Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3551–3558
  13. Jain M, Jegou H, Bouthemy P (2013) Better exploiting motion for better action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2555–2562
  14. Ramana Murthy OV, Goecke R (2013) Ordered trajectories for large scale human action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 412–419
  15. Jiang YG, Dai Q, Liu W, Xue XY, Ngo CW (2015) Human action recognition in unconstrained videos by explicit motion modeling. IEEE Trans Image Process 24(11):3781–3795
    Article MathSciNet MATH Google Scholar
  16. Seo JJ, Son J, Kim H, Neve WD, Ro YM (2015) Efficient and effective human action recognition in video through motion boundary description with a compact set of trajectories. In: Proceedings of IEEE conference on automatic face and gesture recognition, pp 1–6
  17. Yang X, Liu W, Tao D, Cheng J (2017) Canonical correlation analysis networks for two-view image recognition. Inf Sci 385:338–352
    Article Google Scholar
  18. Yu Z, Yu J, Fan J, Tao D (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of IEEE international conference on computer vision, pp 1839–1848
  19. Hong C, Jun Y, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
    Article MathSciNet MATH Google Scholar
  20. Jun Y, Hong C, Rui Y, Tao D (2018) Multitask autoencoder model for recovering human poses. IEEE Trans Ind Electron 65(6):5060–5068
    Article Google Scholar
  21. Ji S, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
    Article Google Scholar
  22. Gkioxari G, Girshick R, Malik J (2015) Contextual action recognition with R*CNN. In: Vision, pp 1080–1088
  23. Simonyan K, Zisserman A (2013) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1(4):568–576
    Google Scholar
  24. Veeriah V, Zhuang N, Qi GJ (2015) Differential recurrent neural networks for action recognition. In: Proceedings of IEEE conference on computer vision, pp 4041–4049
  25. Sun L, Jia K, Yeung D-Y, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of IEEE conference on computer vision, pp 4597–4605
  26. Wang LM, Qiao Y, Tang XO (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 4305–4314
  27. Jegou H, Perronnin F, Douze M, Sanchez J (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716
    Article Google Scholar
  28. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2169–2178
  29. Dollar P, Rabaud V, Cottrell G, Belongie S (2006) Behavior recognition via sparse spatio-temporal features. In: Joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, pp 65–72
  30. Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1470–1477
  31. Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105:222–245
    Article MathSciNet MATH Google Scholar
  32. Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3304–3311
  33. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1725–1732
  34. Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision, pp 392–407
  35. He KM, Zhang XY, Ren SQ, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 1904–1916
  36. Yoo D, Park S, Lee JY, Kweon IS (2015) Multi-scale pyramid pooling for deep convolutional representation. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 71–80
  37. Ryoo MS, Rothrock B, Matthies L (2015) Pooled motion features for first-person videos. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 896–904
  38. Choi J, Jeon WJ, Lee SC (1992) Spatio-temporal pyramid matching for sports videos. In: ACM international conference on multimedia information retrieval, pp 379–385
  39. Zhang XJ, Zhang H, Cao XC (2012) Action recognition based on spatial-temporal pyramid sparse coding. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1455–1458
  40. Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1996–2003
  41. Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24:971–981
    Article Google Scholar
  42. Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild. arXiv:1212.0402
  43. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. CoRR arXiv:1408.5093
  44. Jones S, Ling S (2014) A multigraph representation for improved unsupervised/semi-supervised learning of human actions. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 23–28
  45. Belkin M, Niyogi P (2002) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15:1373–1396
    Article MATH Google Scholar
  46. Ciptadi A, Goodwin MS, Rehg JM (2014) Movement pattern histogram for action recognition and retrieval. In: European conference on computer vision, pp 695–710

Download references