Action Recognition Using Multiple Pooling Strategies of CNN Features (original) (raw)
References
Yamato J, Ohya J, Ishii K (1992) Recognizing human actions in time-sequential images using hidden Markov model. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 379–385
Laptev I (2005) On space-time interest points. Int J Comput Vis 64:107–123 Article Google Scholar
Niebles J, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial temporal words. Int J Comput Vis 79:299–318 Article Google Scholar
Efros A, Berg A, Mori G, Malik J (2003) Recognizing action at a distance. In: Proceedings of IEEE conference on computer vision, pp 726–733
Wang H, Klaser A, Schmid C (2013) Dense trajectories and motion boundary descriptors for action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 60–79
Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 492–4976
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987 ArticleMATH Google Scholar
Kliper-Gross O, Gurovich Y, Hassner T (2012) Motion interchange patterns for action recognition in unconstrained videos. In: European conference on computer vision, pp 256–269
Tao D, Guo Y, Li Y, Gao X (2018) Tensor rank preserving discriminant analysis for facial recognition. IEEE Trans Image Process 27(1):325–334 ArticleMathSciNetMATH Google Scholar
Tao D, Cheng J, Song M, Lin X (2016) Manifold ranking-based matrix factorization for saliency detection. IEEE Trans Neural Netw Learn Syst 27(6):1122–1134 ArticleMathSciNet Google Scholar
Wang H, Klaser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3169–3176
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3551–3558
Jain M, Jegou H, Bouthemy P (2013) Better exploiting motion for better action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2555–2562
Ramana Murthy OV, Goecke R (2013) Ordered trajectories for large scale human action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 412–419
Jiang YG, Dai Q, Liu W, Xue XY, Ngo CW (2015) Human action recognition in unconstrained videos by explicit motion modeling. IEEE Trans Image Process 24(11):3781–3795 ArticleMathSciNetMATH Google Scholar
Seo JJ, Son J, Kim H, Neve WD, Ro YM (2015) Efficient and effective human action recognition in video through motion boundary description with a compact set of trajectories. In: Proceedings of IEEE conference on automatic face and gesture recognition, pp 1–6
Yang X, Liu W, Tao D, Cheng J (2017) Canonical correlation analysis networks for two-view image recognition. Inf Sci 385:338–352 Article Google Scholar
Yu Z, Yu J, Fan J, Tao D (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of IEEE international conference on computer vision, pp 1839–1848
Hong C, Jun Y, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670 ArticleMathSciNetMATH Google Scholar
Jun Y, Hong C, Rui Y, Tao D (2018) Multitask autoencoder model for recovering human poses. IEEE Trans Ind Electron 65(6):5060–5068 Article Google Scholar
Ji S, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231 Article Google Scholar
Gkioxari G, Girshick R, Malik J (2015) Contextual action recognition with R*CNN. In: Vision, pp 1080–1088
Simonyan K, Zisserman A (2013) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1(4):568–576 Google Scholar
Veeriah V, Zhuang N, Qi GJ (2015) Differential recurrent neural networks for action recognition. In: Proceedings of IEEE conference on computer vision, pp 4041–4049
Sun L, Jia K, Yeung D-Y, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of IEEE conference on computer vision, pp 4597–4605
Wang LM, Qiao Y, Tang XO (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 4305–4314
Jegou H, Perronnin F, Douze M, Sanchez J (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716 Article Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2169–2178
Dollar P, Rabaud V, Cottrell G, Belongie S (2006) Behavior recognition via sparse spatio-temporal features. In: Joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, pp 65–72
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1470–1477
Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105:222–245 ArticleMathSciNetMATH Google Scholar
Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3304–3311
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1725–1732
Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision, pp 392–407
He KM, Zhang XY, Ren SQ, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 1904–1916
Yoo D, Park S, Lee JY, Kweon IS (2015) Multi-scale pyramid pooling for deep convolutional representation. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 71–80
Ryoo MS, Rothrock B, Matthies L (2015) Pooled motion features for first-person videos. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 896–904
Choi J, Jeon WJ, Lee SC (1992) Spatio-temporal pyramid matching for sports videos. In: ACM international conference on multimedia information retrieval, pp 379–385
Zhang XJ, Zhang H, Cao XC (2012) Action recognition based on spatial-temporal pyramid sparse coding. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1455–1458
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1996–2003
Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24:971–981 Article Google Scholar
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild. arXiv:1212.0402
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. CoRR arXiv:1408.5093
Jones S, Ling S (2014) A multigraph representation for improved unsupervised/semi-supervised learning of human actions. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 23–28
Belkin M, Niyogi P (2002) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15:1373–1396 ArticleMATH Google Scholar
Ciptadi A, Goodwin MS, Rehg JM (2014) Movement pattern histogram for action recognition and retrieval. In: European conference on computer vision, pp 695–710