Transformer Tracker Based on Multi-level Residual Perception Structure (original) (raw)
References
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-Convolutional Siamese Networks for Object Tracking, pp. 850–865. Springer (2016) Google Scholar
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High Performance Visual Tracking with Siamese Region Proposal Network, pp. 8971–8980 (2018) Google Scholar
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: ‘Siamrpn++: Evolution of Siamese Visual Tracking with Very Deep Networks, pp. 4282–4291 (2019) Google Scholar
Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: Siamfc++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines, pp. 12549–12556 (2020) Google Scholar
Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer Tracking, pp. 8126–8135 (2021) Google Scholar
Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning Spatio-temporal Transformer For Visual Tracking, pp. 10448–10457 (2021) Google Scholar
Chen, B., et al.: Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking, pp. 375–392. Springer (2022) Google Scholar
Ye, B., Chang, H., Ma, B., Shan, S., Chen, X.: Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework, pp. 341–357. Springer (2022) Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017) Google Scholar
Yuan, L., et al.: Tokens-to-Token Vit: Training Vision Transformers From Scratch on Imagenet, pp. 558–567 (2021) Google Scholar
Yue, X., et al.: Vision Transformer with Progressive Sampling, pp. 387–396 (2021) Google Scholar
Hatamizadeh, A., et al.: FasterViT: Fast Vision Transformers with Hierarchical Attention (2023) Google Scholar
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: ‘End-to-End Object Detection with Transformers, pp. 213–229. Springer (2020) Google Scholar
Lv, W., et al.: Detrs beat yolos on real-time object detection (2023) Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021) Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Lin, L., Fan, H., Zhang, Z., Xu, Y., Ling, H.: Swintrack: a simple and strong baseline for transformer tracking. Adv. Neural. Inf. Process. Syst. 35, 16743–16754 (2022) Google Scholar
Liu, Z., et al.: Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows, pp. 10012–10022 (2021) Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017) Article Google Scholar
Guo, D., Wang, J., Cui, Y., Wang, Z., Chen, S.: SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking, pp. 6269–6277 (2020) Google Scholar
Chen, Z., et al.: SiamBAN: target-aware tracking with siamese box adaptive network. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition, pp. 770–778 (2016) Google Scholar
Xing, D., Evangeliou, N., Tsoukalas, A., Tzes, A.: Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, pp. 2139–2148 (2022) Google Scholar
Law, H., and Deng, J.: ‘Cornernet: Detecting objects as paired keypoints’, (Eds.): ‘Book Cornernet: Detecting objects as paired keypoints’ (2018, edn.), pp. 734–750 Google Scholar
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal Loss for Dense Object Detection, pp. 2980–2988 (2017) Google Scholar
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression, pp. 658–666 (2019) Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A Large-Scale Hierarchical Image Database, pp. 248–255. IEEE (2009) Google Scholar
Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562–1577 (2019) Article Google Scholar
Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: Trackingnet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild, pp. 300–317 (2018) Google Scholar
Fan, H., et al.: Lasot: A High-Quality Benchmark for Large-Scale Single Object Tracking, pp. 5374–5383 (2019) Google Scholar
Lin, T.-Y., et al.: Microsoft Coco: Common Objects in Context, pp. 740–755. Springer (2014) Google Scholar
Mueller, M., Smith, N., Ghanem, B.: A Benchmark and Simulator for UAV Tracking, pp. 445–461. Springer (2016) Google Scholar
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: Accurate Tracking by Overlap Maximization, pp. 4660–4669 (2019) Google Scholar
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning Discriminative Model Prediction for Tracking, pp. 6182–6191 (2019) Google Scholar
Wang, N., Zhou, W., Wang, J., Li, H.: Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking, pp. 1571–1580 (2021) Google Scholar
Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: SiamAPN++: Siamese Attentional Aggregation Network for Real-Time UAV Tracking, pp. 3086–3092. IEEE (2021) Google Scholar
Zheng, G., Fu, C., Ye, J., Li, B., Lu, G., Pan, J.: Scale-aware siamese object tracking for vision-based UAM approaching. IEEE Trans. Indust. Inf. (2022) Google Scholar
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast Online Object Tracking and Segmentation: A Unifying Approach, pp. 1328–1338 (2019) Google Scholar