Cross-modal local fine-grained feature localization and alignment for text-to-image person re-identification

References

Aggarwal, S., Radhakrishnan, V.B., Chakraborty, A.: Text-based person search via attribute-aided matching. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2617–2625 (2020)
Bai, Y., Cao, M., Gao, D., et al.: Rasa: Relation and sensitivity aware representation learning for text-based person search. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 555–563 (2023)
Chen, Y., Huang, R., Chang, H., et al.: Cross-modal knowledge adaptation for language-based person search. IEEE Trans. Image Process. 30, 4057–4069 (2021)
Article Google Scholar
Chen, Y., Zhang, G., Lu, Y., et al.: Tipcb: A simple but effective part-based convolutional baseline for text-based person search. Neurocomputing 494, 171–181 (2022)
Article Google Scholar
Ding, Z., Ding, C., Shao, Z., et al.: Semantically self-aligned network for text-to-image part-aware person re-identification. arXiv preprint arXiv:2107.12666 (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Farooq, A., Awais, M., Kittler, J., et al.: Axm-net: Implicit cross-modal feature alignment for person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, 4477–4485 (2022)
He, S., Luo, H., Jiang, W., et al.: Vgsg: Vision-guided semantic-group network for text-based person search. IEEE Trans. Image Process. 33, 163–176 (2024)
Article Google Scholar
Ji, Z., Hu, J., Liu, D., et al.: Asymmetric cross-scale alignment for text-based person search. IEEE Trans. Multimedia 25, 7699–7709 (2022)
Article Google Scholar
Jiang, D., Ye, M.: Cross-modal implicit relation reasoning and aligning for text-to-image person retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2787–2797 (2023)
Jing, Y., Si, C., Wang, J., et al.: Pose-guided multi-granularity attention network for text-based person search. In: Proceedings of the AAAI Conference on Artificial Intelligence, 11189–11196 (2020)
Kenton, J.D.M.W.C., Toutanova, L.K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, 2 (2019)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations (2015)
Lee, K.H., Chen, X., Hua, G., et al.: Stacked cross attention for image-text matching. In: Proceedings of the European conference on computer vision (ECCV), 201–216 (2018)
Li, J., Li, D., Xiong, C., et al.: Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International conference on machine learning, 12888–12900 (2022)
Li, S., Xiao, T., Li, H., et al.: Person search with natural language description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 1970–1979 (2017)
Li, S., Xu, X., Yang, Y., et al.: Dcel: Deep cross-modal evidential learning for text-based person retrieval. In: Proceedings of the 31st ACM International Conference on Multimedia, 6292–6300 (2023)
Lin, D., Peng, Y., Meng, J., et al.: Cross-modal adaptive dual association for text-to-image person retrieval. IEEE Trans. Multimedia 26, 6609–6620 (2024)
Article Google Scholar
Liu, J., Zha, Z.J., Hong, R., et al.: Deep adversarial graph attention convolution network for text-based person search. In: Proceedings of the 27th ACM international conference on multimedia, 665–673 (2019)
Luo, Y., Yang, Y.: Large language model and domain-specific model collaboration for smart education. Frontiers of Information Technology & Electronic Engineering 25(3), 333–341 (2024)
Article Google Scholar
Luo, Y., Liu, P., Guan, T., et al.: Significance-aware information bottleneck for domain adaptive semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, 6778–6787 (2019a)
Luo, Y., Zheng, L., Guan, T., et al.: Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2507–2516 (2019b)
Luo, Y., Liu, P., Zheng, L., et al.: Category-level adversarial adaptation for semantic segmentation using purified features. IEEE Trans. Pattern Anal. Mach. Intell. 44(8), 3940–3956 (2021)
Google Scholar
Luo, Y., Liu, P., Yang, Y.: Kill two birds with one stone: Domain generalization for semantic segmentation via network pruning. Int. J. Comput. Vision 133(1), 335–352 (2025)
Article Google Scholar
Ma, Y., Sun, X., Ji, J., et al.: Beat: Bi-directional one-to-many embedding alignment for text-based person retrieval. In: Proceedings of the 31st ACM International Conference on Multimedia, 4157–4168 (2023)
Niu, K., Huang, Y., Ouyang, W., et al.: Improving description-based person re-identification by multi-granularity image-text alignments. IEEE Trans. Image Process. 29, 5542–5556 (2020)
Article Google Scholar
Qin, Y., Chen, Y.,Peng, D., et al.: Noisy-correspondence learning for text-to-image person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 27197–27206 (2024)
Sarafianos, N., Xu, X., Kakadiaris, I.A.: Adversarial representation learning for text-to-image matching. In: Proceedings of the IEEE/CVF international conference on computer vision, 5814–5824 (2019)
Shao, Z., Zhang, X., Fang, M., et al.: Learning granularity-unified representations for text-to-image person re-identification. In: Proceedings of the 30th acm international conference on multimedia, 5566–5574 (2022)
Shao, Z., Zhang, X., Ding, C., et al.: Unified pre-training with pseudo texts for text-to-image person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 11174–11184 (2023)
Shen, F., Shu, X., Du, X., et al.: Pedestrian-specific bipartite-aware similarity learning for text-based person retrieval. In: Proceedings of the 31st ACM International Conference on Multimedia, 8922–8931 (2023)
Suo, W., Sun, M., Niu, K., et al.: A simple and robust correlation filtering method for text-based person search. In: European conference on computer vision, Springer, 726–742 (2022)
Tan, W., Ding, C., Jiang, J., et al.: Harnessing the power of mllms for transferable text-to-image person reid. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17127–17137 (2024)
Wang, C., Luo, Z., Lin, Y., et al.: Text-based person search via multi-granularity embedding learning. In: IJCAI, 1068–1074 (2021)
Wang, Z., Fang, Z., Wang, J., et al.: Vitaa: Visual-textual attributes alignment in person search by natural language. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, Springer, 402–420 (2020)
Wang, Z., Zhu, A., Xue, J., et al.: Caibc: Capturing all-round information beyond color for text-based person retrieval. In: Proceedings of the 30th ACM international conference on multimedia, 5314–5322 (2022a)
Wang, Z., Zhu, A., Xue, J., et al.: Look before you leap: Improving text-based person retrieval by learning a consistent cross-modal common manifold. In: Proceedings of the 30th ACM international conference on multimedia, 1984–1992 (2022b)
Wu, Y., Yan, Z., Han, X., et al.: Lapscore: language-guided person search via color reasoning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1624–1633 (2021)
Yan, S., Dong, N., Liu, J., et al.: Learning comprehensive representations with richer self for text-to-image person re-identification. In: Proceedings of the 31st ACM international conference on multimedia, 6202–6211 (2023a)
Yan, S., Dong, N., Zhang, L., et al.: Clip-driven fine-grained text-image person re-identification. IEEE Trans. Image Process. 32, 6032–6046 (2023)
Article Google Scholar
Yang, S., Zhou, Y., Zheng, Z., et al.: Towards unified text-based person retrieval: A large-scale multi-attribute and language search benchmark. In: Proceedings of the 31st ACM International Conference on Multimedia, 4492–4501 (2023)
Zhang, Y., Lu, H.: Deep cross-modal projection learning for image-text matching. In: Proceedings of the European conference on computer vision (ECCV), 686–701 (2018)
Zheng, K., Liu, W., Liu, J., et al.: Hierarchical gumbel attention network for text-based person search. In: Proceedings of the 28th ACM international conference on multimedia, 3441–3449 (2020a)
Zheng, Z., Zheng, L., Garrett, M., et al.: Dual-path convolutional image-text embeddings with instance loss. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16(2):1–23 (2020b)
Zhu, A., Wang, Z., Li, Y., et al.: Dssl: Deep surroundings-person separation learning for text-based person retrieval. In: Proceedings of the 29th ACM international conference on multimedia, 209–217 (2021)
Zuo, J., Zhou, H., Nie, Y., et al.: Ufinebench: Towards text-based person retrieval with ultra-fine granularity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 22010–22019 (2024)

Download references

Cross-modal local fine-grained feature localization and alignment for text-to-image person re-identification (original) (raw)

References