Infrared small target detection using multi-scale attention and dilated separable convolution (original) (raw)
Abstract
Detecting small infrared targets is a critical challenge in computer vision, focusing on identifying and localizing minimal-pixel-sized targets within infrared imagery. This task faces significant challenges due to the extremely small size of the targets, variability in target dimensions and shapes across diverse scenes, intricate background complexities, and instances of target occlusion. In this study, we introduce an enhanced U-Net model featuring three novel modules: the Residual Coordinate Attention Block (RCA-Block) integrates coordinate attention and residual structures to enhance feature representation; the Dynamic Context-aware Multi-scale Fusion Module (DCMFM) dynamically fuses multi-scale features based on target characteristics; and the Multi-dilation Depthwise Separable Convolutional Module (MDSCM) employs multi-dilation depthwise separable convolutions to capture spatial features across varying receptive fields. Experiments demonstrate that the proposed model synergizes multi-scale feature fusion and spatial information enhancement, achieving superior accuracy in small target detection, robust noise resistance.
Access this article
Subscribe and save
- Starting from 10 chapters or articles per month
- Access and download chapters and articles from more than 300k books and 2,500 journals
- Cancel anytime View plans
Buy Now
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Instant access to the full article PDF.
Similar content being viewed by others
References
- Zhang M, Zhang R, Yang Y, Bai H, Zhang J, Guo J (2022) ISNet: Shape matters for infrared small target detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 877–886
- Li B, Xiao C, Wang L, Wang Y, Lin Z, Li M, An W, Guo Y (2023) Dense nested attention network for infrared small target detection. IEEE Trans Image Process 32:1745–1758
Article Google Scholar - Dai Y, Wu Y, Zhou F, Barnard K (2021) Asymmetric contextual modulation for infrared small target detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 950–959
- Fan X, Ding W, Li X, Li T, Hu B, Shi Y (2024) An improved U-Net infrared small target detection algorithm based on multi-scale feature decomposition and fusion and attention mechanism. Sensors 24(13):4227
Article Google Scholar - Puriyanto RD, Yunandha ID, Maghfiroh H, Ma’arif A, Furizal Suwarno I (20205) Ball detection system for a soccer on wheeled robot using the MobileNetV2 SSD method. Emerg Sci J 9(5):2782–2796. https://doi.org/10.28991/ESJ-2025-09-05-028
- Fahad N, Hossen MJ, Sayeed MS (2025) Efficient object detection with an optimized YOLOv8x model. HighTech and Innovation Journal 6(3):881–902. https://doi.org/10.28991/HIJ-2025-06-03-09
Article Google Scholar - Akkajit P, Sukkuea A (2025) Deep learning-based behavior recognition for group-housed pigs: Advancing livestock management with segmentation techniques. Emerg Sci J 9(5):2510–2525. https://doi.org/10.28991/ESJ-2025-09-05-013
Article Google Scholar - Ji C-L, Yu T, Gao P, Wang F, Yuan R-Y (2024) YOLO-TLA: an efficient and lightweight small object detection model based on YOLOv5. J Real-Time Image Proc 21(4):141
Article Google Scholar - Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934
- Wu X, Hong D, Chanussot J (2022) UIU-Net: U-Net in U-Net for infrared small object detection. IEEE Trans Image Process 32:364–376
Article Google Scholar - Iqbal I, Odesanmi GA, Wang J, Liu L (2021) Comparative investigation of learning algorithms for image classification with small dataset. Appl Artif Intell 35(10):697–716
Article Google Scholar - Iqbal I, Mustafa G, Ma J (2020) Deep learning-based morphological classification of human sperm heads. Diagnostics 10(5):325
Article Google Scholar - Zeng M, Li J, Peng Z (2006) The design of top-hat morphological filter and application to infrared target detection. Infrared Phys Technol 48(1):67–76
Article Google Scholar - Chen CP, Li H, Wei Y, Xia T, Tang YY (2013) A local contrast method for small infrared target detection. IEEE Trans Geosci Remote Sens 52(1):574–581
Article Google Scholar - Gao C, Meng D, Yang Y, Wang Y, Zhou X, Hauptmann AG (2013) Infrared patch-image model for small target detection in a single image. IEEE Trans Image Process 22(12):4996–5009
Article MathSciNet Google Scholar - Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241. Springer
- Xu S, Zheng S, Xu W, Xu R, Wang C, Zhang J, Teng X, Li A, Guo L (2024) HCF-Net: Hierarchical context fusion network for infrared small object detection. In: 2024 IEEE International conference on multimedia and expo (ICME), pp 1–6. IEEE
- Gao P, Li S-M, Wang F, Fujita H, Aljuaid H, Yuan R-Y (2025) Learning multi-level graph attentional representation for thermal infrared object tracking. Eng Appl Artif Intell 155:110957
Article Google Scholar - Zhang M, Wang Y, Guo J, Li Y, Gao X, Zhang J (2024) IRSAM: Advancing segment anything model for infrared small target detection. In: European conference on computer vision, pp 233–249. Springer
- Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851
Google Scholar - Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B (2022) High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10684–10695
- Gu A, Dao T (2023) Mamba: Linear-time sequence modeling with selective state spaces. arXiv:2312.00752
- Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13713–13722
- Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
- Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proceedings of the european conference on computer vision (ECCV), pp 3–19
- Zha H, Liu R, Yang X, Zhou D, Zhang Q, Wei X (2021) ASFNet: Adaptive multiscale segmentation fusion network for real-time semantic segmentation. Comput Animation Virtual Worlds 32(3–4):2022
Article Google Scholar - Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
- Qin Y, Bruzzone L, Gao C, Li B (2019) Infrared small target detection based on facet kernel and random walker. IEEE Trans Geosci Remote Sens 57(9):7104–7118
Article Google Scholar - Dai Y, Wu Y (2017) Reweighted infrared patch-tensor model with both nonlocal and local priors for single-frame small target detection. IEEE J Select Topics Appl Earth Observ Remote Sens 10(8):3752–3767
Article Google Scholar
Funding
This work is supported by the National Natural Science Foundation of China under Grant No. 62476126.
Author information
Authors and Affiliations
- College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, Jiangsu, China
Wenjuan Tang & Qun Dai - HIWING Technology Academy, China Aerospace Science and Industry Corporation Limited (CASIC), Beijing, 100074, China
Wenjuan Tang - College of Computer Science and Technology, Jilin University, Qianjin Street, Changchun, 130012, Jilin, China
Yuning Zhu - School of Artificial Intelligence, Jilin University, Qianjin Street, Changchun, 130012, Jilin, China
Rui Ma
Authors
- Wenjuan Tang
- Qun Dai
- Yuning Zhu
- Rui Ma
Corresponding author
Correspondence toQun Dai.
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, W., Dai, Q., Zhu, Y. et al. Infrared small target detection using multi-scale attention and dilated separable convolution.Appl Intell 56, 137 (2026). https://doi.org/10.1007/s10489-026-07181-6
- Received: 10 June 2025
- Accepted: 25 February 2026
- Published: 09 March 2026
- Version of record: 09 March 2026
- DOI: https://doi.org/10.1007/s10489-026-07181-6