High-strength synergic-calibration attention system in YOLO for underwater object detection application (original) (raw)

References

  1. Yu, H., Li, X., Feng, Y., Han, S.: Multiple attentional path aggregation network for marine object detection. Appl. Intell. 53(2), 2434–2451 (2023)
    Article Google Scholar
  2. Xu, S., Zhang, M., Song, W., Mei, H., He, Q., Liotta, A.: A systematic review and analysis of deep learning-based underwater object detection. Neurocomputing (2023)
  3. Fu, C., Liu, R., Fan, X., Chen, P., Fu, H., Yuan, W., Zhu, M., Luo, Z.: Rethinking general underwater object detection: datasets, challenges, and solutions. Neurocomputing 517, 243–256 (2023)
    Article Google Scholar
  4. Lin, W.-H., Zhong, J.-X., Liu, S., Li, T., Li, G.: Roimix: proposal-fusion among multiple images for underwater object detection. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 2588–2592 (2020)
  5. Liu, H., Song, P., Ding, R.: Towards domain generalization in underwater object detection. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 1971–1975 (2020)
  6. Liu, C., Wang, Z., Wang, S., Tang, T., Tao, Y., Yang, C., Li, H., Liu, X., Fan, X.: A new dataset, poisson gan and aquanet for underwater object grabbing. IEEE Trans. Circuits Syst. Video Technol. 32(5), 2831–2844 (2021)
    Article Google Scholar
  7. Fan, B., Chen, W., Cong, Y., Tian, J.: Dual refinement underwater object detection network. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16. Springer, pp. 275–291 (2020)
  8. Xu, F., Wang, H., Peng, J., Fu, X.: Scale-aware feature pyramid architecture for marine object detection. Neural Comput. Appl. 33, 3637–3653 (2021)
    Article Google Scholar
  9. Xu, F., Wang, H., Sun, X., Fu, X.: Refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy. Neural Comput. Appl. 34(17), 14881–14894 (2022)
    Article Google Scholar
  10. Chen, L., Liu, Z., Tong, L., Jiang, Z., Wang, S., Dong, J., Zhou, H.: Underwater object detection using invert multi-class adaboost with deep learning. In: 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1–8 (2020)
  11. Chen, L., Zhou, F., Wang, S., Dong, J., Li, N., Ma, H., Wang, X., Zhou, H.: Swipenet: object detection in noisy underwater scenes. Pattern Recognit. 132, 108926 (2022)
    Article Google Scholar
  12. Song, P., Li, P., Dai, L., Wang, T., Chen, Z.: Boosting r-cnn: reweighting r-cnn samples by rpn’s error for underwater object detection. Neurocomputing 530, 150–164 (2023)
    Article Google Scholar
  13. Liu, B., Sun, J., Zhu, B., Li, T., Sun, F.: Madformer: multi-attention-driven image super-resolution method based on transformer. Multim. Syst. 30(2), 78 (2024)
    Article Google Scholar
  14. Xu, S., Wang, J., He, N., Hu, X., Sun, F.: Underwater image enhancement method based on a cross attention mechanism. Multim. Syst. 30(1), 26 (2024)
    Article Google Scholar
  15. Wei, X., Yu, L., Tian, S., Feng, P., Ning, X.: Underwater target detection with an attention mechanism and improved scale. Multim. Tools Appl. 80(25), 33747–33761 (2021)
    Article Google Scholar
  16. Liang, X., Song, P.: Excavating roi attention for underwater object detection. In: 2022 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 2651–2655 (2022)
  17. Sun, Y., Wang, X., Zheng, Y., Yao, L., Qi, S., Tang, L., Yi, H., Dong, K.: Underwater object detection with swin transformer. In: 2022 4th International Conference on Data Intelligence and Security (ICDIS). IEEE, pp. 422–427 (2022)
  18. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
  19. Gao, Z., Xie, J., Wang, Q., Li, P.: Global second-order pooling convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3024–3033 (2019)
  20. Yang, Z., Zhu, L., Wu, Y., Yang, Y.: Gated channel transformation for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11794–11803 (2020)
  21. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
  22. Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
  23. Li, G., Fang, Q., Zha, L., Gao, X., Zheng, N.: Ham: hybrid attention module in deep convolutional neural networks for image classification. Pattern Recognit. 129, 108785 (2022)
    Article Google Scholar
  24. Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 510–519 (2019)
  25. Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., Sun, Y., He, T., Mueller, J., Manmatha, R., et al.: Resnest: split-attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2736–2746 (2022)
  26. Zhang, H., Zu, K., Lu, J., Zou, Y., Meng, D.: Epsanet: an efficient pyramid squeeze attention block on convolutional neural network. In: Proceedings of the Asian Conference on Computer Vision, pp. 1161–1177 (2022)
  27. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
  28. Li, X., Hu, X., Yang, J.: Spatial group-wise enhance: Improving semantic feature learning in convolutional networks. arXiv:1905.09646 (2019)
  29. Liu, J.-J., Hou, Q., Cheng, M.-M., Wang, C., Feng, J.: Improving convolutional networks with self-calibrated convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10096–10105 (2020)
  30. Guo, M.-H., Lu, C.-Z., Liu, Z.-N., Cheng, M.-M., Hu, S.-M.: Visual attention network. Comput. Vis. Media 9(4), 733–752 (2023)
    Article Google Scholar
  31. Wang, Y., Li, Y., Wang, G., Liu, X.: Multi-scale attention network for single image super-resolution. arXiv:2209.14145 (2022)
  32. Rao, Y., Zhao, W., Tang, Y., Zhou, J., Lim, S.N., Lu, J.: Hornet: efficient high-order spatial interactions with recursive gated convolutions. Adv. Neural Inf. Process. Syst. 35, 10353–10366 (2022)
    Google Scholar
  33. Guo, M.-H., Lu, C.-Z., Hou, Q., Liu, Z., Cheng, M.-M., Hu, S.-M.: Segnext: rethinking convolutional attention design for semantic segmentation. Adv. Neural Inf. Process. Syst. 35, 1140–1156 (2022)
    Google Scholar
  34. Lee, H., Kim, H.-E., Nam, H.: Srm: a style-based recalibration module for convolutional neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1854–1862 (2019)
  35. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)
  36. Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 783–792 (2021)
  37. Park, J., Woo, S., Lee, J.-Y., Kweon, I.S.: Bam: Bottleneck attention module. arXiv:1807.06514 (2018)
  38. Zhang, Q.-L., Yang, Y.-B.: Sa-net: shuffle attention for deep convolutional neural networks. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 2235–2239 (2021)
  39. Misra, D., Nalamada, T., Arasanipalai, A.U., Hou, Q.: Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3139–3148 (2021)
  40. Chen, X., Yuan, M., Yang, Q., Yao, H., Wang, H.: Underwater-ycc: underwater target detection optimization algorithm based on yolov7. J. Mar. Sci. Eng. 11(5), 995 (2023)
    Article Google Scholar
  41. Yi, W., Wang, B.: Research on underwater small target detection algorithm based on improved yolov7. IEEE Access (2023)
  42. Liu, K., Peng, L., Tang, S.: Underwater object detection using tc-yolo with attention mechanisms. Sensors 23(5), 2567 (2023)
    Article Google Scholar
  43. Fan, Q., Huang, H., Guan, J., He, R.: Rethinking local perception in lightweight vision transformer. arXiv:2303.17803 (2023)
  44. Cai, H., Li, J., Hu, M., Gan, C., Han, S.: Efficientvit: lightweight multi-scale attention for high-resolution dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17302–17313 (2023)
  45. Ouyang, D., He, S., Zhang, G., Luo, M., Guo, H., Zhan, J., Huang, Z.: Efficient multi-scale attention module with cross-spatial learning. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 1–5 (2023)
  46. Li, Z., Sun, Y., Zhang, L., Tang, J.: Ctnet: context-based tandem network for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 44(12), 9904–9917 (2021)
    Article Google Scholar
  47. Li, Z., Tang, J., Mei, T.: Deep collaborative embedding for social image understanding. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2070–2083 (2018)
    Article Google Scholar
  48. Tang, W., Li, L., Liu, X., Jin, L., Tang, J., Li, Z.: Context disentangling and prototype inheriting for robust visual grounding. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
  49. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)
  50. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)
  51. Yolov5. https://github.com/ultralytics/yolov5 (2021)
  52. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv:2107.08430 (2021)
  53. Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., et al.: Yolov6: a single-stage object detection framework for industrial applications. arXiv:2209.02976 (2022)
  54. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
  55. Yolov8. https://github.com/ultralytics/ultralytics (2023)
  56. Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., Ding, G.: Yolov10: real-time end-to-end object detection. arXiv:2405.14458 (2024)
  57. China computer federation-china multimedia conference-2019. http://mm.ccf.org.cn/chinamm/2019
  58. Brackish dataset. https://www.kaggle.com/datasets/aalborguniversity/brackish-dataset
  59. Underwater robot picking contest. http://www.cnurpc.org/
  60. Liu, C., Li, H., Wang, S., Zhu, M., Wang, D., Fan, X., Wang, Z.: A dataset and benchmark of underwater object detection for robot picking. In: 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, pp. 1–6 (2021)
  61. Selvaraju, R.R., Cogswell, M., Das, R.A. Vedantam, Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)

Download references