Textual–visual interaction for enhanced single image deraining using adapter-tuned VLMs (original) (raw)

References

  1. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021)
    Google Scholar
  2. Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.D.: Eapt: efficient attention pyramid transformer for image processing. IEEE Trans. Multimedia 25, 50–61 (2021)
    Article Google Scholar
  3. Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)
  4. Lee, J., Kang, H.: Pipformers: Patch based inpainting with vision transformers for generalize paintings. Comput. Anim. Virt. Worlds 35(3), 2270 (2024)
    Article Google Scholar
  5. Wang, S., Li, L., Wang, J., Peng, T., Li, Z.: Highlight mask-guided adaptive residual network for single image highlight detection and removal. Comput. Anim. Virt. Worlds 35(3), 2271 (2024)
    Article Google Scholar
  6. Zhang, F., You, S., Li, Y., Fu, Y.: Learning rain location prior for nighttime deraining. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13148–13157 (2023)
  7. Hu, X., Fu, C.-W., Zhu, L., Heng, P.-A.: Depth-attentional features for single-image rain removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8022–8031 (2019)
  8. Yi, Q., Li, J., Dai, Q., Fang, F., Zhang, G., Zeng, T.: Structure-preserving deraining with residue channel prior guidance. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4238–4247 (2021)
  9. Jiao, C., Meng, F., Li, T., Cao, Y.: R-prenet: Deraining network based on image background prior. Appl. Sci. 13(21), 11970 (2023)
    Article Google Scholar
  10. Yan, Q., Jiang, A., Chen, K., Peng, L., Yi, Q., Zhang, C.: Textual prompt guided image restoration. arXiv preprint arXiv:2312.06162 (2023)
  11. Wang, M., Meng, M., Liu, J., Wu, J.: Learning adequate alignment and interaction for cross-modal retrieval. Virt. Real. Intell. Hardw. 5(6), 509–522 (2023)
    Google Scholar
  12. Bai, Y., Wang, C., Xie, S., Dong, C., Yuan, C., Wang, Z.: Textir: A simple framework for text-based editable image restoration. arXiv preprint arXiv:2302.14736 (2023)
  13. Fu, X., Xiao, J., Zhu, Y., Liu, A., Wu, F., Zha, Z.-J.: Continual image deraining with hypergraph convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 45(8), 9534–9551 (2023)
    Article Google Scholar
  14. Fu, X., Huang, J., Zeng, D., Huang, Y., Ding, X., Paisley, J.: Removing rain from single images via a deep detail network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3855–3863 (2017)
  15. Tao, W., Yan, X., Wang, Y., Wei, M.: Mffdnet: Single image deraining via dual-channel mixed feature fusion. IEEE Trans. Instrum. Meas. 73, 1–13 (2024)
    Google Scholar
  16. Lin, K., Zhang, S., Luo, Y., Ling, J.: Unrolling a rain-guided detail recovery network for singleimage deraining. Virt. Real. Intell. Hardw. 5(1), 11–23 (2023)
    Google Scholar
  17. Gao, H., Yang, J., Zhang, Y., Wang, N., Yang, J., Dang, D.: A novel single-stage network for accurate image restoration. Vis. Comput. 40(10), 7385–7398 (2024)
    Article Google Scholar
  18. Zhu, J., Zhang, Q., Fei, L., Cai, R., Xie, Y., Sheng, B., Yang, X.: Fffn: Frame-by-frame feedback fusion network for video super-resolution. IEEE Trans. Multimedia 25, 6821–6835 (2022)
    Article Google Scholar
  19. Zhou, Y., Chen, Z., Li, P., Song, H., Chen, C.P., Sheng, B.: Fsad-net: feedback spatial attention dehazing network. IEEE Trans. Neural Netw. Learn. Syst. 34(10), 7719–7733 (2022)
    Article Google Scholar
  20. Chen, X., Li, H., Li, M., Pan, J.: Learning a sparse transformer network for effective image deraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5896–5905 (2023)
  21. Xiao, J., Fu, X., Liu, A., Wu, F., Zha, Z.-J.: Image de-raining transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(11), 12978–12995 (2022)
    Article Google Scholar
  22. Wang, Z., Cun, X., Bao, J., Zhou, W., Liu, J., Li, H.: Uformer: A general u-shaped transformer for image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17683–17693 (2022)
  23. Shao, M., Bao, Z., Liu, W., Qiao, Y., Wan, Y.: Frequency domain-enhanced transformer for single image deraining. Vis. Comput. 40, 1–16 (2024)
    Article Google Scholar
  24. Dong, H., Song, T., Qi, X., Jin, J., Jin, G., Fan, L.: Exploring high-quality image deraining transformer via effective large kernel attention. Vis. Comput. 41, 1–17 (2024)
    Google Scholar
  25. Zhao, C., Cai, W., Hu, C., Yuan, Z.: Cycle contrastive adversarial learning with structural consistency for unsupervised high-quality image deraining transformer. Neural Netw. 178, 106428 (2024)
    Article Google Scholar
  26. Zhang, K., Luo, W., Yu, Y., Ren, W., Zhao, F., Li, C., Ma, L., Liu, W., Li, H.: Beyond monocular deraining: Parallel stereo deraining network via semantic prior. Int. J. Comput. Vision 130(7), 1754–1769 (2022)
    Article Google Scholar
  27. Jiang, K., Wang, Z., Chen, C., Wang, Z., Cui, L., Lin, C.-W.: Magic elf: Image deraining meets association learning and transformer. arXiv preprint arXiv:2207.10455 (2022)
  28. Jin, X., Shi, Y., Xia, B., Yang, W.: Llmra: Multi-modal large language model based restoration assistant. arXiv preprint arXiv:2401.11401 (2024)
  29. Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Controlling vision-language models for universal image restoration. arXiv preprint arXiv:2310.01018 (2023)
  30. Abdal, R., Zhu, P., Femiani, J., Mitra, N., Wonka, P.: Clip2stylegan: Unsupervised extraction of stylegan edit directions. In: ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–9 (2022)
  31. Tan, Z., Wu, Y., Liu, Q., Chu, Q., Lu, L., Ye, J., Yu, N.: Exploring the application of large-scale pre-trained models on adverse weather removal. IEEE Trans. Image Proc. 33, 1683–1698 (2024)
    Article Google Scholar
  32. Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2555–2563 (2023)
  33. Liang, Z., Li, C., Zhou, S., Feng, R., Loy, C.C.: Iterative prompt learning for unsupervised backlit image enhancement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8094–8103 (2023)
  34. Yu, B.X., Chang, J., Wang, H., Liu, L., Wang, S., Wang, Z., Lin, J., Xie, L., Li, H., Lin, Z.: Visual tuning. ACM Comput. Surv. 56(12), 1–38 (2024)
    Google Scholar
  35. Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Conditional prompt learning for vision-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16816–16825 (2022)
  36. Khattak, M.U., Rasheed, H., Maaz, M., Khan, S., Khan, F.S.: Maple: Multi-modal prompt learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19113–19122 (2023)
  37. Zhao, C., Wang, Y., Jiang, X., Shen, Y., Song, K., Li, D., Miao, D.: Learning domain invariant prompt for vision-language models. IEEE Trans. Image Process. 33, 1348–1360 (2024)
    Article Google Scholar
  38. Liu, L., Wang, N., Zhou, D., Liu, D., Yang, X., Gao, X., Liu, T.: Generalizable prompt learning via gradient constrained sharpness-aware minimization. IEEE Trans. Multimedia 27, 1100–1113 (2024)
    Article Google Scholar
  39. Yang, T., Zhu, Y., Xie, Y., Zhang, A., Chen, C., Li, M.: Aim: Adapting image models for efficient video action recognition. arXiv preprint arXiv:2302.03024 (2023)
  40. Pan, J., Lin, Z., Zhu, X., Shao, J., Li, H.: St-adapter: Parameter-efficient image-to-video transfer learning. Adv. Neural. Inf. Process. Syst. 35, 26462–26477 (2022)
    Google Scholar
  41. Gao, P., Geng, S., Zhang, R., Ma, T., Fang, R., Zhang, Y., Li, H., Qiao, Y.: Clip-adapter: Better vision-language models with feature adapters. Int. J. Comput. Vision 132(2), 581–595 (2024)
    Article Google Scholar
  42. Li, Y., Fan, Y., Xiang, X., Demandolx, D., Ranjan, R., Timofte, R., Van Gool, L.: Efficient and explicit modelling of image hierarchies for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 18278–18289 (2023)
  43. Yang, W., Tan, R.T., Feng, J., Liu, J., Guo, Z., Yan, S.: Deep joint rain detection and removal from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1357–1366 (2017)
  44. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., : Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 (2021). PMLR
  45. Kim, K., Laskin, M., Mordatch, I., Pathak, D.: How to adapt your large-scale vision-and-language model, 2022. In: URL https://openreview Net/forum (2021)
  46. Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.-H.: Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5728–5739 (2022)
  47. Li, J., Li, D., Xiong, C., Hoi, S.: Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International Conference on Machine Learning, pp. 12888–12900 (2022). PMLR
  48. Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.-H., Shao, L.: Multi-stage progressive image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14821–14831 (2021)
  49. Li, S., Araujo, I.B., Ren, W., Wang, Z., Tokuda, E.K., Junior, R.H., Cesar-Junior, R., Zhang, J., Guo, X., Cao, X.: Single image deraining: A comprehensive benchmark analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3838–3847 (2019)
  50. Zhang, H., Sindagi, V., Patel, V.M.: Image de-raining using a conditional generative adversarial network. IEEE Trans. Circuits Syst. Video Technol. 30(11), 3943–3956 (2019)
    Article Google Scholar
  51. Zhang, H., Patel, V.M.: Density-aware single image de-raining using a multi-stream dense network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 695–704 (2018)
  52. Fu, X., Huang, J., Zeng, D., Huang, Y., Ding, X., Paisley, J.: Removing rain from single images via a deep detail network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3855–3863 (2017)
  53. Ren, D., Zuo, W., Hu, Q., Zhu, P., Meng, D.: Progressive image deraining networks: A better and simpler baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3937–3946 (2019)
  54. Chen, L., Lu, X., Zhang, J., Chu, X., Chen, C.: Hinet: Half instance normalization network for image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 182–192 (2021)
  55. Cui, Y., Tao, Y., Bing, Z., Ren, W., Gao, X., Cao, X., Huang, K., Knoll, A.: Selective frequency network for image restoration. In: The Eleventh International Conference on Learning Representations (2023)
  56. Zheng, D., Wu, X.-M., Yang, S., Zhang, J., Hu, J.-F., Zheng, W.-s.: Selective hourglass mapping for universal image restoration based on diffusion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)
  57. Chen, L., Chu, X., Zhang, X., Sun, J.: Simple baselines for image restoration. In: European Conference on Computer Vision, pp. 17–33 (2022). Springer
  58. Zhang, R., Fang, R., Zhang, W., Gao, P., Li, K., Dai, J., Qiao, Y., Li, H.: Tip-adapter: Training-free clip-adapter for better vision-language modeling. arXiv preprint arXiv:2111.03930 (2021)
  59. Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vision 130(9), 2337–2348 (2022)
    Article Google Scholar

Download references