Textual–visual interaction for enhanced single image deraining using adapter-tuned VLMs (original) (raw)
References
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021) Google Scholar
Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.D.: Eapt: efficient attention pyramid transformer for image processing. IEEE Trans. Multimedia 25, 50–61 (2021) Article Google Scholar
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)
Lee, J., Kang, H.: Pipformers: Patch based inpainting with vision transformers for generalize paintings. Comput. Anim. Virt. Worlds 35(3), 2270 (2024) Article Google Scholar
Wang, S., Li, L., Wang, J., Peng, T., Li, Z.: Highlight mask-guided adaptive residual network for single image highlight detection and removal. Comput. Anim. Virt. Worlds 35(3), 2271 (2024) Article Google Scholar
Zhang, F., You, S., Li, Y., Fu, Y.: Learning rain location prior for nighttime deraining. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13148–13157 (2023)
Hu, X., Fu, C.-W., Zhu, L., Heng, P.-A.: Depth-attentional features for single-image rain removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8022–8031 (2019)
Yi, Q., Li, J., Dai, Q., Fang, F., Zhang, G., Zeng, T.: Structure-preserving deraining with residue channel prior guidance. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4238–4247 (2021)
Jiao, C., Meng, F., Li, T., Cao, Y.: R-prenet: Deraining network based on image background prior. Appl. Sci. 13(21), 11970 (2023) Article Google Scholar
Wang, M., Meng, M., Liu, J., Wu, J.: Learning adequate alignment and interaction for cross-modal retrieval. Virt. Real. Intell. Hardw. 5(6), 509–522 (2023) Google Scholar
Fu, X., Huang, J., Zeng, D., Huang, Y., Ding, X., Paisley, J.: Removing rain from single images via a deep detail network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3855–3863 (2017)
Tao, W., Yan, X., Wang, Y., Wei, M.: Mffdnet: Single image deraining via dual-channel mixed feature fusion. IEEE Trans. Instrum. Meas. 73, 1–13 (2024) Google Scholar
Lin, K., Zhang, S., Luo, Y., Ling, J.: Unrolling a rain-guided detail recovery network for singleimage deraining. Virt. Real. Intell. Hardw. 5(1), 11–23 (2023) Google Scholar
Gao, H., Yang, J., Zhang, Y., Wang, N., Yang, J., Dang, D.: A novel single-stage network for accurate image restoration. Vis. Comput. 40(10), 7385–7398 (2024) Article Google Scholar
Zhu, J., Zhang, Q., Fei, L., Cai, R., Xie, Y., Sheng, B., Yang, X.: Fffn: Frame-by-frame feedback fusion network for video super-resolution. IEEE Trans. Multimedia 25, 6821–6835 (2022) Article Google Scholar
Chen, X., Li, H., Li, M., Pan, J.: Learning a sparse transformer network for effective image deraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5896–5905 (2023)
Wang, Z., Cun, X., Bao, J., Zhou, W., Liu, J., Li, H.: Uformer: A general u-shaped transformer for image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17683–17693 (2022)
Shao, M., Bao, Z., Liu, W., Qiao, Y., Wan, Y.: Frequency domain-enhanced transformer for single image deraining. Vis. Comput. 40, 1–16 (2024) Article Google Scholar
Dong, H., Song, T., Qi, X., Jin, J., Jin, G., Fan, L.: Exploring high-quality image deraining transformer via effective large kernel attention. Vis. Comput. 41, 1–17 (2024) Google Scholar
Zhao, C., Cai, W., Hu, C., Yuan, Z.: Cycle contrastive adversarial learning with structural consistency for unsupervised high-quality image deraining transformer. Neural Netw. 178, 106428 (2024) Article Google Scholar
Zhang, K., Luo, W., Yu, Y., Ren, W., Zhao, F., Li, C., Ma, L., Liu, W., Li, H.: Beyond monocular deraining: Parallel stereo deraining network via semantic prior. Int. J. Comput. Vision 130(7), 1754–1769 (2022) Article Google Scholar
Jiang, K., Wang, Z., Chen, C., Wang, Z., Cui, L., Lin, C.-W.: Magic elf: Image deraining meets association learning and transformer. arXiv preprint arXiv:2207.10455 (2022)
Jin, X., Shi, Y., Xia, B., Yang, W.: Llmra: Multi-modal large language model based restoration assistant. arXiv preprint arXiv:2401.11401 (2024)
Abdal, R., Zhu, P., Femiani, J., Mitra, N., Wonka, P.: Clip2stylegan: Unsupervised extraction of stylegan edit directions. In: ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–9 (2022)
Tan, Z., Wu, Y., Liu, Q., Chu, Q., Lu, L., Ye, J., Yu, N.: Exploring the application of large-scale pre-trained models on adverse weather removal. IEEE Trans. Image Proc. 33, 1683–1698 (2024) Article Google Scholar
Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2555–2563 (2023)
Liang, Z., Li, C., Zhou, S., Feng, R., Loy, C.C.: Iterative prompt learning for unsupervised backlit image enhancement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8094–8103 (2023)
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Conditional prompt learning for vision-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16816–16825 (2022)
Khattak, M.U., Rasheed, H., Maaz, M., Khan, S., Khan, F.S.: Maple: Multi-modal prompt learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19113–19122 (2023)
Zhao, C., Wang, Y., Jiang, X., Shen, Y., Song, K., Li, D., Miao, D.: Learning domain invariant prompt for vision-language models. IEEE Trans. Image Process. 33, 1348–1360 (2024) Article Google Scholar
Liu, L., Wang, N., Zhou, D., Liu, D., Yang, X., Gao, X., Liu, T.: Generalizable prompt learning via gradient constrained sharpness-aware minimization. IEEE Trans. Multimedia 27, 1100–1113 (2024) Article Google Scholar
Yang, T., Zhu, Y., Xie, Y., Zhang, A., Chen, C., Li, M.: Aim: Adapting image models for efficient video action recognition. arXiv preprint arXiv:2302.03024 (2023)
Gao, P., Geng, S., Zhang, R., Ma, T., Fang, R., Zhang, Y., Li, H., Qiao, Y.: Clip-adapter: Better vision-language models with feature adapters. Int. J. Comput. Vision 132(2), 581–595 (2024) Article Google Scholar
Li, Y., Fan, Y., Xiang, X., Demandolx, D., Ranjan, R., Timofte, R., Van Gool, L.: Efficient and explicit modelling of image hierarchies for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 18278–18289 (2023)
Yang, W., Tan, R.T., Feng, J., Liu, J., Guo, Z., Yan, S.: Deep joint rain detection and removal from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1357–1366 (2017)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., : Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 (2021). PMLR
Kim, K., Laskin, M., Mordatch, I., Pathak, D.: How to adapt your large-scale vision-and-language model, 2022. In: URL https://openreview Net/forum (2021)
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.-H.: Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5728–5739 (2022)
Li, J., Li, D., Xiong, C., Hoi, S.: Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International Conference on Machine Learning, pp. 12888–12900 (2022). PMLR
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.-H., Shao, L.: Multi-stage progressive image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14821–14831 (2021)
Li, S., Araujo, I.B., Ren, W., Wang, Z., Tokuda, E.K., Junior, R.H., Cesar-Junior, R., Zhang, J., Guo, X., Cao, X.: Single image deraining: A comprehensive benchmark analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3838–3847 (2019)
Zhang, H., Sindagi, V., Patel, V.M.: Image de-raining using a conditional generative adversarial network. IEEE Trans. Circuits Syst. Video Technol. 30(11), 3943–3956 (2019) Article Google Scholar
Zhang, H., Patel, V.M.: Density-aware single image de-raining using a multi-stream dense network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 695–704 (2018)
Fu, X., Huang, J., Zeng, D., Huang, Y., Ding, X., Paisley, J.: Removing rain from single images via a deep detail network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3855–3863 (2017)
Ren, D., Zuo, W., Hu, Q., Zhu, P., Meng, D.: Progressive image deraining networks: A better and simpler baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3937–3946 (2019)
Chen, L., Lu, X., Zhang, J., Chu, X., Chen, C.: Hinet: Half instance normalization network for image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 182–192 (2021)
Cui, Y., Tao, Y., Bing, Z., Ren, W., Gao, X., Cao, X., Huang, K., Knoll, A.: Selective frequency network for image restoration. In: The Eleventh International Conference on Learning Representations (2023)
Zheng, D., Wu, X.-M., Yang, S., Zhang, J., Hu, J.-F., Zheng, W.-s.: Selective hourglass mapping for universal image restoration based on diffusion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)
Chen, L., Chu, X., Zhang, X., Sun, J.: Simple baselines for image restoration. In: European Conference on Computer Vision, pp. 17–33 (2022). Springer
Zhang, R., Fang, R., Zhang, W., Gao, P., Li, K., Dai, J., Qiao, Y., Li, H.: Tip-adapter: Training-free clip-adapter for better vision-language modeling. arXiv preprint arXiv:2111.03930 (2021)
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vision 130(9), 2337–2348 (2022) Article Google Scholar