Dsrn-svmamba: a dual-stream recursive network base on SVMamba for scene text recognition (original) (raw)

References

  1. Long S, He X, Yao C (2021) Scene text detection and recognition: the deep learning era. Int J Comput Vision 129(1):161–184
    Article Google Scholar
  2. Zhou Q, Gao J, Yuan Y, Wang Q (2024) Rrtrn: a lightweight and effective backbone for scene text recognition. Expert Syst Appl 243:122769
    Article Google Scholar
  3. Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304
    Article Google Scholar
  4. Lee C-Y, Osindero S (2016) Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2231–2239
  5. Graves A, Graves A (2012) Connectionist temporal classification. Supervised sequence labelling with recurrent neural networks, 61–93
  6. Fang S, Xie H, Wang Y, Mao Z, Zhang Y (2021) Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7098–7107
  7. Chu X, Wang Y (2022) Itervm: iterative vision modeling module for scene text recognition. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 1393–1399. IEEE
  8. Wang W, Xie E, Sun P, Wang W, Tian L, Shen C, Luo P (2019) Textsr: Content-aware text super-resolution guided by recognition. arXiv preprint arXiv:1909.07113
  9. Zheng T, Chen Z, Bai J, Xie H, Jiang Y-G (2023) Tps++: Attention-enhanced thin-plate spline for scene text recognition. arXiv preprint arXiv:2305.05322
  10. He C, Shen Y, Fang C, Xiao F, Tang L, Zhang Y, Zuo W, Guo Z, Li X (2025) Diffusion models in low-level vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence
  11. Fujitake M (2023) Diffusionstr: Diffusion model for scene text recognition. In: 2023 IEEE International Conference on Image Processing (ICIP), pp. 1585–1589. IEEE
  12. Ye X, Du Y, Tao Y, Chen Z (2024) Textssr: Diffusion-based data synthesis for scene text recognition. arXiv preprint arXiv:2412.01137
  13. Atienza R (2021) Vision transformer for fast and efficient scene text recognition. In: International Conference on Document Analysis and Recognition, pp. 319–334. Springer
  14. Vaswani A (2017) Attention is all you need. Advances in Neural Information Processing Systems
  15. Liu Y, Tian Y, Zhao Y, Yu H, Xie L, Wang Y, Ye Q, Jiao J, Liu Y (2024) Vmamba: visual state space model. Adv Neural Inf Process Syst 37:103031–103063
    Google Scholar
  16. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19
  17. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
  18. Kim J, Lee JK, Lee KM (2016) Deeply-recursive convolutional network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1637–1645
  19. Tai Y, Yang J, Liu X (2017) Image super-resolution via deep recursive residual network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3147–3155
  20. He P, Huang W, Qiao Y, Loy C, Tang X (2016) Reading scene text in deep convolutional sequences. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30
  21. Hu W, Cai X, Hou J, Yi S, Lin Z (2020) Gtc: Guided training of ctc towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, 11005–11012
  22. Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4168–4176
  23. Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H (2019) What is wrong with scene text recognition model comparisons? dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4715–4723
  24. Wang T, Zhu Y, Jin L, Luo C, Chen X, Wu Y, Wang Q, Cai M (2020) Decoupled attention network for text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12216–12224
  25. Zhang B, Xie H, Wang Y, Xu J, Zhang Y (2023) Linguistic more: Taking a further step toward efficient and accurate scene text recognition. arXiv preprint arXiv:2305.05140
  26. Lan T, Yin D(2022) A lightweight backbone used for scene text recognition. In: 2022 3rd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pp. 261–265 . IEEE
  27. Gu A, Dao T (2023) Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752
  28. Dosovitskiy A (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
  29. He C, Zhang R, Xiao F, Fang C, Tang L, Zhang Y, Kong L, Fan D.-P, Li K, Farsiu S: Run: reversible unfolding network for concealed object segmentation. arXiv preprint arXiv:2501.18783 (2025)
  30. He C, Zhang R, Xiao F, Fang C, Tang L, Zhang Y, Farsiu S (2025) Unfoldir: rethinking deep unfolding network in illumination degradation image restoration. arXiv preprint arXiv:2505.06683
  31. Yue Y, Li Z (2024) Medmamba: Vision mamba for medical image classification. arXiv preprint arXiv:2403.03849
  32. Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X (2024) Vision mamba: efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417
  33. Xia S, Kou J, Liu N, Yin T (2023) Scene text recognition based on two-stage attention and multi-branch feature fusion module. Appl Intell 53(11):14219–14232
    Article Google Scholar
  34. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
    Article Google Scholar
  35. Qiao Z, Zhou Y, Yang D, Zhou Y, Wang W (2020) Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13528–13537
  36. Luo C, Jin L, Sun Z (2019) Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn 90:109–118
    Article Google Scholar
  37. Loginov V (2021) Why you should try the real data for the scene text recognition. arXiv preprint arXiv:2107.13938
  38. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227
  39. Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324
  40. Mishra A, Alahari K, Jawahar C (2012) Scene text recognition using higher order language priors. In: BMVC-British Machine Vision Conference. BMVA
  41. Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE
  42. Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013) Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE
  43. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, et al. (2015) Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE
  44. Phan TQ, Shivakumara P, Tian S, Tan CL (2013) Recognizing text with perspective distortion in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 569–576
  45. Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027–8048
    Article Google Scholar
  46. Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701
  47. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034
  48. Wang J, Hu X (2017) Gated recurrent convolution neural network for ocr. Advances in Neural Information Processing Systems 30
  49. Borisyuk F, Gordo A, Sivakumar V (2018) Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 71–79
  50. Yu D, Li X, Zhang C, Liu T, Han J, Liu J, Ding E (2020) Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12113–12122
  51. Wu X, Tang B, Zhao M, Wang J, Guo Y (2023) Str transformer: a cross-domain transformer for scene text recognition. Appl Intell 53(3):3444–3458
    Article Google Scholar

Download references