Dsrn-svmamba: a dual-stream recursive network base on SVMamba for scene text recognition (original) (raw)

References

Long S, He X, Yao C (2021) Scene text detection and recognition: the deep learning era. Int J Comput Vision 129(1):161–184
Article Google Scholar
Zhou Q, Gao J, Yuan Y, Wang Q (2024) Rrtrn: a lightweight and effective backbone for scene text recognition. Expert Syst Appl 243:122769
Article Google Scholar
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304
Article Google Scholar
Lee C-Y, Osindero S (2016) Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2231–2239
Graves A, Graves A (2012) Connectionist temporal classification. Supervised sequence labelling with recurrent neural networks, 61–93
Fang S, Xie H, Wang Y, Mao Z, Zhang Y (2021) Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7098–7107
Chu X, Wang Y (2022) Itervm: iterative vision modeling module for scene text recognition. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 1393–1399. IEEE
Wang W, Xie E, Sun P, Wang W, Tian L, Shen C, Luo P (2019) Textsr: Content-aware text super-resolution guided by recognition. arXiv preprint arXiv:1909.07113
Zheng T, Chen Z, Bai J, Xie H, Jiang Y-G (2023) Tps++: Attention-enhanced thin-plate spline for scene text recognition. arXiv preprint arXiv:2305.05322
He C, Shen Y, Fang C, Xiao F, Tang L, Zhang Y, Zuo W, Guo Z, Li X (2025) Diffusion models in low-level vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence
Fujitake M (2023) Diffusionstr: Diffusion model for scene text recognition. In: 2023 IEEE International Conference on Image Processing (ICIP), pp. 1585–1589. IEEE
Ye X, Du Y, Tao Y, Chen Z (2024) Textssr: Diffusion-based data synthesis for scene text recognition. arXiv preprint arXiv:2412.01137
Atienza R (2021) Vision transformer for fast and efficient scene text recognition. In: International Conference on Document Analysis and Recognition, pp. 319–334. Springer
Vaswani A (2017) Attention is all you need. Advances in Neural Information Processing Systems
Liu Y, Tian Y, Zhao Y, Yu H, Xie L, Wang Y, Ye Q, Jiao J, Liu Y (2024) Vmamba: visual state space model. Adv Neural Inf Process Syst 37:103031–103063
Google Scholar
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
Kim J, Lee JK, Lee KM (2016) Deeply-recursive convolutional network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1637–1645
Tai Y, Yang J, Liu X (2017) Image super-resolution via deep recursive residual network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3147–3155
He P, Huang W, Qiao Y, Loy C, Tang X (2016) Reading scene text in deep convolutional sequences. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30
Hu W, Cai X, Hou J, Yi S, Lin Z (2020) Gtc: Guided training of ctc towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, 11005–11012
Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4168–4176
Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H (2019) What is wrong with scene text recognition model comparisons? dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4715–4723
Wang T, Zhu Y, Jin L, Luo C, Chen X, Wu Y, Wang Q, Cai M (2020) Decoupled attention network for text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12216–12224
Zhang B, Xie H, Wang Y, Xu J, Zhang Y (2023) Linguistic more: Taking a further step toward efficient and accurate scene text recognition. arXiv preprint arXiv:2305.05140
Lan T, Yin D(2022) A lightweight backbone used for scene text recognition. In: 2022 3rd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pp. 261–265 . IEEE
Gu A, Dao T (2023) Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752
Dosovitskiy A (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
He C, Zhang R, Xiao F, Fang C, Tang L, Zhang Y, Kong L, Fan D.-P, Li K, Farsiu S: Run: reversible unfolding network for concealed object segmentation. arXiv preprint arXiv:2501.18783 (2025)
He C, Zhang R, Xiao F, Fang C, Tang L, Zhang Y, Farsiu S (2025) Unfoldir: rethinking deep unfolding network in illumination degradation image restoration. arXiv preprint arXiv:2505.06683
Yue Y, Li Z (2024) Medmamba: Vision mamba for medical image classification. arXiv preprint arXiv:2403.03849
Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X (2024) Vision mamba: efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417
Xia S, Kou J, Liu N, Yin T (2023) Scene text recognition based on two-stage attention and multi-branch feature fusion module. Appl Intell 53(11):14219–14232
Article Google Scholar
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Article Google Scholar
Qiao Z, Zhou Y, Yang D, Zhou Y, Wang W (2020) Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13528–13537
Luo C, Jin L, Sun Z (2019) Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn 90:109–118
Article Google Scholar
Loginov V (2021) Why you should try the real data for the scene text recognition. arXiv preprint arXiv:2107.13938
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227
Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324
Mishra A, Alahari K, Jawahar C (2012) Scene text recognition using higher order language priors. In: BMVC-British Machine Vision Conference. BMVA
Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE
Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013) Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, et al. (2015) Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE
Phan TQ, Shivakumara P, Tian S, Tan CL (2013) Recognizing text with perspective distortion in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 569–576
Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027–8048
Article Google Scholar
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034
Wang J, Hu X (2017) Gated recurrent convolution neural network for ocr. Advances in Neural Information Processing Systems 30
Borisyuk F, Gordo A, Sivakumar V (2018) Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 71–79
Yu D, Li X, Zhang C, Liu T, Han J, Liu J, Ding E (2020) Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12113–12122
Wu X, Tang B, Zhao M, Wang J, Guo Y (2023) Str transformer: a cross-domain transformer for scene text recognition. Appl Intell 53(3):3444–3458
Article Google Scholar

Download references