Survey on Natural Scene Text Recognition Methods of Deep Learning

doi:10.3778/j.issn.1673-9418.2306024

Abstract

Abstract: Natural scene text recognition holds significant value in both academic research and practical applications, making it one of the research hotspots in the field of computer vision. However, the recognition process faces challenges such as diverse text styles and complex background environments, leading to unsatisfactory efficiency and accuracy. Traditional text recognition methods based on manually designed features have limited representation capabilities, which are insufficient for effectively handling complex tasks in natural scene text recognition. In recent years, significant progress has been made in natural scene text recognition by adopting deep learning methods. This paper systematically reviews the recent research work in this area. Firstly, the natural scene text recognition methods are categorized into segmentation-based and non-segmentation-based approaches based on character segmentation required or not. The non-segmentation-based methods are further subdivided according to their technical implementation characteristics, and the working principles of the most representative methods in each category are described. Next, commonly used datasets and evaluation metrics are introduced, and the performance of various methods is compared on these datasets. The advantages and limitations of different approaches are discussed from multiple perspectives. Finally, the shortcomings and challenges are given, and the future development trends are also put forward.

Key words: text recognition, deep learning, natural scene

摘要： 自然场景文本识别在学术研究和实际应用中具有重要价值，已经成为计算机视觉领域的研究热点之一。然而，识别过程存在文本风格多样、背景环境复杂等挑战，导致识别效率和准确率不佳。传统的基于手工设计特征文本识别方法由于其有限的表示能力，不足以有效地应对复杂的自然场景文本识别任务。近年来，采用深度学习方法在自然场景文本识别中取得了重大进展，系统地梳理了近年来相关研究工作。首先，根据是否需要对单字符进行分割，将自然场景文本识别方法分为基于分割与无需分割的方法，再根据其技术实现特点将无需分割的方法进行细分，并对各类最具有代表性的方法工作原理进行了阐述。然后，介绍了当前常用数据集以及评价指标，并在数据集上对各类方法进行了性能对比，从多个方面讨论了各类方法的优势与局限性。最后，指出基于深度学习的自然场景文本识别研究存在的不足和难点，对其未来的发展趋势进行了展望。

关键词: 文本识别, 深度学习, 自然场景

ZENG Fanzhi, FENG Wenjie, ZHOU Yan. Survey on Natural Scene Text Recognition Methods of Deep Learning[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(5): 1160-1181.

曾凡智, 冯文婕, 周燕. 深度学习的自然场景文本识别方法综述[J]. 计算机科学与探索, 2024, 18(5): 1160-1181.

References

[1] LUCAS S M, PANARETOS A, SOSA L, et al. ICDAR 2003 robust reading competitions: entries, results, and future directions[J]. International Journal of Document Analysis and Recognition, 2005, 7: 105-122.
[2] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-26, 2005. Washington: IEEE Computer Society, 2005: 886-893.
[3] LOWE D G. Object recognition from local scale-invariant features[C]//Proceedings of the 1999 International Conference on Computer Vision, Kerkyra, Sep 20-25, 1999. Washington: IEEE Computer Society, 1999: 1150-1157.
[4] GRAVES A, FERNáNDEZ S, GOMEZ F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 369-376.
[5] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv:1409.0473, 2014.
[6] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008.
[7] CHEN X, JIN L, ZHU Y, et al. Text recognition in the wild: a survey[J]. ACM Computing Surveys, 2021, 54(2): 1-35.
[8] 刘崇宇, 陈晓雪, 罗灿杰, 等. 自然场景文本检测与识别的深度学习方法[J]. 中国图象图形学报, 2021, 26(6): 1330-1367.
LIU C Y, CHEN X X, LUO C J, et al. Deep learning methods for scene text detection and recognition[J]. Journal of Image and Graphics, 2021, 26(6): 1330-1367.
[9] 刘艳菊, 伊鑫海, 李炎阁, 等. 深度学习在场景文字识别技术中的应用综述[J]. 计算机工程与应用, 2022, 58(4): 52-63.
LIU Y J, YI X H, LI Y G, et al. Application of scene text recognition technology based on deep learning: a survey[J]. Computer Engineering and Applications, 2022, 58(4): 52-63.
[10] NAIEMI F, GHODS V, KHALESI H. Scene text detection and recognition: a survey[J]. Multimedia Tools and Applications, 2022, 81(14): 20255-20290.
[11] 周燕, 韦勤彬, 廖俊玮, 等. 自然场景文本检测与端到端识别: 深度学习方法[J]. 计算机科学与探索, 2023, 17(3): 577-594.
ZHOU Y, WEI Q B, LIAO J W, et al. Natural scene text detection and end-to-end recognition: deep learning methods[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 577-594.
[12] WANG K, BABENKO B, BELONGIE S. End-to-end scene text recognition[C]//Proceedings of the 2011 International Conference on Computer Vision. Washington: IEEE Computer Society, 2011: 1457-1464.
[13] MISHRA A, ALAHARI K, JAWAHAR C V. Scene text recognition using higher order language priors[C]//Proceedings of the 2012 British Machine Vision Conference, Surrey, Sep 3-7, 2012: 1-11.
[14] LAFFERTY J, MCCALLUM A, PEREIRA F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 2001: 282-289.
[15] YAO C, BAI X, SHI B, et al. Strokelets: a learned multi-scale representation for scene text recognition[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2014: 4042-4049.
[16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[17] WANG T, WU D J, COATES A, et al. End-to-end text recognition with convolutional neural networks[C]//Proceedings of the 21st International Conference on Pattern Recognition. Piscataway: IEEE, 2012: 3304-3308.
[18] MISHRA A, ALAHARI K, JAWAHAR C V. Enhancing energy minimization framework for scene text recognition with top-down cues[J]. Computer Vision and Image Understanding, 2016, 145: 30-42.
[19] BISSACCO A, CUMMINS M, NETZER Y, et al. PhotoOCR: reading text in uncontrolled conditions[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2013: 785-792.
[20] ALSHARIF O, PINEAU J. End-to-end text recognition with hybrid HMM maxout models[J]. arXiv:1310.1811, 2013.
[21] GOODFELLOW I, WARDE-FARLEY D, MIRZA M, et al. Maxout networks[C]//Proceedings of the 30th International Conference on Machine Learning, Atlanta, Jun 16-21, 2013: 1319-1327.
[22] BAUM L E, EAGON J A. An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology[J]. Bulletin of the American Meteorological Society, 1967, 73(3): 360-363.
[23] LIU X, KAWANISHI T, WU X, et al. Scene text recognition with CNN classifier and WFST-based word labeling[C]// Proceedings of the 23rd International Conference on Pattern Recognition. Piscataway: IEEE, 2016: 3999-4004.
[24] PHAN T Q, SHIVAKUMARA P, TIAN S, et al. Recognizing text with perspective distortion in natural scenes[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2013: 569-576.
[25] GORDO A. Supervised mid-level features for word image representation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 2956-2964.
[26] WAN Z, HE M, CHEN H, et al. TextScanner: reading characters in order for robust scene text recognition[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 12120-12127.
[27] ZAREMBA W, SUTSKEVER I, VINYALS O. Recurrent neural network regularization[J]. arXiv:1409.2329, 2014.
[28] GRAVES A, LIWICKI M, BUNKE H, et al. Unconstrained on-line handwriting recognition with recurrent neural networks[C]//Advances in Neural Information Processing Systems 20, Vancouver, Dec 3-6, 2007: 577-584.
[29] HE P, HUANG W, QIAO Y, et al. Reading scene text in deep convolutional sequences[C]//Proceedings of the 2016 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2016: 3501-3508.
[30] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
[31] SHI B, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its appli-cation to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(11): 2298-2304.
[32] GAO Y, CHEN Y, WANG J, et al. Reading scene text with attention convolutional sequence modeling[J]. arXiv:1709. 04303, 2017.
[33] YIN F, WU Y C, ZHANG X Y, et al. Scene text recognition with sliding convolutional character models[J]. arXiv:1709. 01727, 2017.
[34] WANG J, HU X. Gated recurrent convolution neural network for OCR[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 335-344.
[35] LIANG M, HU X. Recurrent convolutional neural network for object recognition[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 3367-3375.
[36] CHO K, VAN M B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv:1406.1078, 2014.
[37] LIU W, CHEN C, WONG K Y K, et al. STAR-Net: a spatial attention residue network for scene text recognition[C]//Proceedings of the British Machine Vision Conference 2016, York, Sep 19-22, 2016: 2-7.
[38] JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]//Advances in Neural Information Processing Systems 28, Montreal, Dec 7-12, 2015: 2017-2025.
[39] LIU Y, WANG Z, JIN H, et al. Synthetically supervised feature learning for scene text recognition[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 435-451.
[40] QI X, CHEN Y, XIAO R, et al. A novel joint character categorization and localization approach for character-level scene text recognition[C]//Proceedings of the 2019 International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2019: 83-90.
[41] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[42] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 10012-10022.
[43] DU Y, CHEN Z, JIA C, et al. SVTR: scene text recognition with a single visual model[J]. arXiv:2205.00159, 2022.
[44] LIU H, JIN S, ZHANG C. Connectionist temporal classification with maximum entropy regularization[C]//Advances in Neural Information Processing Systems 31, Montreal, Dec 3-8, 2018: 839-849.
[45] WAN Z, XIE F, LIU Y, et al. 2D-CTC for scene text recognition[J]. arXiv:1907.09705, 2019.
[46] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems 27, Montreal, Dec 8-13, 2014: 3104-3112.
[47] CHENG Z, BAI F, XU Y, et al. Focusing attention: towards accurate text recognition in natural images[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2017: 5076-5084.
[48] BAI F, CHENG Z, NIU Y, et al. Edit probability for scene text recognition[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 1508-1516.
[49] LUO C, JIN L, SUN Z. Moran: a multi-object rectified attention network for scene text recognition[J]. Pattern Recognition, 2019, 90: 109-118.
[50] WANG T, ZHU Y, JIN L, et al. Decoupled attention network for text recognition[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 12216-12224.
[51] SHI B, WANG X, LYU P, et al. Robust scene text recognition with automatic rectification[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 4168-4176.
[52] BOOKSTEIN F L. Principal warps: thin-plate splines and the decomposition of deformations[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1989, 11(6): 567-585.
[53] SHI B, YANG M, WANG X, et al. Aster: an attentional scene text recognizer with flexible rectification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(9): 2035-2048.
[54] ZHAN F, LU S. ESIR: end-to-end scene text recognition via iterative image rectification[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 2059-2068.
[55] YANG M, GUAN Y, LIAO M, et al. Symmetry-constrained rectification network for scene text recognition[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9147-9156.
[56] LIU W, CHEN C, WONG K Y. Char-Net: a character-aware neural network for distorted scene text recognition[C]//Proceedings of the 2018 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018: 7154-7161.
[57] LIN Q, LUO C, JIN L, et al. STAN: a sequential transformation attention-based network for scene text recognition[J]. Pattern Recognition, 2021, 111: 107692.
[58] MCLEOD R M. The generalized Riemann integral[M]. American Mathematical Soc., 1980.
[59] MOU Y, TAN L, YANG H, et al. PlugNet: degradation aware scene text recognition supervised by a pluggable super-resolution unit[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 158-174.
[60] 陈佐瓒, 徐兵, 丁小军, 等. 基于Encoder-Decoder框架的双监督机制自然场景文本识别[J]. 计算机工程与应用, 2022, 58(6): 128-133.
CHEN Z Z, XU B, DING X J, et al. Natural scene text recognition based on encoder-decoder framework with dual supervision mechanism[J]. Computer Engineering and Applications, 2022, 58(6): 128-133.
[61] LEE C Y, OSINDERO S. Recursive recurrent nets with attention modeling for OCR in the wild[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 2231-2239.
[62] SHENG F, CHEN Z, XU B. NRTR: a no-recurrence sequence-to-sequence model for scene text recognition[C]//Proceedings of the 2019 International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2019: 781-786.
[63] YANG X, HE D, ZHOU Z, et al. Learning to read irregular text with attention mechanisms[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Aug 19-25, 2017: 3280-3286.
[64] LI H, WANG P, SHEN C, et al. Show, attend and read: a simple and strong baseline for irregular text recognition[C]//Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 8610-8617.
[65] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 3431-3440.
[66] LIAO M, ZHANG J, WAN Z, et al. Scene text recognition from two-dimensional perspective[C]//Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 8714-8721.
[67] LONG S, GUAN Y, BIAN K, et al. A new perspective for flexible feature gathering in scene text recognition via character anchor pooling[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2020: 2458-2462.
[68] LITMAN R, ANSCHEL O, TSIPER S, et al. Scatter: selective context attentional scene text recognizer[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11962-11972.
[69] ZHANG Y, NIE S, LIU W, et al. Sequence-to-sequence domain adaptation network for robust text image recognition[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 2740-2749.
[70] UO C, LIN Q, LIU Y, et al. Separating content from style using adversarial learning for recognizing text in the wild[J]. International Journal of Computer Vision, 2021, 129: 960-976.
[71] MIRZA M, OSINDERO S. Conditional generative adversarial nets[J]. arXiv:1411.1784, 2014.
[72] ZHONG D, LYU S, SHIVAKUMARA P, et al. SGBANet: semantic GAN and balanced attention network for arbitrarily oriented scene text recognition[C]//Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Oct 23-27, 2022. Cham: Springer, 2022: 464-480.
[73] QIAO Z, ZHOU Y, YANG D, et al. Seed: semantics enhanced encoder-decoder framework for scene text recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13528-13537.
[74] BOJANOWSKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 135-146.
[75] YU D, LI X, ZHANG C, et al. Towards accurate scene text recognition with semantic reasoning networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 12113-12122.
[76] FANG S, XIE H, WANG Y, et al. Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 7098-7107.
[77] NA B, KIM Y, PARK S. Multi-modal text recognition networks: interactive enhancements between visual and semantic features[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 446-463.
[78] WANG Y, XIE H, FANG S, et al. From two to one: a new scene text recognizer with visual language modeling network[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 14194-14203.
[79] ZHANG B, XIE H, WANG Y, et al. Linguistic more: taking a further step toward efficient and accurate scene text recognition[J]. arXiv:2305.05140, 2023.
[80] CHU X, WANG Y. IterVM: iterative vision modeling module for scene text recognition[C]//Proceedings of the 26th International Conference on Pattern Recognition. Piscataway: IEEE, 2022: 1393-1399.
[81] HE Y, CHEN C, ZHANG J, et al. Visual semantics allow for textual reasoning better in scene text recognition[C]//Proceedings of the 2022 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2022: 888-896.
[82] BHUNIA A K, SAIN A, KUMAR A, et al. Joint visual semantic reasoning: multi-stage decoder for text recognition[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 14940-14949.
[83] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 2117-2125.
[84] LIU H, WANG B, BAO Z, et al. Perceiving stroke-semantic context: hierarchical contrastive learning for robust scene text recognition[C]//Proceedings of the 2022 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2022: 1702-1710.
[85] WANG P, DA C, YAO C. Multi-granularity prediction for scene text recognition[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 339-355.
[86] BAUTISTA D, ATIENZA R. Scene text recognition with permuted autoregressive sequence models[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 178-196.
[87] YANG Z, DAI Z, YANG Y, et al. XLNet: generalized auto-regressive pretraining for language understanding[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 5754-5764.
[88] DU Y, CHEN Z, JIA C, et al. Context perception parallel decoder for scene text recognition[J]. arXiv:2307.12270, 2023.
[89] YANG X, QIAO Z, WEI J, et al. Masked and permuted implicit context learning for scene text recognition[J]. arXiv:2305.16172, 2023.
[90] ABERDAM A, BENSA?D D, GOLTS A, et al. CLIPTER: looking at the bigger picture in scene text recognition[J]. arXiv:2301.07464, 2023.
[91] ZHAO S, WANG X, ZHU L, et al. CLIP4STR: a simple baseline for scene text recognition with pre-trained vision-language model[J]. arXiv:2305.14014, 2023.
[92] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24 , 2021: 8748-8763.
[93] HU W, CAI X, HOU J, et al. GTC: guided training of CTC towards efficient and accurate scene text recognition[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 11005-11012.
[94] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907, 2016.
[95] YAN R, PENG L, XIAO S, et al. Primitive representation learning for scene text recognition[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 284-293.
[96] ATIENZA R. Vision transformer for fast and efficient scene text recognition[C]//Proceedings of the 16th International Conference on Document Analysis and Recognition, Lausanne, Sep 5-10, 2021. Cham: Springer, 2021: 319-334.
[97] YAN X, FANG Z, JIN Y. Augmented Transformers with adaptive n-grams embedding for multilingual scene text recognition[J]. arXiv:2302.14261, 2023.
[98] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Synthetic data and artificial neural networks for natural scene text recognition[J]. arXiv:1406.2227, 2014.
[99] GUPTA A, VEDALDI A, ZISSERMAN A. Synthetic data for text localisation in natural images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 2315-2324.
[100] KARATZAS D, SHAFAIT F, UCHIDA S, et al. ICDAR 2013 robust reading competition[C]//Proceedings of the 12th International Conference on Document Analysis and Recognition, Washington, Aug 25-28, 2013. Washington: IEEE Computer Society, 2013: 1484-1493.
[101] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//Proceedings of the 13th International Conference on Document Analysis and Recognition, Nancy, Aug 23-26, 2015. Washington: IEEE Computer Society, 2015: 1156-1160.
[102] RISNUMAWAN A, SHIVAKUMARA P, CHAN C S, et al. A robust arbitrary text detection system for natural scene images[J]. Expert Systems with Applications, 2014, 41(18): 8027-8048.
[103] HE M, LIU Y, YANG Z, et al. ICPR2018 contest on robust reading for multi-type web images[C]//Proceedings of the 24th International Conference on Pattern Recognition. Piscataway: IEEE, 2018: 7-12.
[104] SUN Y, LIU J, LIU W, et al. Chinese street view text: large-scale Chinese text reading with partially supervised learning[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9086-9095.
[105] SUN Y, NI Z, CHNG C K, et al. ICDAR 2019 competition on large-scale street view text with partial labeling-RRC-LSVT[C]//Proceedings of the 2019 International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2019: 1557-1562.
[106] NAYEF N, PATEL Y, BUSTA M, et al. ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019[C]//Proceedings of the 2019 International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2019: 1582-1587.