Natural Scene Text Detection and End-to-End Recognition: Deep Learning Methods

doi:10.3778/j.issn.1673-9418.2209004

Abstract

Abstract: The rich text content in natural scene images is of great significance for scene understanding, but natural scene texts are often characterized by extreme horizontal/vertical ratio, variable font style, complex background and shape, etc. Traditional text detection and end-to-end recognition methods have the disadvantages of complex model design, low efficiency, low applicability and high cost. With the rapid development of deep learning technology in image field, natural scene text detection and end-to-end recognition methods have made breakthrough progress, and their performance and efficiency have been significantly improved. Aiming at the text detection and end-to-end recognition methods of natural scene, this paper reviews the related research work in recent years. Firstly, according to different generation methods of text boxes, the basic ideas of natural scene text detection methods are divided mainly from two perspectives of regression candidate boxes and pixel segmentation, and various representative methods are described in detail. Secondly, from the perspective of end-to-end recognition speed and decoupling detection and recognition task, the development route of end-to-end recognition methods is summarized. Then, the commonly used open text datasets are introduced, and performance of representative methods is evaluated on the open datasets. Finally, the main research directions of natural scene text detection and end-to-end recognition are discussed, and challenges and future development trends are expounded.

Key words: deep learning, natural scene, text detection, end-to-end recognition

摘要： 自然场景图像中丰富的文本内容对场景理解有着重要意义，但自然场景文本往往具有极端横纵比、字体风格多变、背景及形状复杂等特点，而传统的文本检测与端到端识别方法存在着模型设计复杂、效率低、适用性不强且耗费成本高等缺点。随着深度学习技术在图像领域的迅速发展，自然场景文本检测与端到端识别方法取得了突破性的进展，其性能和效率得到了显著提高。针对自然场景文本检测与端到端识别方法，梳理了近年来相关的研究工作。首先，根据文本框生成方式的不同，主要从回归候选框和像素分割两个角度来划分自然场景文本检测方法的基本思想，并对各类代表性的方法进行了详细叙述；其次，从端到端识别速度与解耦检测和识别任务的角度对端到端识别方法的技术发展路线进行了归纳总结；然后，介绍了常用的公开文本数据集，并在公开的文本数据集上对各类方法进行了性能对比；最后，对自然场景文本检测与端到端识别的主流研究方向进行了讨论，并阐述了其面临的挑战和未来的发展趋势。

关键词: 深度学习, 自然场景, 文本检测, 端到端识别

ZHOU Yan, WEI Qinbin, LIAO Junwei, ZENG Fanzhi, FENG Wenjie, LIU Xiangyu, ZHOU Yuexia. Natural Scene Text Detection and End-to-End Recognition: Deep Learning Methods[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 577-594.

周燕, 韦勤彬, 廖俊玮, 曾凡智, 冯文婕, 刘翔宇, 周月霞. 自然场景文本检测与端到端识别：深度学习方法[J]. 计算机科学与探索, 2023, 17(3): 577-594.

References

[1] 李祥鹏, 闵卫东, 韩清, 等. 基于深度学习的车牌定位和识别方法[J]. 计算机辅助设计与图形学学报, 2019, 31(6): 979-987.
LI X P, MIN W D, HAN Q, et al. License plate location and recognition based on deep learning[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(6): 979-987.
[2] 汤战勇, 田超雄, 叶贵鑫, 等. 一种基于条件生成式对抗网络的文本类验证码识别方法[J]. 计算机学报, 2020, 43(8):1572-1588.
TANG Z Y, TIAN C X, YE G X, et al. A recognition method for text-based captcha based on CGAN[J]. Chinese Journal of Computers, 2020, 43(8): 1572-1588.
[3] 卓天天, 桑庆兵. 注意力机制与复合卷积在手写识别中的应用[J]. 计算机科学与探索, 2022, 16(4): 888-897.
ZHUO T T, SAN Q B. Application of attention mechanism and composite convolution in handwriting recognition[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 888-897.
[4] GONZALEZ A, BERGASA L M, YEBES J J. Text detection and recognition on traffic panels from street-level imagery using visual appearance[J]. IEEE Transactions on Intelligent Transportation Systems, 2013, 15(1): 228-238.
[5] ZHOU W, LI H, LU Y, et al. Principal visual word discovery for automatic license plate detection[J]. IEEE Transactions on Image Processing, 2012, 21(9): 4269-4279.
[6] GREENHALGH J, MIRMEHDI M. Recognizing text-based traffic signs[J]. IEEE Transactions on Intelligent Transporta-tion Systems, 2014, 16(3): 1360-1369.
[7] JUNG K, KIM K I, JAIN A K. Text information extraction in images and video: a survey[J]. Pattern Recognition, 2004, 37(5): 977-997.
[8] EZAKI N, KIYOTA K, MINH B T, et al. Improved text-detection methods for a camera-based text reading system for blind persons[C]//Proceedings of the 8th International Conference on Document Analysis and Recognition, Seoul, Aug 31-Sep 1, 2005. Washington: IEEE Computer Society, 2005: 257-261.
[9] EZAKI N, BULACU M, SCHOMAKER L. Text detection from natural scene images: towards a system for visually impaired persons[C]//Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, Aug 23-26, 2004. Washington: IEEE Computer Society, 2004: 683-686.
[10] HEDGPETH T, BLACK JR J A, PANCHANATHAN S. A demonstration of the iCARE portable reader[C]//Proceedings of the 8th International ACM SIGACCESS Conference on Computers and Accessibility, Portland, Oct 23-25, 2006. New York: ACM, 2006: 279-280.
[11] GOTO H, TANAKA M. Text-tracking wearable camera system for the blind[C]//Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barce-lona, Jul 26-29, 2009. Washington: IEEE Computer Society, 2009: 141-145.
[12] SHILKROT R, HUBER J, LIU C, et al. FingerReader: a wearable device to support text reading on the Go[C]//Proceedings of the 2014 CHI Conference on Human Fac-tors in Computing Systems, Toronto, Apr 26-May 1, 2014. New York: ACM, 2014: 2359-2364.
[13] 李益红, 陈袁宇. 深度学习场景文本检测方法综述[J]. 计算机工程与应用, 2021, 57(6): 42-48.
LI Y H, CHEN Y Y. Review on deep learning based scene text detection[J]. Computer Engineering and Applications, 2021, 57(6): 42-48.
[14] 王润民, 桑农, 丁丁, 等. 自然场景图像中的文本检测综述[J]. 自动化学报, 2018, 44(12): 2113-2141.
WANG R M, SAN N, DING D, et al. Text detection in natural scene image: a survey[J]. Acta Automatica Sinica, 2018, 44(12): 2113-2141.
[15] 王建新, 王子亚, 田萱. 基于深度学习的自然场景文本检测与识别综述[J]. 软件学报, 2020, 31(5): 1465-1496.
WANG J X, WANG Z Y, TIAN X. Review of natural scene text detection and recognition based on deep learning[J]. Journal of Software , 2020, 31(5): 1465-1496.
[16] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich fea-ture hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 580-587.
[17] REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal net-works[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2015, Montreal, Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 91-99.
[18] ZHONG Z Y, JIN L W, HUANG S P. DeepText: a new app-roach for text proposal generation and text detection in natural images[C]//Proceedings of the 2017 IEEE Interna-tional Conference on Acoustics, Speech and Signal Proces-sing, New Orleans, Mar 5-9, 2017. Piscataway: IEEE, 2017:1208-1212.
[19] JIANG Y Y, ZHU X Y, WANG X B, et al. R2CNN: rota-tional region CNN for orientation robust scene text detec-tion[J]. arXiv:1706.09579, 2017.
[20] LIU Y L, JIN L W. Deep matching prior network: toward tighter multi-oriented text detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Re-cognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Com-puter Society, 2017: 3454-3461.
[21] MA J, SHAO W, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122.
[22] LIAO M H, SHI B G, BAI X, et al. TextBoxes: a fast text detector with a single deep neural network[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 4161-4167.
[23] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[24] LIAO M, SHI B, BAI X. Textboxes++: a single-shot orien-ted scene text detector[J]. IEEE Transactions on Image Pro-cessing, 2018, 27(8): 3676-3690.
[25] LIN J Y, PAN Y W, LAI R F, et al. Core-Text: improving scene text detection with contrastive relational reasoning[C]//Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, Shenzhen, Jul 5-9, 2021. Piscata-way: IEEE, 2021: 1-6.
[26] TIAN Z, HUANG W L, HE T, et al. Detecting text in na-tural image with connectionist text proposal network[C]//LNCS 9912: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Sep 12-17, 2016. Cham: Springer, 2016: 56-72.
[27] SHI B G, BAI X, BELONGIE S J. Detecting oriented text in natural images by linking segments[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 3482-3490.
[28] TANG J, YANG Z, WANG Y, et al. SegLink++: detecting dense and arbitrary-shaped scene text by instance-aware com-ponent grouping[J]. Pattern Recognition, 2019, 96: 106954.
[29] ZHANG S X, ZHU X B, HOU J B, et al. Deep relational reasoning graph network for arbitrary shape text detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Com-puter Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 9696-9705.
[30] LI J M, ZHANG C Q, SUN Y P, et al. Detecting text in the wild with deep character embedding network[C]//LNCS 11364: Proceedings of the 14th Asian Conference on Computer Vi-sion, Perth, Dec 2-6, 2018. Cham: Springer, 2018: 501-517.
[31] BAEK Y, LEE B, HAN D, et al. Character region awareness for text detection[C]//Proceedings of the 2019 IEEE Con-ference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 9365-9374.
[32] LONG J, SHELHAMER E, DARRELL T. Fully convolu-tional networks for semantic segmentation[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 1-4, 2015. Washington: IEEE Com-puter Society, 2015: 3431-3440.
[33] ZHOU X Y, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2642-2651.
[34] HE W H, ZHANG X Y, YIN F, et al. Deep direct regression for multi-oriented scene text detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vi-sion, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 745-753.
[35] LONG S, RUAN J Q, ZHANG W J, et al. TextSnake: a flexible representation for detecting text of arbitrary shapes[C]//LNCS 11206: Proceedings of the 15th European Con-ference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 19-35.
[36] WANG P F, ZHANG C Q, QI F, et al. A single-shot arbitrarily-shaped text detector based on context attended multi-task learning[C]//Proceedings of the 27th ACM International Con-ference on Multimedia, Nice, Oct 21-25, 2019. New York: ACM, 2019: 1277-1285.
[37] ZHONG Z, SUN L, HUO Q. An anchor-free region proposal network for Faster R-CNN-based text detection approaches[J]. International Journal on Document Analysis and Recog-nition, 2019, 22(3): 315-327.
[38] ZHANG C Q, LIANG B R, HUANG Z M, et al. Look more than once: an accurate detector for text of arbitrary shapes[C]//Proceedings of the 2019 IEEE/CVF Conference on Com-puter Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 10552-10561.
[39] HE M H, LIAO M H, YANG Z B, et al. MOST: a multi-oriented scene text detector with localization refinement[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 19-25, 2021. Piscataway: IEEE, 2021: 8813-8822.
[40] DENG D, LIU H F, LI X L, et al. Pixellink: detecting scene text via instance segmentation[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th Inno-vative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, Feb 2-7, 2018. Menlo Park: AAAI, 2018: 6773-6780.
[41] LI X, WANG W H, HOU W B, et al. Shape robust text detection with progressive scale expansion network[J]. arXiv: 1806.02559, 2018.
[42] WANG W H, XIE E Z, SONG X G, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggrega-tion network[C]//Proceedings of the 2019 IEEE/CVF Inter-national Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 8439-8448.
[43] ZHANG S X, ZHU X, HOU J B, et al. Kernel proposal network for arbitrary shape text detection[J]. IEEE Transac-tions on Neural Networks and Learning Systems, 2022: 1-12.
[44] LIAO M H, WAN Z Y, YAO C, et al. Real-time scene text detection with differentiable binarization[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Con-ference, the 10th AAAI Symposium on Educational Advan-ces in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 11474-11481.
[45] LIAO M H, ZOU Z S, WAN Z Y, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 919-931.
[46] XU Y C, WANG Y K, ZHOU W, et al. Textfield: learning a deep direction field for irregular scene text detection[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5566-5579.
[47] ZHU Y, DU J. Textmountain: accurate scene text detection via instance segmentation[J]. Pattern Recognition, 2021, 110: 107336.
[48] XUE C H, LU S J, ZHANG W. MSR: multi-scale shape reg-ression for scene text detection[J]. arXiv:1901.02596, 2019.
[49] DAI P W, ZHANG S Y, ZHANG H, et al. Progressive con-tour regression for arbitrary-shape scene text detection[C]//Proceedings of the 2021 IEEE Conference on Computer Vi-sion and Pattern Recognition, Jun 19-25, 2021. Piscataway:IEEE, 2021: 7393-7402.
[50] ZHANG S X, ZHU X B, YANG C, et al. Adaptive boundary proposal network for arbitrary shape text detection[C]//Pro-ceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 1285-1294.
[51] ZHANG S X, ZHU X B, YANG C, et al. Arbitrary shape text detection via boundary transformer[J]. arXiv:2205.05320, 2022.
[52] TANG J Q, ZHANG W Q, LIU H Y, et al. Few could be better than all: feature sampling and grouping for scene text detection[C]//Proceedings of the 2022 IEEE/CVF Confer-ence on Computer Vision and Pattern Recognition, New Or-leans, Jun 18-24, 2022. Piscataway: IEEE, 2022: 4553-4562.
[53] ZHU Y Q, CHEN J Y, LIANG L Y, et al. Fourier contour embedding for arbitrary-shaped text detection[C]//Procee-dings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition, Jun 19-25, 2021. Piscataway: IEEE, 2021: 3123-3131.
[54] SU Y, SHAO Z, ZHOU Y, et al. TextDCT: arbitrary-shaped text detection via discrete cosine transform mask[J]. IEEE Transactions on Multimedia, 2022: 1-14.
[55] FANG S C, XIE H T, WANG Y X, et al. Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition[C]//Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition, Jun 19-25, 2021. Piscataway: IEEE, 2021: 7098-7107.
[56] HE Y, CHEN C, ZHANG J, et al. Visual semantics allow for textual reasoning better in scene text recognition[C]//Proceedings of the 36th AAAI Conference on Artificial In-telligence, the 34th Conference on Innovative Applications of Artificial Intelligence, the 12th Symposium on Educational Advances in Artificial Intelligence, Feb 22-Mar 1, 2022. Menlo Park: AAAI, 2022: 888-896.
[57] CHU X J, WANG Y T. IterVM: iterative vision modeling module for scene text recognition[J]. arXiv:2204.02630, 2022.
[58] DU Y K, CHEN Z N, JIA C Y, et al. SVTR: scene text recog-nition with a single visual model[J]. arXiv:2205.00159, 2022.
[59] HE T, TIAN Z, HUANG W L, et al. An end-to-end textspot-ter with explicit alignment and attention[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Wa-shington: IEEE Computer Society, 2018: 5020-5029.
[60] FENG W, HE W H, YIN F, et al. TextDragon: an end-to-end framework for arbitrary shaped text spotting[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 9076-9085.
[61] BAEK Y, SHIN S, BAEK J, et al. Character region atten-tion for text spotting[C]//LNCS 12374: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Oct 7-10, 2020. Cham: Springer, 2020: 504-521.
[62] HE K M, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington:IEEE Computer Society, 2017: 2980-2988.
[63] LYU P, LIAO M H, YAO C, et al. Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbi-trary shapes[C]//LNCS 11218: Proceedings of the 15th Eu-ropean Conference on Computer Vision, Munich, Aug 26-Oct 9, 2018. Cham: Springer, 2018: 71-88.
[64] LIAO M H, LYU P, HE M H, et al. Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(2): 532-548.
[65] LIAO M H, PANG G, HUANG J, et al. Mask TextSpotter V3: segmentation proposal network for robust scene text spotting[C]//LNCS 12356: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 706-722.
[66] HUANG J, PANG G, KOVVURI R, et al. A multiplexed net-work for end-to-end, multilingual OCR[C]//Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition, Jun 19-25, 2021. Piscataway: IEEE, 2021: 4547-4557.
[67] QIAO L, CHEN Y, CHENG Z, et al. Mango: a mask attention guided one-stage scene text spotter[J]. arXiv:2012.04350, 2020.
[68] WANG P F, ZHANG C Q, QI F, et al. PGNet: real-time arbitrarily-shaped text spotting with point gathering network[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, the 33rd Conference on Innovative Applica-tions of Artificial Intelligence, the 11th Symposium on Edu-cational Advances in Artificial Intelligence, Feb 2-9, 2021. Menlo Park: AAAI, 2021: 2782-2790.
[69] WANG W H, XIE E Z, LI X, et al. PAN++: towards effi-cient and accurate end-to-end spotting of arbitrarily-shaped text[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 5349-5367.
[70] LIU Y L, CHEN H, SHEN C H, et al. ABCNet: real-time scene text spotting with adaptive Bezier-curve network[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Pis-cataway: IEEE, 2020: 9806-9815.
[71] LIU Y L, SHEN C H, JIN L W, et al. ABCNet v2: adaptive Bezier-curve network for real-time end-to-end text spotting[J]. arXiv:2105.03620, 2021.
[72] KITTENPLON Y, LAVI I, FOGEL S, et al. Towards weakly-supervised text spotting using a multi-task transformer[C]//Proceedings of the 2022 IEEE/CVF Conference on Com-puter Vision and Pattern Recognition, New Orleans, Jun 18-24, 2022. Piscataway: IEEE, 2022: 4594-4603.
[73] WU J J, LYU P, LU G M, et al. Decoupling recognition from detection: single shot self-reliant scene text spotter[J]. arXiv:2207.07253, 2022.
[74] HUANG M X, LIU Y L, PENG Z H, et al. SwinTextSpotter: scene text spotting via better synergy between text detec-tion and text recognition[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recogni-tion, New Orleans, Jun 18-24, 2022. Piscataway: IEEE, 2022: 4583-4593.
[75] ZHANG X, SU Y W, TRIPATHI S, et al. Text spotting tran-sformers[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 18-24, 2022. Piscataway: IEEE, 2022: 9509-9518.
[76] KARATZAS D, SHAFAIT F, UCHIDA S, et al. ICDAR 2013 robust reading competition[C]//Proceedings of the 12th In-ternational Conference on Document Analysis and Recog-nition, Washington, Aug 25-28, 2013. Washington: IEEE Com-puter Society, 2013: 1484-1493.
[77] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//Proceedings of the 13th International Conference on Document Analysis and Recognition, Nancy, Aug 23-26, 2015. Washington: IEEE Computer Society, 2015: 1156-1160.
[78] NAYEF N, YIN F, BIZID I, et al. ICDAR2017 robust rea-ding challenge on multi-lingual scene text detection and script identification-RRC-MLT[C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recog-nition, Kyoto, Nov 9-15, 2017. Piscataway: IEEE, 2017: 1454-1459.
[79] YAO C, BAI X, LIU W Y, et al. Detecting texts of arbitrary orientations in natural images[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recogni-tion, Providence, Jun 16-21, 2012. Washington: IEEE Com-puter Society, 2012: 1083-1090.
[80] GOMEZ R, SHI B G, GOMEZ-BIGORDA L, et al. ICDAR-2017 robust reading challenge on COCO-Text[C]//Procee-dings of the 14th IAPR International Conference on Docu-ment Analysis and Recognition, Kyoto, Nov 9-15, 2017. Pis-cataway: IEEE, 2017: 1435-1443.
[81] WANG K, BELONGIE S J. Word spotting in the wild[C]//LNCS 6311: Proceedings of the 11th European Conference on Computer Vision, Heraklion, Sep 5-11, 2010. Berlin, Hei-delberg: Springer, 2010: 591-604.
[82] LIU Y L, JIN L W, ZHANG S T J, et al. Detecting curve text in the wild: new dataset and new solution[J]. arXiv:1712.02170, 2017.
[83] CHNG C K, CHAN C S. Total-text: a comprehensive dataset for scene text detection and recognition[C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Nov 9-15, 2017. Piscataway: IEEE, 2017: 935-942.
[84] SUN Y P, NI Z H, CHNG C K, et al. ICDAR 2019 com-petition on large-scale street view text with partial labeling-RRC-LSVT[C]//Proceedings of the 2019 International Con-ference on Document Analysis and Recognition, Sydney, Sep 20-25, 2019. Piscataway: IEEE, 2019: 1557-1562.
[85] MISHRA A, ALAHARI K, JAWAHAR C V. Scene text re-cognition using higher order language priors[C]//Proceedings of the British Machine Vision Conference, Surrey, Sep 3-7, 2012. Durham: BMVA Press, 2012: 1-11.
[86] RISNUMAWAN A, SHIVAKUMARA P, CHAN C S, et al. A robust arbitrary text detection system for natural scene images[J]. Expert Systems with Applications, 2014, 41(18): 8027-8048.
[87] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Rea-ding text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1): 1-20.