Review of Differentiable Binarization Techniques for Text Detection in Natural Scenes

doi:10.3778/j.issn.1673-9418.2311105

Abstract

Abstract: The rich text contained in natural scenes is important for understanding the real world, but the diversity and complexity of natural scene text makes the detection task difficult. With the rise of the intelligent era, deep learning technology has brought breakthroughs for natural scene text detection, and the proposal of differentiable binarization network DBNet has pushed forward the research progress of real-time demand for text detection, and many researchers have carried out innovative and practical researches based on the differentiable binarization technology, and achieved fruitful results. In this paper, the research on text detection algorithms based on differentiable binarization technology in recent years is analyzed in depth. Firstly, the background, working principle, advantages and disadvantages of DBNet model are briefly introduced, and the algorithms based on differentiable binarization technology are classified into five categories of feature extraction, feature fusion, post-processing, overall architecture, and training strategy according to the technical differences. The improvement methods of each category are illustrated in detailed diagrams, the mechanisms of each type of technical method are elaborated in detail, and all methods are analyzed and summarized. Secondly, the commonly used public datasets and text detection performance evaluation indices are introduced, the simulation experimental results of different methods are summarized, and several application scenarios with practical significance are listed. Finally, the future development direction of text detection in natural scenes is considered, and the challenges and problems to be solved are summarized.

Key words: text detection, deep learning, computer vision, differentiable binarization

摘要： 自然场景中包含的丰富文本对理解现实世界具有重要意义，但由于自然场景文本的多样性和复杂性，检测任务变得困难。随着智能时代的兴起，深度学习技术为自然场景文本检测带来突破性进展，可微分二值化网络DBNet的提出，更是推动了文本检测实时性需求的研究进步，许多研究者基于可微分二值化技术，进行了具有创新性和实用性的研究，并取得丰硕成果。对近年来基于可微分二值化技术的文本检测算法研究进行了深入的分析和总结。简要介绍DBNet模型的背景、工作原理、优势与劣势，根据技术差异将基于微分二值化技术的算法分为特征提取、特征融合、后处理、整体架构以及训练策略五类，对每类方法的改进方式进行详细的图示说明，并对各类技术方法的机制进行详细阐述，对所有方法进行分析总结。介绍了常用公开数据集和文本检测性能评估指标，汇总不同方法的仿真实验结果，列举几个具有实际意义的应用场景。对自然场景文本检测领域的未来发展方向进行了思考，并梳理面对的挑战和亟待解决的问题。

关键词: 文本检测, 深度学习, 计算机视觉, 可微分二值化

LIAN Zhe, YIN Yanjun, ZHI Min, XU Qiaozhi. Review of Differentiable Binarization Techniques for Text Detection in Natural Scenes[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2239-2260.

连哲, 殷雁君, 智敏, 徐巧枝. 自然场景文本检测中可微分二值化技术综述[J]. 计算机科学与探索, 2024, 18(9): 2239-2260.

References

[1] ANBUKKARASI S, SATHISHKUMAR V E, DHIVYAA C R, et al. Enhanced feature model based hybrid neural network for text detection on signboard, billboard and news tickers[J]. IEEE Access, 2023, 11: 41524-41534.
[2] XIA X, MENG Z, HAN X, et al. An automated driving systems data acquisition and analytics platform[J]. Transportation Research Part C: Emerging Technologies, 2023, 151: 104120.
[3] WANG J, CHEN Y, DONG Z, et al. Improved YOLOv5 network for real-time multi-scale traffic sign detection[J]. Neural Computing and Applications, 2023, 35(10): 7853-7865.
[4] MENG Z, XIA X, XU R, et al. HYDRO-3D: hybrid object detection and tracking for cooperative perception using 3D LiDAR[J]. IEEE Transactions on Intelligent Vehicles, 2023,8(8): 4069-4080.
[5] HONG T, KIM D, JI M, et al. BROS: a pre-trained language model focusing on text and layout for better key information extraction from documents[C]//Proceedings of the 2022 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2022: 10767-10775.
[6] LIU W, QUIJANO K, CRAWFORD M M. YOLOv5-Tassel: detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 8085-8094.
[7] SHI H, ZHAO D. License plate recognition system based on improved YOLOv5 and GRU[J]. IEEE Access, 2023, 11: 10429-10439.
[8] PADMASIRI H, SHASHIRANGANA J, MEEDENIYA D, et al. Automated license plate recognition for resource-constrained environments[J]. Sensors, 2022, 22(4): 1434.
[9] BANU J F, MUNEESHWARI P, RAJA K, et al. Ontology based image retrieval by utilizing model annotations and content[C]//Proceedings of the 2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence). Piscataway: IEEE, 2022: 300-305.
[10] 连哲, 殷雁君, 云飞, 等. 基于深度学习的自然场景文本检测综述[J]. 计算机工程, 2024, 50(3): 16-27.
LIAN Z, YIN Y J, YUN F, et al. Review of natural scene text detection based on deep learning[J]. Computer Engineering, 2024, 50(3): 16-27.
[11] WANG T, WU D J, COATES A, et al. End-to-end text recognition with convolutional neural networks[C]//Proceedings of the 21st International Conference on Pattern Recognition. Piscataway: IEEE, 2012: 3304-3308.
[12] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25,Lake Tahoe, Dec 3-6, 2012: 1106-1114.
[13] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115: 211-252.
[14] 周燕, 韦勤彬, 廖俊玮, 等. 自然场景文本检测与端到端识别：深度学习方法[J]. 计算机科学与探索, 2023, 17(3): 577-594.
ZHOU Y, WEI Q B, LIAO J W, et al. Natural scene text detection and end-to-end recognition: deep learning methods[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 577-594.
[15] 刘艳菊, 伊鑫海, 李炎阁, 等. 深度学习在场景文字识别技术中的应用综述[J]. 计算机工程与应用, 2022, 58(4): 52-63.
LIU Y J, YIN X H, LI Y G, et al. Application of scene text recognition technology based on deep learning: a survey[J]. Computer Engineering and Applications, 2022, 58(4): 52-63.
[16] 王润民, 桑农, 丁丁, 等. 自然场景图像中的文本检测综述[J]. 自动化学报, 2018, 44(12): 2113-2141.
WANG R M, SANG N, DING D, et al. Text detection in natural scene image: a survey[J]. Acta Automatica Sinica, 2018, 44(12): 2113-2141.
[17] 王建新, 王子亚, 田萱. 基于深度学习的自然场景文本检测与识别综述[J]. 软件学报, 2020, 31(5): 1465-1496.
WANG J X, WANG Z Y, TIAN X. Review of natural scene text detection and recognition based on deep learning[J]. Journal of Software, 2020, 31(5): 1465-1496.
[18] 刘崇宇, 陈晓雪, 罗灿杰, 等. 自然场景文本检测与识别的深度学习方法[J]. 中国图象图形学报, 2021, 26(6): 1330-1367.
LIU C Y, CHEN X X, LUO C J, et al. Deep learning methods for scene text detection and recognition[J]. Journal of Image and Graphics, 2021, 26(6): 1330-1367.
[19] LIU X, MENG G, PAN C. Scene text detection and recognition with advances in deep learning: a survey[J]. International Journal on Document Analysis and Recognition, 2019, 22: 143-162.
[20] LONG S, HE X, YAO C. Scene text detection and recognition: the deep learning era[J]. International Journal of Computer Vision, 2021, 129: 161-184.
[21] CHEN X, JIN L, ZHU Y, et al. Text recognition in the wild: a survey[J]. ACM Computing Surveys, 2021, 54(2): 1-35.
[22] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems 28, Montreal, Dec 7-12, 2015: 91-99.
[23] MA J, SHAO W, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122.
[24] ZHONG Z, SUN L, HUO Q. An anchor-free region proposal network for Faster R-CNN-based text detection approaches[J]. International Journal on Document Analysis and Recognition, 2019, 22: 315-327.
[25] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shotmultibox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[26] LIAO M, ZHU Z, SHI B, et al. Rotation-sensitive regression for oriented scene text detection[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Saline Lake, Jun 19-21, 2018. Washington: IEEE Computer Society, 2018: 5909-5918.
[27] LIAO M, SHI B, BAI X. TextBoxes++: a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690.
[28] LIAO M, WAN Z, YAO C, et al. Real-time scene text detection with differentiable binarization[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 11474-11481.
[29] RAISI Z, NAIEL M A, YOUNES G, et al. Transformer-based text detection in the wild[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 3162-3171.
[30] YE M, ZHANG J, ZHAO S, et al. DeepSolo: let transformer decoder with explicit points solo for text spotting[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 19348-19357.
[31] ZHANG S X, YANG C, ZHU X, et al. Arbitrary shape text detection via boundary transformer[J]. IEEE Transactions on Multimedia, 2024, 26: 1747-1760.
[32] YE M, ZHANG J, ZHAO S, et al. DPText-DETR: towards better scene text detection with dynamic points in transformer[C]//Proceedings of the 2023 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2023: 3241-3249.
[33] LIAO M, ZOU Z, WAN Z, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(1): 919-931.
[34] WANG W, XIE E, LI X, et al. PAN++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(9): 5349-5367.
[35] WANG W, XIE E, LI X, et al. Shape robust text detection with progressive scale expansion network[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 9336-9345.
[36] WANG W, XIE E, SONG X, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 8440-8449.
[37] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 3431-3440.
[38] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2117-2125.
[39] LI Y, QI H, DAI J, et al. Fully convolutional instance-aware semantic segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2359-2367.
[40] LIN J, YAN Y, WANG H. A dual-path transformer network for scene text detection[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Nashville, Mar 26-30, 2023. Piscataway: IEEE, 2023: 1-5.
[41] CAI Y, LIU Y, SHEN C, et al. Arbitrarily shaped scene text detection with dynamic convolution[J]. Pattern Recognition, 2022, 127: 108608.
[42] GUO Y, ZHOU Y, QIN X, et al. UNITS: unsupervised intermediate training stage for scene text detection[C]//Proceedings of the 2022 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2022: 1-6.
[43] CHANG H C, CHEN H J, SHEN Y C, et al. Re-Attention is all you need: memory-efficient scene text detection via re-attention on uncertain regions[C]//Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2021: 452-459.
[44] ZHANG Y, SONG C, XUE M. PSND: a robust parking space number detector[C]//Proceedings of the 2022 26th International Conference on Pattern Recognition. Piscataway: IEEE, 2022: 1742-1748.
[45] WU H, DONG B, DING L, et al. Attention feature pyramid network for scene text detection[C]//Proceedings of the 2022 IEEE 8th International Conference on Computer and Communications. Piscataway: IEEE, 2022: 1726-1731.
[46] WANG Z, TIAN X. Power equipment nameplate text detection based on improved multiscale feature fusion network[C]//Proceedings of the 15th International Conference on Digital Image Processing. New York: ACM, 2023: 1-8.
[47] 卫薇, 龙娜, 田钺, 等. 基于改进DBNet的电力设备铭牌文本检测方法研究[J]. 高电压技术, 2023, 49(S1): 63-67.
WEI W, LONG N, TIAN Y, et al. Research on text detection method for power equipment nameplates based on improved DBNet[J]. High Voltage Engineering, 2023, 49(S1): 63-67.
[48] WANG X, LI Y, LIU J, et al. Intelligent micron optical character recognition of dfb chip using deep convolutional neural network[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-9.
[49] QU Z, SHEN J, LI R, et al. Partsnet: a unified deep network for automotive engine precision parts defect detection[C]//Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence. New York:ACM, 2018: 594-599.
[50] DU Y, DONG J. Research on natural scene vehicle nameplate text detection based on improved DBNet[C]//Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning, Shanghai, Mar 17-19, 2023: 338-345.
[51] ZHOU X, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition,Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 5551-5560.
[52] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 770-778.
[53] TANG Q, JIANG Z, PAN B, et al. Scene text detection using HRNet and spatial attention mechanism[J]. Programming and Computer Software, 2023, 49(8): 954-965.
[54] ZHU X, HU H, LIN S, et al. Deformable ConvNets V2: more deformable, better results[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9308-9316.
[55] BISWAS K, KUMAR S, BANERJEE S, et al. SMU: smooth activation function for deep networks using smoothing maximum technique[EB/OL]. [2023-09-23]. https://arxiv.org/abs/2111.04682.
[56] 邵海琳, 季怡, 刘纯平, 等. 基于增强特征金字塔网络的场景文本检测算法[J]. 计算机科学, 2022, 49(2): 248-255.
SHAO H L, JI Y, LIU C P, et al. Scene text detection algorithm based on enhanced feature pyramid network[J]. Computer Science, 2022, 49(2): 248-255.
[57] LIU B, JIN J. Text detection based on bidirectional feature fusion and SA attention mechanism[C]//Proceedings of the 2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers. Piscataway: IEEE, 2022: 912-915.
[58] ZHANG Q L, YANG Y B. SA-NET: shuffle attention for deep convolutional neural networks[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 2235-2239.
[59] IBRAYIM M, LI Y, HAMDULLA A. Scene text detection based on two-branch feature extraction[J]. Sensors, 2022, 22(16): 6262.
[60] LI Y, IBRAYIM M, HAMDULLA A. CSFF-Net: scene text detection based on cross-scale feature fusion[J]. Information, 2021, 12(12): 524.
[61] LU M, LENG Y, CHEN C L, et al. An improved differentiable binarization network for natural scene street sign text detection[J]. Applied Sciences, 2022, 12(23): 12120.
[62] 邹伟平, 冯辉扬, 龙鑫. 基于改进特征金字塔网络和注意力机制的场景文本检测[J]. 电子技术与软件工程, 2022(13): 174-177.
ZHOU W P, FENG H Y, LONG X. Scene text detection based on improved feature pyramid network and attention mechanism[J]. Electronic Technology & Software Engineering, 2022(13): 174-177.
[63] WANG H, FENG S. Research on text detection algorithm based on improved FPN[C]//Proceedings of the 2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference. Piscataway: IEEE, 2022: 352-355.
[64] HUANG B, FENG X. Scene text detection based on multi-headed self-attention using shifted windows[J]. Applied Sciences, 2023, 13(6): 3928.
[65] LI Y, SILAMU W, WANG Z, et al. Attention-based scene text detection on dual feature fusion[J]. Sensors, 2022, 22(23): 9072.
[66] SUN Q, ZHANG J, LIU Z, et al. Text detection method of signage image based on attention mechanism and SPP[C]//Proceedings of the 9th International Symposium on Test Automation & Instrumentation, Beijing, Nov 11-13, 2022: 520-524.
[67] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[68] 魏哲亮, 李岳阳, 罗海驰. 多尺度池化和双向特征融合的场景文本检测[J]. 计算机工程与应用, 2024, 60(2): 154-161.
WEI Z L, LI Y Y, LUO H C. Scene text detection based on multi-scale pooling and bidirectional feature fusion[J]. Computer Engineering and Applications, 2024, 60(2): 154-161.
[69] CHENG Y, WAN Y, SIMA Y, et al. Text detection of transformer based on deep learning algorithm[J]. Tehni?ki vjesnik, 2022, 29(3): 861-866.
[70] RONNEBERGER O, FISCHER P, BROX T. U-NET: convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Oct 5-9, 2015. Cham: Springer, 2015: 234-241.
[71] NAOSEKPAM V, AGGARWAL S, SAHU N. UTextNet: a UNet based arbitrary shaped scene text detector[C]//Proceedings of the 2021 International Conference on Intelligent Systems Design and Applications. Cham: Springer, 2021: 368-378.
[72] GU S, ZHANG F. Applicable scene text detection based on semantic segmentation[J]. Journal of Physics: Conference Series, 2020, 1631(1): 012080.
[73] HEN H, LIU J, ZHOU W. Natural scene text detection algorithm based on improved DBNet[C]//Proceedings of the 2022 IEEE 5th International Conference on Electronic Information and Communication Technology. Piscataway: IEEE, 2022: 186-190.
[74] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 7132-7141.
[75] MA H, LU N, MEI J, et al. Label distribution learning for scene text detection[J]. Frontiers of Computer Science, 2023, 17(6): 176339.
[76] ZHAO Q, WANG Y, LYU S, et al. Attention-based feature decomposition-reconstruction network for scene text detection[EB/OL]. [2023-09-23]. https://arxiv.org/abs/2111.14340.
[77] ZHAO F, YU J, XING E, et al. Real-time scene text detection based on global level and word level features[EB/OL]. [2023-09-23]. https://arxiv.org/abs/2203.05251.
[78] WANG L, YAO X, SONG C. Text detection method based on HDBNet in natural scenes[J]. The Journal of Engineering, 2023(1): e12212.
[79] ZHU J, WANG G. TransText: improving scene text detection via transformer[J]. Digital Signal Processing, 2022, 130: 103698.
[80] YANG J, YOU Z, ZHONG Z, et al. DTTR: detecting text with transformers[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Nashville, Mar 26-30, 2023. Piscataway: IEEE, 2023: 1-5.
[81] TAN M, PANG R, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10781-10790.
[82] CHEN X, CHANG Y, ZHANG P, et al. Pixel-level end-to-end dual-channel bill text detection based algorithm[C]//Proceedings of the 2022 7th International Conference on Intelligent Computing and Signal Processing. Piscataway: IEEE, 2022: 405-409.
[83] CHENG Q, WANG G. Shape awareness and structure-preserving network for arbitrary shape text detection[J]. Multimedia Tools and Applications, 2021, 80: 10761-10775.
[84] 李雨, 闫甜甜, 周东生, 等. 基于注意力机制与深度多尺度特征融合的自然场景文本检测[J]. 图学学报, 2023, 44(3): 473-481.
LI Y, YAN T T, ZHOU D S, et al. Natural scene text detection based on attention mechanism and deep multi-scale feature fusion[J]. Journal of Graphics, 2023, 44(3): 473-481.
[85] ZHANG H, WU C, ZHANG Z, et al. ResNeSt: split-attention networks[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 2736-2746.
[86] WANG Y, MAMAT H, XU X, et al. Scene Uyghur text detection based on fine-grained feature representation[J]. Sensors, 2022, 22(12): 4372.
[87] GAO S H, CHENG M M, ZHAO K, et al. Res2Net: a new multi-scale backbone architecture[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(2): 652- 662.
[88] HU X, WU D, LI H, et al. ShallowNet: an efficient lightweight text detection network based on instance count-aware supervision information[C]//Proceedings of the 2021 International Conference on Neural Information Processing. Cham: Springer, 2021: 633-644.
[89] LIN W, ZHANG Z, XUE X. an agile and efficient neural network based on knowledge distillation for scene text detection[J]. Wireless Communications and Mobile Computing, 2022(1): 8682961.
[90] YANG P, ZHANG F, YANG G. A fast scene text detector using knowledge distillation[J]. IEEE Access, 2019, 7: 22588-22598.
[91] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2023-09-23]. https://arxiv.org/abs/1409.1556.
[92] KIM K H, HONG S, ROH B, et al. PVANET: deep but lightweight neural networks for real-time object detection[EB/OL]. [2023-09-23]. https://arxiv.org/abs/1608.08021.
[93] SHAHAB A, SHAFAIT F, DENGEL A. ICDAR 2011 robust reading competition challenge 2: reading text in scene images[C]//Proceedings of the 2011 International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2011: 1491-1496.
[94] KARATZAS D, SHAFAIT F, UCHIDA S, et al. ICDAR 2013 robust reading competition[C]//Proceedings of the 2013 12th International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2013: 1484-1493.
[95] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//Proceedings of the 2015 13th International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2015: 1156-1160.
[96] SHI B, YAO C, LIAO M, et al. ICDAR2017 competition on reading Chinese text in the wild (RCTW-17)[C]//Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Nov 9-12, 2017. Piscataway: IEEE, 2017: 1429-1434.
[97] NAYEF N, YIN F, BIZID I, et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT[C]//Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2017: 1454-1459.
[98] GOMEZ R, SHI B, GOMEZ L, et al. ICDAR2017 robust reading challenge on COCO-text[C]//Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Nov 9-12, 2017. Piscataway: IEEE, 2017: 1435-1443.
[99] SUN Y, LIU J, LIU W, et al. Chinese street view text: large-scale Chinese text reading with partially supervised learning[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9086-9095.
[100] CHNG C K, LIU Y, SUN Y, et al. ICDAR2019 robust reading challenge on arbitrary-shaped text-RRC-ArT[C]//Proceedings of the 2019 International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2019: 1571-1576.
[101] FENG W, HE W, YIN F, et al. TextDragon: an end-to-end framework for arbitrary shaped text spotting[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9076-9085.
[102] CHNG C K, CHAN C S. Total-text: a comprehensive dataset for scene text detection and recognition[C]//Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition. Piscataway: IEEE, 2017: 935-942.
[103] YAO C, BAI X, LIU W, et al. Detecting texts of arbitrary orientations in natural images[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2012: 1083-1090.
[104] WANG K, BELONGIE S J. Word spotting in the wild[C]//LNCS 6311: Proceedings of the 11th European Conference on Computer Vision, Heraklion, Sep 5-11, 2010. Berlin, Heidelberg: Springer, 2010: 591-604.
[105] MISHRA A, ALAHARI K, JAWAHAR C V. Scene text recognition using higher order language priors[C]//Proceedings of the 2012 British Machine Vision Conference, Surrey, Sep 3-7, 2012. Durham: BMVA Press, 2012: 1-11.
[106] RISNUMAWAN A, SHIVAKUMARA P, CHAN C S, et al. A robust arbitrary text detection system for natural scene images[J]. Expert Systems with Applications, 2014, 41(18): 8027-8048.
[107] WOLF C, JOLION J M. Object count/area graphs for the evaluation of object detection and segmentation algorithms[J]. International Journal of Document Analysis and Recognition, 2006, 8(4): 280-296.
[108] SAMADI M, MOUSAVIAN M, MOMTAZI S. Deep contextualized text representation and learning for fake news detection[J]. Information Processing & Management, 2021, 58(6): 102723.
[109] CHOWDHURY P N, SHIVAKUMARA P, RAGHAVENDRA R, et al. An episodic learning network for text detection on human bodies in sports images[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(4): 2279-2289.
[110] VORAKITPHAN V, CABRIO E, VILLATA S. PROTECT—a pipeline for propaganda detection and classification[C]//Proceedings of the 8th Italian Conference on Computational Linguistics, Milan, Jan 26-28, 2022. Turin: Accademia University Press, 2022: 352-358.
[111] OUYANG D, HE S, ZHANG G, et al. Efficient multi-scale attention module with cross-spatial learning[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Nashville, Mar 26-30, 2023. Piscataway: IEEE, 2023: 1-5.
[112] WAN H, ZENG X, FAN Z, et al. U2ESPNet—a lightweight and high-accuracy convolutional neural network for real-time semantic segmentation of visible branches[J]. Computers and Electronics in Agriculture, 2023, 204: 107542.
[113] LIAN Z, YIN Y, ZHI M, et al. PCBSNet: a pure convolutional bilateral segmentation network for real-time natural scene text detection[J]. Electronics, 2023, 12(14): 3055.
[114] YU C, GAO C, WANG J, et al. BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129: 3051-3068.
[115] LIU Z, MAO H, WU C Y, et al. A ConvNet for the 2020s[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11976-11986.
[116] GUPTA A, VEDALDI A, ZISSERMAN A. Synthetic data for text localisation in natural images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society,2016: 2315-2324.
[117] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2009: 248-255.
[118] LUO D, ZHOU Y, YANG R, et al. ICDAR 2023 competition on detecting tampered text in images[C]//Proceedings of the 2023 International Conference on Document Analysis and Recognition. Cham: Springer, 2023: 587-600.
[119] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144.
[120] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 6840-6851.