Journal of Frontiers of Computer Science and Technology ›› 2023, Vol. 17 ›› Issue (3): 577-594.DOI: 10.3778/j.issn.1673-9418.2209004

• Frontiers·Surveys • Previous Articles     Next Articles

Natural Scene Text Detection and End-to-End Recognition: Deep Learning Methods

ZHOU Yan, WEI Qinbin, LIAO Junwei, ZENG Fanzhi, FENG Wenjie, LIU Xiangyu, ZHOU Yuexia   

  1. Department of Computer Science, Foshan University, Foshan, Guangdong 528000, China
  • Online:2023-03-01 Published:2023-03-01



  1. 佛山科学技术学院 计算机系,广东 佛山 528000

Abstract: The rich text content in natural scene images is of great significance for scene understanding, but natural scene texts are often characterized by extreme horizontal/vertical ratio, variable font style, complex background and shape, etc. Traditional text detection and end-to-end recognition methods have the disadvantages of complex model design, low efficiency, low applicability and high cost. With the rapid development of deep learning technology in  image field, natural scene text detection and end-to-end recognition methods have made breakthrough progress, and their performance and efficiency have been significantly improved. Aiming at the text detection and end-to-end recognition methods of natural scene, this paper reviews the related research work in recent years. Firstly, according to different generation methods of text boxes, the basic ideas of natural scene text detection methods are divided mainly from two perspectives of regression candidate boxes and pixel segmentation, and various representative methods are described in detail. Secondly, from the perspective of end-to-end recognition speed and decoupling detection and recognition task, the development route of end-to-end recognition methods is summarized. Then, the commonly used open text datasets are introduced, and performance of representative methods is evaluated on the open datasets. Finally, the main research directions of natural scene text detection and end-to-end recognition are discussed, and challenges and future development trends are expounded.

Key words: deep learning, natural scene, text detection, end-to-end recognition

摘要: 自然场景图像中丰富的文本内容对场景理解有着重要意义,但自然场景文本往往具有极端横纵比、字体风格多变、背景及形状复杂等特点,而传统的文本检测与端到端识别方法存在着模型设计复杂、效率低、适用性不强且耗费成本高等缺点。随着深度学习技术在图像领域的迅速发展,自然场景文本检测与端到端识别方法取得了突破性的进展,其性能和效率得到了显著提高。针对自然场景文本检测与端到端识别方法,梳理了近年来相关的研究工作。首先,根据文本框生成方式的不同,主要从回归候选框和像素分割两个角度来划分自然场景文本检测方法的基本思想,并对各类代表性的方法进行了详细叙述;其次,从端到端识别速度与解耦检测和识别任务的角度对端到端识别方法的技术发展路线进行了归纳总结;然后,介绍了常用的公开文本数据集,并在公开的文本数据集上对各类方法进行了性能对比;最后,对自然场景文本检测与端到端识别的主流研究方向进行了讨论,并阐述了其面临的挑战和未来的发展趋势。

关键词: 深度学习, 自然场景, 文本检测, 端到端识别