计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (9): 2239-2260.DOI: 10.3778/j.issn.1673-9418.2311105

• 前沿·综述 • 上一篇    下一篇

自然场景文本检测中可微分二值化技术综述

连哲,殷雁君,智敏,徐巧枝   

  1. 内蒙古师范大学 计算机科学技术学院,呼和浩特 010022
  • 出版日期:2024-09-01 发布日期:2024-09-01

Review of Differentiable Binarization Techniques for Text Detection in Natural Scenes

LIAN Zhe, YIN Yanjun, ZHI Min, XU Qiaozhi   

  1. School of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, China
  • Online:2024-09-01 Published:2024-09-01

摘要: 自然场景中包含的丰富文本对理解现实世界具有重要意义,但由于自然场景文本的多样性和复杂性,检测任务变得困难。随着智能时代的兴起,深度学习技术为自然场景文本检测带来突破性进展,可微分二值化网络DBNet的提出,更是推动了文本检测实时性需求的研究进步,许多研究者基于可微分二值化技术,进行了具有创新性和实用性的研究,并取得丰硕成果。对近年来基于可微分二值化技术的文本检测算法研究进行了深入的分析和总结。简要介绍DBNet模型的背景、工作原理、优势与劣势,根据技术差异将基于微分二值化技术的算法分为特征提取、特征融合、后处理、整体架构以及训练策略五类,对每类方法的改进方式进行详细的图示说明,并对各类技术方法的机制进行详细阐述,对所有方法进行分析总结。介绍了常用公开数据集和文本检测性能评估指标,汇总不同方法的仿真实验结果,列举几个具有实际意义的应用场景。对自然场景文本检测领域的未来发展方向进行了思考,并梳理面对的挑战和亟待解决的问题。

关键词: 文本检测, 深度学习, 计算机视觉, 可微分二值化

Abstract: The rich text contained in natural scenes is important for understanding the real world, but the diversity and complexity of natural scene text makes the detection task difficult. With the rise of the intelligent era, deep learning technology has brought breakthroughs for natural scene text detection, and the proposal of differentiable binarization network DBNet has pushed forward the research progress of real-time demand for text detection, and many researchers have carried out innovative and practical researches based on the differentiable binarization technology, and achieved fruitful results. In this paper, the research on text detection algorithms based on differentiable binarization technology in recent years is analyzed in depth. Firstly, the background, working principle, advantages and disadvantages of DBNet model are briefly introduced, and the algorithms based on differentiable binarization technology are classified into five categories of feature extraction, feature fusion, post-processing, overall architecture, and training strategy according to the technical differences. The improvement methods of each category are illustrated in detailed diagrams, the mechanisms of each type of technical method are elaborated in detail, and all methods are analyzed and summarized. Secondly, the commonly used public datasets and text detection performance evaluation indices are introduced, the simulation experimental results of different methods are summarized, and several application scenarios with practical significance are listed. Finally, the future development direction of text detection in natural scenes is considered, and the challenges and problems to be solved are summarized.

Key words: text detection, deep learning, computer vision, differentiable binarization