计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (8): 1323-1330.DOI: 10.3778/j.issn.1673-9418.1705085

• 人工智能与模式识别 • 上一篇    下一篇

文本线局部极值区域两阶段场景文本序列识别

董引娣1+,赵晓祎2   

  1. 1. 重庆城市管理职业学院 信息工程学院,重庆 401331
    2. 中国人民解放军后勤工程学院 训练部,重庆 401331
  • 出版日期:2018-08-01 发布日期:2018-08-09

Two-Stage Scene Text Sequence Recognition Method Based on Text Line Local Extremum Region

DONG Yindi1+, ZHAO Xiaoyi2   

  1. 1. College of Information Engineering, Chongqing City Management College, Chongqing 401331, China
    2. Training Department, Logistical Engineering University of PLA, Chongqing 401331, China
  • Online:2018-08-01 Published:2018-08-09

摘要: 为提高场景文本识别算法的计算效率和精度,提出基于文本线局部极值区域两阶段场景文本序列识别方法。首先,利用特征计算对每个构建的极值区域字符进行概率计算,并选取局部最大概率特征作为第一阶段的输出和第二阶段的输入。其次,利用高效的聚类算法将极值区域字符进行文本线处理,利用字符区域的标签以及OCR分类器进行字体合成,在上下文中的每个字符的文本线已知的情况下,可实现最有可能字符序列的快速选取。最后,通过在USTB-SV1K数据库对多方向文本的仿真测试,验证了算法在计算效率和计算精度上的优势。

关键词: 文本线, 局部极值, 两阶段, 场景文本, 序列识别

Abstract: In order to improve the computational efficiency and accuracy of scene text recognition algorithm, this paper proposes a two-stage scene text sequence recognition method based on text line local extremum region. Firstly, the characters of each extreme value region are calculated by using the feature computation, and the feature with the local maximum probability is selected as the output of the first stage and the input of the second stage. Secondly, the efficient clustering algorithm is used to deal with the characters in the extremum region, and then the font synthesis is done by using the label of the character region and the OCR classifier. The fast selection of the most probable character sequences can be achieved in the case that the text lines of each character in the context are known. Finally, through the simulation test on USTB-SV1K database and multi-direction text and horizontal text, this paper verifies the advantages of algorithm in computational efficiency and computational accuracy.

Key words: text line, local extremum, two-stage, scene text, sequence recognition