计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (4): 888-897.DOI: 10.3778/j.issn.1673-9418.2010094

• 人工智能 • 上一篇    下一篇

注意力机制与复合卷积在手写识别中的应用

卓天天, 桑庆兵+()   

  1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
  • 收稿日期:2020-10-29 修回日期:2021-01-07 出版日期:2022-04-01 发布日期:2021-02-05
  • 通讯作者: + E-mail: sangqb@163.com
  • 作者简介:卓天天(1995—),男,江苏宿迁人,硕士研究生,主要研究方向为光学字符识别。
    桑庆兵(1973—),男,安徽明光人,副教授,主要研究方向为模式识别、图像质量评价。
  • 基金资助:
    江苏省自然科学基金(BK20171142)

Application of Attention Mechanism and Composite Convolution in Handwriting Recognition

ZHUO Tiantian, SANG Qingbing+()   

  1. School of Artificial Intelligence and Computer, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Received:2020-10-29 Revised:2021-01-07 Online:2022-04-01 Published:2021-02-05
  • About author:ZHUO Tiantian, born in 1995, M.S. candidate. His research interest is optical character recognition.
    SANG Qingbing, born in 1973, associate professor. His research interests include pattern recognition and image quality assessment.
  • Supported by:
    Natural Science Foundation of Jiangsu Province(BK20171142)

摘要:

将图片切分成单“字”识别再连接成“串”是脱机手写图像识别的一种方法,但由于手写字符间易存在粘连,切分方法不易实现。卷积循环神经网络(CRNN)虽解决了整张文本图片输入,标签却不易对齐的问题,但由于不同人脱机手写风格的严重差异,网络提取出的特征表示力不够。对此提出了加强型卷积块注意力模块和复合卷积,并将其加入处理脱机文本识别的CRNN+CTC主流框架中。加强型卷积块注意力模块增大输入特征图的贡献权重且并联地使用通道注意力、空间注意力,丰富了细化特征图语义信息的同时避免了通道注意力模块对空间注意力模块的权重干扰,使得网络更聚焦图片中的有用特征而非无用的拖拽字迹特征。而嵌入在网络深层的复合卷积采用的多卷积核卷积意味着不同尺度的特征融合,增强了网络的泛化性。基于加强型卷积块注意力模块和复合卷积的CRNN+CTC框架在具有语义信息的IAM数据集上准确率达到85.774 8%,字符错误率为8.6%;在RIMES数据集上准确率达到92.872 8%,字符错误率为3.9%,比起当前主流的脱机文本识别算法,性能进一步提升。

关键词: 脱机英文手写单词识别, 加强型卷积块注意力模块, 复合卷积, 卷积循环神经网络(CRNN)

Abstract:

It is a method of offline handwritten image recognition to segment a picture into a single “character” recognition and then connect it into a “string”. However, due to the adhesion between handwritten characters, the segmentation method is not easy to achieve. Although the convolutional recurrent neural network (CRNN) solves the problem that the whole text image is input, the label is not easy to align, however, due to the serious difference in offline handwriting style between different people, the feature extracted by the network is not powerful enough to represent the features. In response to this, the enhanced convolutional block attention module and composite convolution are proposed, and they are added to the CRNN+CTC mainstream framework for processing offline text recognition. The enhanced convolutional block attention module increases the contribution weight of the input feature map and uses channel attention and spatial attention in parallel to enrich the semantic information of the refined feature map, avoiding the channel attention module’s influence on the spatial attention module. It makes the network focus more on useful features in pictures rather than useless dragging handwriting features. The composite convolution embedded in the deep layer of the network adopts multi-convolution kernel convolution, which means the feature fusion of different scales and enhances the generalization of the network. The CRNN+CTC framework based on the enhanced convolutional block attention module and composite convolution achieves an accuracy rate of 85.7748% and a character error rate of 8.6% on the IAM dataset with semantic information; on the RIMES dataset, the accuracy rate is 92.8728%, and the character error rate is 3.9%. Compared with the current mainstream offline text recognition algorithms, its performance is further improved.

Key words: offline English handwritten word recognition, enhanced convolutional block attention module, composite convolution, convolutional recurrent neural network (CRNN)

中图分类号: