计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (4): 888-897.DOI: 10.3778/j.issn.1673-9418.2010094
收稿日期:
2020-10-29
修回日期:
2021-01-07
出版日期:
2022-04-01
发布日期:
2021-02-05
通讯作者:
+ E-mail: sangqb@163.com作者简介:
卓天天(1995—),男,江苏宿迁人,硕士研究生,主要研究方向为光学字符识别。基金资助:
ZHUO Tiantian, SANG Qingbing+()
Received:
2020-10-29
Revised:
2021-01-07
Online:
2022-04-01
Published:
2021-02-05
About author:
ZHUO Tiantian, born in 1995, M.S. candidate. His research interest is optical character recognition.Supported by:
摘要:
将图片切分成单“字”识别再连接成“串”是脱机手写图像识别的一种方法,但由于手写字符间易存在粘连,切分方法不易实现。卷积循环神经网络(CRNN)虽解决了整张文本图片输入,标签却不易对齐的问题,但由于不同人脱机手写风格的严重差异,网络提取出的特征表示力不够。对此提出了加强型卷积块注意力模块和复合卷积,并将其加入处理脱机文本识别的CRNN+CTC主流框架中。加强型卷积块注意力模块增大输入特征图的贡献权重且并联地使用通道注意力、空间注意力,丰富了细化特征图语义信息的同时避免了通道注意力模块对空间注意力模块的权重干扰,使得网络更聚焦图片中的有用特征而非无用的拖拽字迹特征。而嵌入在网络深层的复合卷积采用的多卷积核卷积意味着不同尺度的特征融合,增强了网络的泛化性。基于加强型卷积块注意力模块和复合卷积的CRNN+CTC框架在具有语义信息的IAM数据集上准确率达到85.774 8%,字符错误率为8.6%;在RIMES数据集上准确率达到92.872 8%,字符错误率为3.9%,比起当前主流的脱机文本识别算法,性能进一步提升。
中图分类号:
卓天天, 桑庆兵. 注意力机制与复合卷积在手写识别中的应用[J]. 计算机科学与探索, 2022, 16(4): 888-897.
ZHUO Tiantian, SANG Qingbing. Application of Attention Mechanism and Composite Convolution in Handwriting Recognition[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 888-897.
Data | Label |
---|---|
opposed | |
passage | |
folk |
表1 数据集示例
Table 1 Examples of dataset
Data | Label |
---|---|
opposed | |
passage | |
folk |
类型 | 卷积核尺寸 | 步长 | 特征图数 |
---|---|---|---|
Conv1 | [3,3] | [1,1] | 64 |
Conv2 | [3,3] | [1,1] | 64 |
CBAM+ | / | / | 64 |
Pool1 | [2,2] | [2,2] | 64 |
Conv3 | [3,3] | [1,1] | 128 |
Conv4 | [3,3] | [1,1] | 128 |
CBAM+ | / | / | 128 |
Pool2 | [2,2] | [2,2] | 128 |
Conv5 | [3,3] | [1,1] | 256 |
Conv6 | [3,3] | [1,1] | 256 |
CBAM+ | / | / | 256 |
Pool3 | [2,1] | [2,1] | 256 |
Conv7-1 | [3,3] | [1,1] | 512 |
Conv7-2 | [5,5] | [1,1] | 512 |
BN1 | / | / | / |
Conv8-1 | [3,3] | [1,1] | 512 |
Conv8-2 | [5,5] | [1,1] | 512 |
Pool4 | [2,1] | [2,1] | 512 |
BN2 | / | / | / |
Conv9 | [2,1] | [1,1] | 512 |
BLSTM | / | / | / |
CTC | / | / | / |
表2 本文提出的基于CBAM+和复合卷积的CRNN+CTC框架
Table 2 CRNN+CTC framework based on CBAM+ and composite convolution proposed in this paper
类型 | 卷积核尺寸 | 步长 | 特征图数 |
---|---|---|---|
Conv1 | [3,3] | [1,1] | 64 |
Conv2 | [3,3] | [1,1] | 64 |
CBAM+ | / | / | 64 |
Pool1 | [2,2] | [2,2] | 64 |
Conv3 | [3,3] | [1,1] | 128 |
Conv4 | [3,3] | [1,1] | 128 |
CBAM+ | / | / | 128 |
Pool2 | [2,2] | [2,2] | 128 |
Conv5 | [3,3] | [1,1] | 256 |
Conv6 | [3,3] | [1,1] | 256 |
CBAM+ | / | / | 256 |
Pool3 | [2,1] | [2,1] | 256 |
Conv7-1 | [3,3] | [1,1] | 512 |
Conv7-2 | [5,5] | [1,1] | 512 |
BN1 | / | / | / |
Conv8-1 | [3,3] | [1,1] | 512 |
Conv8-2 | [5,5] | [1,1] | 512 |
Pool4 | [2,1] | [2,1] | 512 |
BN2 | / | / | / |
Conv9 | [2,1] | [1,1] | 512 |
BLSTM | / | / | / |
CTC | / | / | / |
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
最终模型 | 85.774 8 | 92.872 8 | 8.6 | 3.9 |
最终模型删除CBAM+ | 85.478 1 | 91.356 4 | 8.9 | 4.1 |
表3 删除CBAM+前后的模型性能对比
Table 3 Performance comparison of models before and after deleting CBAM+
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
最终模型 | 85.774 8 | 92.872 8 | 8.6 | 3.9 |
最终模型删除CBAM+ | 85.478 1 | 91.356 4 | 8.9 | 4.1 |
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
添加传统CBAM K=3 | 84.881 7 | 90.698 7 | 9.5 | 4.6 |
添加传统CBAM K=7 | 85.627 4 | 91.055 8 | 9.3 | 4.3 |
添加CBAM+ K=7,C=7 | 85.638 9 | 91.434 5 | 9.3 | 4.3 |
添加CBAM+ K=7,C=3 | 85.684 6 | 91.554 3 | 9.1 | 4.2 |
表4 不同注意力机制、参数对模型的性能影响
Table 4 Impact of different attention mechanisms and parameters on model performance
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
添加传统CBAM K=3 | 84.881 7 | 90.698 7 | 9.5 | 4.6 |
添加传统CBAM K=7 | 85.627 4 | 91.055 8 | 9.3 | 4.3 |
添加CBAM+ K=7,C=7 | 85.638 9 | 91.434 5 | 9.3 | 4.3 |
添加CBAM+ K=7,C=3 | 85.684 6 | 91.554 3 | 9.1 | 4.2 |
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
最终模型 | 85.774 8 | 92.872 8 | 8.6 | 3.9 |
最终模型删除复合卷积 | 85.684 6 | 91.554 3 | 9.1 | 4.2 |
表5 删除复合卷积前后的模型性能对比
Table 5 Performance comparison of models before and after deleting composite convolution
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
最终模型 | 85.774 8 | 92.872 8 | 8.6 | 3.9 |
最终模型删除复合卷积 | 85.684 6 | 91.554 3 | 9.1 | 4.2 |
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
复合卷积中ksize1:3,ksize2:3 | 84.618 4 | 90.664 7 | 9.7 | 4.5 |
复合卷积中ksize1:3,ksize2:5 | 85.478 1 | 91.356 4 | 8.9 | 4.1 |
复合卷积中ksize1:5,ksize2:5 | 84.667 9 | 90.766 5 | 9.7 | 4.5 |
表6 复合卷积中卷积核尺寸对模型的性能影响
Table 6 Impact of convolution kernel size on model performance in composite convolution
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
复合卷积中ksize1:3,ksize2:3 | 84.618 4 | 90.664 7 | 9.7 | 4.5 |
复合卷积中ksize1:3,ksize2:5 | 85.478 1 | 91.356 4 | 8.9 | 4.1 |
复合卷积中ksize1:5,ksize2:5 | 84.667 9 | 90.766 5 | 9.7 | 4.5 |
卷积核个数 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
2(ksize1:3,ksize2:5) | 85.478 1 | 91.356 4 | 8.9 | 4.1 |
3(ksize1:3,ksize2:5,ksize3:7) | 84.951 9 | 91.071 8 | 9.4 | 4.4 |
表7 复合卷积中卷积核个数对模型的性能影响
Table 7 Impact of the number of convolution kernels on model performance in composite convolution
卷积核个数 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
2(ksize1:3,ksize2:5) | 85.478 1 | 91.356 4 | 8.9 | 4.1 |
3(ksize1:3,ksize2:5,ksize3:7) | 84.951 9 | 91.071 8 | 9.4 | 4.4 |
卷积层层数 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
8 | 83.903 3 | 89.097 8 | 10.4 | 5.3 |
9 | 84.606 4 | 90.397 5 | 9.9 | 5.2 |
10 | 84.190 2 | 89.788 6 | 10.2 | 5.3 |
表8 卷积层层数对模型的性能影响
Table 8 Impact of the number of convolution layers on model performance
卷积层层数 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
8 | 83.903 3 | 89.097 8 | 10.4 | 5.3 |
9 | 84.606 4 | 90.397 5 | 9.9 | 5.2 |
10 | 84.190 2 | 89.788 6 | 10.2 | 5.3 |
Method | Pre-processing | Lexicon | Pre-train | CER/% | |
---|---|---|---|---|---|
IAM | RIMES | ||||
Shi等[ | / | / | / | 9.90 | 5.20 |
Krishnan等[ | / | / | Synthetic | 6.34 | — |
Stuner等[ | / | 2.4×106 | / | 4.77 | 2.67 |
Luo等[ | √ | / | / | 5.13 | 2.42 |
Xu等[ | √ | / | / | 6.07 | — |
Bluche等[ | / | / | CTC | 12.60 | — |
Sueiras等[ | √ | / | / | 8.80 | 4.80 |
Carbonell等[ | / | / | / | 15.60 | — |
Proposed | / | / | / | 8.60 | 3.90 |
表9 当前流行方法在IAM、RIMES数据集上的精度对比
Table 9 Accuracy comparison of current popular methods on IAM and RIMES datasets
Method | Pre-processing | Lexicon | Pre-train | CER/% | |
---|---|---|---|---|---|
IAM | RIMES | ||||
Shi等[ | / | / | / | 9.90 | 5.20 |
Krishnan等[ | / | / | Synthetic | 6.34 | — |
Stuner等[ | / | 2.4×106 | / | 4.77 | 2.67 |
Luo等[ | √ | / | / | 5.13 | 2.42 |
Xu等[ | √ | / | / | 6.07 | — |
Bluche等[ | / | / | CTC | 12.60 | — |
Sueiras等[ | √ | / | / | 8.80 | 4.80 |
Carbonell等[ | / | / | / | 15.60 | — |
Proposed | / | / | / | 8.60 | 3.90 |
[1] | MORI S, NISHIDA H, YAMADA H. Optical character reco-gnition[M]. New York: John Wiley & Sons, Inc., 1999. |
[2] | SUEN C Y, NADAL C, LEGAULT R, et al. Computer reco-gnition of unconstrained handwritten numerals[J]. Procee-dings of the IEEE, 1992, 80(7):1162-1180. |
[3] | BUNKE H, WANG P S. Handbook of character recognition and document image analysis[M]. Singapore: World Scien-tific Publishing Company, 1997. |
[4] |
LIU C L, SAKO H, FUJISAWA H. Performance evalua-tion of pattern classifiers for handwritten character recogni-tion[J]. International Journal on Document Analysis and Recognition, 2002, 4(3):191-204.
DOI URL |
[5] | MADHVANATH S, GOVINDARAJU V. Local reference lines for handwritten phrase recognition[J]. Pattern Recogni-tion, 1999, 32(12):2021-2028. |
[6] |
CASEY R G, LECOLINET E. A survey of methods and strategies in character segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, 18(7):690-706.
DOI URL |
[7] |
EL-YACOUBI M A, GILLOUX M, SABOURIN R, et al. An HMM-based approach for off-line unconstrained hand-written word modeling and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(8):752-760.
DOI URL |
[8] | 卫晓欣. 基于长短型记忆递归神经网络的英文手写识别[D]. 广州: 华南理工大学, 2014. |
WEI X X. English handwriting recognition based on long and short memory recurrent neural network[D]. Guangzhou:South China University of Technology, 2014. | |
[9] |
SHI B G, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its app-lication to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(11):2298-2304.
DOI URL |
[10] | SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Proces-sing, 1997, 45(11):2673-2681. |
[11] | GRAVES A, FERNÁNDEZ S, GOMEZ F J, et al. Connec-tionist temporal classification: labelling unsegmented seque-nce data with recurrent neural networks[C]// Proceedings of the 23rd International Conference on Machine Learning,Pittsburgh, Jun 25-29, 2006. New York: ACM, 2006: 369-376. |
[12] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[J]. arXiv: 1807. 06521, 2018. |
[13] | PARK J, WOO S, LEE J Y, et al. BAM: bottleneck atten-tion module[J]. arXiv: 1807. 06514, 2018. |
[14] | 何凯, 马红悦, 冯旭, 等. 基于改进VGG-16模型的英文笔迹鉴别方法[J]. 天津大学学报(自然科学与工程技术版), 2020, 53(9):984-990. |
HE K, MA H Y, FENG X, et al. English handwriting identi-fication method using an improved VGG-16 model[J]. Jour-nal of Tianjin University (Natural Science and Engineering Technology Edition), 2020, 53(9):984-990. | |
[15] | MARTI U V, BUNKE H. The IAM-database: an English sentence database for offline handwriting recognition[J]. International Journal on Document Analysis and Recogni-tion, 2002, 5(1):39-46. |
[16] | IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the 32nd International Conference on Machine Learning, Lille, Jul 6-11, 2015: 448-456. |
[17] | KRISHNAN P, DUTTA K, JAWAHAR C V. Word spotting and recognition using deep embedding[C]// Proceedings of the 13th IAPR International Workshop on Document Anal-ysis Systems, Vienna, Apr 24-27, 2018. Washington: IEEE Computer Society, 2018: 1-6. |
[18] | STUNER B, CHATELAIN C, PAQUET T. Handwriting recognition using cohort of LSTM and lexicon verification with extremely large lexicon[J]. arXiv: 1612. 07528, 2016. |
[19] | LUO C J, ZHU Y Z, JIN L W, et al. Learn to augment: joint data augmentation and network optimization for text reco-gnition[J]. arXiv: 2003. 06606, 2020. |
[20] | XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]// Procee-dings of the 32nd International Conference on International Conference on Machine Learning, Lille France, Jul 6-11, 2015: 2048-2057. |
[21] | BLUCHE T, LOURADOUR J, MESSINA R O. Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention[C]// Proceedings of the 14th IAPR Inter-national Conference on Document Analysis and Recogni-tion, Kyoto, Nov 9-15, 2017. Piscataway: IEEE, 2017: 1050-1055. |
[22] |
SUEIRAS J, RUIZ V, SANCHEZ A, et al. Offline continu-ous handwriting recognition using sequence to sequence neural networks[J]. Neurocomputing, 2018, 289:119-128.
DOI URL |
[23] | CARBONELL M, MAS J, VILLEGAS M, et al. End-to-end handwritten text detection and transcription in full pages[C]// Proceedings of the 2nd International Workshop on Machine Learning, Sydney, Sep 22-25, 2019. Piscataway: IEEE, 2019: 29-34. |
[1] | 杨知桥, 张莹, 王新杰, 张东波, 王玉. 改进U型网络在视网膜病变检测中的应用研究[J]. 计算机科学与探索, 2022, 16(8): 1877-1884. |
[2] | 何丽, 张红艳, 房婉琳. 融合多尺度边界特征的显著实例分割[J]. 计算机科学与探索, 2022, 16(8): 1865-1876. |
[3] | 黄浩, 葛洪伟. 强化类间区分的深度残差表情识别网络[J]. 计算机科学与探索, 2022, 16(8): 1842-1849. |
[4] | 于慧琳, 陈炜, 王琪, 高建伟, 万怀宇. 使用子图推理实现知识图谱关系预测[J]. 计算机科学与探索, 2022, 16(8): 1800-1808. |
[5] | 潘玉, 陈晓红, 李舜酩, 李纪永. 块增量典型相关分析[J]. 计算机科学与探索, 2022, 16(8): 1809-1818. |
[6] | 叶廷宇, 叶军, 王晖, 王磊. 结合人工蜂群优化的粗糙K-means聚类算法[J]. 计算机科学与探索, 2022, 16(8): 1923-1932. |
[7] | 曾凡智, 许露倩, 周燕, 周月霞, 廖俊玮. 面向智慧教育的知识追踪模型研究综述[J]. 计算机科学与探索, 2022, 16(8): 1742-1763. |
[8] | 洪惠群, 沈贵萍, 黄风华. 表情识别技术综述[J]. 计算机科学与探索, 2022, 16(8): 1764-1778. |
[9] | 刘艺, 李蒙蒙, 郑奇斌, 秦伟, 任小广. 视频目标跟踪算法综述[J]. 计算机科学与探索, 2022, 16(7): 1504-1515. |
[10] | 陈江美, 张文德. 基于位置社交网络的兴趣点推荐系统研究综述[J]. 计算机科学与探索, 2022, 16(7): 1462-1478. |
[11] | 赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022, 16(7): 1479-1503. |
[12] | 夏鸿斌, 肖奕飞, 刘渊. 融合自注意力机制的长文本生成对抗网络模型[J]. 计算机科学与探索, 2022, 16(7): 1603-1610. |
[13] | 王雪纯, 吕晟凯, 吴浩, 何鹏, 曾诚. 多网络混合嵌入学习的服务推荐方法研究[J]. 计算机科学与探索, 2022, 16(7): 1529-1542. |
[14] | 李玉轩, 洪学海, 汪洋, 唐正正, 班艳. 引入激活加权策略的分组排序学习方法[J]. 计算机科学与探索, 2022, 16(7): 1594-1602. |
[15] | 彭豪, 李晓明. 多尺度选择金字塔网络的小样本目标检测算法[J]. 计算机科学与探索, 2022, 16(7): 1649-1660. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||