Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (4): 888-897.DOI: 10.3778/j.issn.1673-9418.2010094
• Artificial Intelligence • Previous Articles Next Articles
ZHUO Tiantian, SANG Qingbing+()
Received:
2020-10-29
Revised:
2021-01-07
Online:
2022-04-01
Published:
2021-02-05
About author:
ZHUO Tiantian, born in 1995, M.S. candidate. His research interest is optical character recognition.Supported by:
通讯作者:
+ E-mail: sangqb@163.com作者简介:
卓天天(1995—),男,江苏宿迁人,硕士研究生,主要研究方向为光学字符识别。基金资助:
CLC Number:
ZHUO Tiantian, SANG Qingbing. Application of Attention Mechanism and Composite Convolution in Handwriting Recognition[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 888-897.
卓天天, 桑庆兵. 注意力机制与复合卷积在手写识别中的应用[J]. 计算机科学与探索, 2022, 16(4): 888-897.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2010094
Data | Label |
---|---|
opposed | |
passage | |
folk |
Table 1 Examples of dataset
Data | Label |
---|---|
opposed | |
passage | |
folk |
类型 | 卷积核尺寸 | 步长 | 特征图数 |
---|---|---|---|
Conv1 | [3,3] | [1,1] | 64 |
Conv2 | [3,3] | [1,1] | 64 |
CBAM+ | / | / | 64 |
Pool1 | [2,2] | [2,2] | 64 |
Conv3 | [3,3] | [1,1] | 128 |
Conv4 | [3,3] | [1,1] | 128 |
CBAM+ | / | / | 128 |
Pool2 | [2,2] | [2,2] | 128 |
Conv5 | [3,3] | [1,1] | 256 |
Conv6 | [3,3] | [1,1] | 256 |
CBAM+ | / | / | 256 |
Pool3 | [2,1] | [2,1] | 256 |
Conv7-1 | [3,3] | [1,1] | 512 |
Conv7-2 | [5,5] | [1,1] | 512 |
BN1 | / | / | / |
Conv8-1 | [3,3] | [1,1] | 512 |
Conv8-2 | [5,5] | [1,1] | 512 |
Pool4 | [2,1] | [2,1] | 512 |
BN2 | / | / | / |
Conv9 | [2,1] | [1,1] | 512 |
BLSTM | / | / | / |
CTC | / | / | / |
Table 2 CRNN+CTC framework based on CBAM+ and composite convolution proposed in this paper
类型 | 卷积核尺寸 | 步长 | 特征图数 |
---|---|---|---|
Conv1 | [3,3] | [1,1] | 64 |
Conv2 | [3,3] | [1,1] | 64 |
CBAM+ | / | / | 64 |
Pool1 | [2,2] | [2,2] | 64 |
Conv3 | [3,3] | [1,1] | 128 |
Conv4 | [3,3] | [1,1] | 128 |
CBAM+ | / | / | 128 |
Pool2 | [2,2] | [2,2] | 128 |
Conv5 | [3,3] | [1,1] | 256 |
Conv6 | [3,3] | [1,1] | 256 |
CBAM+ | / | / | 256 |
Pool3 | [2,1] | [2,1] | 256 |
Conv7-1 | [3,3] | [1,1] | 512 |
Conv7-2 | [5,5] | [1,1] | 512 |
BN1 | / | / | / |
Conv8-1 | [3,3] | [1,1] | 512 |
Conv8-2 | [5,5] | [1,1] | 512 |
Pool4 | [2,1] | [2,1] | 512 |
BN2 | / | / | / |
Conv9 | [2,1] | [1,1] | 512 |
BLSTM | / | / | / |
CTC | / | / | / |
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
最终模型 | 85.774 8 | 92.872 8 | 8.6 | 3.9 |
最终模型删除CBAM+ | 85.478 1 | 91.356 4 | 8.9 | 4.1 |
Table 3 Performance comparison of models before and after deleting CBAM+
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
最终模型 | 85.774 8 | 92.872 8 | 8.6 | 3.9 |
最终模型删除CBAM+ | 85.478 1 | 91.356 4 | 8.9 | 4.1 |
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
添加传统CBAM K=3 | 84.881 7 | 90.698 7 | 9.5 | 4.6 |
添加传统CBAM K=7 | 85.627 4 | 91.055 8 | 9.3 | 4.3 |
添加CBAM+ K=7,C=7 | 85.638 9 | 91.434 5 | 9.3 | 4.3 |
添加CBAM+ K=7,C=3 | 85.684 6 | 91.554 3 | 9.1 | 4.2 |
Table 4 Impact of different attention mechanisms and parameters on model performance
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
添加传统CBAM K=3 | 84.881 7 | 90.698 7 | 9.5 | 4.6 |
添加传统CBAM K=7 | 85.627 4 | 91.055 8 | 9.3 | 4.3 |
添加CBAM+ K=7,C=7 | 85.638 9 | 91.434 5 | 9.3 | 4.3 |
添加CBAM+ K=7,C=3 | 85.684 6 | 91.554 3 | 9.1 | 4.2 |
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
最终模型 | 85.774 8 | 92.872 8 | 8.6 | 3.9 |
最终模型删除复合卷积 | 85.684 6 | 91.554 3 | 9.1 | 4.2 |
Table 5 Performance comparison of models before and after deleting composite convolution
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
最终模型 | 85.774 8 | 92.872 8 | 8.6 | 3.9 |
最终模型删除复合卷积 | 85.684 6 | 91.554 3 | 9.1 | 4.2 |
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
复合卷积中ksize1:3,ksize2:3 | 84.618 4 | 90.664 7 | 9.7 | 4.5 |
复合卷积中ksize1:3,ksize2:5 | 85.478 1 | 91.356 4 | 8.9 | 4.1 |
复合卷积中ksize1:5,ksize2:5 | 84.667 9 | 90.766 5 | 9.7 | 4.5 |
Table 6 Impact of convolution kernel size on model performance in composite convolution
模型 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
复合卷积中ksize1:3,ksize2:3 | 84.618 4 | 90.664 7 | 9.7 | 4.5 |
复合卷积中ksize1:3,ksize2:5 | 85.478 1 | 91.356 4 | 8.9 | 4.1 |
复合卷积中ksize1:5,ksize2:5 | 84.667 9 | 90.766 5 | 9.7 | 4.5 |
卷积核个数 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
2(ksize1:3,ksize2:5) | 85.478 1 | 91.356 4 | 8.9 | 4.1 |
3(ksize1:3,ksize2:5,ksize3:7) | 84.951 9 | 91.071 8 | 9.4 | 4.4 |
Table 7 Impact of the number of convolution kernels on model performance in composite convolution
卷积核个数 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
2(ksize1:3,ksize2:5) | 85.478 1 | 91.356 4 | 8.9 | 4.1 |
3(ksize1:3,ksize2:5,ksize3:7) | 84.951 9 | 91.071 8 | 9.4 | 4.4 |
卷积层层数 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
8 | 83.903 3 | 89.097 8 | 10.4 | 5.3 |
9 | 84.606 4 | 90.397 5 | 9.9 | 5.2 |
10 | 84.190 2 | 89.788 6 | 10.2 | 5.3 |
Table 8 Impact of the number of convolution layers on model performance
卷积层层数 | accuracy/% | CER/% | ||
---|---|---|---|---|
IAM | RIMES | IAM | RIMES | |
8 | 83.903 3 | 89.097 8 | 10.4 | 5.3 |
9 | 84.606 4 | 90.397 5 | 9.9 | 5.2 |
10 | 84.190 2 | 89.788 6 | 10.2 | 5.3 |
Method | Pre-processing | Lexicon | Pre-train | CER/% | |
---|---|---|---|---|---|
IAM | RIMES | ||||
Shi等[ | / | / | / | 9.90 | 5.20 |
Krishnan等[ | / | / | Synthetic | 6.34 | — |
Stuner等[ | / | 2.4×106 | / | 4.77 | 2.67 |
Luo等[ | √ | / | / | 5.13 | 2.42 |
Xu等[ | √ | / | / | 6.07 | — |
Bluche等[ | / | / | CTC | 12.60 | — |
Sueiras等[ | √ | / | / | 8.80 | 4.80 |
Carbonell等[ | / | / | / | 15.60 | — |
Proposed | / | / | / | 8.60 | 3.90 |
Table 9 Accuracy comparison of current popular methods on IAM and RIMES datasets
Method | Pre-processing | Lexicon | Pre-train | CER/% | |
---|---|---|---|---|---|
IAM | RIMES | ||||
Shi等[ | / | / | / | 9.90 | 5.20 |
Krishnan等[ | / | / | Synthetic | 6.34 | — |
Stuner等[ | / | 2.4×106 | / | 4.77 | 2.67 |
Luo等[ | √ | / | / | 5.13 | 2.42 |
Xu等[ | √ | / | / | 6.07 | — |
Bluche等[ | / | / | CTC | 12.60 | — |
Sueiras等[ | √ | / | / | 8.80 | 4.80 |
Carbonell等[ | / | / | / | 15.60 | — |
Proposed | / | / | / | 8.60 | 3.90 |
[1] | MORI S, NISHIDA H, YAMADA H. Optical character reco-gnition[M]. New York: John Wiley & Sons, Inc., 1999. |
[2] | SUEN C Y, NADAL C, LEGAULT R, et al. Computer reco-gnition of unconstrained handwritten numerals[J]. Procee-dings of the IEEE, 1992, 80(7):1162-1180. |
[3] | BUNKE H, WANG P S. Handbook of character recognition and document image analysis[M]. Singapore: World Scien-tific Publishing Company, 1997. |
[4] |
LIU C L, SAKO H, FUJISAWA H. Performance evalua-tion of pattern classifiers for handwritten character recogni-tion[J]. International Journal on Document Analysis and Recognition, 2002, 4(3):191-204.
DOI URL |
[5] | MADHVANATH S, GOVINDARAJU V. Local reference lines for handwritten phrase recognition[J]. Pattern Recogni-tion, 1999, 32(12):2021-2028. |
[6] |
CASEY R G, LECOLINET E. A survey of methods and strategies in character segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, 18(7):690-706.
DOI URL |
[7] |
EL-YACOUBI M A, GILLOUX M, SABOURIN R, et al. An HMM-based approach for off-line unconstrained hand-written word modeling and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(8):752-760.
DOI URL |
[8] | 卫晓欣. 基于长短型记忆递归神经网络的英文手写识别[D]. 广州: 华南理工大学, 2014. |
WEI X X. English handwriting recognition based on long and short memory recurrent neural network[D]. Guangzhou:South China University of Technology, 2014. | |
[9] |
SHI B G, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its app-lication to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(11):2298-2304.
DOI URL |
[10] | SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Proces-sing, 1997, 45(11):2673-2681. |
[11] | GRAVES A, FERNÁNDEZ S, GOMEZ F J, et al. Connec-tionist temporal classification: labelling unsegmented seque-nce data with recurrent neural networks[C]// Proceedings of the 23rd International Conference on Machine Learning,Pittsburgh, Jun 25-29, 2006. New York: ACM, 2006: 369-376. |
[12] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[J]. arXiv: 1807. 06521, 2018. |
[13] | PARK J, WOO S, LEE J Y, et al. BAM: bottleneck atten-tion module[J]. arXiv: 1807. 06514, 2018. |
[14] | 何凯, 马红悦, 冯旭, 等. 基于改进VGG-16模型的英文笔迹鉴别方法[J]. 天津大学学报(自然科学与工程技术版), 2020, 53(9):984-990. |
HE K, MA H Y, FENG X, et al. English handwriting identi-fication method using an improved VGG-16 model[J]. Jour-nal of Tianjin University (Natural Science and Engineering Technology Edition), 2020, 53(9):984-990. | |
[15] | MARTI U V, BUNKE H. The IAM-database: an English sentence database for offline handwriting recognition[J]. International Journal on Document Analysis and Recogni-tion, 2002, 5(1):39-46. |
[16] | IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the 32nd International Conference on Machine Learning, Lille, Jul 6-11, 2015: 448-456. |
[17] | KRISHNAN P, DUTTA K, JAWAHAR C V. Word spotting and recognition using deep embedding[C]// Proceedings of the 13th IAPR International Workshop on Document Anal-ysis Systems, Vienna, Apr 24-27, 2018. Washington: IEEE Computer Society, 2018: 1-6. |
[18] | STUNER B, CHATELAIN C, PAQUET T. Handwriting recognition using cohort of LSTM and lexicon verification with extremely large lexicon[J]. arXiv: 1612. 07528, 2016. |
[19] | LUO C J, ZHU Y Z, JIN L W, et al. Learn to augment: joint data augmentation and network optimization for text reco-gnition[J]. arXiv: 2003. 06606, 2020. |
[20] | XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]// Procee-dings of the 32nd International Conference on International Conference on Machine Learning, Lille France, Jul 6-11, 2015: 2048-2057. |
[21] | BLUCHE T, LOURADOUR J, MESSINA R O. Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention[C]// Proceedings of the 14th IAPR Inter-national Conference on Document Analysis and Recogni-tion, Kyoto, Nov 9-15, 2017. Piscataway: IEEE, 2017: 1050-1055. |
[22] |
SUEIRAS J, RUIZ V, SANCHEZ A, et al. Offline continu-ous handwriting recognition using sequence to sequence neural networks[J]. Neurocomputing, 2018, 289:119-128.
DOI URL |
[23] | CARBONELL M, MAS J, VILLEGAS M, et al. End-to-end handwritten text detection and transcription in full pages[C]// Proceedings of the 2nd International Workshop on Machine Learning, Sydney, Sep 22-25, 2019. Piscataway: IEEE, 2019: 29-34. |
[1] | YANG Zhiqiao, ZHANG Ying, WANG Xinjie, ZHANG Dongbo, WANG Yu. Application Research of Improved U-shaped Network in Detection of Retinopathy [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1877-1884. |
[2] | HE Li, ZHANG Hongyan, FANG Wanlin. Salient Instance Segmentation via Multiscale Boundary Characteristic Network [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1865-1876. |
[3] | HUANG Hao, GE Hongwei. Deep Residual Expression Recognition Network to Enhance Inter-class Discrimination [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1842-1849. |
[4] | YU Huilin, CHEN Wei, WANG Qi, GAO Jianwei, WAN Huaiyu. Knowledge Graph Link Prediction Based on Subgraph Reasoning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1800-1808. |
[5] | PAN Yu, CHEN Xiaohong, LI Shunming, LI Jiyong. Chunk Incremental Canonical Correlation Analysis [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1809-1818. |
[6] | YE Tingyu, YE Jun, WANG Hui, WANG Lei. Rough K-means Clustering Algorithm Combined with Artificial Bee Colony Optimization [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1923-1932. |
[7] | ZENG Fanzhi, XU Luqian, ZHOU Yan, ZHOU Yuexia, LIAO Junwei. Review of Knowledge Tracing Model for Intelligent Education [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1742-1763. |
[8] | HONG Huiqun, SHEN Guiping, HUANG Fenghua. Summary of Expression Recognition Technology [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1764-1778. |
[9] | LIU Yi, LI Mengmeng, ZHENG Qibin, QIN Wei, REN Xiaoguang. Survey on Video Object Tracking Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1504-1515. |
[10] | CHEN Jiangmei, ZHANG Wende. Review of Point of Interest Recommendation Systems in Location-Based Social Networks [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1462-1478. |
[11] | ZHAO Xiaoming, YANG Yijiao, ZHANG Shiqing. Survey of Deep Learning Based Multimodal Emotion Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1479-1503. |
[12] | XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610. |
[13] | WANG Xuechun, LYU Shengkai, WU Hao, HE Peng, ZENG Cheng. Research on Service Recommendation Method of Multi-network Hybrid Embed-ding Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1529-1542. |
[14] | LI Yuxuan, HONG Xuehai, WANG Yang, TANG Zhengzheng, BAN Yan. Groupwise Learning to Rank Algorithm with Introduction of Activated Weighting [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1594-1602. |
[15] | PENG Hao, LI Xiaoming. Multi-scale Selection Pyramid Networks for Small-Sample Target Detection Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1649-1660. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/