Application of Attention Mechanism and Composite Convolution in Handwriting Recognition

doi:10.3778/j.issn.1673-9418.2010094

Abstract

Abstract:

It is a method of offline handwritten image recognition to segment a picture into a single “character” recognition and then connect it into a “string”. However, due to the adhesion between handwritten characters, the segmentation method is not easy to achieve. Although the convolutional recurrent neural network (CRNN) solves the problem that the whole text image is input, the label is not easy to align, however, due to the serious difference in offline handwriting style between different people, the feature extracted by the network is not powerful enough to represent the features. In response to this, the enhanced convolutional block attention module and composite convolution are proposed, and they are added to the CRNN+CTC mainstream framework for processing offline text recognition. The enhanced convolutional block attention module increases the contribution weight of the input feature map and uses channel attention and spatial attention in parallel to enrich the semantic information of the refined feature map, avoiding the channel attention module’s influence on the spatial attention module. It makes the network focus more on useful features in pictures rather than useless dragging handwriting features. The composite convolution embedded in the deep layer of the network adopts multi-convolution kernel convolution, which means the feature fusion of different scales and enhances the generalization of the network. The CRNN+CTC framework based on the enhanced convolutional block attention module and composite convolution achieves an accuracy rate of 85.7748% and a character error rate of 8.6% on the IAM dataset with semantic information; on the RIMES dataset, the accuracy rate is 92.8728%, and the character error rate is 3.9%. Compared with the current mainstream offline text recognition algorithms, its performance is further improved.

Key words: offline English handwritten word recognition, enhanced convolutional block attention module, composite convolution, convolutional recurrent neural network (CRNN)

摘要：

将图片切分成单“字”识别再连接成“串”是脱机手写图像识别的一种方法,但由于手写字符间易存在粘连,切分方法不易实现。卷积循环神经网络（CRNN）虽解决了整张文本图片输入,标签却不易对齐的问题,但由于不同人脱机手写风格的严重差异,网络提取出的特征表示力不够。对此提出了加强型卷积块注意力模块和复合卷积,并将其加入处理脱机文本识别的CRNN+CTC主流框架中。加强型卷积块注意力模块增大输入特征图的贡献权重且并联地使用通道注意力、空间注意力,丰富了细化特征图语义信息的同时避免了通道注意力模块对空间注意力模块的权重干扰,使得网络更聚焦图片中的有用特征而非无用的拖拽字迹特征。而嵌入在网络深层的复合卷积采用的多卷积核卷积意味着不同尺度的特征融合,增强了网络的泛化性。基于加强型卷积块注意力模块和复合卷积的CRNN+CTC框架在具有语义信息的IAM数据集上准确率达到85.774 8%,字符错误率为8.6%;在RIMES数据集上准确率达到92.872 8%,字符错误率为3.9%,比起当前主流的脱机文本识别算法,性能进一步提升。

关键词: 脱机英文手写单词识别, 加强型卷积块注意力模块, 复合卷积, 卷积循环神经网络（CRNN）

CLC Number:

TP391

ZHUO Tiantian, SANG Qingbing. Application of Attention Mechanism and Composite Convolution in Handwriting Recognition[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 888-897.

卓天天, 桑庆兵. 注意力机制与复合卷积在手写识别中的应用[J]. 计算机科学与探索, 2022, 16(4): 888-897.

Figures/Tables 18

References 23

[1]	MORI S, NISHIDA H, YAMADA H. Optical character reco-gnition[M]. New York: John Wiley & Sons, Inc., 1999.
[2]	SUEN C Y, NADAL C, LEGAULT R, et al. Computer reco-gnition of unconstrained handwritten numerals[J]. Procee-dings of the IEEE, 1992, 80(7):1162-1180.
[3]	BUNKE H, WANG P S. Handbook of character recognition and document image analysis[M]. Singapore: World Scien-tific Publishing Company, 1997.
[4]	LIU C L, SAKO H, FUJISAWA H. Performance evalua-tion of pattern classifiers for handwritten character recogni-tion[J]. International Journal on Document Analysis and Recognition, 2002, 4(3):191-204. DOI URL
[5]	MADHVANATH S, GOVINDARAJU V. Local reference lines for handwritten phrase recognition[J]. Pattern Recogni-tion, 1999, 32(12):2021-2028.
[6]	CASEY R G, LECOLINET E. A survey of methods and strategies in character segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, 18(7):690-706. DOI URL
[7]	EL-YACOUBI M A, GILLOUX M, SABOURIN R, et al. An HMM-based approach for off-line unconstrained hand-written word modeling and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(8):752-760. DOI URL
[8]	卫晓欣. 基于长短型记忆递归神经网络的英文手写识别[D]. 广州: 华南理工大学, 2014.
	WEI X X. English handwriting recognition based on long and short memory recurrent neural network[D]. Guangzhou:South China University of Technology, 2014.
[9]	SHI B G, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its app-lication to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(11):2298-2304. DOI URL
[10]	SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Proces-sing, 1997, 45(11):2673-2681.
[11]	GRAVES A, FERNÁNDEZ S, GOMEZ F J, et al. Connec-tionist temporal classification: labelling unsegmented seque-nce data with recurrent neural networks[C]// Proceedings of the 23rd International Conference on Machine Learning,Pittsburgh, Jun 25-29, 2006. New York: ACM, 2006: 369-376.
[12]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[J]. arXiv: 1807. 06521, 2018.
[13]	PARK J, WOO S, LEE J Y, et al. BAM: bottleneck atten-tion module[J]. arXiv: 1807. 06514, 2018.
[14]	何凯, 马红悦, 冯旭, 等. 基于改进VGG-16模型的英文笔迹鉴别方法[J]. 天津大学学报(自然科学与工程技术版), 2020, 53(9):984-990.
	HE K, MA H Y, FENG X, et al. English handwriting identi-fication method using an improved VGG-16 model[J]. Jour-nal of Tianjin University (Natural Science and Engineering Technology Edition), 2020, 53(9):984-990.
[15]	MARTI U V, BUNKE H. The IAM-database: an English sentence database for offline handwriting recognition[J]. International Journal on Document Analysis and Recogni-tion, 2002, 5(1):39-46.
[16]	IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the 32nd International Conference on Machine Learning, Lille, Jul 6-11, 2015: 448-456.
[17]	KRISHNAN P, DUTTA K, JAWAHAR C V. Word spotting and recognition using deep embedding[C]// Proceedings of the 13th IAPR International Workshop on Document Anal-ysis Systems, Vienna, Apr 24-27, 2018. Washington: IEEE Computer Society, 2018: 1-6.
[18]	STUNER B, CHATELAIN C, PAQUET T. Handwriting recognition using cohort of LSTM and lexicon verification with extremely large lexicon[J]. arXiv: 1612. 07528, 2016.
[19]	LUO C J, ZHU Y Z, JIN L W, et al. Learn to augment: joint data augmentation and network optimization for text reco-gnition[J]. arXiv: 2003. 06606, 2020.
[20]	XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]// Procee-dings of the 32nd International Conference on International Conference on Machine Learning, Lille France, Jul 6-11, 2015: 2048-2057.
[21]	BLUCHE T, LOURADOUR J, MESSINA R O. Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention[C]// Proceedings of the 14th IAPR Inter-national Conference on Document Analysis and Recogni-tion, Kyoto, Nov 9-15, 2017. Piscataway: IEEE, 2017: 1050-1055.
[22]	SUEIRAS J, RUIZ V, SANCHEZ A, et al. Offline continu-ous handwriting recognition using sequence to sequence neural networks[J]. Neurocomputing, 2018, 289:119-128. DOI URL
[23]	CARBONELL M, MAS J, VILLEGAS M, et al. End-to-end handwritten text detection and transcription in full pages[C]// Proceedings of the 2nd International Workshop on Machine Learning, Sydney, Sep 22-25, 2019. Piscataway: IEEE, 2019: 29-34.

Data	Label
	opposed
	passage
	folk

Data	Label
	opposed
	passage
	folk

类型	卷积核尺寸	步长	特征图数
Conv1	[3,3]	[1,1]	64
Conv2	[3,3]	[1,1]	64
CBAM⁺	/	/	64
Pool1	[2,2]	[2,2]	64
Conv3	[3,3]	[1,1]	128
Conv4	[3,3]	[1,1]	128
CBAM⁺	/	/	128
Pool2	[2,2]	[2,2]	128
Conv5	[3,3]	[1,1]	256
Conv6	[3,3]	[1,1]	256
CBAM⁺	/	/	256
Pool3	[2,1]	[2,1]	256
Conv7-1	[3,3]	[1,1]	512
Conv7-2	[5,5]	[1,1]	512
BN1	/	/	/
Conv8-1	[3,3]	[1,1]	512
Conv8-2	[5,5]	[1,1]	512
Pool4	[2,1]	[2,1]	512
BN2	/	/	/
Conv9	[2,1]	[1,1]	512
BLSTM	/	/	/
CTC	/	/	/

类型	卷积核尺寸	步长	特征图数
Conv1	[3,3]	[1,1]	64
Conv2	[3,3]	[1,1]	64
CBAM⁺	/	/	64
Pool1	[2,2]	[2,2]	64
Conv3	[3,3]	[1,1]	128
Conv4	[3,3]	[1,1]	128
CBAM⁺	/	/	128
Pool2	[2,2]	[2,2]	128
Conv5	[3,3]	[1,1]	256
Conv6	[3,3]	[1,1]	256
CBAM⁺	/	/	256
Pool3	[2,1]	[2,1]	256
Conv7-1	[3,3]	[1,1]	512
Conv7-2	[5,5]	[1,1]	512
BN1	/	/	/
Conv8-1	[3,3]	[1,1]	512
Conv8-2	[5,5]	[1,1]	512
Pool4	[2,1]	[2,1]	512
BN2	/	/	/
Conv9	[2,1]	[1,1]	512
BLSTM	/	/	/
CTC	/	/	/

模型	accuracy/%		CER/%
模型	IAM	RIMES	IAM	RIMES
添加传统CBAM K=3	84.881 7	90.698 7	9.5	4.6
添加传统CBAM K=7	85.627 4	91.055 8	9.3	4.3
添加CBAM⁺ K=7,C=7	85.638 9	91.434 5	9.3	4.3
添加CBAM⁺ K=7,C=3	85.684 6	91.554 3	9.1	4.2