Journal of Frontiers of Computer Science and Technology ›› 2023, Vol. 17 ›› Issue (11): 2689-2702.DOI: 10.3778/j.issn.1673-9418.2208032

• Graphics·Image • Previous Articles     Next Articles

HSKDLR: Lightweight Lip Reading Method Based on Homogeneous Self-Knowledge Distillation

MA Jinlin, LIU Yuhao, MA Ziping, GONG Yuanwen, ZHU Yanbin   

  1. 1. School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China
    2. Key Laboratory for Intelligent Processing of Computer Images and Graphics of National Ethnic Affairs Commission of the PRC, Yinchuan 750021, China
    3. School of Mathematics and Information Science, North Minzu University, Yinchuan 750021, China
  • Online:2023-11-01 Published:2023-11-01

HSKDLR:同类自知识蒸馏的轻量化唇语识别方法

马金林,刘宇灏,马自萍,巩元文,朱艳彬   

  1. 1. 北方民族大学 计算机科学与工程学院,银川 750021
    2. 图像图形智能信息处理国家民委重点实验室,银川 750021
    3. 北方民族大学 数学与信息科学学院,银川 750021

Abstract: In order to solve the problems of low recognition rate and large amount of calculation in lip reading, this paper proposes a lightweight model for lip reading named HSKDLR (homogeneous self-knowledge distillation for lip reading). Firstly, the S-SE (spatial SE)attention module is designed to pay attention to the spatial features of the lip image, which can construct the i-Ghost Bottleneck (improved Ghost Bottleneck) module to extract the channel features and spatial features of the lip image, thereby improving the accuracy of the lip language recognition model. Secondly, a lip reading model is built based on i-Ghost Bottleneck, which reduces the model computation by optimizing the combination of bottleneck structures to a certain extent. Then, in order to improve the accuracy of the model and reduce time consumption, a model optimization method of the homogeneous self-knowledge distillation (HSKD) is proposed. Finally, this paper employs the HSKD to train the lip reading model and verify its recognition performance. And the experimental results show that HSKDLR has higher recognition accuracy and lower computational complexity than the compared methods. The accuracy of the proposed method on LRW dataset is 87.3%, the floating-point number computation is as low as 2.564 GFLOPs, and the parameter quantity is as low as 3.8723×107. Moreover, HSKD can be applied to most lip reading models to improve recognition accuracy effectively and reduce training time.

Key words: lip reading, lightweight, knowledge distillation, self-knowledge, Ghost Bottleneck

摘要: 针对唇语识别模型的识别率较低和计算量较大的问题,提出一种同类自知识蒸馏的轻量化唇语识别模型(HSKDLR)。首先,提出关注唇部图像空间特征的S-SE注意力模块,用其构建提取唇部图像通道特征和空间特征的i-Ghost Bottleneck模块,以提升唇语识别模型的准确率;其次,基于i-Ghost Bottleneck构建唇语识别模型,该模型通过优化瓶颈结构的组合方式降低模型计算量;然后,为提升模型准确率,减少模型运行时间,提出同类自知识蒸馏(HSKD)的模型训练方法;最后,使用同类自知识蒸馏方法训练唇语识别模型,并检验其识别性能。实验结果表明:与其他方法相比,HSKDLR具有更高的识别准确率和更低的计算量,在LRW数据集上的准确率达87.3%,浮点数运算量低至2.564 GFLOPs,参数量低至3.872 3×107;同类自知识蒸馏可被应用于大多数唇语识别模型,帮助其有效提升识别准确率,减少训练时间。

关键词: 唇语识别, 轻量化, 知识蒸馏, 自知识, Ghost Bottleneck