Journal of Frontiers of Computer Science and Technology ›› 2023, Vol. 17 ›› Issue (10): 2499-2510.DOI: 10.3778/j.issn.1673-9418.2210070

• Artificial Intelligence·Pattern Recognition • Previous Articles     Next Articles

SMViT: Lightweight Siamese Masked Vision Transformer Model for Diagnosis of COVID-19

MA Ziping, TAN Lidao, MA Jinlin, CHEN Yong   

  1. 1. School of Mathematics and Information Science, North Minzu University, Yinchuan 750021, China
    2. School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China
    3. Department of Radiological Intervention, General Hospital of Ningxia Medical University, Yinchuan 750004, China
  • Online:2023-10-01 Published:2023-10-01

SMViT:用于新冠肺炎诊断的轻量化孪生网络模型

马自萍,谭力刀,马金林,陈勇   

  1. 1. 北方民族大学 数学与信息科学学院,银川 750021
    2. 北方民族大学 计算机科学与工程学院,银川 750021
    3. 宁夏医科大学总医院 放射介入科,银川 750004

Abstract: In order to solve the problems of low accuracy, poor generalization ability and large number of parameters in the diagnosis model of COVID-19 based on deep learning, a lightweight siamese architecture network SMViT (siamese masked vision transformer) for COVID-19 diagnosis based on ViT (vision transformer) and siamese network is proposed. Firstly, a lightweight strategy of cyclic substructure is proposed, which uses multiple subnets with the same structure to make a diagnosis network, thereby reducing the number of network parameters. Secondly, masked self-supervised pre-training model based on ViT is proposed to enhance the potential feature expression ability of the model. Then, in order to effectively improve the diagnostic accuracy of the diagnosis model of COVID-19, and improve the poor generalization ability of the model under small samples, this paper constructs the twin network SMViT. Finally, the ablation experiment is used to verify and determine the structure of the model, and the diagnostic performance and lightweight capacity of the model are verified through comparative experiments. Experimental results show that, compared with the most competitive ViT-based diagnostic model, the Accuracy, Specificity, Sensitivity and F1 scores of this model on the X-ray dataset have increased by 1.42%, 4.62%, 0.40% and 2.80% respectively, and the Accuracy, Specificity, Sensitivity and F1 scores on the CT image dataset have increased by 2.16%, 2.17%, 2.05% and 2.06% respectively. The SMViT model has strong generalization ability for small sample size datasets. Compared with ViT, SMViT model has smaller parameters and higher diagnostic performance.

Key words: diagnosis of COVID-19, siamese network, vision transformer, self-supervised learning, lightweight model

摘要: 针对新冠肺炎的深度学习诊断模型存在的准确率不高、泛化能力较差和参数量较大的问题,基于ViT和孪生网络,提出了一种新冠肺炎诊断的轻量化孪生网络SMViT。首先,提出了循环子结构轻量化策略,使用多个具有相同结构的子网络构成诊断网络,从而降低网络的参数量;其次,提出ViT掩码自监督预训练模型,以增强模型的潜在特征表达能力;然后,构建新冠肺炎诊断的孪生网络SMViT,有效提升模型的诊断准确率,改善小样本下模型泛化能力较差的问题;最后,使用消融实验验证并确定了模型结构,通过对比实验验证模型的诊断性能和轻量化能力。实验结果表明:与最具竞争力的ViT架构的诊断模型相比,该模型在X-ray数据集上的准确率、特异度、灵敏度与[F1]分数值分别提高了1.42%、4.62%、0.40%和2.80%,在CT图像数据集上的准确率、特异度、灵敏度与[F1]分数值分别提高了2.16%、2.17%、2.05%和2.06%;在样本量较小时,模型具有较强的泛化能力;与ViT相比,SMViT模型具有更小的参数量和更高的诊断性能。

关键词: 新冠肺炎诊断, 孪生网络, ViT模型, 自监督学习, 轻量化模型