SSD Object Detection Algorithm with Attention and Cross-Scale Fusion

doi:10.3778/j.issn.1673-9418.2102001

Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (11): 2575-2586.DOI: 10.3778/j.issn.1673-9418.2102001

• Graphics and Image • Previous Articles Next Articles

SSD Object Detection Algorithm with Attention and Cross-Scale Fusion

LI Qingyuan¹, DENG Zhaohong¹^,²^,³^,⁺(), LUO Xiaoqing¹, GU Xin⁴, WANG Shitong¹

1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
2. Key Laboratory of Computational Neuroscience and Brain-Like Intelligence, Ministry of Education, Fudan University, Shanghai 200433, China
3. Zhangjiang Laboratory, Shanghai 200120, China
4. Jiangsu North Huguang Photoelectric Co., Ltd., Wuxi, Jiangsu 214035, China

Received:2021-02-01 Revised:2021-03-18 Online:2022-11-01 Published:2021-03-25
About author:LI Qingyuan, born in 1997, M.S. candidate. His research interest is deep learning.
DENG Zhaohong, born in 1981, professor, senior member of CCF. His research interests include uncertainty artificial intelligence and its applications.
LUO Xiaoqing, born in 1980, associate professor. Her research interests include image fusion, pattern recognition, image processing, etc.
GU Xin, born in 1979, Ph.D., senior engineer.His research interests include pattern recognition, artificial intelligence image processing technology and its application.
WANG Shitong, born in 1964, professor, Ph.D. supervisor. His research interests include artificial intelligence, pattern recognition, etc.
Supported by:
National Natural Science Foundation of China(61772239);Municipal Major Science and Technology Project of Shanghai(2018SHZDZX01)

注意力与跨尺度融合的SSD目标检测算法

李青援¹, 邓赵红¹^,²^,³^,⁺(), 罗晓清¹, 顾鑫⁴, 王士同¹

1.江南大学人工智能与计算机学院，江苏无锡 214122
2.复旦大学计算神经科学与类脑智能教育部重点实验室，上海 200433
3.张江实验室，上海 200120
4.江苏北方湖光光电有限公司，江苏无锡 214035

通讯作者: + E-mail: dengzhaohong@jiangnan.edu.cn
作者简介:李青援（1997—），男，山东潍坊人，硕士研究生，主要研究方向为深度学习。
邓赵红（1981—），男，安徽蒙城人，教授，CCF高级会员，主要研究方向为不确定性人工智能及其应用。
罗晓清（1980—），女，江西南昌人，副教授，主要研究方向为图像融合、模式识别、图像处理等。
顾鑫（1979—），男，江苏张家港人，博士，高级工程师，主要研究方向为模式识别、人工智能图像处理技术研究与应用。
王士同（1964—），男，江苏扬州人，教授，博士生导师，主要研究方向为人工智能、模式识别等。
基金资助:
国家自然科学基金面上项目(61772239);上海市市级重大科技专项(2018SHZDZX01)

Abstract

Abstract:

In order to further improve the performance of the SSD (single shot multibox detector) algorithm, and solve the problems of unbalanced feature map information and difficulty in small target recognition during multi-scale prediction of the SSD algorithm, in this paper, plug-and-play modules are designed to fully integrate the information contained in feature maps of different scales and model the relationships within feature maps to enhance the representation ability of feature maps. Firstly, a novel feature fusion method is designed to solve the problem of information disparity in cross-scale feature fusion. Secondly, according to the idea of pooling pyramid, a depth feature extraction module is designed to extract the information of different receptive fields, so as to improve the detection ability of the model to object of different sizes. Finally, in order to further optimize the feature map, highlight the effective information of the feature map for the current task, and establish the global long-distance relationship between pixels and the importance relationship between each channel, a lightweight attention module is proposed. Through the above mechanism, the structure of SSD model is modified in this paper, which effectively improves the detection accuracy and robustness of SSD algorithm. Extensive experiments have been conducted on PASCAL VOC datasets to verify the efficiency of the proposed method. On PASCAL VOC2007 test datasets, the proposed method improves 2.9 percentage points mean average precision (mAP) over SSD algorithm, while maintaining the ability of real-time detection.

Key words: object detection, feature fusion, attentional mechanism, deep learning

摘要：

为了进一步提升SSD算法的性能，解决SSD算法在进行多尺度预测时特征图信息不平衡和小目标识别难的问题，设计了即插即用的模块，充分融合不同尺度特征图包含的信息并建模特征图内的重要性关系，来增强特征图的表示能力。首先，设计了一种新颖的特征融合方法来解决跨尺度特征融合存在的信息差异问题。其次，根据池化金字塔的思想设计了一种深度特征提取模块来提取不同感受野的信息，从而提高模型对不同尺寸目标的检测能力。最后，为了进一步优化特征图，突出特征图对当前任务有效的信息，并建立全局像素点之间的长距离关系和各通道之间的重要性关系，提出了一种轻量级的注意力模块。通过上述机制，修改了SSD模型的架构，有效地提升了SSD算法的检测精度和鲁棒性。在PASCAL VOC数据集上设计了丰富的实验，验证了所提方法的有效性。在PASCAL VOC2007测试集上该方法比SSD算法提高了2.9个百分点的平均精确度（mAP），同时还保留了实时检测的能力。

关键词: 目标检测, 特征融合, 注意力机制, 深度学习

CLC Number:

TP391.41

LI Qingyuan, DENG Zhaohong, LUO Xiaoqing, GU Xin, WANG Shitong. SSD Object Detection Algorithm with Attention and Cross-Scale Fusion[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(11): 2575-2586.

李青援, 邓赵红, 罗晓清, 顾鑫, 王士同. 注意力与跨尺度融合的SSD目标检测算法[J]. 计算机科学与探索, 2022, 16(11): 2575-2586.

Figures/Tables 14

Fig.1 Different types of detection methods

Fig.2 Overall framework of SSD

Fig.3 Overall framework of improved SSD

Fig.4 Feature fusion module

Fig.5 Feature extraction module

Fig.6 Non-local channel attentional mechanism module

Fig.7 Pyramid feature generation layers

Table 1

模型	mAP	aero	bike	bird	boat	bottle	bus	car	cat	chair	cow	table	dog	horse	mbike	person	plant	sheep	sofa	train	tv
SSD	77.5	79.5	83.9	76.0	69.6	50.5	87.0	85.7	88.1	60.3	81.5	77.0	86.1	87.5	84.0	79.4	51.7	77.9	79.5	87.6	76.8
DSSD	78.6	81.9	84.9	80.5	68.4	53.9	85.6	86.2	88.9	61.1	83.5	78.7	86.7	88.7	86.7	79.7	51.7	78.0	80.9	87.2	79.4
ION	79.2	80.2	85.2	78.8	70.9	62.6	86.6	86.9	89.8	61.7	86.9	76.5	88.4	87.5	83.4	80.5	52.4	78.1	77.2	86.9	83.5
Ours	80.4	84.9	87.0	79.5	75.5	59.5	86.7	87.4	89.0	67.5	85.0	80.0	86.6	88.0	86.2	81.9	57.1	79.1	81.1	86.3	79.7

Fig.8 Comparison of mAP of 4 detection algorithms on PASCAL VOC2007test dataset

Table 2

模型	mAP	aero	bike	bird	boat	bottle	bus	car	cat	chair	cow	table	dog	horse	mbike	person	plant	sheep	sofa	train	tv
SSD512	76.7	88.8	84.8	77.0	61.0	56.3	82.6	82.4	92.6	58.4	80.7	61.4	90.4	87.2	86.9	85.0	53.1	81.2	65.9	86.4	72.0
Ours512	78.5	91.0	87.9	79.8	63.6	60.3	84.6	83.5	92.8	60.9	82.2	64.3	91.2	86.8	88.3	87.1	57.1	85.1	66.1	84.0	73.8

Table 3 Comparison of detection speed and accuracy on PASCAL VOC2007test dataset

算法	网络	检测速度/（frame/s）	GPU	锚框个数	输入尺寸	mAP/%
Faster R-CNN^[3]	VGG-16	7.0	Tian X	6 000	$~ 600 × 1000$	73.2
Faster R-CNN^[3]	ResNet-101	2.4	K40	300	$~ 600 × 1000$	76.4
R-FCN^[31]	ResNet-50	—	—	300	$~ 600 × 1000$	77.0
R-FCN^[31]	ResNet-101	5.8	K40	300	$~ 600 × 1000$	79.5
YOLOv2^[6]	Darknet-19	81.0	Tian X	—	$352 × 352$	73.7
SSD300^[8]	VGG-16	92.0	2080Ti	8 732	$300 × 300$	77.5
FSSD300^[18]	VGG-16	65.8	1080Ti	8 732	$300 × 300$	78.8
RefineDet320^[4]	VGG-16	12.9	K80	6 375	$320 × 320$	79.5
RSSD300^[32]	VGG-16	35.0	Tian X	8 732	$300 × 300$	78.5
DSSD321^[17]	ResNet-101	9.5	Tian X	17 080	$321 × 321$	78.6
ASSD300^[33]	VGG-16	11.8	K40	8 732	$300 × 300$	80.0
SSD512^[8]	VGG-16	45.0	2080Ti	24 564	$512 × 512$	79.5
DSSD513^[17]	ResNet-101	5.5	Tian X	43 688	$513 × 513$	81.5
FSSD512^[18]	VGG-16	35.7	1080Ti	24 564	$512 × 512$	80.9
RSSD512^[32]	VGG-16	16.6	Tian X	24 564	$512 × 512$	80.8
ASSD512	VGG-16	3.4	K40	24 564	$512 × 512$	81.6
RefineDet512^[4]	VGG-16	5.6	K80	16 320	$512 × 512$	81.2
Ours300	VGG-16	44.8	2080Ti	8 732	$300 × 300$	80.4
Ours512	VGG-16	22.5	2080Ti	24 564	$512 × 512$	82.2

Table 3 Comparison of detection speed and accuracy on PASCAL VOC2007test dataset

算法	网络	检测速度/（frame/s）	GPU	锚框个数	输入尺寸	mAP/%
Faster R-CNN^[3]	VGG-16	7.0	Tian X	6 000	$~ 600 × 1000$	73.2
Faster R-CNN^[3]	ResNet-101	2.4	K40	300	$~ 600 × 1000$	76.4
R-FCN^[31]	ResNet-50	—	—	300	$~ 600 × 1000$	77.0
R-FCN^[31]	ResNet-101	5.8	K40	300	$~ 600 × 1000$	79.5
YOLOv2^[6]	Darknet-19	81.0	Tian X	—	$352 × 352$	73.7
SSD300^[8]	VGG-16	92.0	2080Ti	8 732	$300 × 300$	77.5
FSSD300^[18]	VGG-16	65.8	1080Ti	8 732	$300 × 300$	78.8
RefineDet320^[4]	VGG-16	12.9	K80	6 375	$320 × 320$	79.5
RSSD300^[32]	VGG-16	35.0	Tian X	8 732	$300 × 300$	78.5
DSSD321^[17]	ResNet-101	9.5	Tian X	17 080	$321 × 321$	78.6
ASSD300^[33]	VGG-16	11.8	K40	8 732	$300 × 300$	80.0
SSD512^[8]	VGG-16	45.0	2080Ti	24 564	$512 × 512$	79.5
DSSD513^[17]	ResNet-101	5.5	Tian X	43 688	$513 × 513$	81.5
FSSD512^[18]	VGG-16	35.7	1080Ti	24 564	$512 × 512$	80.9
RSSD512^[32]	VGG-16	16.6	Tian X	24 564	$512 × 512$	80.8
ASSD512	VGG-16	3.4	K40	24 564	$512 × 512$	81.6
RefineDet512^[4]	VGG-16	5.6	K80	16 320	$512 × 512$	81.2
Ours300	VGG-16	44.8	2080Ti	8 732	$300 × 300$	80.4
Ours512	VGG-16	22.5	2080Ti	24 564	$512 × 512$	82.2

Fig.9 Comparison of detection results between SSD and ours

Table 4 Comparative results of ablation experiments

方法	检测速度/（frame/s）	mAP/%
SSD	92.0	77.5
SSD*	77.0	78.1
SSD*+EM	69.3	78.5
SSD*+EM+FM	46.7	79.7
SSD*+EM+FM+NCA	44.8	80.4

Fig.10 Visualization of attention maps

References 34

[1]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 580- 587.
[2]	GIRSHICK R. Fast R-CNN[J]. arXiv:1504.08083, 2015.
[3]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[4]	RAJARAM R N, OHN-BAR E, TRIVEDI M M. RefineNet: iterative refinement for accurate object localization[C]// Proceedings of the 19th IEEE International Conference on Intelligent Transportation Systems, Rio de Janeiro, Nov 1-4, 2016. Piscataway: IEEE, 2016: 1528-1533.
[5]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 779-788.
[6]	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6517-6525.
[7]	REDMON J, FARHADI A. YOLOv3: an incremental improve-ment[J]. arXiv:1804.02767, 2018.
[8]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[9]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 2999-3007.
[10]	FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D A, et al. Object detection with discriminatively trained part- based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645. DOI URL
[11]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 936-944.
[12]	LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition,Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 8759-8768.
[13]	ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6230-6239.
[14]	EVERINGHAM M, ESLAMI S, GOOL L, et al. The PASCAL visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2014, 111(1): 98-136. DOI URL
[15]	LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. DOI URL
[16]	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recog-nition, San Diego, Jun 20-26, 2005. Washington: IEEE Computer Society, 2005: 886-893.
[17]	FU CY, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv:1701.06659, 2017.
[18]	LI Z, ZHOU F. FSSD: feature fusion single shot multibox detector[J]. arXiv:1712.00960, 2017.
[19]	SHEN Z, LIU Z, LI J, et al. DSOD: learning deeply supervised object detectors from scratch[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 1937-1945.
[20]	HUANG G, LIU Z, WEINBERGER K Q. Densely connected convolutional networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2261-2269.
[21]	PANG J, CHEN K, SHI J, et al. Libra R-CNN: towards balanced learning for object detection[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 821-830.
[22]	BELL S, ZITNICK C L, BALA K, et al. Inside-Outside net: detecting objects in context with skip pooling and recurrent neural networks[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2874-2883.
[23]	KONG T, YAO A, CHEN Y, et al. HyperNet: towards accurate region proposal generation and joint object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 845-853.
[24]	HARIHARAN B, ARBELÁEZ P P, GIRSHICK R B, et al. Hypercolumns for object segmentation and fine-grained localization[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 447-456.
[25]	HU J, SHEN L, ALBANIE S, et al. Squeeze-and-Excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011-2023. DOI PMID
[26]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[27]	WANG X, GIRSHICK R B, GUPTA A, et al. Non-local neural networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7794-7803.
[28]	CAO Y, XU J, LIN S, et al. GCNet: non-local networks meet squeeze-excitation networks and beyond[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Oct 27-28, 2019. Piscataway: IEEE, 2019: 1971-1980.
[29]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409. 1556, 2014.
[30]	ZHOU B, KHOSLA A, LAPEDRIZA À, et al. Object detectors emerge in deep scene CNNs[J]. arXiv:1412.6856, 2014.
[31]	DAI J, LI Y, HE K, et al. R-FCN: object detection via region-based fully convolutional networks[J]. arXiv:1605.06409, 2016.
[32]	JEONG J, PARK H, KWAK N. Enhancement of SSD by concatenating feature maps for object detection[J]. arXiv:1705.09587, 2017.
[33]	YI J, WU P, METAXAS D. ASSD: attentive single shot multibox detector[J]. Computer Vision and Image Understanding, 2019, 189: 102827.
[34]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359. DOI URL

SSD Object Detection Algorithm with Attention and Cross-Scale Fusion

注意力与跨尺度融合的SSD目标检测算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 34

Related Articles 15

Recommended Articles

Metrics

[1]	LYU Xiaoqi, JI Ke, CHEN Zhenxiang, SUN Runyuan, MA Kun, WU Jun, LI Yidong. Expert Recommendation Algorithm Combining Attention and Recurrent Neural Network [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 2068-2077.
[2]	ZHANG Xiangping, LIU Jianxun. Overview of Deep Learning-Based Code Representation and Its Applications [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 2011-2029.
[3]	LI Dongmei, LUO Sisi, ZHANG Xiaoping, XU Fu. Review on Named Entity Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1954-1968.
[4]	REN Ning, FU Yan, WU Yanxia, LIANG Pengju, HAN Xi. Review of Research on Imbalance Problem in Deep Learning Applied to Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1933-1953.
[5]	YANG Caidong, LI Chengyang, LI Zhongbo, XIE Yongqiang, SUN Fangwei, QI Jin. Review of Image Super-resolution Reconstruction Algorithms Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1990-2010.
[6]	ZENG Fanzhi, XU Luqian, ZHOU Yan, ZHOU Yuexia, LIAO Junwei. Review of Knowledge Tracing Model for Intelligent Education [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1742-1763.
[7]	AN Fengping, LI Xiaowei, CAO Xiang. Medical Image Classification Algorithm Based on Weight Initialization-Sliding Window CNN [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1885-1897.
[8]	LIU Yi, LI Mengmeng, ZHENG Qibin, QIN Wei, REN Xiaoguang. Survey on Video Object Tracking Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1504-1515.
[9]	ZHAO Xiaoming, YANG Yijiao, ZHANG Shiqing. Survey of Deep Learning Based Multimodal Emotion Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1479-1503.
[10]	XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610.
[11]	PENG Hao, LI Xiaoming. Multi-scale Selection Pyramid Networks for Small-Sample Target Detection Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1649-1660.
[12]	SUN Fangwei, LI Chengyang, XIE Yongqiang, LI Zhongbo, YANG Caidong, QI Jin. Review of Deep Learning Applied to Occluded Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1243-1259.
[13]	LIU Yafen, ZHENG Yifeng, JIANG Lingyi, LI Guohe, ZHANG Wenjie. Survey on Pseudo-Labeling Methods in Deep Semi-supervised Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1279-1290.
[14]	ZHAO Yunji, FAN Cunliang, ZHANG Xinliang. Object Tracking Algorithm with Fusion of Multi-feature and Channel Awareness [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1417-1428.
[15]	CHENG Weiyue, ZHANG Xueqin, LIN Kezheng, LI Ao. Deep Convolutional Neural Network Algorithm Fusing Global and Local Features [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1146-1154.