无锚框目标检测模型通道剪枝方法

doi:10.3778/j.issn.1673-9418.2111102

摘要/Abstract

摘要： 针对无锚框目标检测模型主干网络参数冗杂度大、计算开销高以及检测速度慢等问题，提出双维度注意力引导的通道剪枝算法（CPDAM），以便对无锚框目标检测模型进行压缩。利用池化层和组归一化操作提升通道注意和空间注意子模块性能；采用通道分组策略融合改进后的通道注意和空间注意子模块，并经过不断训练，为每个通道生成一个尺度值用于表示该通道在分类任务上的重要程度；利用尺度值计算一个全局尺度值，并根据该值评估通道重要性对主干网络进行通道剪枝；在PASCAL VOC、ImageNet、CIFAR-100等常用数据集上对剪枝前后的无锚框目标检测模型进行实验验证，结果表明，在mAP仅损失0.6个百分点的前提下，剪枝前后的CenterNet-ResNet101参数量从6.995×107减少至2.238×107，FPS从27提升至46。

关键词: 无锚框, 目标检测, 注意力机制, 通道剪枝

Abstract: Aiming at the problems of large redundant parameters, high computational cost and slow detection speed of the anchor-free detector, a channel pruning method guided by double attention modules (CPDAM) is proposed to compress the anchor-free object detectors. The performance of the channel attention and spatial attention submodules is further improved using pooling and group normalization. The improved channel attention and spatial attention submodules are fused using a channel grouping strategy and are continuously trained to generate a scale value for each channel indicating the importance of the channel on the classification task. A global scale value is calculated using the scale values and the channel pruning of the backbone network is performed based on the evaluation of channel importance by this value. The improved anchor-free object detector is experimentally validated on PASCAL VOC, ImageNet and CIFAR-100 datasets, and the experimental results show that the number of parameters of CenterNet-ResNet101 before and after pruning is decreased from 6.995×107 to 2.238×107, and the FPS is increased from 27 to 46, with only 0.6 percentage points mAP loss.

Key words: anchor-free, object detector, attention module, channel pruning

冉梦影, 杨文柱, 尹群杰. 无锚框目标检测模型通道剪枝方法[J]. 计算机科学与探索, 2023, 17(7): 1634-1643.

RAN Mengying, YANG Wenzhu, YIN Qunjie. Channel Pruning Method for Anchor-Free Detector[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(7): 1634-1643.

参考文献

[1] ZHOU X, WANG D, KR?HENBüHL P. Objects as points[J]. arXiv:1904.07850, 2019.
[2] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Con-ference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778.
[3] HAN S, MAO H, DALLY W J. Deep compression: com-pressing deep neural networks with pruning, trained quan-tization and Huffman coding[J]. arXiv:1510.00149, 2015.
[4] 张良, 张增, 舒伟华, 等. 基于YOLOv3的卷积层结构化剪枝[J]. 计算机工程与应用, 2021, 57(6): 131-137.
ZHANG L, ZHANG Z, SHU W H, et al. Convolutional layered pruning based on YOLOv3[J]. Computer Engineering and Applications, 2021, 57(6): 131-137.
[5] 张宏丽, 白翔宇. 利用优化剪枝GoogLeNet的人脸表情识别方法[J]. 计算机工程与应用, 2021, 57(19): 179-188.
ZHANG H L, BAI X Y. Facial expression recognition method using optimized pruning GoogLeNet[J]. Computer Engineering and Applications, 2021, 57(19): 179-188.
[6] HE Y, ZHANG X, SUN J. Channel pruning for accelerating very deep neural networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 1398-1406．
[7] LUO J H, WU J, LIN W, et al. ThiNet: a filter level pruning method for deep neural network compression[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 5058-5066.
[8] YU R, LI A, CHEN C F, et al. NISP: pruning networks using neuron importance score propagation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 9194-9203.
[9] SONG F, WANG Y, GUO Y, et al. A channel-level pruning strategy for convolutional layers in CNNs[C]//Proceedings of the 2018 International Conference on Network Infrastru-cture and Digital Content, Guiyang, Aug 22-24, 2018. Pis-cataway: IEEE, 2018: 135-139.
[10] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141.
[11] YAMAMOTO K, MAENO K. PCAS: pruning channels with attention statistics for deep network compression[J]. arXiv:1806.05382, 2018.
[12] XIE S N, GIRSHICK R, DOLLáR P, et al. Aggregated residual transformations for deep neural networks[C]//Pro-ceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Was-hington: IEEE Computer Society, 2017: 5987-5995.
[13] LI X, HU X, YANG J. Spatial group-wise enhance: improving semantic feature learning in convolutional networks[J]. arXiv:1905.09646, 2019.
[14] FU J L, ZHENG H L, MEI T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 4438-4446.
[15] ZHANG R. Making convolutional networks shift-invariant again[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 7324-7334.
[16] SCHERER D, MüLLER A, BEHNKE S. Evaluation of pooling operations in convolutional architectures for object recognition[C]//LNCS 6354: Proceedings of the Interna-tional Conference on Artificial Neural Networks, Thessalo-niki, Sep 15-18, 2010. Berlin, Heidelberg: Springer, 2010: 92-101.
[17] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[18] WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 11531-11539.
[19] WU Y X, HE K M. Group normalization[C]//LNCS 11217：Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham:Springer, 2018: 3-19.
[20] XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]//Procee-dings of the 32nd International Conference on Machine Learning, Lille, Jul 6-11, 2015: 2048-2057.
[21] GAO X, ZHAO Y, DUDZIAK ?, et al. Dynamic channel pruning: feature boosting and suppression[J]. arXiv:1810. 05331, 2018.
[22] ZHUANG Z, TAN M, ZHUANG B, et al. Discrimination-aware channel pruning for deep neural networks[J]. arXiv: 1810.11809, 2018.
[23] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems 28, Montreal, Dec 7-12, 2015: 91-99.
[24] DAI J, LI Y, HE K, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems 29, Barcelona, Dec 5-10, 2016: 379-387.
[25] REDMON J, FARHADI A. YOLOv3: an incremental im-provement[J]. arXiv:1804.02767, 2018.
[26] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[27] FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv:1701.06659, 2017.
[28] TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully con-volutional one-stage object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 9626-9635.