Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (1): 217-230.DOI: 10.3778/j.issn.1673-9418.2007059
• Graphics and Image • Previous Articles Next Articles
LI Zhixin1,+(), CHEN Shengjia1, ZHOU Tao1, MA Huifang2
Received:
2020-07-03
Revised:
2020-09-09
Online:
2022-01-01
Published:
2020-09-25
About author:
LI Zhixin, born in 1971, Ph.D., professor, Ph.D. supervisor, senior member of CCF. His research interests include image understanding, machine learning, natural language processing and cross-media computing.Supported by:
通讯作者:
+ E-mail: lizx@gxnu.edu.cn作者简介:
李志欣(1971—),男,博士,教授,博士生导师,CCF高级会员,主要研究方向为图像理解、机器学习、自然语言处理、跨媒体计算。基金资助:
CLC Number:
LI Zhixin, CHEN Shengjia, ZHOU Tao, MA Huifang. Combining Cascaded Network and Adversarial Network for Object Detection[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 217-230.
李志欣, 陈圣嘉, 周韬, 马慧芳. 协同级联网络和对抗网络的目标检测[J]. 计算机科学与探索, 2022, 16(1): 217-230.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2007059
Method | Anchor | Pooling sizes | mAP/% |
---|---|---|---|
Faster R-CNN | (128,256,512) | 7×7 | 73.2 |
Faster R-CNN | (64,128,256,512) | 7×7 | 73.3 |
Cascaded network | (64,128,256,512) | 7×7 | 73.9 |
Cascaded network+RoIAligns | (64,128,256,512) | 3×11、11×3、7×7 | 74.5 |
Cascaded network+RoIAligns (Improved R-CNN) | (64,128,256,512) | 3×11、11×3、7×7、11×11 | 74.8 |
Improved R-CNN+FT | (64,128,256,512) | 3×11、11×3、7×7、11×11 | 75.2 |
Improved R-CNN+FT+ASDN (Collaborative R-CNN) | (64,128,256,512) | 3×11、11×3、7×7、11×11 | 77.5 |
Table 1 Results of ablation experiments on PASCAL VOC 2007 dataset
Method | Anchor | Pooling sizes | mAP/% |
---|---|---|---|
Faster R-CNN | (128,256,512) | 7×7 | 73.2 |
Faster R-CNN | (64,128,256,512) | 7×7 | 73.3 |
Cascaded network | (64,128,256,512) | 7×7 | 73.9 |
Cascaded network+RoIAligns | (64,128,256,512) | 3×11、11×3、7×7 | 74.5 |
Cascaded network+RoIAligns (Improved R-CNN) | (64,128,256,512) | 3×11、11×3、7×7、11×11 | 74.8 |
Improved R-CNN+FT | (64,128,256,512) | 3×11、11×3、7×7、11×11 | 75.2 |
Improved R-CNN+FT+ASDN (Collaborative R-CNN) | (64,128,256,512) | 3×11、11×3、7×7、11×11 | 77.5 |
Method | Backbone | Train data | Input resolution/pixel | mAP/% |
---|---|---|---|---|
Faster R-CNN[ | VGG16 | 07+12 | 600×1 000 | 73.2 |
A-Fast-RCNN[ | VGG16 | 07+12 | 600×1 000 | 71.4 |
NOC[ | VGG16 | 07+12 | 600×1 000 | 73.3 |
SSD[ | VGG16 | 07+12 | — | 75.1 |
RON[ | VGG16 | 07+12 | 384×384 | 77.6 |
ION[ | VGG16 | 07+12 | 600×1 000 | 75.6 |
SIN[ | VGG16 | 07+12 | 600×1 000 | 76.0 |
RGC[ | VGG16 | 07+12 | 600×1 000 | 76.1 |
SSD321[ | VGG16 | 07+12 | 321×321 | 77.1 |
YOLOv3[ | DarkNet | 07+12 | 320×320 | 78.6 |
CenterNet[ | ResNet101 | 07+12 | 384×384 | 78.7 |
Collaborative R-CNN | VGG16 | 07+12 | 600×1 000 | 77.5 |
Table 2 Experimental results of object detection on PASCAL VOC 2007 dataset
Method | Backbone | Train data | Input resolution/pixel | mAP/% |
---|---|---|---|---|
Faster R-CNN[ | VGG16 | 07+12 | 600×1 000 | 73.2 |
A-Fast-RCNN[ | VGG16 | 07+12 | 600×1 000 | 71.4 |
NOC[ | VGG16 | 07+12 | 600×1 000 | 73.3 |
SSD[ | VGG16 | 07+12 | — | 75.1 |
RON[ | VGG16 | 07+12 | 384×384 | 77.6 |
ION[ | VGG16 | 07+12 | 600×1 000 | 75.6 |
SIN[ | VGG16 | 07+12 | 600×1 000 | 76.0 |
RGC[ | VGG16 | 07+12 | 600×1 000 | 76.1 |
SSD321[ | VGG16 | 07+12 | 321×321 | 77.1 |
YOLOv3[ | DarkNet | 07+12 | 320×320 | 78.6 |
CenterNet[ | ResNet101 | 07+12 | 384×384 | 78.7 |
Collaborative R-CNN | VGG16 | 07+12 | 600×1 000 | 77.5 |
Object | AP | ||||
---|---|---|---|---|---|
Faster R-CNN[ | A-Fast-RCNN[ | SSD[ | RON[ | CollaborativeR-CNN | |
aero | 84.9 | 82.2 | 84.9 | 86.5 | 87.0 |
bike | 79.8 | 75.6 | 82.6 | 82.9 | 83.5 |
bird | 74.3 | 69.2 | 74.4 | 76.6 | 78.9 |
blt | 53.9 | 52.0 | 55.8 | 60.9 | 60.1 |
boat | 49.8 | 47.2 | 50.0 | 55.8 | 57.6 |
bus | 77.5 | 76.3 | 80.3 | 81.7 | 83.2 |
car | 75.9 | 71.2 | 78.9 | 80.2 | 80.5 |
cat | 88.5 | 88.5 | 88.8 | 91.1 | 90.2 |
chair | 45.6 | 46.8 | 53.7 | 57.3 | 51.6 |
cow | 77.1 | 74.0 | 76.8 | 81.1 | 82.4 |
dog | 55.3 | 58.1 | 59.4 | 60.4 | 61.6 |
hrs | 86.9 | 85.6 | 87.6 | 87.2 | 89.9 |
mbk | 81.7 | 80.3 | 83.7 | 84.8 | 89.8 |
per | 80.9 | 80.5 | 82.6 | 84.9 | 82.8 |
plant | 79.6 | 74.7 | 81.4 | 81.7 | 86.6 |
shp | 40.1 | 41.5 | 47.2 | 51.9 | 47.4 |
sofa | 72.6 | 70.4 | 75.5 | 79.1 | 74.2 |
table | 60.9 | 62.2 | 65.6 | 68.6 | 70.0 |
train | 81.2 | 77.4 | 84.3 | 84.1 | 86.6 |
tv | 61.5 | 67.0 | 68.1 | 70.3 | 69.9 |
mAP | 70.4 | 69.0 | 73.1 | 75.4 | 75.7 |
Table 3 Experimental results of object detection on PASCAL VOC 2012 dataset %
Object | AP | ||||
---|---|---|---|---|---|
Faster R-CNN[ | A-Fast-RCNN[ | SSD[ | RON[ | CollaborativeR-CNN | |
aero | 84.9 | 82.2 | 84.9 | 86.5 | 87.0 |
bike | 79.8 | 75.6 | 82.6 | 82.9 | 83.5 |
bird | 74.3 | 69.2 | 74.4 | 76.6 | 78.9 |
blt | 53.9 | 52.0 | 55.8 | 60.9 | 60.1 |
boat | 49.8 | 47.2 | 50.0 | 55.8 | 57.6 |
bus | 77.5 | 76.3 | 80.3 | 81.7 | 83.2 |
car | 75.9 | 71.2 | 78.9 | 80.2 | 80.5 |
cat | 88.5 | 88.5 | 88.8 | 91.1 | 90.2 |
chair | 45.6 | 46.8 | 53.7 | 57.3 | 51.6 |
cow | 77.1 | 74.0 | 76.8 | 81.1 | 82.4 |
dog | 55.3 | 58.1 | 59.4 | 60.4 | 61.6 |
hrs | 86.9 | 85.6 | 87.6 | 87.2 | 89.9 |
mbk | 81.7 | 80.3 | 83.7 | 84.8 | 89.8 |
per | 80.9 | 80.5 | 82.6 | 84.9 | 82.8 |
plant | 79.6 | 74.7 | 81.4 | 81.7 | 86.6 |
shp | 40.1 | 41.5 | 47.2 | 51.9 | 47.4 |
sofa | 72.6 | 70.4 | 75.5 | 79.1 | 74.2 |
table | 60.9 | 62.2 | 65.6 | 68.6 | 70.0 |
train | 81.2 | 77.4 | 84.3 | 84.1 | 86.6 |
tv | 61.5 | 67.0 | 68.1 | 70.3 | 69.9 |
mAP | 70.4 | 69.0 | 73.1 | 75.4 | 75.7 |
Type | Method | Test time/(ms/image) | mAP/% |
---|---|---|---|
One-stage | SSD | 46 | 73.1 |
RON | 67 | 75.4 | |
Two-stage | Faster R-CNN | 200 | 70.4 |
Collaborative R-CNN | 234 | 75.7 |
Table 4 Experimental results on PASCAL VOC 2012 dataset
Type | Method | Test time/(ms/image) | mAP/% |
---|---|---|---|
One-stage | SSD | 46 | 73.1 |
RON | 67 | 75.4 | |
Two-stage | Faster R-CNN | 200 | 70.4 |
Collaborative R-CNN | 234 | 75.7 |
[1] | GIRSHICK R B, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 580-587. |
[2] | GIRSHICK R B. Fast R-CNN[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1440-1448. |
[3] | REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]// Proceedings of the Annual Conference on Neural In-formation Processing Systems, Montreal, Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 91-99. |
[4] |
RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252.
DOI URL |
[5] |
WEI S T, LI Z X, ZHANG C L. Combined constraint-based with metric-based in semi-supervised clustering ensemble[J]. International Journal of Machine Learning and Cybernetics, 2018, 9(7):1085-1100.
DOI URL |
[6] | WEI Y C, XIA W, LIN M, et al. HCP: a flexible CNN frame-work for multi-label image classification[J]. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 2015, 38(9):1901-1907. |
[7] |
ZHENG Y Z, LI Z X, ZHANG C L. A hybrid architecture based on CNN for cross-modal semantic instance annotation[J]. Multimedia Tools and Applications, 2018, 77(7):8695-8710.
DOI URL |
[8] | DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]// Proceedings of the Annual Conference on Neural Information Processing Systems, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 379-387. |
[9] | KONG T, YAO A B, CHEN Y R, et al. HyperNet: towards accurate region proposal generation and joint object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 845-853. |
[10] | SERMANET P, EIGEN D, ZHANG X, et al. OverFeat: integrated recognition, localization and detection using con-volutional networks[J]. arXiv:1312.6229, 2013. |
[11] | LIN T Y. DOLLÁR P, GIRSHICK R B, et al. Feature pyramid networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recogni-tion, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 936-944. |
[12] |
HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recogni-tion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
DOI URL |
[13] | EVERINGHAM M, VAN GOOL L, WILLIAMS C I, et al. The PASCAL visual object classes (VOC) challenge[J]. Inter-national Journal of Computer Vision, 2010, 88(2):303-338. |
[14] |
UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2):154-171.
DOI URL |
[15] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37. |
[16] | REDMON J, DIVVALA S K, GIRSHICK R B, et al. You only look once: unified, real-time object detection[C]// Pro-ceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Wash-ington: IEEE Computer Society, 2016: 779-788. |
[17] | KONG T, SUN F C, YAO A B, et al. RON: reverse connec-tion with objectness prior networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 5244-5252. |
[18] | 刘云, 钱美伊, 李辉, 等. 深度学习的多尺度多人目标检测方法研究[J]. 计算机工程与应用, 2020, 56(6):172-179. |
LIU Y, QIAN M Y, LI H, et al. Research on multi-scale and multi-human detection method of deep learning[J]. Computer Engineering and Applications, 2020, 56(6):172-179. | |
[19] | 杨雅茹, 邓红霞, 王哲, 等. 浅层特征融合引导的深层网络行人检测[J]. 计算机工程与应用, 2020, 56(2):196-200. |
YANG Y R, DENG H X, WANG Z, et al. Deep network pedestrian detection guided by shallow feature fusion[J]. Computer Engineering and Applications, 2020, 56(2):196-200. | |
[20] | 陈幻杰, 王琦琦, 杨国威, 等. 多尺度卷积特征融合的SSD目标检测算法[J]. 计算机科学与探索, 2019, 13(6):1049-1061. |
CHEN H J, WANG Q Q, YANG G W, et al. SSD object detection algorithm with multi-scale convolution feature fusion[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(6):1049-1061. | |
[21] | LIU Y, WANG R P, SHAN S G, et al. Structure inference net: object detection using scene-level context and instance-level relationships[C]// Proceedings of the 2018 IEEE Con-ference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 6985-6994. |
[22] | HE C H, LAI S C, LAM K M, et al. Improving object detection with relation graph inference[C]// Proceedings of the 2019 International Conference on Acoustics Speech and Signal Processing, Brighton, May 12-17, 2019. Piscataway: IEEE, 2019: 2537-2541. |
[23] | REDMON J, FARHADI A. YOLOv3: an incremental improve- ment[J]. arXiv:1804.02767, 2018. |
[24] | ZHOU X Y, WANG D Q, KRÄHENBÜHL P. Objects as points[J]. arXiv:1904.07850, 2019. |
[25] | WANG X L, SHRIVASTAVA A, GUPTA A. A-Fast-RCNN: hard positive generation via adversary for object detection[C]// Proceedings of the 2017 IEEE Conference on Comp-uter Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 3039-3048. |
[26] | ZHOU T, LI Z X, ZHANG C L, et al. An improved convo-lutional neural network model with adversarial net for multi-label image classification[C]// LNCS 11013: Proceedings of the 15th Pacific Rim International Conference on Artificial Intelligence, Nanjing, Aug 28-31, 2018. Cham: Springer, 2018: 38-46. |
[27] | CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7103-7112. |
[28] | SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, Inception-ResNet and the impact of residual connections on learning[J]. arXiv:1602.07261, 2016. |
[29] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014 |
[30] | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Image-Net classification with deep convolutional neural networks[C]// Proceedings of the Advances in Neural Information Processing Systems. Red Hook: Curran Associates, 2012: 1106-1114. |
[31] | OQUAB M, BOTTOU L, LAPTEV I, et al. Learning and transferring mid-level image representations using convolu-tional neural networks[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 1717-1724. |
[32] | ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks[C]// LNCS 8689: Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 818-833. |
[33] | YOSINSKI J, CLUNE J, BENGIO Y, et al. How transfer-able are features in deep neural networks?[C]// Proceedings of the Annual Conference on Neural Information Processing Systems, Montreal, Dec 8-13, 2014. Red Hook: Curran Ass-ociates, 2014: 3320-3328. |
[34] | ZHANG C L, LUO J H, WEI X S, et al. In defense of fully connected layers in visual representation transfer[C]// LNCS 10736: Proceedings of the 18th Pacific-Rim Conference on Multimedia Advances in Multimedia Information Processing, Harbin, Sep 28-29, 2017. Cham: Springer, 2017: 807-817. |
[35] | HE K M, GKIOXARI G. DOLLÁR P, et al. Mask R-CNN[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 2961-2969. |
[36] | JIANG Y, ZHU X, WANG X, et al. R2CNN: rotational region CNN for orientation robust scene text detection[J]. arXiv:1706.09579, 2017. |
[37] | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2261-2269. |
[38] | REN S Q, HE K M, GIRSHICK R B, et al. Object detection networks on convolutional feature maps[J]. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 2016, 39(7):1476-1481. |
[39] | BELL S, ZITNICK C L, BALA K, et al. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks[C]// Proceedings of the 2016 IEEE Confer-ence on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2874-2883. |
[1] | AN Fengping, LI Xiaowei, CAO Xiang. Medical Image Classification Algorithm Based on Weight Initialization-Sliding Window CNN [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1885-1897. |
[2] | XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610. |
[3] | PENG Hao, LI Xiaoming. Multi-scale Selection Pyramid Networks for Small-Sample Target Detection Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1649-1660. |
[4] | SUN Fangwei, LI Chengyang, XIE Yongqiang, LI Zhongbo, YANG Caidong, QI Jin. Review of Deep Learning Applied to Occluded Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1243-1259. |
[5] | ZHAO Yunji, FAN Cunliang, ZHANG Xinliang. Object Tracking Algorithm with Fusion of Multi-feature and Channel Awareness [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1417-1428. |
[6] | SHEN Ruicai, ZHAI Junhai, HOU Yingzhen. Multi-discriminator Generative Adversarial Networks Based on Selective Ensemble Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1429-1438. |
[7] | LIN Jiawei, WANG Shitong. Deep Adversarial-Reconstruction-Classification Networks for Unsupervised Domain Adaptation [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1107-1116. |
[8] | CHENG Weiyue, ZHANG Xueqin, LIN Kezheng, LI Ao. Deep Convolutional Neural Network Algorithm Fusing Global and Local Features [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1146-1154. |
[9] | TONG Gan, HUANG Libo. Review of Winograd Fast Convolution Technique Research [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 959-971. |
[10] | ZHAO Pengfei, XIE Linbo, PENG Li. Deep Small Object Detection Algorithm Integrating Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 927-937. |
[11] | FU Xuanyi, ZHANG Luanjing, LIANG Wenke, BI Fangming, FANG Weidong. Review on Development of Anchor Mechanism in Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 791-805. |
[12] | PEI Lishen, ZHAO Xuezhuan. Survey of Collective Activity Recognition Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 775-790. |
[13] | LU Zhongda, ZHANG Chunda, ZHANG Jiaqi, WANG Zifei, XU Junhua. Identification of Apple Leaf Disease Based on Dual Branch Network [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 917-926. |
[14] | BAO Guangbin, LI Gangle, WANG Guoxiong. Bimodal Interactive Attention for Multimodal Sentiment Analysis [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 909-916. |
[15] | JIANG Yi, XU Jiajie, LIU Xu, ZHU Junwu. Research on Edge-Guided Image Repair Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 669-682. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/