[1] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436.
[2] LIU T, ZHAO Y, WEI Y, et al. Concealed object detection for activate millimeter wave image[J]. IEEE Transactions on Industrial Electronics, 2019, 66(12): 9909-9917.
[3] LIU Z Y, WAN P P. Pedestrian re-identification feature extrac-tion method based on attention mechanism[J]. Journal of Computer Applications, 2020, 40(3): 672-676.
刘紫燕, 万培佩. 基于注意力机制的行人重识别特征提取方法[J]. 计算机应用, 2020, 40(3): 672-676.
[4] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 30th Annual Conference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 5998-6008.
[5] LUO H L, PENG S, CHEN H K. Review on latest research progress of challenging problems in object detection[J]. Com-puter Engineering and Applications, 2021, 57(5): 36-46.
罗会兰, 彭珊, 陈鸿坤. 目标检测难点问题最新研究进展综述[J]. 计算机工程与应用, 2021, 57(5): 36-46.
[6] YAN H, HUANG J, LI R A, et al. Research on video SAR moving target detection algorithm based on improved faster region-based CNN[J]. Journal of Electronics & Information Technology, 2021, 43(3): 615-622.
闫贺, 黄佳, 李睿安, 等. 基于改进快速区域卷积神经网络的视频SAR运动目标检测算法研究[J]. 电子与信息学报, 2021, 43(3): 615-622.
[7] DU L, WEI D, LI L, et al. SAR target detection network via semi-supervised learning[J]. Journal of Electronics & Infor-mation Technology, 2020, 42(1): 154-163.
杜兰, 魏迪, 李璐, 等. 基于半监督学习的SAR目标检测网络[J]. 电子与信息学报, 2020, 42(1): 154-163.
[8] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Con-ference on Computer Vision and Pattern Recognition, San Diego, Jun 20-26, 2005. Washington: IEEE Computer Society, 2005: 886-893.
[9] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[10] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D A, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645.
[11] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2013 IEEE Confer-ence on Computer Vision and Pattern Recognition, Colum-bus, Jun 20-23, 2013. Washington: IEEE Computer Society, 2005: 580-587.
[12] ZITNICK C L, DOLLAR P. Edge boxes: locating object pro-posals from edges[C]//LNCS 8693: Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 391-405.
[13] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
[14] LIN T, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//LNCS 8693: Proceedings of the 13th European Conference on Computer Vision, Sep 6-12, 2014. Cham: Springer, 2014: 740-755.
[15] KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems, Lake Tahoe, Dec 3-6, 2012. Red Hook: Curran Associates, 2012: 1106-1114.
[16] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 3rd International Conference on Learning Represen-tations, San Diego, May 7-9, 2015. Washington: IEEE Com-puter Society, 2015: 409-556.
[17] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convo-lutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-13, 2015. Washington: IEEE Computer Society, 2015: 1-9.
[18] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning, Lille, Jul 6-11, 2015: 448-456.
[19] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2818-2826.
[20] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-ResNet and the impact of residual connections on learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 4278-4284.
[21] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recogni-tion, Las Vegas, Jun 27-30, 2016. Washington: IEEE Com-puter Society, 2016: 770-778.
[22] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recogni-tion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[23] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion, Boston, Jun 7-13, 2015. Washington: IEEE Computer Society, 2015: 1440-1448.
[24] REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th Annual Conference on Neural Information Processing Systems, Montreal, Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 91-99.
[25] DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proceedings of the 29th Annual Conference on Neural Information Pro-cessing Systems, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 379-387.
[26] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 779-788.
[27] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6517-6525.
[28] REDMON J, FARHADI A. YOLOv3: an incremental im-provement[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018.
[29] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv: 2004.10934, 2020.
[30] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[31] XIAO Y Q, YANG H M. Research on application of object detection algorithm in traffic scene[J]. Computer Engineering and Applications, 2021, 57(6): 30-41.
肖雨晴, 杨慧敏. 目标检测算法在交通场景中应用综述[J]. 计算机工程与应用, 2021, 57(6): 30-41.
[32] LIU Z Y, YUAN L, ZHU M C, et al. YOLOv3 traffic sign detection based on SPP and improved FPN[J]. Computer Engineering and Applications, 2021, 57(7): 164-170.
刘紫燕, 袁磊, 朱明成, 等. 融合SPP和改进FPN的YOLOv3交通标志检测[J]. 计算机工程与应用, 2021, 57(7): 164-170.
[33] GIBSON J J. The perception of the visual world[M]. Boston: Houghton Mifflin Harcourt, 1950.
[34] ZHU X Z, XIONG Y W, DAI J F, et al. Deep feature flow for video recognition[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 4141-4150.
[35] ZHU X Z, WANG Y J, DAI J F, et al. Flow-guided feature aggregation for video object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 408-417.
[36] ZHU X Z, DAI J F, YUAN L, et al. Towards high perfor-mance video object detection[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recogni-tion, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7210-7218.
[37] ZHU X Z, DAI J F, ZHU X C, et al. Towards high perfor-mance video object detection for mobiles[J]. arXiv:1804. 05830, 2018.
[38] KANG K, OUYANG W L, LI H S, et al. Object detection from video tubelets with convolutional neural networks[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 817-825.
[39] CHEN Y H, CAO Y, WANG L W. Memory enhanced global-local aggregation for video object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 10334-10343.
[40] HAN M F, WANG Y L, CHANG X J, et al. Mining inter-video proposal relations for video object detection[C]//LNCS 12366: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 431-446.
[41] FEICHTENHOFER C, PINZ A, ZISSERMAN A. Detect to track and track to detect[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 3057-3065.
[42] WANG R J, LI X, LING C X. Pelee: a real-time object detection system on mobile devices[J]. arXiv:1804.06882, 2018.
[43] LIU M, ZHU M L, WHITE M, et al. Looking fast and slow: memory-guided mobile video object detection[J]. arXiv:1903.10172, 2019.
[44] HAN K, WANG Y H, CHEN H T, et al. A survey on visual transformer[J]. arXiv:2012.12556, 2020.
[45] BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[J]. arXiv:2005.14165, 2020.
[46] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018.
[47] BELTAGY I, LO K, COHAN A. SciBERT: a pretrained language model for scientific text[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 3613-3618.
[48] LEE J, YOON W, KIM S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
[49] ZHAO Y Q, RAO Y, DONG S P, et al. Survey on deep learning object detection[J]. Journal of Image and Graphics, 2020, 25(4): 629-654.
赵永强, 饶元, 董世鹏, 等. 深度学习目标检测方法综述[J]. 中国图象图形学报, 2020, 25(4): 629-654.
[50] XU D G, WANG L, LI F. Review of typical object detection algorithms for deep learning[J]. Computer Engineering and Applications, 2021, 57(8): 10-25.
许德刚, 王露, 李凡. 深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用, 2021, 57(8): 10-25.
[51] ZOU Z X, SHI Z W, GUO Y H, et al. Object detection in 20 years: a survey[J]. arXiv:1905.05055, 2019.
[52] BILKHU M, WANG S Y, DOBHAL T. Attention is all you need for videos: self-attention based video summarization using universal transformers[J]. arXiv:1906.02792, 2019.
[53] KHAN S, NASEER M, HAYAT M, et al. Transformers in vision: a survey[J]. arXiv:2101.01169, 2021.
[54] TAY Y, DEHGHANI M, BAHRI D, et al. Efficient trans-formers: a survey[J]. arXiv:2009.06732, 2020.
[55] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//LNCS 12346: Procee-dings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 213-229.
[56] BELLO I. LambdaNetworks: modeling long-range interactions without attention[J]. arXiv:2102.08602, 2021.
[57] ZHANG D, ZHANG H W, TANG J H, et al. Feature pyramid transformer[C]//LNCS 12373: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 323-339.
[58] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. Inter-national Journal of Computer Vision, 2010, 88(2): 303-338.
[59] EVERINGHAM M, ESLAMI S M A, VAN GOOL L, et al. The pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111(1): 98-136.
[60] GEIGER A, LENZ P, URTASUN R. Are we ready for auto-nomous driving? The KITTI vision benchmark suite[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, Jun 16-21, 2012. Washington: IEEE Computer Society, 2012: 3354-3361.
[61] GEIGER A, LENZ P, STILLER C, et al. Vision meets robo-tics: the KITTI dataset[J]. International Journal of Robotics Research, 2013, 32(11): 1231-1237.
[62] BEHRENDT K, NOVAK L, BOTROS R. A deep learning approach to traffic lights: detection, tracking, and classifica-tion[C]//Proceedings of the 2017 IEEE International Con-ference on Robotics and Automation, Singapore, May 29-Jun 3, 2017. Piscataway: IEEE, 2017: 1370-1377.
[63] HAN W, KHORRAMI P, PAINE T L, et al. Seq-NMS for video object detection[J]. arXiv:1602.08465, 2016.
[64] BELHASSEN H, ZHANG H, FRESSE V, et al. Improving video object detection by Seq-Bbox matching[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 226-233.
[65] SABATER A, MONTESANO L, MURILLO A C. Robust and efficient post-processing for video object detection[C]//Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2020: 10536-10542.
[66] ZHANG Z, CHENG D, ZHU X, et al. Integrated object detection and tracking with tracklet-conditioned detection[J]. arXiv:1811.11167, 2018.
[67] LIU M, ZHU M. Mobile video object detection with temporally-aware feature maps[C]//Proceedings of the 2018 IEEE Con-ference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 5686-5695.
[68] CHEN K, WANG J, YANG S, et al. Optimizing video object detection via a scale-time lattice[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7814-7823.
[69] WANG S, ZHOU Y, YAN J, et al. Fully motion-aware network for video object detection[C]//LNCS 11217: Pro-ceedings of the 2018 European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 542-557.
[70] WU H, CHEN Y, WANG N, et al. Sequence level semantics aggregation for video object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 9217-9225.
[71] DENG H, HUA Y, SONG T, et al. Object guided external memory network for video object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 6678-6687.
[72] DENG J, PAN Y, YAO T, et al. Relation distillation networks for video object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 7023-7032.
[73] XIAO F, LEE Y J. Video object detection with an aligned spatial-temporal memory[C]//LNCS 11217: Proceedings of the 2018 European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 485-501.
[74] BERTASIUS G, TORRESANI L, SHI J. Object detection in video with spatiotemporal sampling networks[C]//LNCS 11217: Proceedings of the 2018 European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 331-346.
[75] BENDRE N, MARíN H T, NAJAFIRAD P. Learning from few samples: a survey[J]. arXiv:2007.15484, 2020.
[76] WANG Y Q, YAO Q M, KWOK J T, et al. Generalizing from a few examples: a survey on few-shot learning[J]. ACM Computing Surveys, 2020, 53(3): 1-34.
[77] SUN Q R, LIU Y Y, CHUA T S, et al. Meta-transfer learning for few-shot learning[C]//Proceedings of the 2019 IEEE Con-ference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 403-412.
[78] YU X D, ALOIMONOS Y. Attribute-based transfer learning for object categorization with zero/one training example[C]//LNCS 6315: Proceedings of the 11th European Conference on Computer Vision, Heraklion, Sep 5-11, 2010. Berlin, Heidelberg: Springer, 2010: 127-140.
[79] REN M Y, TRIANTAFILLOU E, RAVI S, et al. Meta-learning for semi-supervised few-shot classification[J]. arXiv:1803.00676, 2018.
[80] JAMAL M A, QI G J. Task agnostic meta-learning for few-shot learning[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 11719-11727.
[81] WANG Y X, RAMANAN D, HEBERT M. Meta-learning to detect rare objects[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 9924-9933.
[82] HAO F S, HE F X, CHENG J, et al. Collect and select: semantic alignment metric learning for few-shot learning[C]//Proceedings of the 2019 IEEE/CVF International Con-ference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 8459-8468.
[83] SCHWARTZ E, KARLINSKY L, SHTOK J, et al. RepMet: representative-based metric learning for classification and one-shot object detection[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recogni-tion, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 5197-5206. |