[1] 张婷, 张兴忠, 王慧民, 等. 基于图神经网络的变电站场景三维目标检测[J]. 计算机工程与应用, 2023, 59(9): 329-336.
ZHANG T, ZHANG X Z, WANG H M, et al. 3D object detection in substation scene based on graph neural network[J]. Computer Engineering and Applications, 2023, 59(9): 329-336.
[2] 陆慧敏, 杨朔. 基于深度神经网络的自动驾驶场景三维目标检测算法[J]. 北京工业大学学报, 2022, 48(6): 589-597.
LU H M, YANG S. Three-dimensional object detection algorithm based on deep neural networks for automatic driving[J]. Journal of Beijing University of Technology, 2022, 48(6): 589-597.
[3] 黄磊, 杨媛, 杨成煜, 等. FS-YOLOv5:轻量化红外目标检测方法[J]. 计算机工程与应用, 2023, 59(9): 215-224.
HUANG L, YANG Y, YANG C Y, et al. FS-YOLOv5: lightweight infrared rode target detection method[J]. Computer Engineering and Applications, 2023, 59(9): 215-224.
[4] 谭暑秋, 汤国放, 涂媛雅, 等. 教室监控下学生异常行为检测系统[J]. 计算机工程与应用, 2022, 58(7): 176-184.
TAN S Q, TANG G F, TU Y Y, et al. Classroom monitoring students abnormal behavior detection system[J]. Computer Engineering and Applications, 2022, 58(7): 176-184.
[5] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-11-22]. https://arxiv.org/abs/1804.02767.
[6] 甘海明, 薛月菊, 李诗梅, 等. 基于时空信息融合的母猪哺乳行为识别[J]. 农业机械学报, 2020, 51(S1): 357-363.
GAN H M, XUE Y J, LI S M, et al. Automatic sow nursing behaviour recognition based on spatio-temporal information fusion[J]. Journal of Agricultural Machinery, 2020, 51(S1): 357-363.
[7] FEICHTENHOFER C, PINZ A, ZISSERMAN A. Convolutional two-stream network fusion for video action recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 1933-1941.
[8] 杨永闯, 王昊, 王新良. 基于改进SSD的食物浪费行为识别方法[J]. 计算机工程与设计, 2023, 44(8): 2523-2530.
YANG Y C, WANG H, WANG X L. Food waste behavior recognition method based on improved SSD[J]. Computer Engineering and Design, 2023, 44(8): 2523-2530.
[9] 胡学敏, 陈钦, 杨丽, 等. 基于深度时空卷积神经网络的人群异常行为检测和定位[J]. 计算机应用研究, 2020, 37(3): 891-895.
HU X M, CHEN Q, YANG L, et al. Abnormal crowd behavior detection and localization based on deep spatial-temporal convolutional neural network[J]. Computer Application Research, 2020, 37(3): 891-895.
[10] JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(1): 221-231.
[11] 吴丽君, 李斌斌, 陈志聪, 等. 3D多重注意力机制下的行为识别[J]. 福州大学学报(自然科学版), 2022, 50(1): 47-53.
WU L J, LI B B, CHEN Z C, et al. Action recognition under 3D multiple attention mechanism[J]. Journal of Fuzhou University (Natural Science Edition), 2022, 50(1): 47-53.
[12] HARA K, KATAOKA H, SATOH Y. Learning spatio-temporal features with 3D residual networks for action recognition[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2017: 3154-3160.
[13] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[14] 杨乐, 黎亦凡, 陈曦, 等. 基于ST-SlowFast的电力生产环境违规行为检测[J]. 智慧电力, 2023, 51(6): 71-77.
YANG L, LI Y F, CHEN X, et al. Violation detectionin power production scenarios based on ST-SlowFastin[J]. Smart Electricity, 2023, 51(6): 71-77.
[15] FEICHTENHOFER C, FAN H, MALIK J, et al. Slowfast networks for video recognition[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 6202-6211.
[16] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2023-11-22]. https://arxiv.org/abs/2010.11929.
[17] FAN H, XIONG B, MANGALAM K, et al. Multiscale vision transformers[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 6824-6835.
[18] ARNAB A, DEHGHANI M, HEIGOLD G, et al. ViViT: a video vision transformer[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 6836-6846.
[19] LIU Z, NING J, CAO Y, et al. Video swin transformer[C]//Proceedings of the 2022 IEEE/CVF Conference on Com-puter Vision and Pattern Recognition. Piscataway: IEEE, 2022: 3202-3211.
[20] LIN J, GAN C, HAN S. TSM: temporal shift modulefor efficient video understanding[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7083-7093.
[21] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[22] CHEN L C, PAPAN-DREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. [2023-11-22]. https://arxiv.org/abs/1706.05587.
[23] GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLOseries in 2021[EB/OL]. [2023-11-22]. https://arxiv.org/abs/2107.08430.
[24] HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2017: 2961-2969.
[25] WU C Y, FEICHT-ENHOFER C, FAN H, et al. Long-term feature banks for detailed video understanding[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 284-293.
[26] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 7132-7141.
[27] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 2117-2125.
[28] GU C, SUN C, ROSS D A, et al. AVA: a video dataset of spatio-temporally localized atomic visual actions[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 6047-6056.
[29] SULTANI W, CHEN C, SHAH M. Real-world anomaly detection in surveillance videos[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 6479-6488.
[30] CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? A new model and the kinetics dataset[C]//Proceedings of the 2017 IEEE Conference on Computer Visionand Pattern Recognition. Washington: IEEE Computer Society, 2017: 6299-6308.
[31] SUN C, SHRIVASTAVA A, VONDRICK C, et al. Actor-centric relation network[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 318-334.
[32] FEICHTENHOFER C. X3D: expanding architectures for efficient video recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recog-nition. Piscataway: IEEE, 2020: 203-213.
[33] TRAN D, WANG H, TORRESANI L, et al. Video classification with channel-separated convolutional networks[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 5552-5561.
[34] LI Y, WU C Y, FAN H, et al. MViTv2: improved multiscale vision transformers for classification and detection[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 4804-4814. |