[1] SHOU Z, WANG D G, CHANG S F. Temporal action locali-zation in untrimmed videos via multistage CNNs[C]//Pro-ceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Wash-ington: IEEE Computer Society, 2016: 1049-1058.
[2] DUCHENNE O, LAPTEV I, SIVIC J, et al. Automatic annot-ation of human actions in video[C]//Proceedings of the 2009 IEEE International Conference on Computer Vision, Kyoto, Sep 29-Oct 2, 2009. Piscataway: IEEE, 2009: 1491-1498.
[3] FENG X Y, MEI W, HU D S. Aerial target detection based on improved faster R-CNN[J]. Acta Optica Sinica, 2018, 38(6): 242-250.
冯小雨, 梅卫, 胡大帅. 基于改进Faster R-CNN的空中目标检测[J]. 光学学报, 2018, 38(6): 242-250.
[4] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceed-ings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 779-788.
[5] ESCORCIA V, HEILBRON F C, NIEBLES J C, et al. DAPs: deep action proposals for action understanding[C]//LNCS 9907: Proceedings of the European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Berlin, Heidelberg: Springer, 2016: 768-784.
[6] BUCH S, VICTOR E, ESCORCIA V, et al. SST: single-stream temporal action proposals[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 22-25, 2017. Washington: IEEE Computer Society, 2017: 6373-6382.
[7] GUO D S, LI W, FANG X Z. Fully convolutional network for multiscale temporal action proposals[J]. IEEE Transac-tions on Multimedia, 2018, 20(12): 3428-3438.
[8] GAO J Y, YANG Z H, CHEN S, et al. TURN TAP: temporal unit regression network for temporal action proposals[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 24-27, 2017. Piscataway: IEEE, 2017: 3648-3656.
[9] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatio-temporal features with 3D convolutional networks[C]//Pro-ceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Dec 13-16, 2015. Piscataway: IEEE, 2015: 4489-4497.
[10] CARREIRA J, ZISSERMAN A. Quo Vadis, action reco-gnition? A new model and the kinetics dataset[C]//Proceed-ings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 24-27, 2017. Piscataway: IEEE, 2017: 4724-4733.
[11] LI Q H, LI A H, WANG T, et al. Double-stream convolutional networks with sequential optical flow image for action rec-ognition[J]. Acta Optica Sinica, 2018, 38(6): 226-232.
李庆辉, 李艾华, 王涛, 等. 结合有序光流图和双流卷积网络的行为识别[J]. 光学学报, 2018, 38(6): 226-232.
[12] SONG L F, WENG L G, WANG L F, et al. Multi-scale 3D convolution fusion two-stream networks for action recogni-tion[J]. Journal of Computer-Aided Design & Computer Graphics, 2018, 30(11): 99-108.
宋立飞, 翁理国, 汪凌峰, 等. 多尺度输入3D卷积融合双流模型的行为识别方法[J]. 计算机辅助设计与图形学学报, 2018, 30(11): 99-108.
[13] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Process-ing, 1997, 45(11): 2673-2681.
[14] PASCANU R, MIKOLOV T, BENGIO Y. On the difficulty of training recurrent neural networks[J]. arXiv:1211.5063, 2012.
[15] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[16] CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Pro-cessing, Qatar, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 1724-1734.
[17] HOSANG J, BENENSON R, SCHIELE B. A convnet for non-maximum suppression[C]//LNCS 9796: Proceedings of the 38th German Conference on Pattern Recognition, Han-nover, Sep 12-15, 2015. Berlin, Heidelberg: Springer, 2016: 192-204.
[18] JIANG Y, LIU J, ROSHAN Z A, et al. Thumos challenge: action recognition with a large number of classes[EB/OL]. [2019-11-20]. http://crcv.ucf.edu/THUMOS14/.
[19] SOOMRO K, ZAMIR A R, SHAH M. UCF101: a dataset of 101 human action classes from videos in the wild[EB/OL]. [2019-11-20]. http://crcv.ucf.edu/data/UCF101.php.
[20] LI N N, GUO H W, ZHAO Y, et al. Active temporal action detection in untrimmed videos via deep reinforcement learn-ing[J]. IEEE Access, 2018, 6: 59126-59140. |