[1] Chen J Y, Song X M, Nie L Q, et al. Micro tells macro: predic-ting the popularity of micro-videos via a transductive model[C]//Proceedings of the 2016 ACM Conference on Multi-media Conference, Amsterdam, Oct 15-19, 2016. New York: ACM, 2016: 898-907.
[2] Liu M, Nie L Q, Wang X, et al. Online data organizer: micro-video categorization by structure-guided multimodal dictio-nary learning[J]. IEEE Transactions on Image Processing, 2019, 28(3): 1235-1247.
[3] Zhang J L, Nie L Q, Wang X, et al. Shorter-is-better: venue category estimation from micro-video[C]//Proceedings of the 2016 ACM Conference on Multimedia Conference, Ams-terdam, Oct 15-19, 2016. New York: ACM, 2016: 1415-1424.
[4] Liu J, Wang G, Duan L Y, et al. Skeleton based human action recognition with global context-aware attention LSTM net-works[J]. IEEE Transactions on Image Processing, 2018, 27(4): 1586-1599.
[5] Yu K, Yun F. Bilinear heterogeneous information machine for RGB-D action recognition[J]. International Journal of Com-puter Vision, 2017, 123(3): 350-371.
[6] Liu A A, Xu N, Nie W Z, et al. Benchmarking a multimodal and multiview and interactive dataset for human action recognition[J]. IEEE Transactions on Cybernetics, 2017, 47(7): 1781-1794.
[7] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 2014 Annual Conference on Neural Information Processing Sys-tems, Montreal, Dec 8-13, 2014. Red Hook: Curran Associates, 2014: 568-576.
[8] Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]//Pro-ceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Wash-ington: IEEE Computer Society, 2014: 1725-1732.
[9] Ji S W, Xu W, Yang M, et al. 3D convolutional neural net-works for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221-231.
[10] Wang X H, Gao L L, Song J K, et al. Beyond frame-level CNN: saliency-aware 3D CNN with LSTM for video action recognition[J]. IEEE Signal Processing Letters, 2017, 24(4): 510-514.
[11] Peng B, Lei J J, Fu H Z, et al. Unsupervised video action clustering via motion-scene interaction constraint[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(1): 131-144.
[12] Zhang H B, Lei Q, Chen D S, et al. Probability-based method for Boosting human action recognition using scene context[J]. IET Computer Vision, 2016, 10(6): 528-536.
[13] Yang J, Shi Z K, Wu Z Y. Vision-based action recognition of construction workers using dense trajectories[J]. Advanced Engineering Informatics, 2016, 30(3): 327-336.
[14] Hou J Y, Wu X X, Sun Y C, et al. Content-attention rep-resentation by factorized action-scene network for action re-cognition[J]. IEEE Transactions on Multimedia, 2018, 20(6): 1537-1547.
[15] Dai J F, Qi H Z, Xiong Y W, et al. Deformable convolu-tional networks[C]//Proceedings of the 2017 IEEE Interna-tional Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 764-773.
[16] Sigurdsson G A, Russakovsky O, Gupta A. What actions are needed for understanding human actions in videos?[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 2156-2165. |