[1] LIU X Y, YAN M Y, BOHG J. MeteorNet: deep learning on dynamic 3D point cloud sequences[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9246-9255.
[2] FAN H H, YU X, DING Y H, et al. Point spatio-temporal convolution on point cloud sequences[C]//Proceedings of the 2021 International Conference on Learning Representations, Vienna, May 4-8, 2021.
[3] WANG Y C, XIAO Y, XIONG F, et al. 3DV: 3D dynamic voxel for action recognition in depth video[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 511-520.
[4] FAN H H, YANG Y, KANKANHALLI M. Point spatio-temporal transformer networks for point cloud video modeling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(2): 2181-2192.
[5] 田钰琪, 刘康, 张远辉. 基于毫米波雷达点云的人体动作识别[J]. 中国计量大学学报, 2023, 34(1): 66-73.
TIAN Y Q, LIU K, ZHANG Y H. Human activity recognition based on millimeter wave radar point cloud[J]. Journal of China University of Metrology, 2023, 34(1): 66-73.
[6] ZHONG J X, ZHOU K, HU Q Y, et al. No pain, big gain: classify dynamic point cloud sequences with static models by fitting feature-level space-time surfaces[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 8510-8520.
[7] 赵登阁, 智敏. 用于人体动作识别的多尺度时空图卷积算法[J]. 计算机科学与探索, 2023, 17(3): 719-732.
ZHAO D G, ZHI M. Spatial multiple-temporal graph convolutional neural network for human action recognition[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 719-732.
[8] FAN H H, YANG Y, KANKANHALLI M. Point 4D transformer networks for spatio-temporal modeling in point cloud videos[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 14204-14213.
[9] HE K M, CHEN X L, XIE S N, et al. Masked autoencoders are scalable vision learners[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 16000-16009.
[10] FEICHTENHOFER C, FAN H Q, LI Y H, et al. Masked autoencoders as spatiotemporal learners[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 35946-35958.
[11] TONG Z, SONG Y B, WANG J, et al. VideoMAE: masked autoencoders are data-efficient learners for self-supervised video pre-training[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 10078-10093.
[12] YU X M, TANG L L, RAO Y M, et al. Point-BERT: pre-training 3D point cloud transformers with masked point modeling[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 19313-19322.
[13] PANG Y T, WANG W X, TAY F E H, et al. Masked autoencoders for point cloud self-supervised learning[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 604-621.
[14] HWANG S, YOON J, LEE Y, et al. EVEREST: efficient masked video autoencoder by removing redundant spatiotemporal tokens[EB/OL]. [2024-02-23]. https://arxiv.org/abs/ 2211.10636.
[15] SHEN Z Q, SHENG X X, FAN H H, et al. Masked spatio-temporal structure prediction for self-supervised learning on point cloud videos[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 16580-16589.
[16] SHEN Z, SHENG X, WANG L, et al. PointCMP: contrastive mask prediction for self-supervised learning on point cloud videos[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 1212-1222.
[17] 邱云飞, 王宜帆. 双分支结构的多层级三维点云补全[J]. 计算机工程与应用, 2024, 60(9): 272-282.
QIU Y F, WANG Y F. Multi-level 3D point cloud completion with dual-branch structure[J]. Computer Engineering and Applications, 2024, 60(9): 272-282.
[18] 李海旺, 周恒可, 赵兴, 等. 机载LiDAR点云数据的建筑屋顶面提取算法[J]. 计算机工程与应用, 2024, 60(11): 233-241.
LI H W, ZHOU H K, ZHAO X, et al. Algorithm for extracting building roof surfaces from airborne LiDAR point cloud data[J]. Computer Engineering and Applications, 2024, 60(11): 233-241.
[19] LI P, CAO J, YUAN L, et al. Truncated attention-aware proposal networks with multi-scale dilation for temporal action detection[J]. Pattern Recognition, 2023, 142: 109684.
[20] LI P, CAO J, YE X. Prototype contrastive learning for point-supervised temporal action detection[J]. Expert Systems with Applications, 2023, 213: 118965.
[21] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2024-02-23]. https://arxiv.org/abs/2010.11929.
[22] XIE Z D, ZHANG Z, CAO Y, et al. SimMIM: a simple framework for masked image modeling[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 9653-9663.
[23] BAO H B, DONG L, PIAO S H, et al. BEiT: BERT pre-training of image transformers[C]//Proceedings of the 10th International Conference on Learning Representations, Apr 25-29, 2022.
[24] CHEN M, RADFORD A, CHILD R, et al. Generative pretraining from pixels[C]//Proceedings of the 37th International Conference on Machine Learning, Jul 13-18, 2020: 1691-1703.
[25] QI C R, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 652-660.
[26] QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5099-5108.
[27] MATURANA D, SCHERER S. VoxNet: a 3D convolutional neural network for real-time object recognition[C]//Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2015: 922-928.
[28] ZHANG C, WAN H C, SHEN X Y, et al. PVT: point-voxel transformer for point cloud learning[J]. International Journal of Intelligent Systems, 2022, 37(12): 11985-12008.
[29] WANG Y, SUN Y B, LIU Z W, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5): 1-12.
[30] SHEN Y R, FENG C, YANG Y Q, et al. Mining point cloud local structures by kernel correlation and graph pooling[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 4548-4557.
[31] BEN-SHABAT Y, SHROUT O, GOULD S. 3DinAction: understanding human actions in 3D point clouds[EB/OL].[2024-02-23]. https://arxiv.org/abs/2303.06346.
[32] WANG H Y, YANG L, RONG X J, et al. Self-supervised 4D spatio-temporal feature learning via order prediction of sequential point cloud clips[C]//Proceedings of the 2021 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 3762-3771.
[33] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008.
[34] QIAN R, DING S R, LIU X, et al. Static and dynamic concepts for self-supervised video representation learning[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 145-164.
[35] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems 27, Montreal, Dec 8-13, 2014: 568-576.
[36] LI W, ZHANG Z, LIU Z. Action recognition based on a bag of 3D points[C]//Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2010: 9-14.
[37] FAN H H, YU X, YANG Y, et al. Deep hierarchical representation of point cloud videos via spatio-temporal decomposition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(12): 9918-9930.
[38] WEN H, LIU Y Z, HUANG J W, et al. Point primitive transformer for long-term 4D point cloud video understanding[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 19-35.
[39] CORTES C, MOHRI M, ROSTAMIZADEH A. Algorithms for learning kernels based on centered alignment[J]. The Journal of Machine Learning Research, 2012, 13(1): 795-828.
[40] LI X, HUANG Q, WANG Z, et al. SequentialPointNet: a strong frame-level parallel point cloud sequence network for 3D action recognition[EB/OL]. [2024-02-23]. https://arxiv.org/abs/2111.08492. |