Journal of Frontiers of Computer Science and Technology ›› 2024, Vol. 18 ›› Issue (12): 3235-3246.DOI: 10.3778/j.issn.1673-9418.2404045
• Graphics·Image • Previous Articles Next Articles
HE Yundong, LI Ping, PING Chenhao
Online:
2024-12-01
Published:
2024-11-29
何允栋,李平,平晨昊
HE Yundong, LI Ping, PING Chenhao. Point Cloud Action Recognition Method Based on Masked Self-Supervised Learning[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(12): 3235-3246.
何允栋, 李平, 平晨昊. 基于掩码自监督学习的点云动作识别方法[J]. 计算机科学与探索, 2024, 18(12): 3235-3246.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2404045
[1] LIU X Y, YAN M Y, BOHG J. MeteorNet: deep learning on dynamic 3D point cloud sequences[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9246-9255. [2] FAN H H, YU X, DING Y H, et al. Point spatio-temporal convolution on point cloud sequences[C]//Proceedings of the 2021 International Conference on Learning Representations, Vienna, May 4-8, 2021. [3] WANG Y C, XIAO Y, XIONG F, et al. 3DV: 3D dynamic voxel for action recognition in depth video[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 511-520. [4] FAN H H, YANG Y, KANKANHALLI M. Point spatio-temporal transformer networks for point cloud video modeling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(2): 2181-2192. [5] 田钰琪, 刘康, 张远辉. 基于毫米波雷达点云的人体动作识别[J]. 中国计量大学学报, 2023, 34(1): 66-73. TIAN Y Q, LIU K, ZHANG Y H. Human activity recognition based on millimeter wave radar point cloud[J]. Journal of China University of Metrology, 2023, 34(1): 66-73. [6] ZHONG J X, ZHOU K, HU Q Y, et al. No pain, big gain: classify dynamic point cloud sequences with static models by fitting feature-level space-time surfaces[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 8510-8520. [7] 赵登阁, 智敏. 用于人体动作识别的多尺度时空图卷积算法[J]. 计算机科学与探索, 2023, 17(3): 719-732. ZHAO D G, ZHI M. Spatial multiple-temporal graph convolutional neural network for human action recognition[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 719-732. [8] FAN H H, YANG Y, KANKANHALLI M. Point 4D transformer networks for spatio-temporal modeling in point cloud videos[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 14204-14213. [9] HE K M, CHEN X L, XIE S N, et al. Masked autoencoders are scalable vision learners[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 16000-16009. [10] FEICHTENHOFER C, FAN H Q, LI Y H, et al. Masked autoencoders as spatiotemporal learners[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 35946-35958. [11] TONG Z, SONG Y B, WANG J, et al. VideoMAE: masked autoencoders are data-efficient learners for self-supervised video pre-training[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 10078-10093. [12] YU X M, TANG L L, RAO Y M, et al. Point-BERT: pre-training 3D point cloud transformers with masked point modeling[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 19313-19322. [13] PANG Y T, WANG W X, TAY F E H, et al. Masked autoencoders for point cloud self-supervised learning[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 604-621. [14] HWANG S, YOON J, LEE Y, et al. EVEREST: efficient masked video autoencoder by removing redundant spatiotemporal tokens[EB/OL]. [2024-02-23]. https://arxiv.org/abs/ 2211.10636. [15] SHEN Z Q, SHENG X X, FAN H H, et al. Masked spatio-temporal structure prediction for self-supervised learning on point cloud videos[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 16580-16589. [16] SHEN Z, SHENG X, WANG L, et al. PointCMP: contrastive mask prediction for self-supervised learning on point cloud videos[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 1212-1222. [17] 邱云飞, 王宜帆. 双分支结构的多层级三维点云补全[J]. 计算机工程与应用, 2024, 60(9): 272-282. QIU Y F, WANG Y F. Multi-level 3D point cloud completion with dual-branch structure[J]. Computer Engineering and Applications, 2024, 60(9): 272-282. [18] 李海旺, 周恒可, 赵兴, 等. 机载LiDAR点云数据的建筑屋顶面提取算法[J]. 计算机工程与应用, 2024, 60(11): 233-241. LI H W, ZHOU H K, ZHAO X, et al. Algorithm for extracting building roof surfaces from airborne LiDAR point cloud data[J]. Computer Engineering and Applications, 2024, 60(11): 233-241. [19] LI P, CAO J, YUAN L, et al. Truncated attention-aware proposal networks with multi-scale dilation for temporal action detection[J]. Pattern Recognition, 2023, 142: 109684. [20] LI P, CAO J, YE X. Prototype contrastive learning for point-supervised temporal action detection[J]. Expert Systems with Applications, 2023, 213: 118965. [21] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2024-02-23]. https://arxiv.org/abs/2010.11929. [22] XIE Z D, ZHANG Z, CAO Y, et al. SimMIM: a simple framework for masked image modeling[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 9653-9663. [23] BAO H B, DONG L, PIAO S H, et al. BEiT: BERT pre-training of image transformers[C]//Proceedings of the 10th International Conference on Learning Representations, Apr 25-29, 2022. [24] CHEN M, RADFORD A, CHILD R, et al. Generative pretraining from pixels[C]//Proceedings of the 37th International Conference on Machine Learning, Jul 13-18, 2020: 1691-1703. [25] QI C R, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 652-660. [26] QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5099-5108. [27] MATURANA D, SCHERER S. VoxNet: a 3D convolutional neural network for real-time object recognition[C]//Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2015: 922-928. [28] ZHANG C, WAN H C, SHEN X Y, et al. PVT: point-voxel transformer for point cloud learning[J]. International Journal of Intelligent Systems, 2022, 37(12): 11985-12008. [29] WANG Y, SUN Y B, LIU Z W, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5): 1-12. [30] SHEN Y R, FENG C, YANG Y Q, et al. Mining point cloud local structures by kernel correlation and graph pooling[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 4548-4557. [31] BEN-SHABAT Y, SHROUT O, GOULD S. 3DinAction: understanding human actions in 3D point clouds[EB/OL].[2024-02-23]. https://arxiv.org/abs/2303.06346. [32] WANG H Y, YANG L, RONG X J, et al. Self-supervised 4D spatio-temporal feature learning via order prediction of sequential point cloud clips[C]//Proceedings of the 2021 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 3762-3771. [33] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008. [34] QIAN R, DING S R, LIU X, et al. Static and dynamic concepts for self-supervised video representation learning[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 145-164. [35] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems 27, Montreal, Dec 8-13, 2014: 568-576. [36] LI W, ZHANG Z, LIU Z. Action recognition based on a bag of 3D points[C]//Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2010: 9-14. [37] FAN H H, YU X, YANG Y, et al. Deep hierarchical representation of point cloud videos via spatio-temporal decomposition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(12): 9918-9930. [38] WEN H, LIU Y Z, HUANG J W, et al. Point primitive transformer for long-term 4D point cloud video understanding[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 19-35. [39] CORTES C, MOHRI M, ROSTAMIZADEH A. Algorithms for learning kernels based on centered alignment[J]. The Journal of Machine Learning Research, 2012, 13(1): 795-828. [40] LI X, HUANG Q, WANG Z, et al. SequentialPointNet: a strong frame-level parallel point cloud sequence network for 3D action recognition[EB/OL]. [2024-02-23]. https://arxiv.org/abs/2111.08492. |
[1] | LI Mengyun, ZHANG Jing, ZHANG Huanxiang, ZHANG Xiaolin, LIU Luyao. Multimodal Sentiment Analysis Based on Cross-Modal Semantic Information Enhancement [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2476-2486. |
[2] | JIANG Youpeng, HUA Yang, SONG Xiaoning. Domain Adaptation Algorithm for 3D Human Pose Estimation with Spatial Attention and Position Optimization [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2384-2394. |
[3] | XU Zhihong, ZHANG Huibin, DONG Yongfeng, WANG Liqin, WANG Xu. Question Feature Enhanced Knowledge Tracing Model [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2466-2475. |
[4] | YUAN Heng, WANG Xiaoxue, ZHANG Shengchong. No-Reference Low-Light Image Enhancement with Enhanced Feature Map [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2449-2465. |
[5] | YE Qingwen, ZHANG Qiuju. Multi-label Image Recognition Using Channel Pixel Attention [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(8): 2109-2117. |
[6] | WANG Yonggui, CHEN Shuming, LIU Yihai, LAI Zhenxiang. Knowledge-aware Recommendation Algorithm Combining Hypergraph Contrast Learning and Relational Clustering [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(8): 2140-2155. |
[7] | ZHANG Zheng, LU Tianliang, CAO Jinxuan. Occluded Face Recognition Based on Segmentation and Multi-stage Mask Learning [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1814-1825. |
[8] | WANG Yonggui, LIU Danni. Cross-Domain Recommendation Algorithm Combining Multi-personalized Bridges and Self-supervised Learning [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1792-1805. |
[9] | WEN Wen, DENG Fengying, HAO Zhifeng, CAI Ruichu, LIANG Fangyu. Recommendation Method for Time-Sequence Point of Interest via Spatio-Temporal Vicinity Perception [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1865-1878. |
[10] | WANG Guokai, ZHANG Xiang, WANG Shunfang. Multi-scale and Boundary Fusion Network for Skin Lesion Regions Segmentation [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1826-1837. |
[11] | HAN Han, HUANG Xunhua, CHANG Huihui, FAN Haoyi, CHEN Peng, CHEN Jijia. Review of Self-supervised Learning Methods in Field of ECG [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1683-1704. |
[12] | XIA Qingfeng, XU Ke'er, LI Mingyang, HU Kai, SONG Lipeng, SONG Zhiqiang, SUN Ning. Review of Attention Mechanisms in Reinforcement Learning [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(6): 1457-1475. |
[13] | YANG Li, ZHONG Junhong, ZHANG Yun, SONG Xinyu. Temporal Multimodal Sentiment Analysis with Composite Cross Modal Interaction Network [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(5): 1318-1327. |
[14] | WANG Xiang, MAO Li, CHEN Qidong, SUN Jun. Sentiment Analysis Combining Dynamic Gradient and Multi-view Co-attention [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(5): 1328-1338. |
[15] | ZHANG Yusong, XIA Hongbin, LIU Yuan. Self-supervised Hybrid Graph Neural Network for Session-Based Recommendation [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(4): 1021-1031. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/