多模态轨迹预测：低秩近似与金字塔特征结合

doi:10.3778/j.issn.1673-9418.2410071

摘要/Abstract

摘要： 捕捉高维社会活动和趋势特征对准确预测智能体的可行未来行为至关重要。为应对这一复杂性，已有研究通过参数曲线拟合降低输入变量的维度，以捕获更有用的信息；而另一些研究则采用递归或同步方式推断未来轨迹。然而，这些方法存在一些不足之处：单一的平滑曲线难以有效拟合社会动态，递归策略易导致累计误差，而同步策略则忽略了未来步骤之间的约束，进而使运动学上的预测变得不可行。为了解决这些问题，提出了一种结合奇异值分解和时间序列特征金字塔网络的方法，旨在降维和提取趋势特征，以去除冗余信息。该方法采用基于奇异值分解的特征空间替代传统的欧几里德空间，以在该空间内模拟不同模型的多模态预测。从底层到最上层逐步融合不同深度趋势特征的预测结果，并通过全局到局部的递归轨迹预测生成方法生成最终预测结果。该递归轨迹生成方法使用不同粒度的插值技术，将全局信息与每次迭代区域的头尾部信息相结合，持续生成每个区域的中间步骤位置信息。大量实验证明，所提出的通用轨迹预测框架显著提高了现有轨迹模型在公共基准上的预测精度和可靠性。

关键词: 奇异值分解, 特征金字塔网络, 递归轨迹生成器, 特征提取, 多模态, 轨迹预测

Abstract: Capturing high-dimensional social interactions and trend features is essential for accurately predicting the feasible future behaviors of agents. To address this complexity, previous research has reduced the dimensionality of input variables through parametric curve fitting to capture more useful information, while other studies have inferred future trajectories using recursive or synchronous methods. However, these methods have limitations: a single smooth curve struggles to effectively fit social dynamics, recursive strategies can lead to cumulative errors, and synchronous strategies overlook constraints between future steps, rendering kinematic predictions infeasible. To overcome these challenges, a method combining singular value decomposition and feature pyramid networks is proposed to reduce dimensionality and extract trend features, eliminating redundant information. The proposed method replaces the traditional Euclidean space with a feature space based on singular value decomposition to better model multimodal predictions across different models. Results from various depths of trend feature predictions are progressively fused from the bottom layer to the top layer, generating final predictions through a global-to-local recursive trajectory prediction method. This recursive method employs interpolation techniques of varying granularity, integrating global information with the boundary information from each iteration??s region, continuously generating intermediate positional information for each area. Extensive experiments demonstrate that the proposed universal trajectory prediction framework significantly enhances the prediction accuracy and reliability of existing trajectory models on public benchmarks.

Key words: singular value decomposition, feature pyramid network, recursive trajectory generator, feature extraction, multimodal, trajectory prediction

刘桂红, 翟倬玉, 张霄雁, 冷强奎. 多模态轨迹预测：低秩近似与金字塔特征结合[J]. 计算机科学与探索, 2025, 19(12): 3380-3394.

LIU Guihong, ZHAI Zhuoyu, ZHANG Xiaoyan, LENG Qiangkui. Multimodal Trajectory Prediction with Low-Rank Approximation and Pyramid Features[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(12): 3380-3394.

参考文献

[1] ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: human trajectory prediction in crowded spaces[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 961-971.
[2] GUPTA A, JOHNSON J, LI F F, et al. Social GAN: socially acceptable trajectories with generative adversarial networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 2255-2264.
[3] KOSARAJU V, SADEGHIAN A, MARTíN-MARTíN R, et al. Social-BiGAT: multimodal trajectory forecasting using bicycle-GAN and graph attention networks[EB/OL]. [2024-08-19]. https://arxiv.org/abs/1907.03395.
[4] XU K, HU W H, LESKOVEC J, et al. How powerful are graph neural networks?[EB/OL]. [2024-08-19]. https://arxiv. org/abs/1810.00826.
[5] GIULIARI F, HASAN I, CRISTANI M, et al. Transformer networks for trajectory forecasting[C]//Proceedings of the 25th International Conference on Pattern Recognition. Piscataway: IEEE, 2021: 10335-10342.
[6] YU C J, MA X, REN J W, et al. Spatio-temporal graph transformer networks for pedestrian trajectory prediction[C]//Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 507-523.
[7] YUN S, JEONG M, YOO S, et al. Graph Transformer networks: learning meta-path graphs to improve GNNs[J]. Neural Networks, 2022, 153: 104-119.
[8] PEARSON K. On lines and planes of closest fit to systems of points in space[J]. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1901, 2(11): 559-572.
[9] JOLLIFFE I T. Principal component analysis[M]. 2nd ed. New York: Springer, 2002.
[10] VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9:2579-2605.
[11] TENENBAUM J B, DE SILVA V, LANGFORD J C. A global geometric framework for nonlinear dimensionality reduction[J]. Science, 2000, 290(5500): 2319-2323.
[12] GOLUB G H, REINSCH C. Singular value decomposition and least squares solutions[M]//Handbook for automatic computation. Berlin, Heidelberg: Springer, 1971: 134-151.
[13] NIE Y Q, NGUYEN N H, SINTHONG P, et al. A time series is worth 64 words: long-term forecasting with transformers[EB/OL]. [2024-08-19]. https://arxiv.org/abs/2211.14730.
[14] WU H X, XU J H, WANG J M, et al. Autoformer: decomposition transformers with auto-correlation for long-term series forecasting[C]//Advances in Neural Information Processing Systems 34, 2021: 22419-22430.
[15] ZHOU T, MA Z Q, WEN Q S, et al. FEDformer: frequency enhanced decomposed transformer for long-term series forecasting[EB/OL]. [2024-08-19]. https://arxiv.org/abs/2201.12740.
[16] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944.
[17] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[18] HUG R, BECKER S, HüBNER W, et al. Bézier curve Gaussian processes[EB/OL]. [2024-08-19]. https://arxiv.org/abs/2205.01754.
[19] HUG R, HüBNER W, ARENS M. Introducing probabilistic Bézier curves for N-step sequence prediction[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(6): 10162-10169.
[20] JAZAYERI M S, JAHANGIRI A. Utilizing B-spline curves and neural networks for vehicle trajectory prediction in an inverse reinforcement learning framework[J]. Journal of Sensor and Actuator Networks, 2022, 11(1): 14.
[21] RAHIMI A, RECHT B. Random features for large-scale kernel machines[C]//Advances in Neural Information Processing Systems 20, 2007: 1177-1184.
[22] BROOMHEAD D S, LOWE D. Multivariable functional interpolation and adaptive networks[J]. Complex Systems, 1988, 2(3): 321-355.
[23] BUHMANN M D. Radial basis functions: theory and implementations[M]. Cambridge: Cambridge University Press, 2003: 208-209.
[24] SHI L S, WANG L, LONG C J, et al. SGCN: sparse graph convolution network for pedestrian trajectory prediction[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 8994-9003.
[25] LI S J, ZHOU Y Y, YI J H, et al. Spatial-temporal consistency network for low-latency trajectory forecasting[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 1920-1929.
[26] ZHOU Z K, WANG J P, LI Y, et al. Query-centric trajectory prediction[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 17863-17873.
[27] BAE I, JEON H G. A set of control points conditioned pedestrian trajectory prediction[C]//Proceedings of the 37th AAAI Conference on Artificial Intelligence and the 35th Conference on Innovative Applications of Artificial Intelligence and the 13th Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI, 2023: 6155-6165.
[28] AYDEMIR G, AKAN A K, GüNEY F. ADAPT: efficient multi-agent trajectory prediction with adaptation[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 8261-8271.
[29] GU J R, SUN C, ZHAO H. DenseTNT: end-to-end trajectory prediction from dense goal sets[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 15283-15292.
[30] MARCHETTI F, BECATTINI F, SEIDENARI L, et al. SMEMO: social memory for trajectory forecasting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(6): 4410-4425.
[31] ZHAO H, WILDES R P. Where are you heading? Dynamic trajectory prediction with expert goal examples[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 7609-7618.
[32] NAVARRO I, OH J. Social-PatteRNN: socially-aware trajectory prediction guided by motion patterns[C]//Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2022: 9859-9864.
[33] JIA X, CHEN L, WU P, et al. Towards capturing the temporal dynamics for trajectory prediction: a coarse-to-fine approach[C]//Proceedings of the 2022 Conference on Robot Learning, 2023: 910-920.
[34] PELLEGRINI S, ESS A, SCHINDLER K, et al. You??ll never walk alone: modeling social behavior for multi-target tracking[C]//Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. Piscataway: IEEE, 2009: 261-268.
[35] LERNER A, CHRYSANTHOU Y, LISCHINSKI D. Crowds by example[J]. Computer Graphics Forum, 2007, 26(3): 655-664.
[36] CAESAR H, BANKITI V, LANG A H, et al. nuScenes: a multimodal dataset for autonomous driving[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11618-11628.
[37] MOHAMED A, QIAN K, ELHOSEINY M, et al. Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 14412-14420.
[38] MANGALAM K, GIRASE H, AGARWAL S, et al. It is not the journey but the destination: endpoint conditioned trajectory prediction[C]//Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 759-776.
[39] YUAN Y, WENG X S, OU Y L, et al. AgentFormer: agent-aware transformers for socio-temporal multi-agent forecasting[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9793-9803.
[40] BAE I, JEON H G. Disentangled multi-relational graph convolutional network for pedestrian trajectory prediction[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(2): 911-919.
[41] XU C X, TAN R T, TAN Y H, et al. EqMotion: equivariant multi-agent motion prediction with invariant interaction reasoning[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 1410-1420.
[42] MAO W B, XU C X, ZHU Q, et al. Leapfrog diffusion model for stochastic trajectory prediction[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 5517-5526.
[43] GIRGIS R, GOLEMO F, CODEVILLA F, et al. Latent variable sequential set transformers for joint multi-agent motion prediction[EB/OL]. [2024-08-20]. https://arxiv.org/abs/2104.00563.
[44] MOHAMED A, ZHU D Y, VU W, et al. Social-implicit: rethinking trajectory prediction evaluation and the effectiveness of implicit maximum likelihood estimation[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 463-479.
[45] PANG B, ZHAO T Y, XIE X, et al. Trajectory prediction with latent belief energy-based model[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 11814-11824.