Journal of Frontiers of Computer Science and Technology ›› 2025, Vol. 19 ›› Issue (10): 2769-2781.DOI: 10.3778/j.issn.1673-9418.2501038

• Artificial Intelligence·Pattern Recognition • Previous Articles     Next Articles

Multivariate Feature Extraction and Channel Feature Reconstruction Cross-Modality Person Re-identification Method

WANG Mingjie, BI Yihan, WANG Rong, LI Chong   

  1. College of Information and Cyber Security, People??s Public Security University of China, Beijing 100038, China
  • Online:2025-10-01 Published:2025-09-30

多元特征提取与通道特征重构的跨模态行人重识别方法

王铭杰,毕艺瀚,王蓉,李冲   

  1. 中国人民公安大学 信息网络安全学院,北京 100038

Abstract: To address the problem of difficult visible-infrared person re-identification matching caused by significant modal differences between visible light images and infrared images, a multivariate feature extraction and channel feature reconstruction cross-modality person re-identification method is proposed. Firstly, the dual-stream ResNeXt50 serves as the backbone network, extracting different sub-network features through channel grouping convolution. This approach mitigates the imbalance in channel numbers between the two modalities, enhances discriminative feature extraction capabilities, and reduces model complexity to prevent overfitting. Secondly, a multi-level feature reconstruction module is designed to reconstruct and fuse features from various stages in the channel dimension. The channel attention mechanism and adaptive weights are employed to emphasize key discriminative features, eliminate redundant information, and boost the identification capacity of model. Finally, a multi-dimensional feature extraction module is constructed to extract shared features across multiple modalities through parallel convolution of multiple branches. The EMA (efficient multi-scale attention module) attention mechanism is utilized to capture details and global information in the image through feature grouping and cross spatial learning methods. Effective spatial and channel features are learnt to enhance the ability of network to learn key pedestrian features in complex scenes. On the SYSU-MM01 dataset in all search mode, the proposed method achieves rank-1 and mAP scores of 75.35% and 72.37%, respectively. In the indoor search mode, the rank-1 and mAP scores reach 83.57% and 86.03%, respectively. On the RegDB dataset in the visible-infrared retrieval mode, the rank-1 and mAP scores are 93.21% and 87.09%, respectively, while in the infrared-visible retrieval mode, the rank-1 and mAP scores are 91.63% and 86.00%, respectively, demonstrating the effectiveness of the method.

Key words: visible-infrared image, person re-identification, attention mechanism, feature extraction

摘要: 针对可见光图像与红外图像模态差异大导致可见光-红外行人重识别匹配困难的问题,提出一种多元特征提取与通道特征重构的跨模态行人重识别方法。采用双流ResNeXt50作为骨干网络,通过通道分组卷积分别提取不同子网络特征,缓解两种模态通道数不平衡的问题,提升判别特征提取能力,降低模型复杂度,避免过拟合;设计多级特征重构模块,对不同阶段特征进行通道维度的重构融合,并利用通道注意力机制和自适应权重增强具有判别力的关键特征,减少冗余信息,增强模型的辨识能力;构建多元特征提取模块,通过多支路并行卷积提取多元跨模态共享特征,并利用EMA注意力机制,通过特征分组及跨空间学习方法,捕捉图像中的细节和全局信息,学习有效空间和通道特征,增强网络对复杂场景下行人关键特征的学习能力。该方法在SYSU-MM01数据集的全景搜索模式下,rank-1和mAP分别达到75.35%和72.37%,室内搜索模式下,rank-1和mAP分别达到83.57%和86.03%;在RegDB数据集的可见-红外检索模式下,rank-1和mAP分别达到93.21%和87.09%,红外-可见检索模式下,rank-1和mAP分别达到91.63%和86.00%,证明了方法的有效性。

关键词: 可见光-红外图像, 行人重识别, 注意力机制, 特征提取