Journal of Frontiers of Computer Science and Technology

• Science Researches •     Next Articles

The Occluded Person Re-Identification Method based on Transformer with Global- Local Fused Features

WANG Xu,  HU Xiaoguang,  FU Zheyu,  ZHAO Lixin   

  1. 1.School of Investigation, People’s Public Security University of China, Beijing 100038, China
    2. School of Information Network Security, People’s Public Security University of China, Beijing 100038, China

基于Transformer的全局-局部融合特征的遮挡行人重识别方法

汪旭,胡晓光,付哲宇,赵利欣   

  1. 1.中国人民公安大学 侦查学院,北京 100038
    2.中国人民公安大学 信网学院,北京 100038

Abstract: Person Re-Identification (ReID) is a technology that utilizes artificial intelligence to solve public safety application issues such as station security checks and urban surveillance systems, with the capability to identify specific individuals from images collected across different devices. However, in problems like person re-identification, factors such as deliberate occlusion of pedestrians or occlusions caused by complex scene environments significantly increase the difficulty of person re-identification. In most existing occluded person re-identification methods, Convolutional Neural Network (CNN) models tend to focus more on local features but struggle to obtain global structural information, while Transformer network models excel at modeling long-range feature dependencies but often overlook the details of local features. To address these challenges, this paper proposes an occluded person re-identification method based on global-local fusion features. This method leverages the characteristics of CNN and Transformer feature learning networks to enrich pedestrian local features while enhancing the global representation ability of features. The proposed model consists of three parts: the CNN network primarily extracts local detail features, while the Transformer branch focuses on extracting global feature mjninformation. A cross-dimensional multi-scale pooling fusion module calculates the correlation between the features of the two branches, thereby achieving global-local feature fusion. A mask generation module guided by multi-level attention can accurately highlight key features in pedestrian images, automatically align pedestrian feature information, and suppress interference from occluded parts or background noise. An image high and low frequency feature enhancement module strengthens the high and low frequency feature information of occluded pedestrians, highlighting effective information. The ablation experiments and experimental results on relevant datasets prove the effectiveness of the proposed method.

Key words: Global, Local, Cross-dimensional Multi-scale Pooling Fusion, Multi-level Attention, High and Low Frequency Features

摘要: 行人重识别(re-identification, ReID)是利用人工智能解决车站安检、城市监控系统等公共安全应用问题的技术,具有从跨设备采集的图像中识别某一特定行人的能力。但是在行人重识别等问题中,往往会出现行人被刻意遮挡或被复杂场景环境遮挡等因素,这大大提高了行人重识别的难度,在目前所提出的大部分遮挡行人重识别方法中,卷积神经网络模型更加关注局部特征、但难以获得全局结构信息,Transformer网络模型建模长距离的特征依赖、但易忽略局部特征细节,为解决这些难题,本文提出了一种全局-局部融合特征的遮挡行人重识别方法,利用CNN和Transformer特征学习网络的特点,在丰富行人局部特征的同时提升特征的全局表达能力,该模型由三个部分组成:CNN网络主要提取局部细节特征的同时,Transformer分支侧重提取全局特征信息,并通过跨纬度多尺度池化融合模块计算上述两个分支特征的相关性,进而实现全局—局部的特征融合; 由多层级注意力引导生成的掩码模块能够精准地突出行人图像中的关键特征,自动对齐行人特征信息,抑制遮挡部分或背景噪声的干扰;图像高低频特征增强模块强化被遮挡行人的高低频特征信息,突出有效信息。本文对所提出的方法的消融实验以及在相关数据集上的实验结果证明了所提方法的有效性。

关键词: 全局, 局部, 跨纬度多尺度池化融合, 多层级注意力, 高低频特征