计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (8): 2109-2117.DOI: 10.3778/j.issn.1673-9418.2307087

• 图形·图像 • 上一篇    下一篇

采用通道像素注意力的多标签图像识别

叶庆文,张秋菊   

  1. 1. 江南大学 机械工程学院,江苏 无锡 214122
    2. 江苏省食品先进制造装备与技术重点实验室,江苏 无锡 214122
  • 出版日期:2024-08-01 发布日期:2024-07-29

Multi-label Image Recognition Using Channel Pixel Attention

YE Qingwen, ZHANG Qiuju   

  1. 1. School of Mechanical Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
    2. Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment & Technology, Wuxi, Jiangsu 214122, China
  • Online:2024-08-01 Published:2024-07-29

摘要: 多标签图像识别是对包含多个对象类别标签的图像进行预测分类。为了解决多标签图像识别中存在的小对象识别困难和样本数据不平衡问题,分别提出了简单高效的通道像素注意力(CPA)和类权重交叉熵损失。CPA通过计算通道注意力和像素注意力得分来为每个通道生成对应的像素特征,以提升网络对小对象的注意力,将进行池化和增益后的像素特征输入到多层感知机中用于最终的分类预测;引入数据集中的正样本数量分布作为经典的交叉熵(CE)损失函数的权重,以提升模型对样本数量少的对象特征的关注。在公开多标签图像数据集VOC 2007、MS-COCO 2014和VAW上进行对比实验,所提出的方法相较于其他现有的先进方法在平均精度均值(mAP)上分别提高了0.2个百分点、0.7个百分点和0.9个百分点。针对MS-COCO 2014和VAW数据集,类权重交叉熵损失在不增加任何计算成本的情况下,相较于常用的交叉熵损失在mAP上分别提高了0.6个百分点和1.6个百分点,验证了所提方法的先进性和有效性。

关键词: 深度学习, 图像识别, 注意力机制, 多标签分类, 损失函数

Abstract: Multi-label image recognition is the classification of images that contain labels for multiple object categories. In order to solve the problems of small object recognition and sample data imbalance in multi-label image recognition, this paper proposes simple and efficient channel pixel attention (CPA) and class weight cross-entropy loss, respectively. CPA generates the corresponding pixel features for each channel by calculating channel attention and pixel attention score, so as to improve the attention of the network to small objects, and input the pooled and gained pixel features to the multi-layer perceptron for final classification. The positive sample size distribution in the dataset is introduced as the weight of cross-entropy (CE) loss to enhance the attention to objects with small sample size. Experiments are conducted on the public datasets of  VOC 2007 (PASCAL VOC challenge 2007), MS-COCO (micro-soft common objects in context) 2014 and VAW (visual attribute prediction in the wild). The results show that the proposed method improves the mean average precision (mAP) by 0.2 percentage points, 0.7 percentage points and 0.9 percentage points compared with other existing advanced methods, respectively. For the MS-COCO 2014 and VAW datasets, the class-weighted cross-entropy loss improves 0.6 percentage points and 1.6 percentage points on mAP compared with the commonly used cross-entropy loss without adding any computational cost, which verifies the sophistication and effectiveness of the proposed method.

Key words: deep learning, image recognition, attention mechanism, multi-label classification, loss function