计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (9): 2436-2448.DOI: 10.3778/j.issn.1673-9418.2308032

• 图形·图像 • 上一篇    下一篇

傅里叶增强的无偏跨域目标检测研究

王兵,徐裴,张兴鹏   

  1. 西南石油大学 计算机科学学院,成都 610500
  • 出版日期:2024-09-01 发布日期:2024-09-01

Research on Fourier Augmented Unbiased Cross-Domain Object Detection

WANG Bing, XU Pei, ZHANG Xingpeng   

  1. School of Computer Science, Southwest Petroleum University, Chengdu 610500, China
  • Online:2024-09-01 Published:2024-09-01

摘要: 无偏跨域目标检测的主要目的是借助知识蒸馏最大限度地利用源域的知识,通过领域自适应减小模型的跨域差距。然而,通常用于无偏跨域目标检测的平均教师方法所产生的伪标签并不可靠,从而导致师生模型间仍然存在较大的领域偏差问题。受傅里叶变换中相位信息不变性特点的启发,在平均教师的基础上提出傅里叶增强无偏协同教师模型(FAUMT)。利用傅里叶相位信息的不变性,设计振幅混合的数据增强(AMDA)模块,其可以有效地混合源域和目标域间的相位信息从而实现数据增强。而数据增强会产生额外的噪声,设计两个一致性损失来保证数据增强前后预测的一致性。此外,为平衡模型训练过程中源域和目标域间的跨域偏差,还设计了多层对抗学习(MAL)模块,旨在对不同层次的像素级别特征进行域对齐。在三个基准数据集Cilpart1K、Watercolor2K、Comic2K上,该方法的mAP分别达到了47.5%、58.9%、46.1%,超过了其他算法。

关键词: 领域自适应, 跨域目标检测, 深度神经网络, 平均教师

Abstract: The main purpose of unbiased cross-domain object detection is to utilize the knowledge of the source domain to the maximum extent through knowledge distillation, and reduce the cross-domain gap of the model through domain adaptation. However, the pseudo labels generated by the mean teacher method commonly used for unbiased cross-domain object detection are not reliable, resulting in significant domain bias issues between teacher and student models. Therefore, inspired by the invariance of phase information in Fourier transform, this paper proposes the Fourier augmentation unbiased mean teacher (FAUMT) model based on the mean teacher. This paper utilizes the invariance of Fourier phase information to design an amplitude mixing data augmentation (AMDA) module, which can effectively mix phase information between the source and target domains to achieve data augmentation. And data augmentation will generate additional noise, thus this paper designs two consistency losses to ensure the consistency of predictions before and after data augmentation. In addition, to balance the cross-domain bias between the source and target domains during model training, this paper also designs a multi-layer adversarial learning (MAL) module, with the aim of domain alignment of pixel level features at different levels. On three benchmark datasets Cilpart1K, Watercolor2K and Comic2K, the mAP of proposed method achieves 47.5%, 58.9% and 46.1%, respectively, outperforming other algorithms.

Key words: domain adaptation, cross-domain object detection, deep neural network, mean teacher