随机通道扰动的图像数据增强方法

doi:10.3778/j.issn.1673-9418.2311022

摘要/Abstract

摘要： 数据增强中遮挡仿真方法将输入图像随机裁剪的区域像素全部置零，会擦除有效纹理特征，导致网络泛化能力欠佳。因此，提出一种随机通道扰动的图像数据增强方法（ChannelCut）。ChannelCut方法包括ChannelCut1和ChannelCut2两种方法。在输入图像上随机选取三个方形区域，并且对输入图像进行通道分离，得到三个通道图像；ChannelCut1方法在三个通道图像上分别选取一个方形区域将其像素置零，且三个通道选择的区域互不相同；ChannelCut2方法保留ChannelCut1方法中选取的方形区域像素，并将每个通道中剩余两个方形区域的像素置零；将两种方法处理后的三个通道图像分别进行合并，得到两种随机通道扰动图像。将所提方法融合到Resnet18、ShuffleNet V2、MobileNet V3等CNN模型中，并在CIFAR-10、Imagenette等五个数据集上开展实验。该方法在五个数据集上的分类准确率均优于主流方法，显著提高了基线模型的性能；在细粒度图像分类中更占有优势；在时间性能上优于使用强化学习的自动数据增强类型方法。该方法能够不同程度地保留图像纹理特征，丰富图像多样性，具有较强的通用性和有效性，显著地提高卷积神经网络模型的鲁棒性和泛化性。

关键词: 数据增强, 遮挡仿真, 通道扰动, 纹理特征, 图像分类

Abstract: The simulation of object occlusion strategies in data augmentation sets all the pixels in the randomly cropped region of the input image to zero, which erases the effective texture features and leads to poor network generalization. Therefore, this paper proposes a novel data augmentation method known as the “ChannelCut” method. The “ChannelCut” includes two methods: ChannelCut1 and ChannelCut2. Firstly, three square regions are randomly selected on the input image, and the channels of the input image are split to three channel images. Secondly, the ChannelCut1 method selects a square region on the three channel images respectively. The pixels selected by the three channels are different from each other and are set to zero. At the same time, the ChannelCut2 method retains the pixels of the square area selected on each channel in the ChannelCut1 method, and the pixels of the other two square areas corresponding to the channel are set to zero. Finally, the two methods merge the three channel images together to obtain two random channel perturbed images. The proposed method is fused into CNN models such as Resnet18, ShuffleNet V2, MobileNet V3 and experiments are carried out on five datasets such as CIFAR-10 and Image-nette. The results show that the proposed method has a better classification accuracy than the mainstream method on five datasets. Furthermore, the baseline performance has shown a significant improvement. The proposed method has advantages in fine-grained image classification and outperforms the automatic data enhancement type method that uses reinforcement learning in terms of time performance. The ChannelCut method has strong generality and effectiveness, can retain image texture features to different degrees, and enrich image diversity, significantly improving the robustness and generalization of the convolutional neural network model.

Key words: data augmentation, occlusion simulation, channel perturbation, texture features, image classification

姜文涛, 刘玉薇, 张晟翀. 随机通道扰动的图像数据增强方法[J]. 计算机科学与探索, 2024, 18(11): 2980-2995.

JIANG Wentao, LIU Yuwei, ZHANG Shengchong. Image Data Augmentation Method for Random Channel Perturbation[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(11): 2980-2995.

参考文献

[1] 张海涛, 柴思敏. 改进双分支胶囊网络的高光谱图像分类[J]. 计算机科学与探索, 2022, 16(10): 2405-2414.
ZHANG H T, CHAI S M. Improved two-branch capsule network for hyperspectral image classification[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(10): 2405-2414.
[2] GAO S Q. A research on traditional tangka image classification based on visual features[C]//Proceedings of the 2023 4th International Conference on Computer Vision, Image and Deep Learning. Piscataway: IEEE, 2023: 13-16.
[3] GAO Q J, QUAN Z, LI P D, et al. Multiple baggage identification algorithm based on point cloud density clustering[C]//Proceedings of the 2018 IEEE 8th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems. Piscataway: IEEE, 2018: 546-551.
[4] 葛轶洲, 许翔, 杨锁荣, 等. 序列数据的数据增强方法综述[J]. 计算机科学与探索, 2021, 15(7): 1207-1219.
GE Y Z, XU X, YANG S R, et al. Survey on sequence data augmentation[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1207-1219.
[5] YANG S R, XIAO W K, ZHANG M C, et al. Image data augmentation for deep learning: a survey[EB/OL]. [2023-09-02]. https://arxiv.org/abs/2204.08610.
[6] MUTHUMARI M, BHUVANESWARI C A, BABU J E N S K, et al. Data augmentation model for audio signal extraction[C]//Proceedings of the 2022 3rd International Conference on Electronics and Sustainable Communication Systems. Piscataway: IEEE, 2022: 334-340.
[7] 孙书魁, 范菁, 孙中强, 等. 基于深度学习的图像数据增强研究综述[J]. 计算机科学, 2024, 51(1): 150-167.
SUN S K, FAN J, SUN Z Q, et al. Survey of image data augmentation techniques based on deep learning[J]. Computer Science, 2024, 51(1): 150-167.
[8] 姜文涛, 陈霖霖, 张晟翀. 正态随机仿射变换的图像数据增强方法[J/OL]. 计算机工程与应用 [2023-11-04]. http://kns.cnki.net/kcms/detail/11.2127.TP.20231008.1642.006.html.
JIANG W T, CHEN L L, ZHANG S C. Image data augmentation method for normal random affine transformation[J/OL]. Computer Engineering and Applications [2023-11-04]. http://kns.cnki.net/kcms/detail/11.2127.TP.20231008.1642.006.html.
[9] CHOI H K, CHOI J, KIM H J, et al. TokenMixup: efficient attention-guided token-level data augmentation for transformers[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 14224-14235.
[10] HONG M, CHOI J, KIM G. StyleMix: separating content and style for enhanced data augmentation[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 14862-14870.
[11] YANG H, ZHOU Y. IDA-GAN: a novel imbalanced data augmentation GAN[C]//Proceedings of the 25th International Conference on Pattern Recognition. Piscataway: IEEE, 2021: 8299-8305.
[12] 罗亚威, 于俊清. 可微风格搜索: 一种在线自动数据增强方法[J]. 计算机辅助设计与图形学学报, 2023, 35(4): 553-561.
LUO Y W, YU J Q. Differentiable style search: an online automatic data augmentation method[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(4): 553-561.
[13] 朱光辉, 陈文忠, 朱振南, 等. 基于自引导进化策略的高效自动化数据增强算法[J]. 软件学报, 2024, 35(6): 3013-3035.
ZHU G H, CHEN W Z, ZHU Z N, et al. Efficient automated data augmentation algorithm based on self-guided evolution strategy[J]. Journal of Software, 2024, 35(6): 3013-3035.
[14] CUBUK E D, ZOPH B, SHLENS J, et al. RandAugment: practical automated data augmentation with a reduced search space[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3008-3017.
[15] ZHONG Z, ZHENG L, KANG G, et al. Random erasing data augmentation[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 13001-13008.
[16] DEVRIES T, TAYLOR G W. Improved regularization of convolutional neural networks with Cutout[EB/OL]. [2023-09-02]. https://arxiv.org/abs/1708.04552.
[17] SINGH K K, YU H, SARMASI A, et al. Hide-and-seek: a data augmentation technique for weakly-supervised localization and beyond[EB/OL]. [2023-09-02]. https://arxiv.org/abs/1811.02545.
[18] CHEN P, LIU S, ZHAO H, et al. GridMask data augmentation[EB/OL]. [2023-09-02]. https://arxiv.org/abs/2001.04086.
[19] LI P, LI X, LONG X, et al. FenceMask: a data augmentation approach for pre-extracted image features[EB/OL]. [2023-09-02]. https://arxiv.org/abs/2006.07877.
[20] 曾武, 朱恒亮, 邢树礼, 等. 显著性检测引导的图像数据增强方法[J]. 图学学报, 2023, 44(2): 260-270.
ZENG W, ZHU H L, XING S L, et al. Saliency detection-guided for image data augmentation[J]. Journal of Graphics, 2023, 44(2): 260-270.
[21] KRIZHEVSKY A, HINTON G. Learning multiple layers of features from tiny images[J]. Handbook of Systemic Autoimmune Diseases, 2009, 1(4).
[22] NETZER Y, WANG T, COATES A, et al. Reading digits in natural images with unsupervised feature learning[C]//Proceedings of the 2011 Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, Granda, Dec 12-17, 2011: 4.
[23] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition.Washington: IEEE Computer Society, 2009: 248-255.
[24] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington:IEEE Computer Society, 2016: 770-778.
[25] HE K, ZHANG X, REN S, et al. Identity mappings in deep residual networks[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 630-645.
[26] MA N, ZHANG X, ZHENG H T. ShuffleNet V2: practical guidelines for efficient CNN architecture design[C]//Procee-dings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 116-131.
[27] KAVYASHREE P S P, EI-SHARKAWY M. Compressed MobileNet V3: a light weight variant for resource-constrained platforms[C]//Proceedings of the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference.Piscataway: IEEE, 2021: 104-107.
[28] LIU L, JIANG H, HE P, et al. On the variance of the adaptive learning rate and beyond[EB/OL]. [2023-09-02]. https://arxiv.org/abs/1908.03265.
[29] SHORTEN C, KHOSHGOFTAAR T M. A survey on image data augmentation for deep learning[J]. Journal of Big Data, 2019, 6(1): 1-48.
[30] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2017: 618-626.