面向图像复原和增强的轻量级交叉门控Transformer

doi:10.3778/j.issn.1673-9418.2301050

摘要/Abstract

摘要： 现有的图像复原和图像增强方法难以同时兼顾在多个子任务上的鲁棒性和维持较小的参数量与计算代价。针对这一问题，提出轻量级交叉门控转换算法（CGT）。一方面，总结了传统全局自注意力机制捕获全局依赖关系的局限性，将全局自注意力机制改进为跨层次交叉门控自注意力机制。同时提出轻量化的前馈神经网络，从而以极小的计算代价学习到跨层次局部依赖关系，在局部邻域内重构清晰特征。另一方面，针对传统方法对编码器和解码器平等地进行加法或拼接的操作易导致信息干扰这一缺陷，提出长距离重置更新模块，分别对无用信息与清晰特征加以抑制和更新。在图像去噪、图像去雨和低亮度图像增强3个不同任务的9个公开数据集上，与最新的25个方法进行的对比实验结果表明，所提出的轻量级交叉门控转换模型以较少的参数量和计算代价，在图像复原和图像增强领域中均取得较高的峰值信噪比和结构相似度，重构出接近真实世界场景的清晰图像，达到了先进的图像复原性能。

关键词: 图像复原, 图像增强, 深度学习, Transformer, 轻量化, 特征融合

Abstract: Recent image restoration and image enhancement methods are difficult to balance the robustness of multiple subtasks with the small number of parameters and computational costs. To solve this problem, this paper proposes a lightweight cross-gating transformer (CGT) for efficient image restoration task. On the one hand, this paper summarizes the limitations of traditional global self-attention mechanism to capture global dependencies, and improves the global self-attention mechanism to a cross-level cross-gating self-attention mechanism. Meanwhile, a lightweight feed-forward neural network is proposed to learn cross-level local dependencies at a very small computational cost and reconstruct clear features in the adjacent locality. On the other hand, in view of the defect that the traditional method of adding or concatenating encoder and decoder equally leads to information interference, a long-distance reset update module is proposed to suppress and update useless information and clear features respectively. This paper conducts extensive quantitative experiments and is compared with 25 state-of-the-art methods on 9 datasets for image denoising, image deraining and low-light image enhancement, respectively. Experimental results prove that the proposed lightweight cross-gating transformer achieves high peak signal-to-noise ratio and structural similarity in image restoration and image enhancement tasks with a small number of parameters and computation, and reconstructs clear images close to real-world scenes, achieving state-of-the-art image restoration performance.

Key words: image restoration, image enhancement, deep learning, Transformer, lightweight, feature fusion

薛金强, 吴秦. 面向图像复原和增强的轻量级交叉门控Transformer[J]. 计算机科学与探索, 2024, 18(3): 718-730.

XUE Jinqiang, WU Qin. Lightweight Cross-Gating Transformer for Image Restoration and Enhancement#br# #br#[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 718-730.

参考文献

[1] CHENG S, WANG Y, HUANG H, et al. NBNet: noise basis learning for image denoising with subspace projection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 19-25, 2021. Piscataway: IEEE, 2021: 4896-4906.
[2] CHANG M, LI Q, FENG H, et al. Spatial-adaptive network for single image denoising[C]//Proceedings of the 16th European Conference on Computer Vision, Oct 23-27, 2020.Cham: Springer, 2020: 171-187.
[3] KIM Y, SOH J W, PARK G Y, et al. Transfer learning from synthetic to real-noise denoising with adaptive instance normalization[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 13-19, 2020. Piscataway: IEEE, 2020: 3482-3492.
[4] GUO S, YAN Z, ZHANG K, et al. Toward convolutional blind denoising of real photographs[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 16-20, 2019. Piscataway: IEEE, 2019: 1712-1722.
[5] FAN C M, LIU T J, LIU K H, et al. Selective residual M-Net for real image denoising[C]//Proceedings of the 30th European Signal Processing Conference, Aug 29-Sep 2, 2022. Piscataway: IEEE, 2022: 469-473.
[6] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Oct 5-9, 2015. Cham: Springer, 2015: 234-241.
[7] ISOLA P, ZHU J, ZHOU T, et al. Image-to-image translation with conditional adversarial networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 5967-5976.
[8] ZAMIR S W, ARORA A, KHAN S, et al. Learning enriched features for real image restoration and enhancement[C]//Proceedings of the 16th European Conference on Computer Vision, Oct 23-27, 2020. Cham: Springer, 2020: 492-511.
[9] ANWAR S, BARNES N. Real image denoising with feature attention[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 3155-3164.
[10] ZAMIR S W, ARORA A, KHAN S, et al. CycleISP: real image restoration via improved data synthesis[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 2693-2702.
[11] 李明悦, 晏涛, 井花花, 等. 多尺度特征融合的低照度光场图像增强算法[J]. 计算机科学与探索, 2023, 17(8): 1904-1916.
LI M Y, YAN T, JING H H, et al. Low-light enhancement method for light field images by fusing multi-scale features[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(8): 1904-1916.
[12] 曹义亲, 饶哲初, 朱志亮, 等. 双通道四元数卷积网络去噪方法[J]. 计算机科学与探索, 2023, 17(6): 1359-1372.
CAO Y Q, RAO Z C, ZHU Z L, et al. Dual-channel quaternion convolutional network for denoising[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(6): 1359-1372.
[13] 曹义亲, 饶哲初, 朱志亮, 等. DnRFD: 用于图像去噪的递进式残差融合密集网络[J]. 计算机科学与探索, 2022, 16(12): 2841-2850.
CAO Y Q, RAO Z C, ZHU Z L, et al. DnRFD: progressive residual fusion dense network for image denoising[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(12): 2841-2850.
[14] ZAMIR S W, ARORA A, KHAN S, et al. Multi-stage progressive image restoration[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 19-25, 2021. Piscataway: IEEE, 2021:14816-14826.
[15] CHEN L, LU X, ZHANG J, et al. HINet: half instance normalization network for image restoration[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 19-25, 2021. Piscataway: IEEE, 2021: 182-192.
[16] LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 9992-10002.
[17] ASHISH V, NOAM S, NIKI P, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008.
[18] WANG Z, CUN X, BAO J, et al. Uformer: a general U-shaped transformer for image restoration[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 19-25, 2021. Piscataway: IEEE, 2021: 17683-17693.
[19] 郭银景, 马新瑞, 许越铖, 等. 水下光声图像空间配准算法研究综述[J]. 计算机工程与应用, 2023, 59(5): 14-27.
GUO Y J, MA X R, XU Y C, et al. Overview of research on spatial registration algorithms of underwater opti-acoustic images[J]. Computer Engineering and Applications, 2023,59(5): 14-27.
[20] 王凡, 赵宏伟, 刘俊博, 等. 高速铁路运行环境视频自适应去模糊方法[J]. 计算机工程与应用, 2022, 58(21): 258-263.
WANG F, ZHAO H W, LIU J B, et al. Adaptive blur removal method of operating environment video for high-speed railway[J]. Computer Engineering and Applications, 2022, 58(21): 258-263.
[21] 郭威, 张有波, 周悦, 等. 应用于水下机器人的快速深海图像复原算法[J]. 光学学报, 2022, 42(4): 61-75.
GUO W, ZHANG Y B, ZHOU Y, et al. Rapid deep-sea image restoration algorithm applied to unmanned underwater vehicles[J]. Acta Optica Sinica, 2022, 42(4): 61-75.
[22] CHO S C, JI S W, HONG J P, et al. Rethinking coarse-to-fine approach in single image deblurring[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 4621-4630.
[23] YUE Z, YONG H, ZHAO Q, et al. Variational denoising network: toward blind noise modeling and removal[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 1690-1701.
[24] YUE Z, ZHAO Q, ZHANG L, et al. Dual adversarial network: toward real-world noise removal and noise generation[C]//Proceedings of the 16th European Conference on Computer Vision, Oct 23-27, 2020. Cham: Springer, 2020: 41-58.
[25] CHO K, MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 1724-1734.
[26] ABDELHAMED A, LIN S, BROWN M S. A high-quality denoising dataset for smartphone cameras[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 1692-1700.
[27] PLOTZ T, ROTH S. Benchmarking denoising algorithms with real photographs[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition,Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 1586-1595.
[28] BYCHKOVSKY V, PARIS S, CHAN E, et al. Learning photographic global tonal adjustment with a database of input/output image pairs[C]//Proceedings of the 2011 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Colorado, Jun 20-25, 2011. Piscataway: IEEE, 2011: 97-104.
[29] WEI C, WANG W, YANG W, et al. Deep retinex decomposition for low-light enhancement[C]//Proceedings of the 2018 British Machine Vision Conference, Newcastle, Sep 3-6, 2018: 155.
[30] JIANG K, WANG Z, YI P, et al. Multi-scale progressive fusion network for single image deraining[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 8346-8355.
[31] YANG W, TAN R T, FENG J, et al. Joint rain detection and removal via iterative region dependent multi-task learning[EB/OL]. [2022-11-12]. http://arxiv.org/abs/1609.07769.
[32] ZHANG H, SINDAGI V, PATEL V M. Image de-raining using a conditional generative adversarial network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(11): 3943-3956.
[33] ZHANG H, PATEL V M. Density-aware single image de-raining using a multi-stream dense network[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018.Piscataway: IEEE, 2018: 695-704.
[34] FU X, HUANG J, ZENG D, et al. Removing rain from single images via a deep detail network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 1715-1723.
[35] JOSUE A, ADRIAN B. RENOIR—a dataset for real low-light image noise reduction[J]. Journal of Visual Communication and Image Representation, 2018, 51: 144-154.
[36] DABOV K, FOI A, KATKOVNIK V, et al. Image denoising by sparse 3-D transform-domain collaborative filtering[J]. IEEE Transactions on Image Processing, 2007, 16(8): 2080-2095.
[37] FAN C, LIU T, LIU K. Half wavelet attention on M-Net+ for low-light image enhancement[EB/OL]. [2022-11-12]. https://arxiv.org/abs/2203.01296.
[38] JIANG Y, GONG X, LIU D, et al. EnlightenGAN: deep light enhancement without paired supervision[J]. IEEE Tran-sactions on Image Processing, 2021, 30: 2340-2349.
[39] CHEN Y S, WANG Y C, KAO M H, et al. Deep photo enhancer: unpaired learning for image enhancement from photo-graphs with GANs[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Piscataway: IEEE, 2018: 6306-6314.
[40] NI Z, YANG W, WANG S, et al. Towards unsupervised deep image enhancement with generative adversarial network[J]. IEEE Transactions on Image Processing, 2020, 29: 9140-9151.
[41] CHEN Z, HUANG Y, HU Z, et al. Landscape and dynamics of single tumor and immune cells in early and advanced-stage lung adenocarcinoma[J]. Clinical and Translational Medicine, 2021, 11(3): e350.
[42] ZHANG Y, GUO X, MA J, et al. Beyond brightening low-light images[J]. International Journal of Computer Vision, 2021, 129(4): 1013-1037.
[43] WEI W, MENG D, ZHAO Q, et al. Semi-supervised transfer learning for image rain removal[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 3872-3881.
[44] YASARLA R, PATEL V M. Uncertainty guided multi-scale residual learning-using a cycle spinning CNN for single image de-raining[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 8405-8414.
[45] LI X, WU J, LIN Z, et al. Recurrent squeeze-and-excitation context aggregation net for single image deraining[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 262-277.
[46] REN D, ZUO W, HU Q, et al. Progressive image deraining networks: a better and simpler baseline[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 3937-3946.
[47] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 26-Jul 1, 2016. Piscataway: IEEE, 2016: 770-778.