Positional Enhancement TransUnet for Medical Image Segmentation

doi:10.3778/j.issn.1673-9418.2406001

Abstract

Abstract: Medical image segmentation can assist doctors to quickly and accurately identify organs and lesions in medical images, which is of great value in improving the efficiency of clinical diagnosis. U-Net combined with Transformer is the mainstream method in the field of medical image segmentation. However, Transformer has weak ability to extract local information, and the U-Net structure will lose detailed location information during upsampling and downsampling. To address the above problems, this paper proposes a TransUnet medical image segmentation network with enhanced position information, PETransUnet. The network first uses the positional efficient attention block (PEA) to enhance the position information of features. Secondly, the dual attention bridge block (DAB) is used to make up for the semantic gap between the features in the encoding stage and the decoding stage. Finally, the cross-channel attention fusion block (CCAF) is used to reduce the position information lost during upsampling. The proposed method is validated on the publicly available Synapse dataset, achieving Dice coefficient of 82.92% and HD95 coefficient of 18.87%. On the ACDC dataset, a Dice coefficient of 90.73% is attained. On the LITS17 dataset, the Dice coefficients for liver and liver tumor segmentation are 94.85% and 74.47%, respectively. Comparative analysis with recent algorithms shows higher segmentation accuracy.

Key words: medical image segmentation, Transformer, feature fusion, position encoding

摘要： 医学图像分割能够辅助医生快速准确地识别医学图像中的器官和病变部位，对提高临床诊断的效率有重要的价值。结合Transformer的U-Net是当前医学图像分割领域的主流方法，但是Transformer对于局部信息的提取能力较弱，并且U-Net结构在上采样和下采样过程中会损失细节位置信息。针对以上问题，提出一种位置信息增强的TransUnet医学图像分割网络PETransUnet。该网络使用位置高效注意力模块（PEA）对特征的位置信息进行增强；使用双注意力桥模块（DAB），弥补编码阶段和解码阶段特征之间的语义差距；使用跨通道注意力融合模块（CCAF）减少上采样时丢失的位置信息。提出的方法在公开数据集Synapse上进行验证，Dice系数和HD95系数分别达到82.92%和18.87%；在公开数据集ACDC上进行验证，Dice系数达到90.73%；在公开数据集LITS17上进行验证，肝脏和肝肿瘤Dice系数分别达到94.85%和74.47%。与近期多种算法进行比较，具有更高的分割精度。

关键词: 医学图像分割, Transformer, 特征融合, 位置编码

ZHAO Liang, LIU Chen, WANG Chunyan. Positional Enhancement TransUnet for Medical Image Segmentation[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(4): 976-988.

赵亮, 刘晨, 王春艳. 位置信息增强的TransUnet医学图像分割方法[J]. 计算机科学与探索, 2025, 19(4): 976-988.

References

[1] DENG Z H, YANG S H, ZHANG X D, et al. Advancements and innovations in U-Net for enhanced medical image segmentation: a review[C]//Proceedings of the 2023 8th International Conference on Mechanical Engineering and Robotics Research. Piscataway: IEEE, 2023: 36-45.
[2] 张倩, 胡建文, 王鼎湘, 等. 融合注意力与Transformer的肝肿瘤CT图像分割方法[J/OL]. 小型微型计算机系统 [2024-02-10]. http://kns.cnki.net/kcms/detail/21.1106.TP.20240205. 1825.004.html.
ZHANG Q, HU J W, WANG D X, et al. Integrated attention mechanism and Transformer for liver tumor segmentation method in CT images[J/OL]. Journal of Chinese Computer Systems [2024-02-10]. http://kns.cnki.net/kcms/detail/21.1106. TP.20240205.1825.004.html.
[3] XIE Y T, YANG B, GUAN Q B, et al. Attention mechanisms in medical image segmentation: a survey[EB/OL]. [2024-03-18]. https://arxiv.org/abs/2305.17937.
[4] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[5] 崔珂, 田启川, 廉露. 基于U-Net变体的医学图像分割算法综述[J]. 计算机工程与应用, 2024, 60(11): 32-49.
CUI K, TIAN Q C, LIAN L. Review of medical image segmentation algorithms based on U-Net variants[J]. Computer Engineering and Applications, 2024, 60(11): 32-49.
[6] ZHOU Z W, RAHMAN SIDDIQUEE M M, TAJBAKHSH N, et al. UNet++: a nested U-Net architecture for medical image segmentation[C]//Proceedings of the 4th International Workshop on Deep Learning in Medical Image Analysis and the 8th International Workshop on Multimodal Learning for Clinical Decision Support. Cham: Springer, 2018: 3-11.
[7] XIAO X, LIAN S, LUO Z M, et al. Weighted Res-UNet for high-quality retina vessel segmentation[C]//Proceedings of the 2018 9th International Conference on Information Technology in Medicine and Education. Piscataway: IEEE, 2018: 327-331.
[8] HUANG H M, LIN L F, TONG R F, et al. UNet 3+: a full-scale connected UNet for medical image segmentation[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2020: 1055-1059.
[9] CHEN J N, LU Y Y, YU Q H, et al. TransUNet: transformers make strong encoders for medical image segmentation[EB/OL]. [2024-03-18]. https://arxiv.org/abs/2102.04306.
[10] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, 2017: 5998-6008.
[11] PENG Z L, HUANG W, GU S Z, et al. Conformer: local features coupling global representations for visual recognition [C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 357-366.
[12] 蒋婷, 李晓宁. 采用多尺度视觉注意力分割腹部CT和心脏MR图像[J]. 中国图象图形学报, 2024, 29(1): 268-279.
JIANG T, LI X N. Segmentation of abdominal CT and cardiac MR images with multi scale visual attention[J]. Journal of Image and Graphics, 2024, 29(1): 268-279.
[13] CAO H, WANG Y Y, CHEN J, et al. Swin-Unet: Unet-like pure transformer for medical image segmentation[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 205-218.
[14] AZAD R, ARIMOND R, AGHDAM E K, et al. DAE-former: dual attention-guided efficient transformer for medical image segmentation[C]//Proceedings of the 2023 International Workshop on Predictive Intelligence in Medicine. Cham: Springer, 2023: 83-95.
[15] YUAN F N, ZHANG Z X, FANG Z J. An effective CNN and Transformer complementary network for medical image segmentation[J]. Pattern Recognition, 2023, 136: 109228.
[16] HUANG X H, DENG Z F, LI D D, et al. MISSFormer: an effective medical image segmentation transformer[EB/OL]. [2024-03-18]. https://arxiv.org/abs/2109.07162.
[17] HEIDARI M, KAZEROUNI A, SOLTANY M, et al. HiFormer: hierarchical multi-scale representations using transformers for medical image segmentation[C]//Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2023: 6191-6201.
[18] GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: a survey[J]. Computational Visual Media, 2022, 8(3): 331-368.
[19] OKTAY O, SCHLEMPER J, FOLGOC L L, et al. Attention U-Net: learning where to look for the pancreas[EB/OL]. [2024-03-18]. https://arxiv.org/abs/1804.03999.
[20] WANG H N, CAO P, WANG J Q, et al. UCTransNet: rethinking the skip connections in U-Net from a channel-wise perspective with transformer[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(3): 2441-2449.
[21] ATES G C, MOHAN P, CELIK E. Dual cross-attention for medical image segmentation[J]. Engineering Applications of Artificial Intelligence, 2023, 126: 107139.
[22] RAHMAN M M, MARCULESCU R. Medical image segmentation via cascaded attention decoding[C]//Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2023: 6211-6220.
[23] SU J L, AHMED M, LU Y, et al. RoFormer: enhanced transformer with rotary position embedding[J]. Neurocomputing, 2024, 568: 127063.
[24] XU W J, XU Y F, CHANG T, et al. Co-scale conv-attentional image transformers[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9961-9970.
[25] WU K, PENG H, CHEN M, et al. Rethinking and improving relative position encoding for vision transformer[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 10033-10041.
[26] CAI Y T, WANG Y. MA-Unet: an improved version of Unet based on multi-scale and attention mechanism for medical image segmentation[C]//Proceedings of the 3rd International Conference on Electronics and Communication; Network and Computer Technology, 2022: 205-211.
[27] AZAD R, JIA Y W, AGHDAM E K, et al. Enhancing medical image segmentation with TransCeption: a multi-scale feature fusion approach[EB/OL]. [2024-04-09]. https://arxiv.org/abs/2301.10847.
[28] HU W X, YU J X, LIANG W, et al. Artifact removal from low-dose multi-energy CT images via GAN with joint loss[C]//Proceedings of the 2024 IEEE International Conference on Computational Electromagnetics. Piscataway: IEEE, 2024: 1-3.
[29] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848.
[30] SHEN Z R, ZHANG M Y, ZHAO H Y, et al. Efficient attention: attention with linear complexities[C]//Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 3530-3538.
[31] SUN G Q, PAN Y Z, KONG W K, et al. DA-TransUNet: integrating spatial and channel dual attention with transformer U-Net for medical image segmentation[EB/OL]. [2024-04-09]. https://arxiv.org/abs/2310.12570.
[32] LANDMAN B, XU Z, IGELSIAS J, et al. Miccai multi-atlas labeling beyond the cranial vault—workshop and challenge[C]//Proceedings of the MICCAI 2015: 18th International Conference. Cham: Springer, 2015: 12.
[33] BAUMGARTNER C F, KOCH L M, POLLEFEYS M, et al. An exploration of 2D and 3D deep learning techniques for cardiac MR image segmentation[C]//Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges: 8th International Workshop, Held in Conjunction with MICCAI 2017. Cham: Springer, 2018: 111-119.
[34] BILIC P, CHRIST P, LI H B, et al. The liver tumor segmentation benchmark (LiTS)[J]. Medical Image Analysis, 2023, 84: 102680.
[35] LIN A L, CHEN B Z, XU J Y, et al. DS-TransUNet: dual swin transformer U-Net for medical image segmentation[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 4005615.
[36] 杨鹤, 柏正尧. CoT-TransUNet: 轻量化的上下文Transformer医学图像分割网络[J]. 计算机工程与应用, 2023, 59(3): 218-225.
YANG H, BAI Z Y. CoT-TransUNet: lightweight context transformer medical image segmentation network[J]. Computer Engineering and Applications, 2023, 59(3): 218-225.