计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (5): 1136-1145.DOI: 10.3778/j.issn.1673-9418.2105095
收稿日期:
2021-06-10
修回日期:
2021-09-29
出版日期:
2022-05-01
发布日期:
2022-05-19
通讯作者:
+ E-mail: qshj@hunnu.edu.cn作者简介:
欧阳柳(1999—),男,湖北黄冈人,硕士研究生,CCF会员,主要研究方向为计算机视觉、深度学习。基金资助:
OU Yangliu1, HE Xi1, QU Shaojun1,2,+()
Received:
2021-06-10
Revised:
2021-09-29
Online:
2022-05-01
Published:
2022-05-19
About author:
OU Yangliu, born in 1999, M.S. candidate, member of CCF. His research interests include computer vision and deep learning.Supported by:
摘要:
全卷积神经网络是一种强大的端到端的模型,在语义分割领域应用广泛,获得了巨大的成功。研究人员提出了一系列基于完全卷积神经网络的方法,但是随着卷积和池化的持续性下采样,图像的上下文信息将会丢失,影响了像素级分类。针对完全卷积网络上下文信息丢失问题,提出基于像素的注意力方法。该方法利用计算高级特征图像素之间的联系来获取全局信息,增强像素之间的相关性,再结合空洞空间金字塔池化进一步提取图像的特征信息。针对图像的高层特征图像素丢失的问题,提出了基于图像不同层级的注意力方法。该方法将高层特征图中的信息作为指导对低层特征图中隐藏的信息进行挖掘,然后和高级特征图进行融合,充分利用高级特征图信息和低级特征图的信息。在实验方面,通过对比所提不同模块对全卷积神经网络分割性能的影响,验证了所提方法的有效性。同时在公认的图像语义分割数据集Cityscapes上与当前先进的网络进行实验对比,结果显示所提方法在客观评价指标和主观效果方面均存在优越性,并在Cityscapes官网测试集中达到了69.3%的准确率,性能比近期几个先进网络高出3~5个百分点。
中图分类号:
欧阳柳, 贺禧, 瞿绍军. 全卷积注意力机制神经网络的图像语义分割[J]. 计算机科学与探索, 2022, 16(5): 1136-1145.
OU Yangliu, HE Xi, QU Shaojun. Fully Convolutional Neural Network with Attention Module for Semantic Segmentation[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1136-1145.
Layer_name | 101-layer |
---|---|
Conv1 | 7×7,64,stride:2 |
Pooling | 3×3maxpool,stride:2 |
Res1 | |
Res2 | |
Res3 | |
Res4 | |
表1 ResNet-101四个块的结构
Table 1 Four blocks structure of ResNet-101
Layer_name | 101-layer |
---|---|
Conv1 | 7×7,64,stride:2 |
Pooling | 3×3maxpool,stride:2 |
Res1 | |
Res2 | |
Res3 | |
Res4 | |
网络模型 | ASPPAM | PAM | mIoU/% | FPS |
---|---|---|---|---|
ResNet-101-baseline | 无 | 无 | 68.1 | 25 |
ResNet-101-ASPPAM | 有 | 无 | 73.8 | 22 |
ResNet-101-PAM | 无 | 有 | 69.3 | 24 |
ResNet-101-ASPP | 无 | 无 | 70.7 | 23 |
ResNet-101-ASPPAM-PAM | 有 | 有 | 75.4 | 20 |
表2 两个模块对网络性能的影响
Table 2 Impact of two modules on network performance
网络模型 | ASPPAM | PAM | mIoU/% | FPS |
---|---|---|---|---|
ResNet-101-baseline | 无 | 无 | 68.1 | 25 |
ResNet-101-ASPPAM | 有 | 无 | 73.8 | 22 |
ResNet-101-PAM | 无 | 有 | 69.3 | 24 |
ResNet-101-ASPP | 无 | 无 | 70.7 | 23 |
ResNet-101-ASPPAM-PAM | 有 | 有 | 75.4 | 20 |
方法 | BaseNet | mIoU/% |
---|---|---|
Dilated FCN-16 | Res-101 | 47.29 |
PSPNet | Res-101 | 60.89 |
DeepLab-v3 | Res-101 | 60.91 |
DeepLab-v3+ | Res-101 | 64.06 |
DANet[ | Res-101 | 64.54 |
OCNet (baseOC) | Res-101 | 64.37 |
OCRNet[ | Res-101 | 66.54 |
EffcientFCN[ | Res-101 | 65.78 |
BiANet (without PAM) | Res-101 | 65.85 |
BiANet[ | Res-101 | 66.63 |
CANet (proposed) | Res-101 | 69.30 |
表3 与各种先进网络的比较
Table 3 Comparison with various advanced networks
方法 | BaseNet | mIoU/% |
---|---|---|
Dilated FCN-16 | Res-101 | 47.29 |
PSPNet | Res-101 | 60.89 |
DeepLab-v3 | Res-101 | 60.91 |
DeepLab-v3+ | Res-101 | 64.06 |
DANet[ | Res-101 | 64.54 |
OCNet (baseOC) | Res-101 | 64.37 |
OCRNet[ | Res-101 | 66.54 |
EffcientFCN[ | Res-101 | 65.78 |
BiANet (without PAM) | Res-101 | 65.85 |
BiANet[ | Res-101 | 66.63 |
CANet (proposed) | Res-101 | 69.30 |
Class name | ResNet- Baseline | ResNet- ASPPAM | ResNet- PAM | ResNet- ASPP | Proposed |
---|---|---|---|---|---|
Road | 97.6 | 97.2 | 97.7 | 97.8 | 97.6 |
Sidewalk | 83.0 | 81.8 | 83.3 | 86.4 | 82.6 |
Building | 91.4 | 90.5 | 91.9 | 92.4 | 91.6 |
Wall | 36.6 | 51.8 | 52.8 | 62.7 | 60.5 |
Fence | 53.7 | 58.4 | 47.1 | 68.2 | 64.1 |
Pole | 60.1 | 58.3 | 61.2 | 60.1 | 57.6 |
Traffic light | 69.4 | 61.3 | 66.1 | 69.0 | 61.0 |
Traffic sign | 76.5 | 73.7 | 76.1 | 77.9 | 75.7 |
Vegetation | 91.8 | 91.1 | 91.9 | 91.9 | 91.6 |
Terrain | 56.2 | 65.9 | 67.4 | 70.7 | 66.1 |
Sky | 93.9 | 93.6 | 94.0 | 94.2 | 93.7 |
Person | 80.5 | 77.7 | 78.1 | 81.0 | 79.0 |
Rider | 59.9 | 63.0 | 60.6 | 62.8 | 61.1 |
Car | 93.0 | 92.2 | 94.3 | 94.4 | 93.1 |
Truck | 40.2 | 71.3 | 46.1 | 55.1 | 66.5 |
Bus | 55.9 | 63.5 | 29.0 | 20.1 | 80.2 |
Train | 21.1 | 77.0 | 43.0 | 8.3 | 79.0 |
Motocycle | 56.0 | 59.7 | 60.0 | 72.8 | 56.2 |
Bicycle | 76.6 | 75.0 | 75.2 | 76.7 | 74.8 |
mIoU | 68.1 | 73.8 | 69.3 | 70.7 | 75.4 |
表4 Cityscapes验证集上各个类别的准确率
Table 4 Accuracy of each category on Cityscapes verification set %
Class name | ResNet- Baseline | ResNet- ASPPAM | ResNet- PAM | ResNet- ASPP | Proposed |
---|---|---|---|---|---|
Road | 97.6 | 97.2 | 97.7 | 97.8 | 97.6 |
Sidewalk | 83.0 | 81.8 | 83.3 | 86.4 | 82.6 |
Building | 91.4 | 90.5 | 91.9 | 92.4 | 91.6 |
Wall | 36.6 | 51.8 | 52.8 | 62.7 | 60.5 |
Fence | 53.7 | 58.4 | 47.1 | 68.2 | 64.1 |
Pole | 60.1 | 58.3 | 61.2 | 60.1 | 57.6 |
Traffic light | 69.4 | 61.3 | 66.1 | 69.0 | 61.0 |
Traffic sign | 76.5 | 73.7 | 76.1 | 77.9 | 75.7 |
Vegetation | 91.8 | 91.1 | 91.9 | 91.9 | 91.6 |
Terrain | 56.2 | 65.9 | 67.4 | 70.7 | 66.1 |
Sky | 93.9 | 93.6 | 94.0 | 94.2 | 93.7 |
Person | 80.5 | 77.7 | 78.1 | 81.0 | 79.0 |
Rider | 59.9 | 63.0 | 60.6 | 62.8 | 61.1 |
Car | 93.0 | 92.2 | 94.3 | 94.4 | 93.1 |
Truck | 40.2 | 71.3 | 46.1 | 55.1 | 66.5 |
Bus | 55.9 | 63.5 | 29.0 | 20.1 | 80.2 |
Train | 21.1 | 77.0 | 43.0 | 8.3 | 79.0 |
Motocycle | 56.0 | 59.7 | 60.0 | 72.8 | 56.2 |
Bicycle | 76.6 | 75.0 | 75.2 | 76.7 | 74.8 |
mIoU | 68.1 | 73.8 | 69.3 | 70.7 | 75.4 |
Layer | Params/MB | GFLOPS | Receptive field |
---|---|---|---|
Res1 | 0.761(0.346%) | 8.820(1.468%) | 32 |
Res2 | 4.300(1.954%) | 12.427(2.258%) | 42 |
Res3 | 92.016(41.799%) | 245.820(44.656%) | 138 |
Res4 | 52.780(23.975%) | 140.890(25.595%) | 162 |
ASPPAM | 53.510(24.310%) | 141.720(25.746%) | 236 |
Else | 16.770(7.616%) | 1.490(0.003%) | 238 |
Total | 220.137 | 551.167 | 238 |
表5 网络参数表
Table 5 Network parameters
Layer | Params/MB | GFLOPS | Receptive field |
---|---|---|---|
Res1 | 0.761(0.346%) | 8.820(1.468%) | 32 |
Res2 | 4.300(1.954%) | 12.427(2.258%) | 42 |
Res3 | 92.016(41.799%) | 245.820(44.656%) | 138 |
Res4 | 52.780(23.975%) | 140.890(25.595%) | 162 |
ASPPAM | 53.510(24.310%) | 141.720(25.746%) | 236 |
Else | 16.770(7.616%) | 1.490(0.003%) | 238 |
Total | 220.137 | 551.167 | 238 |
[1] |
罗会兰, 张云. 基于深度网络的图像语义分割综述[J]. 电子学报, 2019, 47(10): 2211-2220.
DOI |
LUO H L, ZHANG Y. A survey of image semantic segmen-tation based on deep network[J]. Acta Electronica Sinica, 2019, 47(10): 2211-2220. | |
[2] | 徐辉, 祝玉华, 甄彤, 等. 深度神经网络图像语义分割方法综述[J]. 计算机科学与探索, 2021, 15(1): 47-59. |
XU H, ZHU Y H, ZHEN T, et al. Survey of semantic methods based on deep neural network[J]. Journal of Frontiers of Com-puter Science and Technology, 2021, 15(1): 47-59. | |
[3] |
LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based lear-ning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
DOI URL |
[4] | 路文超, 庞彦伟, 何宇清, 等. 基于可分离残差模块的精确实时语义分割[J]. 激光与光电子学进展, 2019, 56(5): 97-107. |
LU W C, PANG Y W, HE Y Q, et al. Real-time and accurate semantic segmentation based on separable residual modules[J]. Laser & Optoelectronics Progress, 2019, 56(5): 97-107. | |
[5] |
SHELHAMER E, LONG J, TREVOR D. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.
DOI URL |
[6] | 李晓筱, 胡晓光, 王梓强, 等. 基于深度学习的实例分割研究进展[J]. 计算机工程与应用, 2021, 57(9): 60-67. |
LI X X, HU X G, WANG Z Q, et al. Survey of instance segmentation based on deep learning[J]. Computer Enginee-ring and Applications, 2021, 57(9): 60-67. | |
[7] | ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2881-2890. |
[8] | RONNEBERGER O, FISCHER P, BROX T. U-Net: convo-lutional networks for biomedical image segmentation[C]// LNCS 9351: Proceedings of the 2015 International Confe-rence on Medical Image Computing and Computer-Assisted Intervention, Munich, Oct 5-9, 2015. Cham: Springer, 2015: 234-241. |
[9] | YUAN Y, HUANG L, GUO J, et al. OCNet: object context network for scene parsing[J]. arXiv:1809.00916, 2018. |
[10] | CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethin-king atrous convolution for semantic image segmentation[J]. arXiv:1706.05587, 2017. |
[11] | CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. arXiv:1412.7062, 2014. |
[12] | CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Tran-sactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848. |
[13] | 邝辉宇, 吴俊君. 基于深度学习的图像语义分割技术研究综述[J]. 计算机工程与应用, 2019, 55(19): 12-21. |
KUANG H Y, WU J J. Survey of image semantic semen-tation based on deep learning[J]. Computer Engineering and Applications, 2019, 55(19): 12-21. | |
[14] | 田萱, 王亮, 丁琪. 基于深度学习的图像语义分割方法综述[J]. 软件学报, 2019, 30(2): 440-468. |
TIAN X, WANG L, DING Q. Review of image semantic segmantation based on deep learning[J]. Journal of Software, 2019, 30(2): 440-468. | |
[15] | 王嫣然, 陈清亮, 吴俊君. 面向复杂环境的图像语义分割方法综述[J]. 计算机科学, 2019, 46(9): 36-46. |
WANG Y R, CHEN Q L, WU J J. Research on image sem-antic segmentation for complex environments[J]. Computer Science, 2019, 46(9): 36-46. | |
[16] | 景庄伟, 管海燕, 彭代峰, 等. 基于深度神经网络的图像语义分割研究综述[J]. 计算机工程, 2020, 46(10): 1-17. |
JING Z W, GUAN H Y, PENG D F, et al. Survey of research in image semantic segmentation based on deep neural network[J]. Computer Engineering, 2020, 46(10): 1-17. | |
[17] | LI H, XIONG P, AN J, et al. Pyramid attention network for semantic segmentation[J]. arXiv:1805.10180, 2018. |
[18] | WOO S, PARK J, LEE J Y, et al. cCBAM: onvolutional block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19. |
[19] | WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7794-7803. |
[20] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Confe-rence on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778. |
[21] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141. |
[22] | FINNEY D J. Probit analysis: a statistical treatment of the sigmoid response curve[M]. Cambridge: Cambridge University Press, 1952. |
[23] | CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmen-tation[C]// LNCS 11211: Proceedings of the 15th European Con-ference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 833-851. |
[24] | GARCIA-GARCIA A, ORTS-ESCOLANO S, OPREA S, et al. A review on deep learning techniques applied to semantic segmentation[J]. arXiv:1704.06857, 2017. |
[25] | FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]// Proceedings of the 2019 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition, Los An-geles, Jun 16-19, 2019. Piscataway: IEEE, 2019: 3146-3154. |
[26] | YUAN Y, CHEN X, WANG J. Object-contextual representa-tions for semantic segmentation[J]. arXiv:1909.11065, 2019. |
[27] | LIU J, HE J, ZHANG J, et al. EfficientFCN:holistically-guided decoding for semantic segmentation[C]// LNCS 12371: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 1-17. |
[28] |
WANG D, LI N, ZHOU Y, et al. Bilateral attention network for semantic segmentation[J]. IET Image Processing, 2021, 15(8): 1607-1616.
DOI URL |
[1] | 马宇, 张丽果, 杜慧敏, 毛智礼. 卷积神经网络的交通标志语义分割[J]. 计算机科学与探索, 2021, 15(6): 1114-1121. |
[2] | 赵小强, 徐慧萍. 分级特征融合的图像语义分割[J]. 计算机科学与探索, 2021, 15(5): 949-957. |
[3] | 景庄伟, 管海燕, 臧玉府, 倪欢, 李迪龙, 于永涛. 基于深度学习的点云语义分割研究综述[J]. 计算机科学与探索, 2021, 15(1): 1-26. |
[4] | 徐辉, 祝玉华, 甄彤, 李智慧. 深度神经网络图像语义分割方法综述[J]. 计算机科学与探索, 2021, 15(1): 47-59. |
[5] | 张守东,杨明,胡太. 基于多特征融合的显著性目标检测算法[J]. 计算机科学与探索, 2019, 13(5): 834-845. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||