全卷积注意力机制神经网络的图像语义分割

doi:10.3778/j.issn.1673-9418.2105095

计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (5): 1136-1145.DOI: 10.3778/j.issn.1673-9418.2105095

全卷积注意力机制神经网络的图像语义分割

欧阳柳¹, 贺禧¹, 瞿绍军¹^,²^,⁺()

1.湖南师范大学信息科学与工程学院,长沙 410081
2.湖南师范大学湖南湘江人工智能学院,长沙 410081

收稿日期:2021-06-10 修回日期:2021-09-29 出版日期:2022-05-01 发布日期:2022-05-19
通讯作者: + E-mail: qshj@hunnu.edu.cn
作者简介:欧阳柳（1999—）,男,湖北黄冈人,硕士研究生,CCF会员,主要研究方向为计算机视觉、深度学习。
贺禧（2000—）,男,湖南湘潭人,主要研究方向为计算机视觉、深度学习。
瞿绍军（1979—）,男,湖南永顺人,博士,高级实验师,CCF会员,主要研究方向为图像分割、计算机视觉、深度学习。
基金资助:
国家自然科学基金(12071126);湖南省教育厅科学研究项目(19C1149);国家级大学生创新创业训练计划项目(S202010542021)

Fully Convolutional Neural Network with Attention Module for Semantic Segmentation

OU Yangliu¹, HE Xi¹, QU Shaojun¹^,²^,⁺()

1. College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China
2. Hunan Xiangjiang Artificial Intelligence Academy, Hunan Normal University, Changsha 410081, China

Received:2021-06-10 Revised:2021-09-29 Online:2022-05-01 Published:2022-05-19
About author:OU Yangliu, born in 1999, M.S. candidate, member of CCF. His research interests include computer vision and deep learning.
HE Xi, born in 2000. His research interests include computer vision and deep learning.
QU Shaojun, born in 1979, Ph.D., senior experimentalist, member of CCF. His research interests include image segmentation, computer vision and deep learning.
Supported by:
National Natural Science Foundation of China(12071126);Scientific Research Fund of Hunan Provincial Education Department(19C1149);National College Students’ Innovation and Entrepreneurship Training Program(S202010542021)

摘要/Abstract

摘要：

全卷积神经网络是一种强大的端到端的模型,在语义分割领域应用广泛,获得了巨大的成功。研究人员提出了一系列基于完全卷积神经网络的方法,但是随着卷积和池化的持续性下采样,图像的上下文信息将会丢失,影响了像素级分类。针对完全卷积网络上下文信息丢失问题,提出基于像素的注意力方法。该方法利用计算高级特征图像素之间的联系来获取全局信息,增强像素之间的相关性,再结合空洞空间金字塔池化进一步提取图像的特征信息。针对图像的高层特征图像素丢失的问题,提出了基于图像不同层级的注意力方法。该方法将高层特征图中的信息作为指导对低层特征图中隐藏的信息进行挖掘,然后和高级特征图进行融合,充分利用高级特征图信息和低级特征图的信息。在实验方面,通过对比所提不同模块对全卷积神经网络分割性能的影响,验证了所提方法的有效性。同时在公认的图像语义分割数据集Cityscapes上与当前先进的网络进行实验对比,结果显示所提方法在客观评价指标和主观效果方面均存在优越性,并在Cityscapes官网测试集中达到了69.3%的准确率,性能比近期几个先进网络高出3~5个百分点。

关键词: 全卷积神经网络, 空洞空间金字塔池化, 注意力模型, 语义分割

Abstract:

A fully convolutional neural network is a powerful end-to-end model that is widely used in the field of semantic segmentation and has achieved great success. Researchers have proposed a series of methods based on a fully convolutional neural network. However, with the continuous subsampling of convolutions and pooling, the image contextual information will be lost, affecting the pixel-level classification. To solve the problem of context loss in a fully convolutional network, a pixel-based attention method is proposed, which calculates the relationship bet-ween high-level feature map pixels to obtain global information and enhance the correlation between pixels com-bined with atrous spatial pyramid pooling to further extract the image feature information. To solve the problem of pixel loss in the high-level feature map of an image, an attention method based on different levels of the image is proposed. This method uses the information in the high-level feature map as a guide to mine the hidden information in the low-level feature map and then fuses it with the high-level feature map to make full use of the high-level feature map and the low-level feature map information. In the experiment, the effectiveness of the proposed method is verified by comparing the effects of different modules on the segmentation results of a fully convolutional neural network. At the same time, experiments are carried out on the recognized image semantic segmentation dataset called Cityscapes and compared with the current advanced networks. The results show that the proposed method has advantages in both objective evaluation indicators and subjective effects, and achieves 69.3% accuracy in the Cityscapes official website test set. The performance is 3 to 5 percentage points higher than that of several recent advanced networks.

Key words: fully convolutional neural network, atrous spatial pyramid pooling, attention module, semantic segmentation

中图分类号:

TP391.4

欧阳柳, 贺禧, 瞿绍军. 全卷积注意力机制神经网络的图像语义分割[J]. 计算机科学与探索, 2022, 16(5): 1136-1145.

OU Yangliu, HE Xi, QU Shaojun. Fully Convolutional Neural Network with Attention Module for Semantic Segmentation[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1136-1145.

图/表 13

图1 CANet网络结构图

Fig.1 CANet network structure diagram

表1 ResNet-101四个块的结构

Table 1 Four blocks structure of ResNet-101

Layer_name	101-layer
Conv1	7×7,64,stride：2
Pooling	3×3maxpool,stride:2
Res1	$[\begin{matrix} 1 \times 1, & 64 \\ 3 \times 3, & 64 \\ 1 \times 1, & 256 \end{matrix}] \times 3$
Res2	$[\begin{matrix} 1 \times 1, & 128 \\ 3 \times 3, & 128 \\ 1 \times 1, & 256 \end{matrix}] \times 4$
Res3	$[\begin{matrix} 1 \times 1, & 256 \\ 3 \times 3, & 256 \\ 1 \times 1, & 1024 \end{matrix}] \times 23$
Res4	$[\begin{matrix} 1 \times 1, & 512 \\ 3 \times 3, & 512 \\ 1 \times 1, & 2048 \end{matrix}] \times 3$

表1 ResNet-101四个块的结构

Table 1 Four blocks structure of ResNet-101

Layer_name	101-layer
Conv1	7×7,64,stride：2
Pooling	3×3maxpool,stride:2
Res1	$[\begin{matrix} 1 \times 1, & 64 \\ 3 \times 3, & 64 \\ 1 \times 1, & 256 \end{matrix}] \times 3$
Res2	$[\begin{matrix} 1 \times 1, & 128 \\ 3 \times 3, & 128 \\ 1 \times 1, & 256 \end{matrix}] \times 4$
Res3	$[\begin{matrix} 1 \times 1, & 256 \\ 3 \times 3, & 256 \\ 1 \times 1, & 1024 \end{matrix}] \times 23$
Res4	$[\begin{matrix} 1 \times 1, & 512 \\ 3 \times 3, & 512 \\ 1 \times 1, & 2048 \end{matrix}] \times 3$

图2 原始图片

Fig.2 Original picture

图3 未使用ASPPAM提取的高级特征图可视化结果

Fig.3 Visualization results of advanced feature map extracted without ASPPAM

图4 使用ASPPAM提取的高级特征图可视化结果

Fig.4 Visualization results of advanced feature map extracted with ASPPAM

图5 像素相似注意力模块

Fig.5 Pixel similar attention module

图6 空洞空间金字塔注意力模块

Fig.6 Atrous spatial pyramid pooling attention module

图7 位置注意力模块

Fig.7 Position attention module

表2 两个模块对网络性能的影响

Table 2 Impact of two modules on network performance

网络模型	ASPPAM	PAM	mIoU/%	FPS
ResNet-101-baseline	无	无	68.1	25
ResNet-101-ASPPAM	有	无	73.8	22
ResNet-101-PAM	无	有	69.3	24
ResNet-101-ASPP	无	无	70.7	23
ResNet-101-ASPPAM-PAM	有	有	75.4	20

表3 与各种先进网络的比较

Table 3 Comparison with various advanced networks

方法	BaseNet	mIoU/%
Dilated FCN-16	Res-101	47.29
PSPNet	Res-101	60.89
DeepLab-v3	Res-101	60.91
DeepLab-v3+	Res-101	64.06
DANet^[25]	Res-101	64.54
OCNet (baseOC)	Res-101	64.37
OCRNet^[26]	Res-101	66.54
EffcientFCN^[27]	Res-101	65.78
BiANet (without PAM)	Res-101	65.85
BiANet^[28]	Res-101	66.63
CANet (proposed)	Res-101	69.30

表4 Cityscapes验证集上各个类别的准确率

Table 4 Accuracy of each category on Cityscapes verification set %

Class name	ResNet- Baseline	ResNet- ASPPAM	ResNet- PAM	ResNet- ASPP	Proposed
Road	97.6	97.2	97.7	97.8	97.6
Sidewalk	83.0	81.8	83.3	86.4	82.6
Building	91.4	90.5	91.9	92.4	91.6
Wall	36.6	51.8	52.8	62.7	60.5
Fence	53.7	58.4	47.1	68.2	64.1
Pole	60.1	58.3	61.2	60.1	57.6
Traffic light	69.4	61.3	66.1	69.0	61.0
Traffic sign	76.5	73.7	76.1	77.9	75.7
Vegetation	91.8	91.1	91.9	91.9	91.6
Terrain	56.2	65.9	67.4	70.7	66.1
Sky	93.9	93.6	94.0	94.2	93.7
Person	80.5	77.7	78.1	81.0	79.0
Rider	59.9	63.0	60.6	62.8	61.1
Car	93.0	92.2	94.3	94.4	93.1
Truck	40.2	71.3	46.1	55.1	66.5
Bus	55.9	63.5	29.0	20.1	80.2
Train	21.1	77.0	43.0	8.3	79.0
Motocycle	56.0	59.7	60.0	72.8	56.2
Bicycle	76.6	75.0	75.2	76.7	74.8
mIoU	68.1	73.8	69.3	70.7	75.4

图8 消融实验结果可视化

Fig.8 Visualization of ablation experimental results

表5 网络参数表

Table 5 Network parameters

Layer	Params/MB	GFLOPS	Receptive field
Res1	0.761(0.346%)	8.820(1.468%)	32
Res2	4.300(1.954%)	12.427(2.258%)	42
Res3	92.016(41.799%)	245.820(44.656%)	138
Res4	52.780(23.975%)	140.890(25.595%)	162
ASPPAM	53.510(24.310%)	141.720(25.746%)	236
Else	16.770(7.616%)	1.490(0.003%)	238
Total	220.137	551.167	238

参考文献 28

[1]	罗会兰, 张云. 基于深度网络的图像语义分割综述[J]. 电子学报, 2019, 47(10): 2211-2220. DOI
	LUO H L, ZHANG Y. A survey of image semantic segmen-tation based on deep network[J]. Acta Electronica Sinica, 2019, 47(10): 2211-2220.
[2]	徐辉, 祝玉华, 甄彤, 等. 深度神经网络图像语义分割方法综述[J]. 计算机科学与探索, 2021, 15(1): 47-59.
	XU H, ZHU Y H, ZHEN T, et al. Survey of semantic methods based on deep neural network[J]. Journal of Frontiers of Com-puter Science and Technology, 2021, 15(1): 47-59.
[3]	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based lear-ning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. DOI URL
[4]	路文超, 庞彦伟, 何宇清, 等. 基于可分离残差模块的精确实时语义分割[J]. 激光与光电子学进展, 2019, 56(5): 97-107.
	LU W C, PANG Y W, HE Y Q, et al. Real-time and accurate semantic segmentation based on separable residual modules[J]. Laser & Optoelectronics Progress, 2019, 56(5): 97-107.
[5]	SHELHAMER E, LONG J, TREVOR D. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651. DOI URL
[6]	李晓筱, 胡晓光, 王梓强, 等. 基于深度学习的实例分割研究进展[J]. 计算机工程与应用, 2021, 57(9): 60-67.
	LI X X, HU X G, WANG Z Q, et al. Survey of instance segmentation based on deep learning[J]. Computer Enginee-ring and Applications, 2021, 57(9): 60-67.
[7]	ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2881-2890.
[8]	RONNEBERGER O, FISCHER P, BROX T. U-Net: convo-lutional networks for biomedical image segmentation[C]// LNCS 9351: Proceedings of the 2015 International Confe-rence on Medical Image Computing and Computer-Assisted Intervention, Munich, Oct 5-9, 2015. Cham: Springer, 2015: 234-241.
[9]	YUAN Y, HUANG L, GUO J, et al. OCNet: object context network for scene parsing[J]. arXiv:1809.00916, 2018.
[10]	CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethin-king atrous convolution for semantic image segmentation[J]. arXiv:1706.05587, 2017.
[11]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. arXiv:1412.7062, 2014.
[12]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Tran-sactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
[13]	邝辉宇, 吴俊君. 基于深度学习的图像语义分割技术研究综述[J]. 计算机工程与应用, 2019, 55(19): 12-21.
	KUANG H Y, WU J J. Survey of image semantic semen-tation based on deep learning[J]. Computer Engineering and Applications, 2019, 55(19): 12-21.
[14]	田萱, 王亮, 丁琪. 基于深度学习的图像语义分割方法综述[J]. 软件学报, 2019, 30(2): 440-468.
	TIAN X, WANG L, DING Q. Review of image semantic segmantation based on deep learning[J]. Journal of Software, 2019, 30(2): 440-468.
[15]	王嫣然, 陈清亮, 吴俊君. 面向复杂环境的图像语义分割方法综述[J]. 计算机科学, 2019, 46(9): 36-46.
	WANG Y R, CHEN Q L, WU J J. Research on image sem-antic segmentation for complex environments[J]. Computer Science, 2019, 46(9): 36-46.
[16]	景庄伟, 管海燕, 彭代峰, 等. 基于深度神经网络的图像语义分割研究综述[J]. 计算机工程, 2020, 46(10): 1-17.
	JING Z W, GUAN H Y, PENG D F, et al. Survey of research in image semantic segmentation based on deep neural network[J]. Computer Engineering, 2020, 46(10): 1-17.
[17]	LI H, XIONG P, AN J, et al. Pyramid attention network for semantic segmentation[J]. arXiv:1805.10180, 2018.
[18]	WOO S, PARK J, LEE J Y, et al. cCBAM: onvolutional block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[19]	WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7794-7803.
[20]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Confe-rence on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778.
[21]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141.
[22]	FINNEY D J. Probit analysis: a statistical treatment of the sigmoid response curve[M]. Cambridge: Cambridge University Press, 1952.
[23]	CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmen-tation[C]// LNCS 11211: Proceedings of the 15th European Con-ference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 833-851.
[24]	GARCIA-GARCIA A, ORTS-ESCOLANO S, OPREA S, et al. A review on deep learning techniques applied to semantic segmentation[J]. arXiv:1704.06857, 2017.
[25]	FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]// Proceedings of the 2019 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition, Los An-geles, Jun 16-19, 2019. Piscataway: IEEE, 2019: 3146-3154.
[26]	YUAN Y, CHEN X, WANG J. Object-contextual representa-tions for semantic segmentation[J]. arXiv:1909.11065, 2019.
[27]	LIU J, HE J, ZHANG J, et al. EfficientFCN:holistically-guided decoding for semantic segmentation[C]// LNCS 12371: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 1-17.
[28]	WANG D, LI N, ZHOU Y, et al. Bilateral attention network for semantic segmentation[J]. IET Image Processing, 2021, 15(8): 1607-1616. DOI URL

编辑推荐 0

Metrics

阅读次数

全文

197

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	27	0	0	170

来源	本网站	其他网站

次数	195	2
比例	99%	1%

摘要

364

最新录用	在线预览	正式出版

0	0	364

来源	本网站	其他网站

次数	361	3
比例	99%	1%

全卷积注意力机制神经网络的图像语义分割

Fully Convolutional Neural Network with Attention Module for Semantic Segmentation

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 28

相关文章 5

编辑推荐 0

Metrics

[1]	马宇, 张丽果, 杜慧敏, 毛智礼. 卷积神经网络的交通标志语义分割[J]. 计算机科学与探索, 2021, 15(6): 1114-1121.
[2]	赵小强, 徐慧萍. 分级特征融合的图像语义分割[J]. 计算机科学与探索, 2021, 15(5): 949-957.
[3]	景庄伟, 管海燕, 臧玉府, 倪欢, 李迪龙, 于永涛. 基于深度学习的点云语义分割研究综述[J]. 计算机科学与探索, 2021, 15(1): 1-26.
[4]	徐辉, 祝玉华, 甄彤, 李智慧. 深度神经网络图像语义分割方法综述[J]. 计算机科学与探索, 2021, 15(1): 47-59.
[5]	张守东，杨明，胡太. 基于多特征融合的显著性目标检测算法[J]. 计算机科学与探索, 2019, 13(5): 834-845.