Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (5): 1136-1145.DOI: 10.3778/j.issn.1673-9418.2105095
• Graphics and Image • Previous Articles Next Articles
OU Yangliu1, HE Xi1, QU Shaojun1,2,+()
Received:
2021-06-10
Revised:
2021-09-29
Online:
2022-05-01
Published:
2022-05-19
About author:
OU Yangliu, born in 1999, M.S. candidate, member of CCF. His research interests include computer vision and deep learning.Supported by:
通讯作者:
+ E-mail: qshj@hunnu.edu.cn作者简介:
欧阳柳(1999—),男,湖北黄冈人,硕士研究生,CCF会员,主要研究方向为计算机视觉、深度学习。基金资助:
CLC Number:
OU Yangliu, HE Xi, QU Shaojun. Fully Convolutional Neural Network with Attention Module for Semantic Segmentation[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1136-1145.
欧阳柳, 贺禧, 瞿绍军. 全卷积注意力机制神经网络的图像语义分割[J]. 计算机科学与探索, 2022, 16(5): 1136-1145.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2105095
Layer_name | 101-layer |
---|---|
Conv1 | 7×7,64,stride:2 |
Pooling | 3×3maxpool,stride:2 |
Res1 | |
Res2 | |
Res3 | |
Res4 | |
Table 1 Four blocks structure of ResNet-101
Layer_name | 101-layer |
---|---|
Conv1 | 7×7,64,stride:2 |
Pooling | 3×3maxpool,stride:2 |
Res1 | |
Res2 | |
Res3 | |
Res4 | |
网络模型 | ASPPAM | PAM | mIoU/% | FPS |
---|---|---|---|---|
ResNet-101-baseline | 无 | 无 | 68.1 | 25 |
ResNet-101-ASPPAM | 有 | 无 | 73.8 | 22 |
ResNet-101-PAM | 无 | 有 | 69.3 | 24 |
ResNet-101-ASPP | 无 | 无 | 70.7 | 23 |
ResNet-101-ASPPAM-PAM | 有 | 有 | 75.4 | 20 |
Table 2 Impact of two modules on network performance
网络模型 | ASPPAM | PAM | mIoU/% | FPS |
---|---|---|---|---|
ResNet-101-baseline | 无 | 无 | 68.1 | 25 |
ResNet-101-ASPPAM | 有 | 无 | 73.8 | 22 |
ResNet-101-PAM | 无 | 有 | 69.3 | 24 |
ResNet-101-ASPP | 无 | 无 | 70.7 | 23 |
ResNet-101-ASPPAM-PAM | 有 | 有 | 75.4 | 20 |
方法 | BaseNet | mIoU/% |
---|---|---|
Dilated FCN-16 | Res-101 | 47.29 |
PSPNet | Res-101 | 60.89 |
DeepLab-v3 | Res-101 | 60.91 |
DeepLab-v3+ | Res-101 | 64.06 |
DANet[ | Res-101 | 64.54 |
OCNet (baseOC) | Res-101 | 64.37 |
OCRNet[ | Res-101 | 66.54 |
EffcientFCN[ | Res-101 | 65.78 |
BiANet (without PAM) | Res-101 | 65.85 |
BiANet[ | Res-101 | 66.63 |
CANet (proposed) | Res-101 | 69.30 |
Table 3 Comparison with various advanced networks
方法 | BaseNet | mIoU/% |
---|---|---|
Dilated FCN-16 | Res-101 | 47.29 |
PSPNet | Res-101 | 60.89 |
DeepLab-v3 | Res-101 | 60.91 |
DeepLab-v3+ | Res-101 | 64.06 |
DANet[ | Res-101 | 64.54 |
OCNet (baseOC) | Res-101 | 64.37 |
OCRNet[ | Res-101 | 66.54 |
EffcientFCN[ | Res-101 | 65.78 |
BiANet (without PAM) | Res-101 | 65.85 |
BiANet[ | Res-101 | 66.63 |
CANet (proposed) | Res-101 | 69.30 |
Class name | ResNet- Baseline | ResNet- ASPPAM | ResNet- PAM | ResNet- ASPP | Proposed |
---|---|---|---|---|---|
Road | 97.6 | 97.2 | 97.7 | 97.8 | 97.6 |
Sidewalk | 83.0 | 81.8 | 83.3 | 86.4 | 82.6 |
Building | 91.4 | 90.5 | 91.9 | 92.4 | 91.6 |
Wall | 36.6 | 51.8 | 52.8 | 62.7 | 60.5 |
Fence | 53.7 | 58.4 | 47.1 | 68.2 | 64.1 |
Pole | 60.1 | 58.3 | 61.2 | 60.1 | 57.6 |
Traffic light | 69.4 | 61.3 | 66.1 | 69.0 | 61.0 |
Traffic sign | 76.5 | 73.7 | 76.1 | 77.9 | 75.7 |
Vegetation | 91.8 | 91.1 | 91.9 | 91.9 | 91.6 |
Terrain | 56.2 | 65.9 | 67.4 | 70.7 | 66.1 |
Sky | 93.9 | 93.6 | 94.0 | 94.2 | 93.7 |
Person | 80.5 | 77.7 | 78.1 | 81.0 | 79.0 |
Rider | 59.9 | 63.0 | 60.6 | 62.8 | 61.1 |
Car | 93.0 | 92.2 | 94.3 | 94.4 | 93.1 |
Truck | 40.2 | 71.3 | 46.1 | 55.1 | 66.5 |
Bus | 55.9 | 63.5 | 29.0 | 20.1 | 80.2 |
Train | 21.1 | 77.0 | 43.0 | 8.3 | 79.0 |
Motocycle | 56.0 | 59.7 | 60.0 | 72.8 | 56.2 |
Bicycle | 76.6 | 75.0 | 75.2 | 76.7 | 74.8 |
mIoU | 68.1 | 73.8 | 69.3 | 70.7 | 75.4 |
Table 4 Accuracy of each category on Cityscapes verification set %
Class name | ResNet- Baseline | ResNet- ASPPAM | ResNet- PAM | ResNet- ASPP | Proposed |
---|---|---|---|---|---|
Road | 97.6 | 97.2 | 97.7 | 97.8 | 97.6 |
Sidewalk | 83.0 | 81.8 | 83.3 | 86.4 | 82.6 |
Building | 91.4 | 90.5 | 91.9 | 92.4 | 91.6 |
Wall | 36.6 | 51.8 | 52.8 | 62.7 | 60.5 |
Fence | 53.7 | 58.4 | 47.1 | 68.2 | 64.1 |
Pole | 60.1 | 58.3 | 61.2 | 60.1 | 57.6 |
Traffic light | 69.4 | 61.3 | 66.1 | 69.0 | 61.0 |
Traffic sign | 76.5 | 73.7 | 76.1 | 77.9 | 75.7 |
Vegetation | 91.8 | 91.1 | 91.9 | 91.9 | 91.6 |
Terrain | 56.2 | 65.9 | 67.4 | 70.7 | 66.1 |
Sky | 93.9 | 93.6 | 94.0 | 94.2 | 93.7 |
Person | 80.5 | 77.7 | 78.1 | 81.0 | 79.0 |
Rider | 59.9 | 63.0 | 60.6 | 62.8 | 61.1 |
Car | 93.0 | 92.2 | 94.3 | 94.4 | 93.1 |
Truck | 40.2 | 71.3 | 46.1 | 55.1 | 66.5 |
Bus | 55.9 | 63.5 | 29.0 | 20.1 | 80.2 |
Train | 21.1 | 77.0 | 43.0 | 8.3 | 79.0 |
Motocycle | 56.0 | 59.7 | 60.0 | 72.8 | 56.2 |
Bicycle | 76.6 | 75.0 | 75.2 | 76.7 | 74.8 |
mIoU | 68.1 | 73.8 | 69.3 | 70.7 | 75.4 |
Layer | Params/MB | GFLOPS | Receptive field |
---|---|---|---|
Res1 | 0.761(0.346%) | 8.820(1.468%) | 32 |
Res2 | 4.300(1.954%) | 12.427(2.258%) | 42 |
Res3 | 92.016(41.799%) | 245.820(44.656%) | 138 |
Res4 | 52.780(23.975%) | 140.890(25.595%) | 162 |
ASPPAM | 53.510(24.310%) | 141.720(25.746%) | 236 |
Else | 16.770(7.616%) | 1.490(0.003%) | 238 |
Total | 220.137 | 551.167 | 238 |
Table 5 Network parameters
Layer | Params/MB | GFLOPS | Receptive field |
---|---|---|---|
Res1 | 0.761(0.346%) | 8.820(1.468%) | 32 |
Res2 | 4.300(1.954%) | 12.427(2.258%) | 42 |
Res3 | 92.016(41.799%) | 245.820(44.656%) | 138 |
Res4 | 52.780(23.975%) | 140.890(25.595%) | 162 |
ASPPAM | 53.510(24.310%) | 141.720(25.746%) | 236 |
Else | 16.770(7.616%) | 1.490(0.003%) | 238 |
Total | 220.137 | 551.167 | 238 |
[1] |
罗会兰, 张云. 基于深度网络的图像语义分割综述[J]. 电子学报, 2019, 47(10): 2211-2220.
DOI |
LUO H L, ZHANG Y. A survey of image semantic segmen-tation based on deep network[J]. Acta Electronica Sinica, 2019, 47(10): 2211-2220. | |
[2] | 徐辉, 祝玉华, 甄彤, 等. 深度神经网络图像语义分割方法综述[J]. 计算机科学与探索, 2021, 15(1): 47-59. |
XU H, ZHU Y H, ZHEN T, et al. Survey of semantic methods based on deep neural network[J]. Journal of Frontiers of Com-puter Science and Technology, 2021, 15(1): 47-59. | |
[3] |
LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based lear-ning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
DOI URL |
[4] | 路文超, 庞彦伟, 何宇清, 等. 基于可分离残差模块的精确实时语义分割[J]. 激光与光电子学进展, 2019, 56(5): 97-107. |
LU W C, PANG Y W, HE Y Q, et al. Real-time and accurate semantic segmentation based on separable residual modules[J]. Laser & Optoelectronics Progress, 2019, 56(5): 97-107. | |
[5] |
SHELHAMER E, LONG J, TREVOR D. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.
DOI URL |
[6] | 李晓筱, 胡晓光, 王梓强, 等. 基于深度学习的实例分割研究进展[J]. 计算机工程与应用, 2021, 57(9): 60-67. |
LI X X, HU X G, WANG Z Q, et al. Survey of instance segmentation based on deep learning[J]. Computer Enginee-ring and Applications, 2021, 57(9): 60-67. | |
[7] | ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2881-2890. |
[8] | RONNEBERGER O, FISCHER P, BROX T. U-Net: convo-lutional networks for biomedical image segmentation[C]// LNCS 9351: Proceedings of the 2015 International Confe-rence on Medical Image Computing and Computer-Assisted Intervention, Munich, Oct 5-9, 2015. Cham: Springer, 2015: 234-241. |
[9] | YUAN Y, HUANG L, GUO J, et al. OCNet: object context network for scene parsing[J]. arXiv:1809.00916, 2018. |
[10] | CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethin-king atrous convolution for semantic image segmentation[J]. arXiv:1706.05587, 2017. |
[11] | CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. arXiv:1412.7062, 2014. |
[12] | CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Tran-sactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848. |
[13] | 邝辉宇, 吴俊君. 基于深度学习的图像语义分割技术研究综述[J]. 计算机工程与应用, 2019, 55(19): 12-21. |
KUANG H Y, WU J J. Survey of image semantic semen-tation based on deep learning[J]. Computer Engineering and Applications, 2019, 55(19): 12-21. | |
[14] | 田萱, 王亮, 丁琪. 基于深度学习的图像语义分割方法综述[J]. 软件学报, 2019, 30(2): 440-468. |
TIAN X, WANG L, DING Q. Review of image semantic segmantation based on deep learning[J]. Journal of Software, 2019, 30(2): 440-468. | |
[15] | 王嫣然, 陈清亮, 吴俊君. 面向复杂环境的图像语义分割方法综述[J]. 计算机科学, 2019, 46(9): 36-46. |
WANG Y R, CHEN Q L, WU J J. Research on image sem-antic segmentation for complex environments[J]. Computer Science, 2019, 46(9): 36-46. | |
[16] | 景庄伟, 管海燕, 彭代峰, 等. 基于深度神经网络的图像语义分割研究综述[J]. 计算机工程, 2020, 46(10): 1-17. |
JING Z W, GUAN H Y, PENG D F, et al. Survey of research in image semantic segmentation based on deep neural network[J]. Computer Engineering, 2020, 46(10): 1-17. | |
[17] | LI H, XIONG P, AN J, et al. Pyramid attention network for semantic segmentation[J]. arXiv:1805.10180, 2018. |
[18] | WOO S, PARK J, LEE J Y, et al. cCBAM: onvolutional block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19. |
[19] | WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7794-7803. |
[20] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Confe-rence on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778. |
[21] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141. |
[22] | FINNEY D J. Probit analysis: a statistical treatment of the sigmoid response curve[M]. Cambridge: Cambridge University Press, 1952. |
[23] | CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmen-tation[C]// LNCS 11211: Proceedings of the 15th European Con-ference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 833-851. |
[24] | GARCIA-GARCIA A, ORTS-ESCOLANO S, OPREA S, et al. A review on deep learning techniques applied to semantic segmentation[J]. arXiv:1704.06857, 2017. |
[25] | FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]// Proceedings of the 2019 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition, Los An-geles, Jun 16-19, 2019. Piscataway: IEEE, 2019: 3146-3154. |
[26] | YUAN Y, CHEN X, WANG J. Object-contextual representa-tions for semantic segmentation[J]. arXiv:1909.11065, 2019. |
[27] | LIU J, HE J, ZHANG J, et al. EfficientFCN:holistically-guided decoding for semantic segmentation[C]// LNCS 12371: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 1-17. |
[28] |
WANG D, LI N, ZHOU Y, et al. Bilateral attention network for semantic segmentation[J]. IET Image Processing, 2021, 15(8): 1607-1616.
DOI URL |
[1] | ZHUO Tiantian, SANG Qingbing. Application of Attention Mechanism and Composite Convolution in Handwriting Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 888-897. |
[2] | MA Yu, ZHANG Liguo, DU Huimin, MAO Zhili. Traffic Sign Semantic Segmentation Based on Convolutional Neural Network [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1114-1121. |
[3] | ZHAO Xiaoqiang, XU Huiping. Image Semantic Segmentation Method with Hierarchical Feature Fusion [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(5): 949-957. |
[4] | JING Zhuangwei, GUAN Haiyan, ZANG Yufu, NI Huan, LI Dilong, YU Yongtao. Survey of Point Cloud Semantic Segmentation Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(1): 1-26. |
[5] | XU Hui, ZHU Yuhua, ZHEN Tong, LI Zhihui. Survey of Image Semantic Segmentation Methods Based on Deep Neural Network [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(1): 47-59. |
[6] | ZHANG Shoudong, YANG Ming, HU Tai. Salient Object Detection Algorithm Based on Multi-Feature Fusion [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(5): 834-845. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/