深度神经网络图像语义分割方法综述

doi:10.3778/j.issn.1673-9418.2004039

摘要/Abstract

摘要：

图像语义分割是计算机视觉领域近年来的热点研究课题，随着深度学习技术的兴起，图像语义分割与深度学习技术进行融合发展，取得了显著的进步，在无人驾驶、智能安防、智能机器人、人机交互等真实场景中应用广泛。首先对应用于图像语义分割的几种深度神经网络模型进行简单介绍，接着详细阐述了现有主流的基于深度神经网络的图像语义分割方法，依据实现技术的区别对图像语义分割方法进行分类，并对每类方法中代表性算法的技术特点、优势和不足进行分析与总结。之后归纳了图像语义分割常用的大规模公共数据集和性能评价指标，并在此基础上对经典的语义分割方法的实验结果进行了对比，最后对语义分割领域未来可行的研究方向进行展望。

关键词: 计算机视觉, 图像语义分割, 深度神经网络

Abstract:

Image semantic segmentation is a hot research topic in the field of computer vision in recent years. With the rise of deep learning technology, image semantic segmentation and deep learning technology are integrated and developed, which has made significant progress. It is widely used in practical scenarios such as unmanned driving, intelligent security, intelligent robot, human-computer interaction. Firstly, several deep neural network models for image semantic segmentation are introduced, and then the existing mainstream deep neural network-based image semantic segmentation methods are introduced. According to the differences of implementation technologies, image semantic segmentation methods are classified, and the technical characteristics, advantages and disadvantages of representative algorithms are analyzed and summarized. After that, the common datasets and performance evaluation indexes of image semantic segmentation are summarized, and the experimental results of classic semantic segmentation methods are compared on this basis. Finally, the future feasible research directions in the field of semantic segmentation are prospected.

Key words: computer vision, image semantic segmentation, deep neural network

徐辉, 祝玉华, 甄彤, 李智慧. 深度神经网络图像语义分割方法综述[J]. 计算机科学与探索, 2021, 15(1): 47-59.

XU Hui, ZHU Yuhua, ZHEN Tong, LI Zhihui. Survey of Image Semantic Segmentation Methods Based on Deep Neural Network[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(1): 47-59.

参考文献

[1] TIAN X, WANG L, DING Q. Review of image semantic segmentation based on deep learning[J]. Journal of Software, 2019, 30(2): 440-468.
田萱, 王亮, 丁琪. 基于深度学习的图像语义分割方法综述[J]. 软件学报, 2019, 30(2): 440-468.
[2] LIANG X Y, LUO C, QUAN J C, et al. Researchon pro-gress of image semantic segmentation based on deep learning[J]. Computer Engineeringand Applications, 2020, 56(2): 18-28.
梁新宇, 罗晨, 权冀川, 等. 基于深度学习的图像语义分割技术研究进展[J]. 计算机工程与应用, 2020, 56(2): 18-28.
[3] KUANG H Y, WU J J. Survey of image semantic segmen-tation based on deep learning[J]. Computer Engineering and Applications, 2019, 55(19): 12-21.
邝辉宇, 吴俊君. 基于深度学习的图像语义分割技术研究综述[J]. 计算机工程与应用, 2019, 55(19): 12-21.
[4] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[5] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[6] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409. 1556, 2014.
[7] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 1-9.
[8] HE K M, ZHANG X Y, REN S Q, et al. Deep residual-learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recogni-tion, Las Vegas, Jun 27-30, 2016. Washington: IEEE Com-puter Society, 2016: 770-778.
[9] LIPTON Z C, BERKOWITZ J, ELKAN C. A critical review of recurrent neural networks for sequence learning[J]. arXiv:1506.00019, 2015.
[10] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. arXiv:1406.2661, 2014.
[11] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Reco-gnition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 3431-3440.
[12] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Sema-ntic image segmentation with deep convolutional nets and fully connected crfs[J]. arXiv:1412.7062, 2014.
[13] YU F, KOLTUN V. Multi-scale context aggregationby dilated convolutions[J]. arXiv:1511.07122, 2015.
[14] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deep-lab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
[15] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Re-thinking atrous convolution for semantic image segmenta-tion[J]. arXiv:1706.05587, 2017.
[16] CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//LNCS 11211: Proceedings of the 15th Eur-opean Conferenceon Computer Vision, Munich, Sep 8-14, 2018. Berlin, Heidelberg: Springer, 2018: 833-851.
[17] BADRINARAYANAN V, KENDALL A, CIPOLLA R. Seg-net: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[18] RONNEBERGER O, FISCHER P, BROX T. U-Net: convo-lutional networks for biomedical image segmentation[C]//LNCS 9351: Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Interven-tion, Munich, Oct 5-9, 2015. Berlin, Heidelberg: Springer, 2015: 234-241.
[19] PASZKE A, CHAURASIA A, KIM S, et al. Enet: a deep neural network architecture for real-time semantic segmen-tation[J]. arXiv:1606.02147, 2016.
[20] WANG Y, ZHOU Q, LIU J, et al. Lednet: a lightweight encoder-decoder network for real-time semantic segmenta-tion[C]//Proceedings of the 2019 IEEE International Con-ference on Image Processing, Taipei, China, Sep 22-25, 2019: 1860-1864.
[21] LIU W, RABINOVICH A, BERGA C. ParseNet: looking wider to see better[J]. arXiv:1506.04579, 2015.
[22] LIN G S, MILAN A, SHENC H, et al. Refinenet: multi-path refinement networks for high-resolution semantic seg-mentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 1925-1934.
[23] ZHAO H S, SHI J P, QI X J, et al. Pyramids cene parsing network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6230-6239.
[24] VISIN F, REMORO A, CHO K, et al. ReSeg: a recurrent neural network-based model for semantic segmentation[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, Jun 26-Jul 1, 2016. Washington: IEEE Computer Society, 2016: 426-433.
[25] VISIN F, KASTNER K, CHO K, et al. ReNet: a recurrent neural network based alternative toconvolutional networks[J]. arXiv:1505.00393, 2015.
[26] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[27] CHO K, MERRIENBOER B V, GULCEHRE C, et al. Lea-rning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv:1406.1078, 2014.
[28] BYEON W, BREUEL T M, RAUE F, et al. Scene labeling with LSTM recurrent neural networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Com-puter Society, 2015: 3547-3555.
[29] LIANG X D, SHEN X H, FENG J S, et al. Semantic object parsing with graph LSTM[C]//LNCS 9905: Proceedings of the 2016 European Conference on Computer Vision, Am-sterdam, Oct 11-14, 2016. Berlin, Heidelberg: Springer, 2016: 125-143.
[30] XIANG Y, FOX D. DA-RNN: semantic mapping with data associated recurrent neural networks[J]. arXiv:1703.03098, 2017.
[31] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, Montreal, Dec 8-13, 2014. Red Hook: Curran Associates, 2014: 2204-2212.
[32] LI H C, XIONG P F, AN J, et al. Pyramid attention network for semantic segmentation[J]. arXiv:1805.10180, 2018.
[33] FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[J]. arXiv:1809.02983, 2018.
[34] HUANG Z L, WANG X G, HUANG L C, et al. CCNET: criss-cross attention for semantic segmentation[C]//Procee-dings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 603-612.
[35] LUC P, COUPRIE C, CHINTALA S, et al. Semantic segme-ntation using adversarial networks[J]. arXiv:1611.08408, 2016.
[36] SOULY N, SPAMPINATO C, SHAH M. Semi and weakly supervised semantic segmentation using generative adversarial network[J]. arXiv:1703.02382, 2017.
[37] MIRZA M, OSINDERO S. Conditional generative adver-sarial nets[J]. Computer Science, 2014, 27(8): 2672-2680.
[38] HUNG W C, TSAI H Y, LIOU Y T, et al. Adversarial learning for semi-supervised semantic segmentation[J]. arXiv:1802.07934, 2018.
[39] GOULD S, FULTON R, KOLLER D. Decomposing a scene into geometric and semantically consistent regions[C]//Pro-ceedings of the IEEE 12th International Conference on Computer Vision, Kyoto, Sep 27-Oct 4, 2009. Washington: IEEE Computer Society, 2009: 1-8.
[40] LIU C, YUEN J, TORRALBA A. Nonparametric scene par-sing: label transfer via dense scene alignment[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, Jun 20-25, 2009. Washington: IEEE Computer Society, 2009: 1972-1979.
[41] GEIGER A, LENZ P, STILLER C, et al. Vision meets robo-tics: the KITTI dataset[J]. The International Journal of Ro-botics Research, 2013, 32(11): 1231-1237.
[42] ALVAREZ J M, GEVERS T, LECUN Y, et al. Road scene segmentation from a single image[C]//LNCS 7578: Procee-dings of the 2012 European Conference on Computer Vision, Florence, Oct 7-13, 2012. Berlin, Heidelberg: Springer, 2012: 376-389.
[43] ZHANG R, CANDRA S A, VETTER K, et al. Sensor fusion for semantic segmentation of urban scenes[C]//Proceedings of the 2015 IEEE International Conference on Robotics and Automation, Seattle, May 26-30, 2015. Piscataway: IEEE, 2015: 1850-1857.
[44] ROS G, RAMOS S, GRANADOS M, et al. Vision-based offline-online perception paradigm for autonomous driving[C]//Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, Jan 5-9, 2015. Washington: IEEE Computer Society, 2015: 231-238.
[45] HARIHARAN B, ARBELAEZ P, BOURDEV L D, et al. Semantic contours from inverse detectors[C]//Proceedings of the 2011 IEEE International Conference on Computer Vision, Barcelona, Nov 6-13, 2011. Washington: IEEE Com-puter Society, 2011: 991-998.
[46] EVERINGHAM M, ESLAMI S A, VAN G L, et al. The pascal visual object classes challenge: a retrospective[J]. Inter-national Journal on Computer Vision, 2014, 11(1): 98-136.
[47] MOTTAGHI R, CHEN X, LIU X, et al. The role of context for object detection and semantic segmentation in the wild[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 891-898.
[48] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: common objects in context[C]//LNCS 8693: Proceedings of the 13th European Conferenceon Computer Vision, Zurich, Sep 6-12, 2014. Berlin, Heidelberg: Springer, 2014: 740-755.
[49] ZHOU B L, ZHAO H, PUIG X, et al. Scene parsing through ADE20K dataset[C]//Proceedings of the 2017 IEEE Confer-ence on Computer Vision and Pattern Recognition, Hono-lulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 5122-5130.
[50] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Procee-dings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 3213-3223.
[51] SONG S, LICHTENBERG S P, XIAO J. SUN RGB-D: a RGB-D scene understanding benchmark suite[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 567-576.
[52] GARCIA-GARCIA A, ORTS-ESCOLANO S, OPREA S O, et al. A review on deep learning techniques applied to semantic segmentation[J]. arXiv:1704.06857, 2017.
[53] LI H C, XIONG P F, FAN H Q, et al. DFANet: deep feature aggregation for real-time semantic segmentation[J]. arXiv:1904.02216, 2019.
[54] ARANI E, MARZBAN S, PATA A, et al. RGPNet: a real-time general purpose semantic segmentation[J]. arXiv:1912. 01394, 2019.
[55] CHARLES R Q, SU H, KAICHUN M, et al. PointNet: deep learning on point sets for 3D classification and seg-mentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 77-85.
[56] HU Q Y, YANG B, ROSA S, et al. RandLA-Net: efficient semantic segmentation of large-scale pointclouds[J]. arXiv:1911.11236, 2019.
[57] LU Y, CHEN Y Y, ZHAO D B, et al. Graph-FCN for image semantic segmentation[J]. arXiv:2001.00335, 2020.

编辑推荐 0

Metrics

阅读次数

全文

1222

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	1222

来源	本网站	其他网站

次数	788	434
比例	64%	36%

摘要

1015

最新录用	在线预览	正式出版

0	0	1015

	来源	本网站

	次数	1015
	比例	100%