Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (10): 2377-2386.DOI: 10.3778/j.issn.1673-9418.2203015

• Graphics and Image • Previous Articles     Next Articles

Rapid and Ultra-lightweight Semantic Segmentation in Urban Traffic Scene

SHI Min, SHEN Jialin, YI Qingming, LUO Aiwen+()   

  1. College of Information Science and Technology, Jinan University, Guangzhou 510632, China
  • Received:2022-02-11 Revised:2022-04-15 Online:2022-10-01 Published:2022-10-14
  • About author:SHI Min, born in 1977, Ph.D., associate professor. Her research interests include image multimedia processing, video codec, etc.
    SHEN Jialin, born in 1997, M.S. candidate. His research interests include machine vision, deep learning, etc.
    YI Qingming, born in 1965, Ph.D., professor. Her research interest is multimedia information processing.
    LUO Aiwen, born in 1986, Ph.D., lecturer, member of CCF. Her research interests include edge machine vision and intelligent IC design.
  • Supported by:
    National Natural Science Foundation of China(62002134);Basic and Applied Research Foundation of Guangdong Province(2020A1515110645);Key Laboratory Project of Guangdong Province(2021KSY001);Innovation Leading Talent Project of Guangzhou(2019019);Fundamental Research Funds for the Central Universities at Jinan University(21620353);Project of JNU-Techtotop Joint Postgraduates Training Base(82621176)

快速超轻量城市交通场景语义分割

石敏, 沈佳林, 易清明, 骆爱文+()   

  1. 暨南大学 信息科学技术学院,广州 510632
  • 通讯作者: + E-mail: luoaiwen@jnu.edu.cn
  • 作者简介:石敏(1977—),女,湖北襄樊人,博士,副教授,主要研究方向为图像多媒体处理、视频编解码等。
    沈佳林(1997—),男,广东云浮人,硕士研究生,主要研究方向为机器视觉、深度学习等。
    易清明(1965—),女,湖南岳阳人,博士,教授,主要研究方向为多媒体信息处理。
    骆爱文(1986—),女,广东广州人,博士,讲师,CCF会员,主要研究方向为边缘机器视觉、智能IC设计。
  • 基金资助:
    国家自然科学基金(62002134);广东省基础与应用基础研究基金(2020A1515110645);广东省重点实验室项目(2021KSY001);广州市创新领军人才项目(2019019);暨南大学中央高校基本科研业务费项目(21620353);暨大-泰斗联合培养研究生基地项目(82621176)

Abstract:

Recently, with the rapid development of automatic driving, more and more researchers begin to explore the lightweight of image semantic segmentation network and apply it to road traffic scenes. However, the existing semantic segmentation networks are usually difficult to deploy in edge devices with limited hardware resources due to the large number of parameters. Aiming at solving this problem, a rapid and ultra-lightweight dual attention lightweight network (DALNet) composed of channel attention bottleneck backbone (CABb) network and spatial attention decoder (SAD) module is proposed in this paper, which has outstanding performance in extracting the context semantic information and spatial information of the image. The CABb network is mainly composed of channel attention bottleneck (CABt) module. Split strategy is employed in CABt to separate feature channels and process multi-scale feature maps in parallel. And channel attention mechanism is introduced for channel fusion and multi-scale semantic information extraction. The spatial attention mechanism is adopted in SAD module to guide the decoder to upsample the feature maps using bilinear interpolation and recover the edge information and detail information of segmentation target. Experimental results show that the proposed DALNet has only 0.48 million parameters and achieves 74.1% and 70.1% mean intersection over union (mIoU) in the popular urban traffic datasets of Cityscapes and CamVid. With the resolution of 512×1024, DALNet achieves 74 frame/s inference speed on a GTX 1080Ti card, which meets the speed requirements of real-time semantic segmentation adequately.

Key words: rapid semantic segmentation, lightweight network, channel attention, spatial attention, urban traffic

摘要:

近年来,随着自动驾驶的火热发展,越来越多研究者开始探索图像语义分割网络的轻量化并将其应用于道路交通场景。而目前现存的语义分割网络通常由于参数量庞大难以部署在硬件资源有限的边缘设备,针对这一问题,设计了一个由通道注意力骨干网络(CABb)和空间注意力解码器(SAD)模块构成的双注意力轻量化网络(DALNet),结合“通道-空间”双注意力机制的DALNet在图像上下文语义信息的提取和图像空间信息的恢复上都具有突出的表现。CABb主要由通道注意力瓶颈(CABt)模块组成,CABt模块采用Split策略分离特征通道并行处理多尺度的特征图,引入通道注意力机制进行通道融合,提取多尺度语义信息。SAD模块利用空间注意力机制指导解码器进行双线性插值上采样,恢复分割目标边沿以及细节信息。实验结果表明,DALNet仅凭48万的参数量在城市交通数据集Cityscapes和CamVid最高分别可达到74.1%和70.1%的交并比(mIoU)。DALNet在输入图像分辨率为512×1 024的情况下,基于GTX 1080Ti GPU可以获得74 frame/s的前向推理速度,远超实时语义分割所需的速度要求。

关键词: 快速语义分割, 轻量化网络, 通道注意力, 空间注意力, 城市交通

CLC Number: