高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

复杂场景下自适应注意力机制融合实时语义分割

陈丹 刘乐 王晨昊 白熙茹 王子晨

陈丹, 刘乐, 王晨昊, 白熙茹, 王子晨. 复杂场景下自适应注意力机制融合实时语义分割[J]. 电子与信息学报. doi: 10.11999/JEIT231338
引用本文: 陈丹, 刘乐, 王晨昊, 白熙茹, 王子晨. 复杂场景下自适应注意力机制融合实时语义分割[J]. 电子与信息学报. doi: 10.11999/JEIT231338
CHEN Dan, LIU Le, WANG Chenhao, BAI Xiru, WANG Zichen. Adaptive Attention Mechanism Fusion for Real-Time Semantic Segmentation in Complex Scenes[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT231338
Citation: CHEN Dan, LIU Le, WANG Chenhao, BAI Xiru, WANG Zichen. Adaptive Attention Mechanism Fusion for Real-Time Semantic Segmentation in Complex Scenes[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT231338

复杂场景下自适应注意力机制融合实时语义分割

doi: 10.11999/JEIT231338
基金项目: 榆林市科技局计划项目(2019-146),西安市秦创原重点产业链核心技术攻关项目(23ZDCYJSGG0021-2023)
详细信息
    作者简介:

    陈丹:女,副教授,研究方向为智能信息处理、无线激光通信

    刘乐:女,硕士生,研究方向为人工智能计算机视觉,语义分割

    王晨昊:男,讲师,研究方向为智能信息处理

    白熙茹:女,学士,研究方向为智能信息处理

    王子晨:男,硕士生,研究方向为机器视觉智能导航

    通讯作者:

    陈丹 chdh@xaut.edu.cn

  • 中图分类号: TN911.7; TP391.4

Adaptive Attention Mechanism Fusion for Real-Time Semantic Segmentation in Complex Scenes

Funds: Yulin Science and Technology Bureau Program (No.2019-146), Xi’an Qinchuangyuan Key Industrial Chain Technology Program(23ZDCYJSGG0021-2023)
  • 摘要: 实现高准确度和低计算负担是卷积神经网络(CNN)实时语义分割面临的严峻挑战。针对复杂城市街道场景目标种类众多、光照变化大等特点,该文设计了一种高效的实时语义分割自适应注意力机制融合网络(AAFNet)分别提取图像空间细节和语义信息,再经过特征融合网络(FFN)获得准确语义图像。AAFNet采用扩展的深度可分离卷积(DDW)可增大语义特征提取感受野,提出自适应平均池化(Avp)和自适应最大池化(Amp)构成自适应注意力机制融合模块(AAFM),可细化目标边缘分割效果并降低小目标的漏分率。最后在复杂城市街道场景Cityscapes和CamVid数据集上分别进行了语义分割实验,所设计的AAFNet以32帧/s(Cityscapes) 和52帧/s (CamVid)的推理速度获得73.0%和69.8%的平均分割精度(mIoU),且与扩展的空间注意力网络(DSANet)、多尺度上下文融合网络(MSCFNet)以及轻量级双边非对称残差网络(LBARNet)相比,AAFNet平均分割精度最高。
  • 图  1  AAFNet架构

    图  2  AAFM架构

    图  3  CamVid数据集上AAFNet与其它网络的语义分割结果图对比

    图  4  Cityscapes数据集上AAFNet与其它网络的语义分割结果图对比

    表  1  空间特征提取网络结构参数

    阶段卷积核步长输入通道数输出通道数
    阶段13×32332
    3×313232
    阶段23×323264
    3×316464
    阶段33×3264128
    3×31128128
    下载: 导出CSV

    表  2  语义特征提取网络结构参数

    步骤 操作类型 扩张率 输入通道数 输出通道数
    步骤1 3×3卷积 3 32
    3×3卷积 32 32
    3×3卷积 32 32
    步骤2 下采样 32 64
    3×FDSS-nbt 64 64
    步骤3 下采样 64 128
    FDSS-nbt 1 128 128
    FDSS-nbt 3 128 128
    FDSS-nbt 6 128 128
    FDSS-nbt 12 128 128
    步骤4 FDSS-nbt 3 128 128
    FDSS-nbt 6 128 128
    FDSS-nbt 12 128 128
    FDSS-nbt 24 128 128
    下载: 导出CSV

    表  3  CamVid数据集上AAFM消融实验结果

    消融模块 SFEN SeFEN AAFM mIoU(%) Parameters fps GFLOPs
    基线 × 67.9 2.2M 73 9.29
    AAFNet 69.8 2.4M 52 9.98
    下载: 导出CSV

    表  4  Cityscapes数据集上AAFM消融实验结果

    消融模块 SFEN SeFEN AAFM mIoU(%) Parameters fps GFLOPs
    基线 × 69.7 2.2M 52 28.25
    AAFNet 73.0 2.4M 32 30.34
    下载: 导出CSV

    表  5  CamVid数据集上各网络单个类别IoU与mIoU对比结果(%)

    网络名称 sky fence pole road sign tree
    ENet 91.0 21.7 25.6 91.9 28.5 67.9
    ERFNet 91.7 36.4 35.9 93.8 41.1 72.9
    CGNet 91.1 36.3 28.4 94.3 42.9 72.2
    ESNet 91.3 39.2 36.6 94.5 42.2 71.8
    DSANet 91.5 45.8 34.5 94.5 47.2 76.4
    RGPNet[21] 91.2 51.4 46.8 90.6 62.5 77.3
    PCNet 91.4 40.2 34.0 95.1 44.8 74.5
    AAFNet 91.6 43.7 34.9 95.2 50.6 76.0
    网络名称 sidewalk building car pedstrin bicyclist mIoU
    ENet 75.0 72.3 76.1 37.7 41.1 57.2
    ERFNet 79.8 79.9 82.3 54.7 58.3 66.1
    CGNet 80.8 78.0 79.9 55.2 55.1 64.9
    ESNet 81.8 78.5 80.6 55.1 56.0 66.1
    DSANet 80.9 82.9 84.2 60.4 64.6 69.3
    RGPNet[21] 70.7 85.8 87.0 67.6 67.2 69.2
    PCNet 81.7 82.3 80.5 56.8 55.3 67.0
    AAFNet 82.8 83.2 85.2 60.1 64.1 69.8
    下载: 导出CSV

    表  6  Cityscapes数据集上各网络单个类别IoU与mIoU对比结果(%)

    网络名称 road sidewalk building wall fence pole traffic light traffic sign vegetation terrain
    ERFNet 97.9 82.1 90.7 45.2 50.4 59.0 62.6 68.4 91.9 69.4
    CGNet 95.5 78.7 88.1 40.0 43.0 54.1 59.8 63.9 89.6 67.6
    ESNet 98.1 80.4 92.4 48.3 49.2 61.5 62.5 72.3 92.5 61.5
    AGLNet[22] 97.8 80.1 91.0 51.3 50.6 58.3 63.0 68.5 92.3 71.3
    DSANet 96.8 78.5 91.2 50.5 50.8 59.4 64.0 71.7 92.6 70.0
    MSCFNet 97.7 82.8 91.0 49.0 52.5 61.2 67.1 71.4 92.3 70.2
    MRFDCNet[23] 98.2 83.7 91.5 50 53.9 59.6 66.6 70.7 92.4 70.4
    PCNet 98.3 84.4 91.4 48.4 52.6 57.1 63.8 69.7 92.3 70.0
    AAFNet 98.5 87.9 91.9 43.8 56.6 64.0 68.4 76.1 92.6 62.2
    网络名称 sky person rider car truck bus train motorcycle bicycle mIoU
    ERFNet 94.2 78.5 59.8 93.4 52.3 60.8 53.7 49.9 64.2 69.7
    CGNet 92.9 77.9 54.9 90.2 44.1 59.5 25.2 47.3 60.2 64.8
    ESNet 94.4 76.6 53.2 94.4 62.5 74.3 52.4 45.5 71.4 70.7
    AGLNet[22] 94.2 80.1 59.6 93.8 48.4 68.1 42.1 52.4 67.8 70.1
    DSANet 94.5 81.8 61.9 92.9 56.1 75.6 50.6 50.9 66.8 71.4
    MSCFNet 94.3 82.7 62.7 94.1 50.9 66.1 51.9 57.6 70.2 71.9
    MRFDCNet[23] 94.7 82 63.0 94.4 55.5 70.6 57.3 59 70.4 72.8
    PCNet 94.6 80.6 61.5 94.5 61.2 73.9 63.2 57.3 69.3 72.9
    AAFNet 94.0 82.9 58.3 94.2 62.4 73.9 48.6 57.2 73.8 73.0
    下载: 导出CSV

    表  7  CamVid数据集上各网络参数量、推理速度和计算复杂度对比结果

    网络名称 ENet ERFNet CGNet ESNet DSANet SegNet LBARNet AAFNet
    Parameters 0.36M 2.07M 0.49M 1.66M 3.12M 29.45M 0.6M 2.42M
    fps 75 71 64 71 42 17 72 52
    GFLOPs 1.39 8.83 2.3 8.0 12.51 29.21 - 9.98
    mIoU 57.2 66.1 64.9 66.1 69.3 60.1 67.2 69.8
    下载: 导出CSV

    表  8  Cityscapes数据集上各网络参数量、推理速度和计算复杂度对比结果

    网络名称 输入尺寸 GFLOPs Parameters(M) fps mIoU 显卡型号
    ENet 512×1 024 4.35 0.4 44 58.3 TITAN Xp
    ERFNet 512×1 024 28.86 2.1 32 69.7 TITAN Xp
    CGNet 512×1 024 7.01 0.5 35 64.8 TITAN Xp
    ESNet 512×1 024 24.35 1.06 54 70.7 TITAN Xp
    PSPNet[24] 512×1 024 514.56 65.47 7 80.2 TITAN Xp
    SegNet 512×1 024 326.26 29.45 18 - TITAN Xp
    DABNet[25] 512×1 024 10.46 0.76 84 70.1 TITAN Xp
    AGLNet 512×1 024 13.88 1.12 43 70.1 TITAN Xp
    DSANet 512×1 024 38.0 3.12 33 71.4 TITAN Xp
    ICNet 512×1 024 40.37 12.87 30 70.6 TITAN Xp
    RGPNet 1 024×2 048 - 17.8 38 - RTX 2080Ti
    MRFDCNet 512×1 024 14.04 1.07 47 72.8 TITAN Xp
    BAFF 512×1 024 15.96 0.82 - 68.0 TITAN Xp
    MHANet[26] 512×1 024 14.25 5.42 203 71.9 RTX 2080Ti
    LBARNet 512×1 024 - 0.6 72 70.8 GTX 3090
    AAFNet 512×1 024 30.34 2.42 32 73.0 TITAN Xp
    下载: 导出CSV
  • [1] HAO Shijie, ZHOU Yuan, and GUO Yanrong. A brief survey on semantic segmentation with deep learning[J]. Neurocomputing, 2020, 406: 302–321. doi: 10.1016/j.neucom.2019.11.118.
    [2] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848. doi: 10.1109/TPAMI.2016.2644615.
    [3] BADRINARAYANAN V, KENDALL A, and CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481–2495. doi: 10.17863/CAM.17966.
    [4] CHEN Wenlin, WILSON J T, TYREE S, et al. Compressing neural networks with the hashing trick[C]. The 32nd International Conference on Machine Learning, Lille, France, 2015: 2285–2294. doi: 10.5555/3045118.3045361.
    [5] HAN Song, MAO Huizi, and DALLY W J. Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding[C]. 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016: 3–7.
    [6] WU Jiaxiang, LENG Cong, WANG Yuhang, et al. Quantized convolutional neural networks for mobile devices[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 4820–4828. doi: 10.1109/CVPR.2016.521.
    [7] ROMERA E, ALVAREZ J M, BERGASA L M, et al. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(1): 263–272. doi: 10.1109/TITS.2017.2750080.
    [8] WANG Yu, ZHOU Quan, XIONG Jian, et al. ESNet: An efficient symmetric network for real-time semantic segmentation[C]. Second Chinese Conference on Pattern Recognition and Computer Vision, Xi’an, China, 2019: 41–52. doi: 10.1007/978-3-030-31723-2_4.
    [9] GAO Guangwei, XU Guoan, YU Yi, et al. MSCFNet: A lightweight network with multi-scale context fusion for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12): 25489–25499. doi: 10.1109/TITS.2021.3098355.
    [10] PASZKE A, CHAURASIA A, KIM S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[EB/OL].https://arxiv.org/pdf/:1606.02147.pdf, 2016.
    [11] ZHAO Hengshuang, QI Xiaojuan, SHEN Xiaoyong, et al. ICNet for real-time semantic segmentation on high-resolution images[C]. 15th European Conference on Computer Vision, Munich, Germany, 2018: 418–434. doi: 10.1007/978-3-030-01219-9_25.
    [12] WU Tianyi, TANG Sheng, ZHANG Rui, et al. CGNet: A light-weight context guided network for semantic segmentation[J]. IEEE Transactions on Image Processing, 2021, 30: 1169–1179. doi: 10.1109/TIP.2020.3042065.
    [13] LV Qingxuan, SUN Xin, CHEN Changrui, et al. Parallel complement network for real-time semantic segmentation of road scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5): 4432–4444. doi: 10.1109/TITS.2020.3044672.
    [14] 黄庭鸿, 聂卓赟, 王庆国, 等. 基于区块自适应特征融合的图像实时语义分割[J]. 自动化学报, 2021, 47(5): 1137–1148. doi: 10.16383/j.aas.c180645.

    HUANG Tinghong, NIE Zhuoyun, WANG Qingguo, et al. Real-time image semantic segmentation based on block adaptive feature fusion[J]. Acta Automatica Sinica, 2021, 47(5): 1137–1148. doi: 10.16383/j.aas.c180645.
    [15] HU Xuegang and ZHOU Baoman. LBARNet: Lightweight bilateral asymmetric residual network for real-time semantic segmentation[J]. Computers & Graphics, 2023, 116: 1–12. doi: 10.1016/j.cag.2023.07.039.
    [16] ZHAO Hengshuang, ZHANG Yi, LIU Shu, et al. PSANet: Point-wise spatial attention network for scene parsing[C]. 15th European Conference on Computer Vision, Munich, Germany, 2018: 270–286. doi: 10.1007/978-3-030-01240-3_17.
    [17] FU Jun, LIU Jing, TIAN Haijie, et al. Dual attention network for scene segmentation[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Los Alamitos, USA, 2019: 3141–3149. doi: 10.1109/CVPR.2019.00326.
    [18] ELHASSAN M A M, HUANG Chenxi, YANG Chenhui, et al. DSANet: Dilated spatial attention for real-time semantic segmentation in urban street scenes[J]. Expert Systems with Applications, 2021, 183: 115090. doi: 10.1016/j.eswa.2021.115090.
    [19] 王囡, 侯志强, 蒲磊, 等. 空洞可分离卷积和注意力机制的实时语义分割[J]. 中国图象图形学报, 2022, 27(4): 1216–1225. doi: 10.11834/jig.200729.

    WANG Nan, HOU Zhiqiang, PU Lei, et al. Real-time semantic segmentation analysis based on cavity separable convolution and attention mechanism[J]. Journal of Image and Graphics, 2022, 27(4): 1216–1225. doi: 10.11834/jig.200729.
    [20] 高翔, 李春庚, 安居白. 基于注意力和多标签分类的图像实时语义分割[J]. 计算机辅助设计与图形学学报, 2021, 33(1): 59–67. doi: 10.3724/SP.J.1089.2021.18233.

    GAO Xiang, LI Chungeng, and AN Jubai. Real-time image semantic segmentation based on attention mechanism and multi-label classification[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 59–67. doi: 10.3724/SP.J.1089.2021.18233.
    [21] ARANI E, MARZBAN S, PATA A, et al. RGPNet: A real-time general purpose semantic segmentation[C]. IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, 2021: 3008–3017. doi: 10.1109/WACV48630.2021.00305.
    [22] ZHOU Quan, WANG Yu, FAN Yawen, et al. AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network[J]. Applied Soft Computing, 2020, 96: 106682. doi: 10.1016/j.asoc.2020.106682.
    [23] WANG Xiaotian and CAO Weiqun. MRFDCNet: Multireceptive field dense connection network for real-time semantic segmentation[J]. Mobile Information Systems, 2022, 2022: 6100292. doi: 10.1155/2022/6100292.
    [24] ZHAO Hengshuang, SHI Jianping, QI Xiaojuan, et al. Pyramid scene parsing network[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 6230–6239. doi: 10.1109/CVPR.2017.660.
    [25] LI Gen, JIANG Shenlu, YUN I, et al. Depth-wise asymmetric bottleneck with point-wise aggregation decoder for real-time semantic segmentation in urban scenes[J]. IEEE Access, 2020, 8: 27495–27506. doi: 10.1109/ACCESS.2020.2971760.
    [26] WANG Xizhong, LIU Rui, DONG Jing, et al. Lightweight real-time image semantic segmentation network based on multi-resolution hybrid attention mechanism[J]. Wireless Communications and Mobile Computing, 2022, 2022: 3215083. doi: 10.1155/2022/3215083.
  • 加载中
图(4) / 表(8)
计量
  • 文章访问数:  99
  • HTML全文浏览量:  47
  • PDF下载量:  26
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-12-04
  • 修回日期:  2024-02-23
  • 网络出版日期:  2024-03-04

目录

    /

    返回文章
    返回